Documentation update.
This commit is contained in:
parent
d5dc4e0c33
commit
16d47a9cb1
|
@ -1762,17 +1762,22 @@ subject string does not happen. The first match attempt is run starting from
|
||||||
the overall result is "no match".
|
the overall result is "no match".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There are also other start-up optimizations. For example, a minimum length for
|
As another start-up optimization makes use of a minimum length for a matching
|
||||||
the subject may be recorded. Consider the pattern
|
subject, which is recorded when possible. Consider the pattern
|
||||||
<pre>
|
<pre>
|
||||||
(*MARK:A)(X|Y)
|
(*MARK:1)B(*MARK:2)(X|Y)
|
||||||
</pre>
|
</pre>
|
||||||
The minimum length for a match is one character. If the subject is "ABC", there
|
The minimum length for a match is two characters. If the subject is "XXBB", the
|
||||||
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
|
"starting character" optimization skips "XX", then tries to match "BB", which
|
||||||
string at the end of the subject does not take place, because PCRE2 knows that
|
is long enough. In the process, (*MARK:2) is encountered and remembered. When
|
||||||
the subject is now too short, and so the (*MARK) is never encountered. In this
|
the match attempt fails, the next "B" is found, but there is only one character
|
||||||
case, the optimization does not affect the overall match result, which is still
|
left, so there are no more attempts, and "no match" is returned with the "last
|
||||||
"no match", but it does affect the auxiliary information that is returned.
|
mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
|
||||||
|
at every possible starting position, including at the end of the subject, where
|
||||||
|
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
|
||||||
|
returned is "1". In this case, the optimizations do not affect the overall
|
||||||
|
match result, which is still "no match", but they do affect the auxiliary
|
||||||
|
information that is returned.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NO_UTF_CHECK
|
PCRE2_NO_UTF_CHECK
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3831,7 +3836,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 23 May 2019
|
Last updated: 30 May 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -741,13 +741,22 @@ ignored when used with <b>-L</b> (list files without matches), because the grand
|
||||||
total would always be zero.
|
total would always be zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-u</b>, <b>--utf-8</b>
|
<b>-u</b>, <b>--utf</b>
|
||||||
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
|
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
|
||||||
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
|
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
|
||||||
<b>--include</b> options) and all subject lines that are scanned must be valid
|
<b>--include</b> options) and all subject lines that are scanned must be valid
|
||||||
strings of UTF-8 characters.
|
strings of UTF-8 characters.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
<b>-U</b>, <b>--utf-allow-invalid</b>
|
||||||
|
As <b>--utf</b>, but in addition subject lines may contain invalid UTF-8 code
|
||||||
|
unit sequences. These can never form part of any pattern match. This facility
|
||||||
|
allows valid UTF-8 strings to be sought in executable or other binary files.
|
||||||
|
For more details about matching in non-valid UTF-8 strings, see the
|
||||||
|
<a href="pcre2unicode.html"><b>pcre2unicode</b>(3)</a>
|
||||||
|
documentation.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
<b>-V</b>, <b>--version</b>
|
<b>-V</b>, <b>--version</b>
|
||||||
Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
|
Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
|
||||||
standard output and then exit. Anything else on the command line is
|
standard output and then exit. Anything else on the command line is
|
||||||
|
@ -806,9 +815,9 @@ as in the GNU <b>grep</b> program. Any long option of the form
|
||||||
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
|
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
|
||||||
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
|
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
|
||||||
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
|
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
|
||||||
<b>--output</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
|
<b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
|
||||||
<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
|
options are specific to <b>pcre2grep</b>, as is the use of the
|
||||||
capturing parentheses number.
|
<b>--only-matching</b> option with a capturing parentheses number.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Although most of the common options work the same way, a few are different in
|
Although most of the common options work the same way, a few are different in
|
||||||
|
@ -971,9 +980,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 24 November 2018
|
Last updated: 28 May 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
1607
doc/pcre2.txt
1607
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "23 May 2019" "PCRE2 10.34"
|
.TH PCRE2API 3 "30 May 2019" "PCRE2 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1701,17 +1701,22 @@ subject string does not happen. The first match attempt is run starting from
|
||||||
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
||||||
the overall result is "no match".
|
the overall result is "no match".
|
||||||
.P
|
.P
|
||||||
There are also other start-up optimizations. For example, a minimum length for
|
As another start-up optimization makes use of a minimum length for a matching
|
||||||
the subject may be recorded. Consider the pattern
|
subject, which is recorded when possible. Consider the pattern
|
||||||
.sp
|
.sp
|
||||||
(*MARK:A)(X|Y)
|
(*MARK:1)B(*MARK:2)(X|Y)
|
||||||
.sp
|
.sp
|
||||||
The minimum length for a match is one character. If the subject is "ABC", there
|
The minimum length for a match is two characters. If the subject is "XXBB", the
|
||||||
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
|
"starting character" optimization skips "XX", then tries to match "BB", which
|
||||||
string at the end of the subject does not take place, because PCRE2 knows that
|
is long enough. In the process, (*MARK:2) is encountered and remembered. When
|
||||||
the subject is now too short, and so the (*MARK) is never encountered. In this
|
the match attempt fails, the next "B" is found, but there is only one character
|
||||||
case, the optimization does not affect the overall match result, which is still
|
left, so there are no more attempts, and "no match" is returned with the "last
|
||||||
"no match", but it does affect the auxiliary information that is returned.
|
mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
|
||||||
|
at every possible starting position, including at the end of the subject, where
|
||||||
|
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
|
||||||
|
returned is "1". In this case, the optimizations do not affect the overall
|
||||||
|
match result, which is still "no match", but they do affect the auxiliary
|
||||||
|
information that is returned.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_NO_UTF_CHECK
|
PCRE2_NO_UTF_CHECK
|
||||||
.sp
|
.sp
|
||||||
|
@ -3843,6 +3848,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 23 May 2019
|
Last updated: 30 May 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -650,7 +650,7 @@ with UTF-8 support. All patterns (including those for any \fB--exclude\fP and
|
||||||
\fB--include\fP options) and all subject lines that are scanned must be valid
|
\fB--include\fP options) and all subject lines that are scanned must be valid
|
||||||
strings of UTF-8 characters.
|
strings of UTF-8 characters.
|
||||||
.TP
|
.TP
|
||||||
\fb-U\fP, \fB--utf-allow-invalid\fP
|
\fB-U\fP, \fB--utf-allow-invalid\fP
|
||||||
As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
|
As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
|
||||||
unit sequences. These can never form part of any pattern match. This facility
|
unit sequences. These can never form part of any pattern match. This facility
|
||||||
allows valid UTF-8 strings to be sought in executable or other binary files.
|
allows valid UTF-8 strings to be sought in executable or other binary files.
|
||||||
|
|
|
@ -719,47 +719,54 @@ OPTIONS
|
||||||
(list files without matches), because the grand total would
|
(list files without matches), because the grand total would
|
||||||
always be zero.
|
always be zero.
|
||||||
|
|
||||||
-u, --utf-8
|
-u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
|
||||||
Operate in UTF-8 mode. This option is available only if PCRE2
|
|
||||||
has been compiled with UTF-8 support. All patterns (including
|
has been compiled with UTF-8 support. All patterns (including
|
||||||
those for any --exclude and --include options) and all sub-
|
those for any --exclude and --include options) and all sub-
|
||||||
ject lines that are scanned must be valid strings of UTF-8
|
ject lines that are scanned must be valid strings of UTF-8
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
-U, --utf-allow-invalid
|
||||||
|
As --utf, but in addition subject lines may contain invalid
|
||||||
|
UTF-8 code unit sequences. These can never form part of any
|
||||||
|
pattern match. This facility allows valid UTF-8 strings to be
|
||||||
|
sought in executable or other binary files. For more details
|
||||||
|
about matching in non-valid UTF-8 strings, see the pcre2uni-
|
||||||
|
code(3) documentation.
|
||||||
|
|
||||||
-V, --version
|
-V, --version
|
||||||
Write the version numbers of pcre2grep and the PCRE2 library
|
Write the version numbers of pcre2grep and the PCRE2 library
|
||||||
to the standard output and then exit. Anything else on the
|
to the standard output and then exit. Anything else on the
|
||||||
command line is ignored.
|
command line is ignored.
|
||||||
|
|
||||||
-v, --invert-match
|
-v, --invert-match
|
||||||
Invert the sense of the match, so that lines which do not
|
Invert the sense of the match, so that lines which do not
|
||||||
match any of the patterns are the ones that are found.
|
match any of the patterns are the ones that are found.
|
||||||
|
|
||||||
-w, --word-regex, --word-regexp
|
-w, --word-regex, --word-regexp
|
||||||
Force the patterns only to match "words". That is, there must
|
Force the patterns only to match "words". That is, there must
|
||||||
be a word boundary at the start and end of each matched
|
be a word boundary at the start and end of each matched
|
||||||
string. This is equivalent to having "\b(?:" at the start of
|
string. This is equivalent to having "\b(?:" at the start of
|
||||||
each pattern, and ")\b" at the end. This option applies only
|
each pattern, and ")\b" at the end. This option applies only
|
||||||
to the patterns that are matched against the contents of
|
to the patterns that are matched against the contents of
|
||||||
files; it does not apply to patterns specified by any of the
|
files; it does not apply to patterns specified by any of the
|
||||||
--include or --exclude options.
|
--include or --exclude options.
|
||||||
|
|
||||||
-x, --line-regex, --line-regexp
|
-x, --line-regex, --line-regexp
|
||||||
Force the patterns to start matching only at the beginnings
|
Force the patterns to start matching only at the beginnings
|
||||||
of lines, and in addition, require them to match entire
|
of lines, and in addition, require them to match entire
|
||||||
lines. In multiline mode the match may be more than one line.
|
lines. In multiline mode the match may be more than one line.
|
||||||
This is equivalent to having "^(?:" at the start of each pat-
|
This is equivalent to having "^(?:" at the start of each pat-
|
||||||
tern and ")$" at the end. This option applies only to the
|
tern and ")$" at the end. This option applies only to the
|
||||||
patterns that are matched against the contents of files; it
|
patterns that are matched against the contents of files; it
|
||||||
does not apply to patterns specified by any of the --include
|
does not apply to patterns specified by any of the --include
|
||||||
or --exclude options.
|
or --exclude options.
|
||||||
|
|
||||||
|
|
||||||
ENVIRONMENT VARIABLES
|
ENVIRONMENT VARIABLES
|
||||||
|
|
||||||
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
||||||
order, for a locale. The first one that is set is used. This can be
|
order, for a locale. The first one that is set is used. This can be
|
||||||
overridden by the --locale option. If no locale is set, the PCRE2
|
overridden by the --locale option. If no locale is set, the PCRE2
|
||||||
library's default (usually the "C" locale) is used.
|
library's default (usually the "C" locale) is used.
|
||||||
|
|
||||||
|
|
||||||
|
@ -767,107 +774,107 @@ NEWLINES
|
||||||
|
|
||||||
The -N (--newline) option allows pcre2grep to scan files with different
|
The -N (--newline) option allows pcre2grep to scan files with different
|
||||||
newline conventions from the default. Any parts of the input files that
|
newline conventions from the default. Any parts of the input files that
|
||||||
are written to the standard output are copied identically, with what-
|
are written to the standard output are copied identically, with what-
|
||||||
ever newline sequences they have in the input. However, the setting of
|
ever newline sequences they have in the input. However, the setting of
|
||||||
this option affects only the way scanned files are processed. It does
|
this option affects only the way scanned files are processed. It does
|
||||||
not affect the interpretation of files specified by the -f, --file-
|
not affect the interpretation of files specified by the -f, --file-
|
||||||
list, --exclude-from, or --include-from options, nor does it affect the
|
list, --exclude-from, or --include-from options, nor does it affect the
|
||||||
way in which pcre2grep writes informational messages to the standard
|
way in which pcre2grep writes informational messages to the standard
|
||||||
error and output streams. For these it uses the string "\n" to indicate
|
error and output streams. For these it uses the string "\n" to indicate
|
||||||
newlines, relying on the C I/O library to convert this to an appropri-
|
newlines, relying on the C I/O library to convert this to an appropri-
|
||||||
ate sequence.
|
ate sequence.
|
||||||
|
|
||||||
|
|
||||||
OPTIONS COMPATIBILITY
|
OPTIONS COMPATIBILITY
|
||||||
|
|
||||||
Many of the short and long forms of pcre2grep's options are the same as
|
Many of the short and long forms of pcre2grep's options are the same as
|
||||||
in the GNU grep program. Any long option of the form --xxx-regexp (GNU
|
in the GNU grep program. Any long option of the form --xxx-regexp (GNU
|
||||||
terminology) is also available as --xxx-regex (PCRE2 terminology). How-
|
terminology) is also available as --xxx-regex (PCRE2 terminology). How-
|
||||||
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
|
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
|
||||||
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
|
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
|
||||||
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options
|
line, -N, --newline, --om-separator, --output, -u, --utf, -U, and
|
||||||
are specific to pcre2grep, as is the use of the --only-matching option
|
--utf-allow-invalid options are specific to pcre2grep, as is the use of
|
||||||
with a capturing parentheses number.
|
the --only-matching option with a capturing parentheses number.
|
||||||
|
|
||||||
Although most of the common options work the same way, a few are dif-
|
Although most of the common options work the same way, a few are dif-
|
||||||
ferent in pcre2grep. For example, the --include option's argument is a
|
ferent in pcre2grep. For example, the --include option's argument is a
|
||||||
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
||||||
-c and -l options are given, GNU grep lists only file names, without
|
-c and -l options are given, GNU grep lists only file names, without
|
||||||
counts, but pcre2grep gives the counts as well.
|
counts, but pcre2grep gives the counts as well.
|
||||||
|
|
||||||
|
|
||||||
OPTIONS WITH DATA
|
OPTIONS WITH DATA
|
||||||
|
|
||||||
There are four different ways in which an option with data can be spec-
|
There are four different ways in which an option with data can be spec-
|
||||||
ified. If a short form option is used, the data may follow immedi-
|
ified. If a short form option is used, the data may follow immedi-
|
||||||
ately, or (with one exception) in the next command line item. For exam-
|
ately, or (with one exception) in the next command line item. For exam-
|
||||||
ple:
|
ple:
|
||||||
|
|
||||||
-f/some/file
|
-f/some/file
|
||||||
-f /some/file
|
-f /some/file
|
||||||
|
|
||||||
The exception is the -o option, which may appear with or without data.
|
The exception is the -o option, which may appear with or without data.
|
||||||
Because of this, if data is present, it must follow immediately in the
|
Because of this, if data is present, it must follow immediately in the
|
||||||
same item, for example -o3.
|
same item, for example -o3.
|
||||||
|
|
||||||
If a long form option is used, the data may appear in the same command
|
If a long form option is used, the data may appear in the same command
|
||||||
line item, separated by an equals character, or (with two exceptions)
|
line item, separated by an equals character, or (with two exceptions)
|
||||||
it may appear in the next command line item. For example:
|
it may appear in the next command line item. For example:
|
||||||
|
|
||||||
--file=/some/file
|
--file=/some/file
|
||||||
--file /some/file
|
--file /some/file
|
||||||
|
|
||||||
Note, however, that if you want to supply a file name beginning with ~
|
Note, however, that if you want to supply a file name beginning with ~
|
||||||
as data in a shell command, and have the shell expand ~ to a home
|
as data in a shell command, and have the shell expand ~ to a home
|
||||||
directory, you must separate the file name from the option, because the
|
directory, you must separate the file name from the option, because the
|
||||||
shell does not treat ~ specially unless it is at the start of an item.
|
shell does not treat ~ specially unless it is at the start of an item.
|
||||||
|
|
||||||
The exceptions to the above are the --colour (or --color) and --only-
|
The exceptions to the above are the --colour (or --color) and --only-
|
||||||
matching options, for which the data is optional. If one of these
|
matching options, for which the data is optional. If one of these
|
||||||
options does have data, it must be given in the first form, using an
|
options does have data, it must be given in the first form, using an
|
||||||
equals character. Otherwise pcre2grep will assume that it has no data.
|
equals character. Otherwise pcre2grep will assume that it has no data.
|
||||||
|
|
||||||
|
|
||||||
USING PCRE2'S CALLOUT FACILITY
|
USING PCRE2'S CALLOUT FACILITY
|
||||||
|
|
||||||
pcre2grep has, by default, support for calling external programs or
|
pcre2grep has, by default, support for calling external programs or
|
||||||
scripts or echoing specific strings during matching by making use of
|
scripts or echoing specific strings during matching by making use of
|
||||||
PCRE2's callout facility. However, this support can be completely or
|
PCRE2's callout facility. However, this support can be completely or
|
||||||
partially disabled when pcre2grep is built. You can find out whether
|
partially disabled when pcre2grep is built. You can find out whether
|
||||||
your binary has support for callouts by running it with the --help
|
your binary has support for callouts by running it with the --help
|
||||||
option. If callout support is completely disabled, all callouts in pat-
|
option. If callout support is completely disabled, all callouts in pat-
|
||||||
terns are ignored by pcre2grep. If the facility is partially disabled,
|
terns are ignored by pcre2grep. If the facility is partially disabled,
|
||||||
calling external programs is not supported, and callouts that request
|
calling external programs is not supported, and callouts that request
|
||||||
it are ignored.
|
it are ignored.
|
||||||
|
|
||||||
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
|
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
|
||||||
ment is either a number or a quoted string (see the pcre2callout docu-
|
ment is either a number or a quoted string (see the pcre2callout docu-
|
||||||
mentation for details). Numbered callouts are ignored by pcre2grep;
|
mentation for details). Numbered callouts are ignored by pcre2grep;
|
||||||
only callouts with string arguments are useful.
|
only callouts with string arguments are useful.
|
||||||
|
|
||||||
Calling external programs or scripts
|
Calling external programs or scripts
|
||||||
|
|
||||||
This facility can be independently disabled when pcre2grep is built. It
|
This facility can be independently disabled when pcre2grep is built. It
|
||||||
is supported for Windows, where a call to _spawnvp() is used, for VMS,
|
is supported for Windows, where a call to _spawnvp() is used, for VMS,
|
||||||
where lib$spawn() is used, and for any other Unix-like environment
|
where lib$spawn() is used, and for any other Unix-like environment
|
||||||
where fork() and execv() are available.
|
where fork() and execv() are available.
|
||||||
|
|
||||||
If the callout string does not start with a pipe (vertical bar) charac-
|
If the callout string does not start with a pipe (vertical bar) charac-
|
||||||
ter, it is parsed into a list of substrings separated by pipe charac-
|
ter, it is parsed into a list of substrings separated by pipe charac-
|
||||||
ters. The first substring must be an executable name, with the follow-
|
ters. The first substring must be an executable name, with the follow-
|
||||||
ing substrings specifying arguments:
|
ing substrings specifying arguments:
|
||||||
|
|
||||||
executable_name|arg1|arg2|...
|
executable_name|arg1|arg2|...
|
||||||
|
|
||||||
Any substring (including the executable name) may contain escape
|
Any substring (including the executable name) may contain escape
|
||||||
sequences started by a dollar character: $<digits> or ${<digits>} is
|
sequences started by a dollar character: $<digits> or ${<digits>} is
|
||||||
replaced by the captured substring of the given decimal number, which
|
replaced by the captured substring of the given decimal number, which
|
||||||
must be greater than zero. If the number is greater than the number of
|
must be greater than zero. If the number is greater than the number of
|
||||||
capturing substrings, or if the capture is unset, the replacement is
|
capturing substrings, or if the capture is unset, the replacement is
|
||||||
empty.
|
empty.
|
||||||
|
|
||||||
Any other character is substituted by itself. In particular, $$ is
|
Any other character is substituted by itself. In particular, $$ is
|
||||||
replaced by a single dollar and $| is replaced by a pipe character.
|
replaced by a single dollar and $| is replaced by a pipe character.
|
||||||
Here is an example:
|
Here is an example:
|
||||||
|
|
||||||
echo -e "abcde\n12345" | pcre2grep \
|
echo -e "abcde\n12345" | pcre2grep \
|
||||||
|
@ -881,13 +888,13 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
Arg1: [1] [234] [4] Arg2: |1| ()
|
Arg1: [1] [234] [4] Arg2: |1| ()
|
||||||
12345
|
12345
|
||||||
|
|
||||||
The parameters for the system call that is used to run the program or
|
The parameters for the system call that is used to run the program or
|
||||||
script are zero-terminated strings. This means that binary zero charac-
|
script are zero-terminated strings. This means that binary zero charac-
|
||||||
ters in the callout argument will cause premature termination of their
|
ters in the callout argument will cause premature termination of their
|
||||||
substrings, and therefore should not be present. Any syntax errors in
|
substrings, and therefore should not be present. Any syntax errors in
|
||||||
the string (for example, a dollar not followed by another character)
|
the string (for example, a dollar not followed by another character)
|
||||||
cause the callout to be ignored. If running the program fails for any
|
cause the callout to be ignored. If running the program fails for any
|
||||||
reason (including the non-existence of the executable), a local match-
|
reason (including the non-existence of the executable), a local match-
|
||||||
ing failure occurs and the matcher backtracks in the normal way.
|
ing failure occurs and the matcher backtracks in the normal way.
|
||||||
|
|
||||||
Echoing a specific string
|
Echoing a specific string
|
||||||
|
@ -896,41 +903,41 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
pletely disabled when pcre2grep was built. If the callout string starts
|
pletely disabled when pcre2grep was built. If the callout string starts
|
||||||
with a pipe (vertical bar) character, the rest of the string is written
|
with a pipe (vertical bar) character, the rest of the string is written
|
||||||
to the output, having been passed through the same escape processing as
|
to the output, having been passed through the same escape processing as
|
||||||
text from the --output option. This provides a simple echoing facility
|
text from the --output option. This provides a simple echoing facility
|
||||||
that avoids calling an external program or script. No terminator is
|
that avoids calling an external program or script. No terminator is
|
||||||
added to the string, so if you want a newline, you must include it
|
added to the string, so if you want a newline, you must include it
|
||||||
explicitly. Matching continues normally after the string is output. If
|
explicitly. Matching continues normally after the string is output. If
|
||||||
you want to see only the callout output but not any output from an
|
you want to see only the callout output but not any output from an
|
||||||
actual match, you should end the relevant pattern with (*FAIL).
|
actual match, you should end the relevant pattern with (*FAIL).
|
||||||
|
|
||||||
|
|
||||||
MATCHING ERRORS
|
MATCHING ERRORS
|
||||||
|
|
||||||
It is possible to supply a regular expression that takes a very long
|
It is possible to supply a regular expression that takes a very long
|
||||||
time to fail to match certain lines. Such patterns normally involve
|
time to fail to match certain lines. Such patterns normally involve
|
||||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||||
line of a's with no final digit. The PCRE2 matching function has a
|
line of a's with no final digit. The PCRE2 matching function has a
|
||||||
resource limit that causes it to abort in these circumstances. If this
|
resource limit that causes it to abort in these circumstances. If this
|
||||||
happens, pcre2grep outputs an error message and the line that caused
|
happens, pcre2grep outputs an error message and the line that caused
|
||||||
the problem to the standard error stream. If there are more than 20
|
the problem to the standard error stream. If there are more than 20
|
||||||
such errors, pcre2grep gives up.
|
such errors, pcre2grep gives up.
|
||||||
|
|
||||||
The --match-limit option of pcre2grep can be used to set the overall
|
The --match-limit option of pcre2grep can be used to set the overall
|
||||||
resource limit. There are also other limits that affect the amount of
|
resource limit. There are also other limits that affect the amount of
|
||||||
memory used during matching; see the discussion of --heap-limit and
|
memory used during matching; see the discussion of --heap-limit and
|
||||||
--depth-limit above.
|
--depth-limit above.
|
||||||
|
|
||||||
|
|
||||||
DIAGNOSTICS
|
DIAGNOSTICS
|
||||||
|
|
||||||
Exit status is 0 if any matches were found, 1 if no matches were found,
|
Exit status is 0 if any matches were found, 1 if no matches were found,
|
||||||
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
||||||
files (even if matches were found in other files) or too many matching
|
files (even if matches were found in other files) or too many matching
|
||||||
errors. Using the -s option to suppress error messages about inaccessi-
|
errors. Using the -s option to suppress error messages about inaccessi-
|
||||||
ble files does not affect the return code.
|
ble files does not affect the return code.
|
||||||
|
|
||||||
When run under VMS, the return code is placed in the symbol
|
When run under VMS, the return code is placed in the symbol
|
||||||
PCRE2GREP_RC because VMS does not distinguish between exit(0) and
|
PCRE2GREP_RC because VMS does not distinguish between exit(0) and
|
||||||
exit(1).
|
exit(1).
|
||||||
|
|
||||||
|
|
||||||
|
@ -948,5 +955,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 24 November 2018
|
Last updated: 28 May 2019
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue