Documentation update.
This commit is contained in:
parent
d5dc4e0c33
commit
16d47a9cb1
|
@ -1762,17 +1762,22 @@ subject string does not happen. The first match attempt is run starting from
|
||||||
the overall result is "no match".
|
the overall result is "no match".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There are also other start-up optimizations. For example, a minimum length for
|
As another start-up optimization makes use of a minimum length for a matching
|
||||||
the subject may be recorded. Consider the pattern
|
subject, which is recorded when possible. Consider the pattern
|
||||||
<pre>
|
<pre>
|
||||||
(*MARK:A)(X|Y)
|
(*MARK:1)B(*MARK:2)(X|Y)
|
||||||
</pre>
|
</pre>
|
||||||
The minimum length for a match is one character. If the subject is "ABC", there
|
The minimum length for a match is two characters. If the subject is "XXBB", the
|
||||||
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
|
"starting character" optimization skips "XX", then tries to match "BB", which
|
||||||
string at the end of the subject does not take place, because PCRE2 knows that
|
is long enough. In the process, (*MARK:2) is encountered and remembered. When
|
||||||
the subject is now too short, and so the (*MARK) is never encountered. In this
|
the match attempt fails, the next "B" is found, but there is only one character
|
||||||
case, the optimization does not affect the overall match result, which is still
|
left, so there are no more attempts, and "no match" is returned with the "last
|
||||||
"no match", but it does affect the auxiliary information that is returned.
|
mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
|
||||||
|
at every possible starting position, including at the end of the subject, where
|
||||||
|
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
|
||||||
|
returned is "1". In this case, the optimizations do not affect the overall
|
||||||
|
match result, which is still "no match", but they do affect the auxiliary
|
||||||
|
information that is returned.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NO_UTF_CHECK
|
PCRE2_NO_UTF_CHECK
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3831,7 +3836,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 23 May 2019
|
Last updated: 30 May 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -741,13 +741,22 @@ ignored when used with <b>-L</b> (list files without matches), because the grand
|
||||||
total would always be zero.
|
total would always be zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-u</b>, <b>--utf-8</b>
|
<b>-u</b>, <b>--utf</b>
|
||||||
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
|
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
|
||||||
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
|
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
|
||||||
<b>--include</b> options) and all subject lines that are scanned must be valid
|
<b>--include</b> options) and all subject lines that are scanned must be valid
|
||||||
strings of UTF-8 characters.
|
strings of UTF-8 characters.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
<b>-U</b>, <b>--utf-allow-invalid</b>
|
||||||
|
As <b>--utf</b>, but in addition subject lines may contain invalid UTF-8 code
|
||||||
|
unit sequences. These can never form part of any pattern match. This facility
|
||||||
|
allows valid UTF-8 strings to be sought in executable or other binary files.
|
||||||
|
For more details about matching in non-valid UTF-8 strings, see the
|
||||||
|
<a href="pcre2unicode.html"><b>pcre2unicode</b>(3)</a>
|
||||||
|
documentation.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
<b>-V</b>, <b>--version</b>
|
<b>-V</b>, <b>--version</b>
|
||||||
Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
|
Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
|
||||||
standard output and then exit. Anything else on the command line is
|
standard output and then exit. Anything else on the command line is
|
||||||
|
@ -806,9 +815,9 @@ as in the GNU <b>grep</b> program. Any long option of the form
|
||||||
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
|
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
|
||||||
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
|
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
|
||||||
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
|
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
|
||||||
<b>--output</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
|
<b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
|
||||||
<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
|
options are specific to <b>pcre2grep</b>, as is the use of the
|
||||||
capturing parentheses number.
|
<b>--only-matching</b> option with a capturing parentheses number.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Although most of the common options work the same way, a few are different in
|
Although most of the common options work the same way, a few are different in
|
||||||
|
@ -971,9 +980,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 24 November 2018
|
Last updated: 28 May 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -1741,18 +1741,23 @@ COMPILING A PATTERN
|
||||||
(*COMMIT) prevents any further matches being tried, so the overall
|
(*COMMIT) prevents any further matches being tried, so the overall
|
||||||
result is "no match".
|
result is "no match".
|
||||||
|
|
||||||
There are also other start-up optimizations. For example, a minimum
|
As another start-up optimization makes use of a minimum length for a
|
||||||
length for the subject may be recorded. Consider the pattern
|
matching subject, which is recorded when possible. Consider the pattern
|
||||||
|
|
||||||
(*MARK:A)(X|Y)
|
(*MARK:1)B(*MARK:2)(X|Y)
|
||||||
|
|
||||||
The minimum length for a match is one character. If the subject is
|
The minimum length for a match is two characters. If the subject is
|
||||||
"ABC", there will be attempts to match "ABC", "BC", and "C". An attempt
|
"XXBB", the "starting character" optimization skips "XX", then tries to
|
||||||
to match an empty string at the end of the subject does not take place,
|
match "BB", which is long enough. In the process, (*MARK:2) is encoun-
|
||||||
because PCRE2 knows that the subject is now too short, and so the
|
tered and remembered. When the match attempt fails, the next "B" is
|
||||||
(*MARK) is never encountered. In this case, the optimization does not
|
found, but there is only one character left, so there are no more
|
||||||
affect the overall match result, which is still "no match", but it does
|
attempts, and "no match" is returned with the "last mark seen" set to
|
||||||
affect the auxiliary information that is returned.
|
"2". If NO_START_OPTIMIZE is set, however, matches are tried at every
|
||||||
|
possible starting position, including at the end of the subject, where
|
||||||
|
(*MARK:1) is encountered, but there is no "B", so the "last mark seen"
|
||||||
|
that is returned is "1". In this case, the optimizations do not affect
|
||||||
|
the overall match result, which is still "no match", but they do affect
|
||||||
|
the auxiliary information that is returned.
|
||||||
|
|
||||||
PCRE2_NO_UTF_CHECK
|
PCRE2_NO_UTF_CHECK
|
||||||
|
|
||||||
|
@ -3698,7 +3703,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 23 May 2019
|
Last updated: 30 May 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "23 May 2019" "PCRE2 10.34"
|
.TH PCRE2API 3 "30 May 2019" "PCRE2 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1701,17 +1701,22 @@ subject string does not happen. The first match attempt is run starting from
|
||||||
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
||||||
the overall result is "no match".
|
the overall result is "no match".
|
||||||
.P
|
.P
|
||||||
There are also other start-up optimizations. For example, a minimum length for
|
As another start-up optimization makes use of a minimum length for a matching
|
||||||
the subject may be recorded. Consider the pattern
|
subject, which is recorded when possible. Consider the pattern
|
||||||
.sp
|
.sp
|
||||||
(*MARK:A)(X|Y)
|
(*MARK:1)B(*MARK:2)(X|Y)
|
||||||
.sp
|
.sp
|
||||||
The minimum length for a match is one character. If the subject is "ABC", there
|
The minimum length for a match is two characters. If the subject is "XXBB", the
|
||||||
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
|
"starting character" optimization skips "XX", then tries to match "BB", which
|
||||||
string at the end of the subject does not take place, because PCRE2 knows that
|
is long enough. In the process, (*MARK:2) is encountered and remembered. When
|
||||||
the subject is now too short, and so the (*MARK) is never encountered. In this
|
the match attempt fails, the next "B" is found, but there is only one character
|
||||||
case, the optimization does not affect the overall match result, which is still
|
left, so there are no more attempts, and "no match" is returned with the "last
|
||||||
"no match", but it does affect the auxiliary information that is returned.
|
mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
|
||||||
|
at every possible starting position, including at the end of the subject, where
|
||||||
|
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
|
||||||
|
returned is "1". In this case, the optimizations do not affect the overall
|
||||||
|
match result, which is still "no match", but they do affect the auxiliary
|
||||||
|
information that is returned.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_NO_UTF_CHECK
|
PCRE2_NO_UTF_CHECK
|
||||||
.sp
|
.sp
|
||||||
|
@ -3843,6 +3848,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 23 May 2019
|
Last updated: 30 May 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -650,7 +650,7 @@ with UTF-8 support. All patterns (including those for any \fB--exclude\fP and
|
||||||
\fB--include\fP options) and all subject lines that are scanned must be valid
|
\fB--include\fP options) and all subject lines that are scanned must be valid
|
||||||
strings of UTF-8 characters.
|
strings of UTF-8 characters.
|
||||||
.TP
|
.TP
|
||||||
\fb-U\fP, \fB--utf-allow-invalid\fP
|
\fB-U\fP, \fB--utf-allow-invalid\fP
|
||||||
As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
|
As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
|
||||||
unit sequences. These can never form part of any pattern match. This facility
|
unit sequences. These can never form part of any pattern match. This facility
|
||||||
allows valid UTF-8 strings to be sought in executable or other binary files.
|
allows valid UTF-8 strings to be sought in executable or other binary files.
|
||||||
|
|
|
@ -719,13 +719,20 @@ OPTIONS
|
||||||
(list files without matches), because the grand total would
|
(list files without matches), because the grand total would
|
||||||
always be zero.
|
always be zero.
|
||||||
|
|
||||||
-u, --utf-8
|
-u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
|
||||||
Operate in UTF-8 mode. This option is available only if PCRE2
|
|
||||||
has been compiled with UTF-8 support. All patterns (including
|
has been compiled with UTF-8 support. All patterns (including
|
||||||
those for any --exclude and --include options) and all sub-
|
those for any --exclude and --include options) and all sub-
|
||||||
ject lines that are scanned must be valid strings of UTF-8
|
ject lines that are scanned must be valid strings of UTF-8
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
-U, --utf-allow-invalid
|
||||||
|
As --utf, but in addition subject lines may contain invalid
|
||||||
|
UTF-8 code unit sequences. These can never form part of any
|
||||||
|
pattern match. This facility allows valid UTF-8 strings to be
|
||||||
|
sought in executable or other binary files. For more details
|
||||||
|
about matching in non-valid UTF-8 strings, see the pcre2uni-
|
||||||
|
code(3) documentation.
|
||||||
|
|
||||||
-V, --version
|
-V, --version
|
||||||
Write the version numbers of pcre2grep and the PCRE2 library
|
Write the version numbers of pcre2grep and the PCRE2 library
|
||||||
to the standard output and then exit. Anything else on the
|
to the standard output and then exit. Anything else on the
|
||||||
|
@ -785,9 +792,9 @@ OPTIONS COMPATIBILITY
|
||||||
terminology) is also available as --xxx-regex (PCRE2 terminology). How-
|
terminology) is also available as --xxx-regex (PCRE2 terminology). How-
|
||||||
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
|
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
|
||||||
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
|
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
|
||||||
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options
|
line, -N, --newline, --om-separator, --output, -u, --utf, -U, and
|
||||||
are specific to pcre2grep, as is the use of the --only-matching option
|
--utf-allow-invalid options are specific to pcre2grep, as is the use of
|
||||||
with a capturing parentheses number.
|
the --only-matching option with a capturing parentheses number.
|
||||||
|
|
||||||
Although most of the common options work the same way, a few are dif-
|
Although most of the common options work the same way, a few are dif-
|
||||||
ferent in pcre2grep. For example, the --include option's argument is a
|
ferent in pcre2grep. For example, the --include option's argument is a
|
||||||
|
@ -948,5 +955,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 24 November 2018
|
Last updated: 28 May 2019
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue