Documentation update.
This commit is contained in:
parent
d5dc4e0c33
commit
16d47a9cb1
|
@ -1762,17 +1762,22 @@ subject string does not happen. The first match attempt is run starting from
|
|||
the overall result is "no match".
|
||||
</P>
|
||||
<P>
|
||||
There are also other start-up optimizations. For example, a minimum length for
|
||||
the subject may be recorded. Consider the pattern
|
||||
As another start-up optimization makes use of a minimum length for a matching
|
||||
subject, which is recorded when possible. Consider the pattern
|
||||
<pre>
|
||||
(*MARK:A)(X|Y)
|
||||
(*MARK:1)B(*MARK:2)(X|Y)
|
||||
</pre>
|
||||
The minimum length for a match is one character. If the subject is "ABC", there
|
||||
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
|
||||
string at the end of the subject does not take place, because PCRE2 knows that
|
||||
the subject is now too short, and so the (*MARK) is never encountered. In this
|
||||
case, the optimization does not affect the overall match result, which is still
|
||||
"no match", but it does affect the auxiliary information that is returned.
|
||||
The minimum length for a match is two characters. If the subject is "XXBB", the
|
||||
"starting character" optimization skips "XX", then tries to match "BB", which
|
||||
is long enough. In the process, (*MARK:2) is encountered and remembered. When
|
||||
the match attempt fails, the next "B" is found, but there is only one character
|
||||
left, so there are no more attempts, and "no match" is returned with the "last
|
||||
mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
|
||||
at every possible starting position, including at the end of the subject, where
|
||||
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
|
||||
returned is "1". In this case, the optimizations do not affect the overall
|
||||
match result, which is still "no match", but they do affect the auxiliary
|
||||
information that is returned.
|
||||
<pre>
|
||||
PCRE2_NO_UTF_CHECK
|
||||
</pre>
|
||||
|
@ -3831,7 +3836,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 23 May 2019
|
||||
Last updated: 30 May 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -741,13 +741,22 @@ ignored when used with <b>-L</b> (list files without matches), because the grand
|
|||
total would always be zero.
|
||||
</P>
|
||||
<P>
|
||||
<b>-u</b>, <b>--utf-8</b>
|
||||
<b>-u</b>, <b>--utf</b>
|
||||
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
|
||||
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
|
||||
<b>--include</b> options) and all subject lines that are scanned must be valid
|
||||
strings of UTF-8 characters.
|
||||
</P>
|
||||
<P>
|
||||
<b>-U</b>, <b>--utf-allow-invalid</b>
|
||||
As <b>--utf</b>, but in addition subject lines may contain invalid UTF-8 code
|
||||
unit sequences. These can never form part of any pattern match. This facility
|
||||
allows valid UTF-8 strings to be sought in executable or other binary files.
|
||||
For more details about matching in non-valid UTF-8 strings, see the
|
||||
<a href="pcre2unicode.html"><b>pcre2unicode</b>(3)</a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
<b>-V</b>, <b>--version</b>
|
||||
Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
|
||||
standard output and then exit. Anything else on the command line is
|
||||
|
@ -806,9 +815,9 @@ as in the GNU <b>grep</b> program. Any long option of the form
|
|||
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
|
||||
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
|
||||
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
|
||||
<b>--output</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
|
||||
<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
|
||||
capturing parentheses number.
|
||||
<b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
|
||||
options are specific to <b>pcre2grep</b>, as is the use of the
|
||||
<b>--only-matching</b> option with a capturing parentheses number.
|
||||
</P>
|
||||
<P>
|
||||
Although most of the common options work the same way, a few are different in
|
||||
|
@ -971,9 +980,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 24 November 2018
|
||||
Last updated: 28 May 2019
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -1741,18 +1741,23 @@ COMPILING A PATTERN
|
|||
(*COMMIT) prevents any further matches being tried, so the overall
|
||||
result is "no match".
|
||||
|
||||
There are also other start-up optimizations. For example, a minimum
|
||||
length for the subject may be recorded. Consider the pattern
|
||||
As another start-up optimization makes use of a minimum length for a
|
||||
matching subject, which is recorded when possible. Consider the pattern
|
||||
|
||||
(*MARK:A)(X|Y)
|
||||
(*MARK:1)B(*MARK:2)(X|Y)
|
||||
|
||||
The minimum length for a match is one character. If the subject is
|
||||
"ABC", there will be attempts to match "ABC", "BC", and "C". An attempt
|
||||
to match an empty string at the end of the subject does not take place,
|
||||
because PCRE2 knows that the subject is now too short, and so the
|
||||
(*MARK) is never encountered. In this case, the optimization does not
|
||||
affect the overall match result, which is still "no match", but it does
|
||||
affect the auxiliary information that is returned.
|
||||
The minimum length for a match is two characters. If the subject is
|
||||
"XXBB", the "starting character" optimization skips "XX", then tries to
|
||||
match "BB", which is long enough. In the process, (*MARK:2) is encoun-
|
||||
tered and remembered. When the match attempt fails, the next "B" is
|
||||
found, but there is only one character left, so there are no more
|
||||
attempts, and "no match" is returned with the "last mark seen" set to
|
||||
"2". If NO_START_OPTIMIZE is set, however, matches are tried at every
|
||||
possible starting position, including at the end of the subject, where
|
||||
(*MARK:1) is encountered, but there is no "B", so the "last mark seen"
|
||||
that is returned is "1". In this case, the optimizations do not affect
|
||||
the overall match result, which is still "no match", but they do affect
|
||||
the auxiliary information that is returned.
|
||||
|
||||
PCRE2_NO_UTF_CHECK
|
||||
|
||||
|
@ -3698,7 +3703,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 23 May 2019
|
||||
Last updated: 30 May 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "23 May 2019" "PCRE2 10.34"
|
||||
.TH PCRE2API 3 "30 May 2019" "PCRE2 10.34"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -1701,17 +1701,22 @@ subject string does not happen. The first match attempt is run starting from
|
|||
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
|
||||
the overall result is "no match".
|
||||
.P
|
||||
There are also other start-up optimizations. For example, a minimum length for
|
||||
the subject may be recorded. Consider the pattern
|
||||
As another start-up optimization makes use of a minimum length for a matching
|
||||
subject, which is recorded when possible. Consider the pattern
|
||||
.sp
|
||||
(*MARK:A)(X|Y)
|
||||
(*MARK:1)B(*MARK:2)(X|Y)
|
||||
.sp
|
||||
The minimum length for a match is one character. If the subject is "ABC", there
|
||||
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
|
||||
string at the end of the subject does not take place, because PCRE2 knows that
|
||||
the subject is now too short, and so the (*MARK) is never encountered. In this
|
||||
case, the optimization does not affect the overall match result, which is still
|
||||
"no match", but it does affect the auxiliary information that is returned.
|
||||
The minimum length for a match is two characters. If the subject is "XXBB", the
|
||||
"starting character" optimization skips "XX", then tries to match "BB", which
|
||||
is long enough. In the process, (*MARK:2) is encountered and remembered. When
|
||||
the match attempt fails, the next "B" is found, but there is only one character
|
||||
left, so there are no more attempts, and "no match" is returned with the "last
|
||||
mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
|
||||
at every possible starting position, including at the end of the subject, where
|
||||
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
|
||||
returned is "1". In this case, the optimizations do not affect the overall
|
||||
match result, which is still "no match", but they do affect the auxiliary
|
||||
information that is returned.
|
||||
.sp
|
||||
PCRE2_NO_UTF_CHECK
|
||||
.sp
|
||||
|
@ -3843,6 +3848,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 23 May 2019
|
||||
Last updated: 30 May 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -650,7 +650,7 @@ with UTF-8 support. All patterns (including those for any \fB--exclude\fP and
|
|||
\fB--include\fP options) and all subject lines that are scanned must be valid
|
||||
strings of UTF-8 characters.
|
||||
.TP
|
||||
\fb-U\fP, \fB--utf-allow-invalid\fP
|
||||
\fB-U\fP, \fB--utf-allow-invalid\fP
|
||||
As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
|
||||
unit sequences. These can never form part of any pattern match. This facility
|
||||
allows valid UTF-8 strings to be sought in executable or other binary files.
|
||||
|
|
|
@ -719,13 +719,20 @@ OPTIONS
|
|||
(list files without matches), because the grand total would
|
||||
always be zero.
|
||||
|
||||
-u, --utf-8
|
||||
Operate in UTF-8 mode. This option is available only if PCRE2
|
||||
-u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
|
||||
has been compiled with UTF-8 support. All patterns (including
|
||||
those for any --exclude and --include options) and all sub-
|
||||
ject lines that are scanned must be valid strings of UTF-8
|
||||
characters.
|
||||
|
||||
-U, --utf-allow-invalid
|
||||
As --utf, but in addition subject lines may contain invalid
|
||||
UTF-8 code unit sequences. These can never form part of any
|
||||
pattern match. This facility allows valid UTF-8 strings to be
|
||||
sought in executable or other binary files. For more details
|
||||
about matching in non-valid UTF-8 strings, see the pcre2uni-
|
||||
code(3) documentation.
|
||||
|
||||
-V, --version
|
||||
Write the version numbers of pcre2grep and the PCRE2 library
|
||||
to the standard output and then exit. Anything else on the
|
||||
|
@ -785,9 +792,9 @@ OPTIONS COMPATIBILITY
|
|||
terminology) is also available as --xxx-regex (PCRE2 terminology). How-
|
||||
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
|
||||
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
|
||||
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options
|
||||
are specific to pcre2grep, as is the use of the --only-matching option
|
||||
with a capturing parentheses number.
|
||||
line, -N, --newline, --om-separator, --output, -u, --utf, -U, and
|
||||
--utf-allow-invalid options are specific to pcre2grep, as is the use of
|
||||
the --only-matching option with a capturing parentheses number.
|
||||
|
||||
Although most of the common options work the same way, a few are dif-
|
||||
ferent in pcre2grep. For example, the --include option's argument is a
|
||||
|
@ -948,5 +955,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 24 November 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 28 May 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
|
|
Loading…
Reference in New Issue