Documentation update.

This commit is contained in:
Philip.Hazel 2019-05-30 15:43:05 +00:00
parent d5dc4e0c33
commit 16d47a9cb1
6 changed files with 951 additions and 920 deletions

View File

@ -1762,17 +1762,22 @@ subject string does not happen. The first match attempt is run starting from
the overall result is "no match". the overall result is "no match".
</P> </P>
<P> <P>
There are also other start-up optimizations. For example, a minimum length for As another start-up optimization makes use of a minimum length for a matching
the subject may be recorded. Consider the pattern subject, which is recorded when possible. Consider the pattern
<pre> <pre>
(*MARK:A)(X|Y) (*MARK:1)B(*MARK:2)(X|Y)
</pre> </pre>
The minimum length for a match is one character. If the subject is "ABC", there The minimum length for a match is two characters. If the subject is "XXBB", the
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty "starting character" optimization skips "XX", then tries to match "BB", which
string at the end of the subject does not take place, because PCRE2 knows that is long enough. In the process, (*MARK:2) is encountered and remembered. When
the subject is now too short, and so the (*MARK) is never encountered. In this the match attempt fails, the next "B" is found, but there is only one character
case, the optimization does not affect the overall match result, which is still left, so there are no more attempts, and "no match" is returned with the "last
"no match", but it does affect the auxiliary information that is returned. mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
at every possible starting position, including at the end of the subject, where
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
returned is "1". In this case, the optimizations do not affect the overall
match result, which is still "no match", but they do affect the auxiliary
information that is returned.
<pre> <pre>
PCRE2_NO_UTF_CHECK PCRE2_NO_UTF_CHECK
</pre> </pre>
@ -3831,7 +3836,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 23 May 2019 Last updated: 30 May 2019
<br> <br>
Copyright &copy; 1997-2019 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>

View File

@ -741,13 +741,22 @@ ignored when used with <b>-L</b> (list files without matches), because the grand
total would always be zero. total would always be zero.
</P> </P>
<P> <P>
<b>-u</b>, <b>--utf-8</b> <b>-u</b>, <b>--utf</b>
Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
with UTF-8 support. All patterns (including those for any <b>--exclude</b> and with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
<b>--include</b> options) and all subject lines that are scanned must be valid <b>--include</b> options) and all subject lines that are scanned must be valid
strings of UTF-8 characters. strings of UTF-8 characters.
</P> </P>
<P> <P>
<b>-U</b>, <b>--utf-allow-invalid</b>
As <b>--utf</b>, but in addition subject lines may contain invalid UTF-8 code
unit sequences. These can never form part of any pattern match. This facility
allows valid UTF-8 strings to be sought in executable or other binary files.
For more details about matching in non-valid UTF-8 strings, see the
<a href="pcre2unicode.html"><b>pcre2unicode</b>(3)</a>
documentation.
</P>
<P>
<b>-V</b>, <b>--version</b> <b>-V</b>, <b>--version</b>
Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
standard output and then exit. Anything else on the command line is standard output and then exit. Anything else on the command line is
@ -806,9 +815,9 @@ as in the GNU <b>grep</b> program. Any long option of the form
<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>, <b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
<b>--output</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to <b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a options are specific to <b>pcre2grep</b>, as is the use of the
capturing parentheses number. <b>--only-matching</b> option with a capturing parentheses number.
</P> </P>
<P> <P>
Although most of the common options work the same way, a few are different in Although most of the common options work the same way, a few are different in
@ -971,9 +980,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC16" href="#TOC1">REVISION</a><br> <br><a name="SEC16" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 24 November 2018 Last updated: 28 May 2019
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -1741,18 +1741,23 @@ COMPILING A PATTERN
(*COMMIT) prevents any further matches being tried, so the overall (*COMMIT) prevents any further matches being tried, so the overall
result is "no match". result is "no match".
There are also other start-up optimizations. For example, a minimum As another start-up optimization makes use of a minimum length for a
length for the subject may be recorded. Consider the pattern matching subject, which is recorded when possible. Consider the pattern
(*MARK:A)(X|Y) (*MARK:1)B(*MARK:2)(X|Y)
The minimum length for a match is one character. If the subject is The minimum length for a match is two characters. If the subject is
"ABC", there will be attempts to match "ABC", "BC", and "C". An attempt "XXBB", the "starting character" optimization skips "XX", then tries to
to match an empty string at the end of the subject does not take place, match "BB", which is long enough. In the process, (*MARK:2) is encoun-
because PCRE2 knows that the subject is now too short, and so the tered and remembered. When the match attempt fails, the next "B" is
(*MARK) is never encountered. In this case, the optimization does not found, but there is only one character left, so there are no more
affect the overall match result, which is still "no match", but it does attempts, and "no match" is returned with the "last mark seen" set to
affect the auxiliary information that is returned. "2". If NO_START_OPTIMIZE is set, however, matches are tried at every
possible starting position, including at the end of the subject, where
(*MARK:1) is encountered, but there is no "B", so the "last mark seen"
that is returned is "1". In this case, the optimizations do not affect
the overall match result, which is still "no match", but they do affect
the auxiliary information that is returned.
PCRE2_NO_UTF_CHECK PCRE2_NO_UTF_CHECK
@ -3698,7 +3703,7 @@ AUTHOR
REVISION REVISION
Last updated: 23 May 2019 Last updated: 30 May 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "23 May 2019" "PCRE2 10.34" .TH PCRE2API 3 "30 May 2019" "PCRE2 10.34"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -1701,17 +1701,22 @@ subject string does not happen. The first match attempt is run starting from
"D" and when this fails, (*COMMIT) prevents any further matches being tried, so "D" and when this fails, (*COMMIT) prevents any further matches being tried, so
the overall result is "no match". the overall result is "no match".
.P .P
There are also other start-up optimizations. For example, a minimum length for As another start-up optimization makes use of a minimum length for a matching
the subject may be recorded. Consider the pattern subject, which is recorded when possible. Consider the pattern
.sp .sp
(*MARK:A)(X|Y) (*MARK:1)B(*MARK:2)(X|Y)
.sp .sp
The minimum length for a match is one character. If the subject is "ABC", there The minimum length for a match is two characters. If the subject is "XXBB", the
will be attempts to match "ABC", "BC", and "C". An attempt to match an empty "starting character" optimization skips "XX", then tries to match "BB", which
string at the end of the subject does not take place, because PCRE2 knows that is long enough. In the process, (*MARK:2) is encountered and remembered. When
the subject is now too short, and so the (*MARK) is never encountered. In this the match attempt fails, the next "B" is found, but there is only one character
case, the optimization does not affect the overall match result, which is still left, so there are no more attempts, and "no match" is returned with the "last
"no match", but it does affect the auxiliary information that is returned. mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
at every possible starting position, including at the end of the subject, where
(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
returned is "1". In this case, the optimizations do not affect the overall
match result, which is still "no match", but they do affect the auxiliary
information that is returned.
.sp .sp
PCRE2_NO_UTF_CHECK PCRE2_NO_UTF_CHECK
.sp .sp
@ -3843,6 +3848,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 23 May 2019 Last updated: 30 May 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -650,7 +650,7 @@ with UTF-8 support. All patterns (including those for any \fB--exclude\fP and
\fB--include\fP options) and all subject lines that are scanned must be valid \fB--include\fP options) and all subject lines that are scanned must be valid
strings of UTF-8 characters. strings of UTF-8 characters.
.TP .TP
\fb-U\fP, \fB--utf-allow-invalid\fP \fB-U\fP, \fB--utf-allow-invalid\fP
As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
unit sequences. These can never form part of any pattern match. This facility unit sequences. These can never form part of any pattern match. This facility
allows valid UTF-8 strings to be sought in executable or other binary files. allows valid UTF-8 strings to be sought in executable or other binary files.

View File

@ -719,13 +719,20 @@ OPTIONS
(list files without matches), because the grand total would (list files without matches), because the grand total would
always be zero. always be zero.
-u, --utf-8 -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
Operate in UTF-8 mode. This option is available only if PCRE2
has been compiled with UTF-8 support. All patterns (including has been compiled with UTF-8 support. All patterns (including
those for any --exclude and --include options) and all sub- those for any --exclude and --include options) and all sub-
ject lines that are scanned must be valid strings of UTF-8 ject lines that are scanned must be valid strings of UTF-8
characters. characters.
-U, --utf-allow-invalid
As --utf, but in addition subject lines may contain invalid
UTF-8 code unit sequences. These can never form part of any
pattern match. This facility allows valid UTF-8 strings to be
sought in executable or other binary files. For more details
about matching in non-valid UTF-8 strings, see the pcre2uni-
code(3) documentation.
-V, --version -V, --version
Write the version numbers of pcre2grep and the PCRE2 library Write the version numbers of pcre2grep and the PCRE2 library
to the standard output and then exit. Anything else on the to the standard output and then exit. Anything else on the
@ -785,9 +792,9 @@ OPTIONS COMPATIBILITY
terminology) is also available as --xxx-regex (PCRE2 terminology). How- terminology) is also available as --xxx-regex (PCRE2 terminology). How-
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit, ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi- --include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options line, -N, --newline, --om-separator, --output, -u, --utf, -U, and
are specific to pcre2grep, as is the use of the --only-matching option --utf-allow-invalid options are specific to pcre2grep, as is the use of
with a capturing parentheses number. the --only-matching option with a capturing parentheses number.
Although most of the common options work the same way, a few are dif- Although most of the common options work the same way, a few are dif-
ferent in pcre2grep. For example, the --include option's argument is a ferent in pcre2grep. For example, the --include option's argument is a
@ -948,5 +955,5 @@ AUTHOR
REVISION REVISION
Last updated: 24 November 2018 Last updated: 28 May 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.