Documentation update.

2019-05-30 15:43:05 +00:00 · 2019-05-30 15:43:05 +00:00 · 16d47a9cb1
parent d5dc4e0c33
commit 16d47a9cb1
6 changed files with 951 additions and 920 deletions
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -1762,17 +1762,22 @@ subject string does not happen. The first match attempt is run starting from
 the overall result is "no match".
 </P>
 <P>
-There are also other start-up optimizations. For example, a minimum length for
+As another start-up optimization makes use of a minimum length for a matching
-the subject may be recorded. Consider the pattern
+subject, which is recorded when possible. Consider the pattern
 <pre>
-  (*MARK:A)(X|Y)
+  (*MARK:1)B(*MARK:2)(X|Y)
 </pre>
-The minimum length for a match is one character. If the subject is "ABC", there
+The minimum length for a match is two characters. If the subject is "XXBB", the 
-will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
+"starting character" optimization skips "XX", then tries to match "BB", which 
-string at the end of the subject does not take place, because PCRE2 knows that
+is long enough. In the process, (*MARK:2) is encountered and remembered. When 
-the subject is now too short, and so the (*MARK) is never encountered. In this
+the match attempt fails, the next "B" is found, but there is only one character
-case, the optimization does not affect the overall match result, which is still
+left, so there are no more attempts, and "no match" is returned with the "last
-"no match", but it does affect the auxiliary information that is returned.
+mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
 at every possible starting position, including at the end of the subject, where
 (*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
 returned is "1". In this case, the optimizations do not affect the overall
 match result, which is still "no match", but they do affect the auxiliary
 information that is returned.
 <pre>
  PCRE2_NO_UTF_CHECK
 </pre>
@ -3831,7 +3836,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 May 2019
+Last updated: 30 May 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2grep.html
+++ b/doc/html/pcre2grep.html
@ -741,13 +741,22 @@ ignored when used with <b>-L</b> (list files without matches), because the grand
 total would always be zero.
 </P>
 <P>
-<b>-u</b>, <b>--utf-8</b>
+<b>-u</b>, <b>--utf</b>
 Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
 with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
 <b>--include</b> options) and all subject lines that are scanned must be valid
 strings of UTF-8 characters.
 </P>
 <P>
 <b>-U</b>, <b>--utf-allow-invalid</b>
 As <b>--utf</b>, but in addition subject lines may contain invalid UTF-8 code
 unit sequences. These can never form part of any pattern match. This facility
 allows valid UTF-8 strings to be sought in executable or other binary files.
 For more details about matching in non-valid UTF-8 strings, see the
 <a href="pcre2unicode.html"><b>pcre2unicode</b>(3)</a>
 documentation.
 </P>
 <P>
 <b>-V</b>, <b>--version</b>
 Write the version numbers of <b>pcre2grep</b> and the PCRE2 library to the
 standard output and then exit. Anything else on the command line is
@ -806,9 +815,9 @@ as in the GNU <b>grep</b> program. Any long option of the form
 <b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
 <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
 <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
-<b>--output</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
+<b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
-<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
+options are specific to <b>pcre2grep</b>, as is the use of the
-capturing parentheses number.
+<b>--only-matching</b> option with a capturing parentheses number.
 </P>
 <P>
 Although most of the common options work the same way, a few are different in
@ -971,9 +980,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC16" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 24 November 2018
+Last updated: 28 May 2019
 <br>
-Copyright &copy; 1997-2018 University of Cambridge.
+Copyright &copy; 1997-2019 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@ -1741,18 +1741,23 @@ COMPILING A PATTERN
       (*COMMIT)  prevents  any  further  matches  being tried, so the overall
       result is "no match".
-       There are also other start-up optimizations.  For  example,  a  minimum
+       As another start-up optimization makes use of a minimum  length  for  a
-       length for the subject may be recorded. Consider the pattern
+       matching subject, which is recorded when possible. Consider the pattern
-         (*MARK:A)(X|Y)
+         (*MARK:1)B(*MARK:2)(X|Y)
-       The  minimum  length  for  a  match is one character. If the subject is
+       The  minimum  length  for  a match is two characters. If the subject is
-       "ABC", there will be attempts to match "ABC", "BC", and "C". An attempt
+       "XXBB", the "starting character" optimization skips "XX", then tries to
-       to match an empty string at the end of the subject does not take place,
+       match  "BB", which is long enough. In the process, (*MARK:2) is encoun-
-       because PCRE2 knows that the subject is  now  too  short,  and  so  the
+       tered and remembered. When the match attempt fails,  the  next  "B"  is
-       (*MARK)  is  never encountered. In this case, the optimization does not
+       found,  but  there  is  only  one  character left, so there are no more
-       affect the overall match result, which is still "no match", but it does
+       attempts, and "no match" is returned with the "last mark seen"  set  to
-       affect the auxiliary information that is returned.
+       "2".  If  NO_START_OPTIMIZE is set, however, matches are tried at every
       possible starting position, including at the end of the subject,  where
       (*MARK:1)  is encountered, but there is no "B", so the "last mark seen"
       that is returned is "1". In this case, the optimizations do not  affect
       the overall match result, which is still "no match", but they do affect
       the auxiliary information that is returned.
         PCRE2_NO_UTF_CHECK
@ -3698,7 +3703,7 @@ AUTHOR
 REVISION
-       Last updated: 23 May 2019
+       Last updated: 30 May 2019
       Copyright (c) 1997-2019 University of Cambridge.
 ------------------------------------------------------------------------------
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "23 May 2019" "PCRE2 10.34"
+.TH PCRE2API 3 "30 May 2019" "PCRE2 10.34"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -1701,17 +1701,22 @@ subject string does not happen. The first match attempt is run starting from
 "D" and when this fails, (*COMMIT) prevents any further matches being tried, so
 the overall result is "no match".
 .P
-There are also other start-up optimizations. For example, a minimum length for
+As another start-up optimization makes use of a minimum length for a matching
-the subject may be recorded. Consider the pattern
+subject, which is recorded when possible. Consider the pattern
 .sp
-  (*MARK:A)(X|Y)
+  (*MARK:1)B(*MARK:2)(X|Y)
 .sp
-The minimum length for a match is one character. If the subject is "ABC", there
+The minimum length for a match is two characters. If the subject is "XXBB", the 
-will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
+"starting character" optimization skips "XX", then tries to match "BB", which 
-string at the end of the subject does not take place, because PCRE2 knows that
+is long enough. In the process, (*MARK:2) is encountered and remembered. When 
-the subject is now too short, and so the (*MARK) is never encountered. In this
+the match attempt fails, the next "B" is found, but there is only one character
-case, the optimization does not affect the overall match result, which is still
+left, so there are no more attempts, and "no match" is returned with the "last
-"no match", but it does affect the auxiliary information that is returned.
+mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
 at every possible starting position, including at the end of the subject, where
 (*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
 returned is "1". In this case, the optimizations do not affect the overall
 match result, which is still "no match", but they do affect the auxiliary
 information that is returned.
 .sp
  PCRE2_NO_UTF_CHECK
 .sp
@ -3843,6 +3848,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 23 May 2019
+Last updated: 30 May 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2grep.1
+++ b/doc/pcre2grep.1
@ -650,7 +650,7 @@ with UTF-8 support. All patterns (including those for any \fB--exclude\fP and
 \fB--include\fP options) and all subject lines that are scanned must be valid
 strings of UTF-8 characters.
 .TP
-\fb-U\fP, \fB--utf-allow-invalid\fP
+\fB-U\fP, \fB--utf-allow-invalid\fP
 As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code
 unit sequences. These can never form part of any pattern match. This facility
 allows valid UTF-8 strings to be sought in executable or other binary files.
--- a/doc/pcre2grep.txt
+++ b/doc/pcre2grep.txt
@ -719,13 +719,20 @@ OPTIONS
                 (list  files  without matches), because the grand total would
                 always be zero.
-       -u, --utf-8
+       -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
                 Operate in UTF-8 mode. This option is available only if PCRE2
                 has been compiled with UTF-8 support. All patterns (including
                 those for any --exclude and --include options) and  all  sub-
                 ject  lines  that  are scanned must be valid strings of UTF-8
                 characters.
       -U, --utf-allow-invalid
                 As --utf, but in addition subject lines may  contain  invalid
                 UTF-8  code  unit sequences. These can never form part of any
                 pattern match. This facility allows valid UTF-8 strings to be
                 sought in executable or other binary files.  For more details
                 about matching in non-valid UTF-8 strings, see the  pcre2uni-
                 code(3) documentation.
       -V, --version
                 Write  the version numbers of pcre2grep and the PCRE2 library
                 to the standard output and then exit. Anything  else  on  the
@ -785,9 +792,9 @@ OPTIONS COMPATIBILITY
       terminology) is also available as --xxx-regex (PCRE2 terminology). How-
       ever,  the  --depth-limit,  --file-list,  --file-offsets, --heap-limit,
       --include-dir, --line-offsets, --locale,  --match-limit,  -M,  --multi-
-       line, -N, --newline, --om-separator, --output, -u, and --utf-8  options
+       line,  -N,  --newline,  --om-separator,  --output,  -u,  --utf, -U, and
-       are  specific to pcre2grep, as is the use of the --only-matching option
+       --utf-allow-invalid options are specific to pcre2grep, as is the use of
-       with a capturing parentheses number.
+       the --only-matching option with a capturing parentheses number.
       Although  most  of the common options work the same way, a few are dif-
       ferent in pcre2grep. For example, the --include option's argument is  a
@ -948,5 +955,5 @@ AUTHOR
 REVISION
-       Last updated: 24 November 2018
+       Last updated: 28 May 2019
-       Copyright (c) 1997-2018 University of Cambridge.
+       Copyright (c) 1997-2019 University of Cambridge.