Documentation update to clarify ovector usage with DFA matching.

2021-08-28 16:25:59 +01:00 · 2021-08-28 16:25:59 +01:00 · 6c2fe9da99
parent 5ff1daffa0
commit 6c2fe9da99
11 changed files with 204 additions and 148 deletions
--- a/doc/html/pcre2_dfa_match.html
+++ b/doc/html/pcre2_dfa_match.html
@ -45,10 +45,16 @@ just once (except when processing lookaround assertions). This function is
  <i>workspace</i>    Points to a vector of ints used as working space
  <i>wscount</i>      Number of elements in the vector
 </pre>
-For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
+The size of output vector needed to contain all the results depends on the  
-up a callout function or specify the heap limit or the match or the recursion
+number of simultaneous matches, not on the number of parentheses in the        
-depth limits. The <i>length</i> and <i>startoffset</i> values are code units, not
+pattern. Using <b>pcre2_match_data_create_from_pattern()</b> to create the match 
-characters. The options are:
+data block is therefore not advisable when using this function.   
 </P>
 <P>
 A match context is needed only if you want to set up a callout function or
 specify the heap limit or the match or the recursion depth limits. The
 <i>length</i> and <i>startoffset</i> values are code units, not characters. The
 options are:
 <pre>
  PCRE2_ANCHORED          Match only at the first position
  PCRE2_COPY_MATCHED_SUBJECT
--- a/doc/html/pcre2_match_data_create.html
+++ b/doc/html/pcre2_match_data_create.html
@ -30,8 +30,9 @@ This function creates a new match data block, which is used for holding the
 result of a match. The first argument specifies the number of pairs of offsets
 that are required. These form the "output vector" (ovector) within the match
 data block, and are used to identify the matched string and any captured
-substrings. There is always one pair of offsets; if <b>ovecsize</b> is zero, it
+substrings when matching with <b>pcre2_match()</b>, or a number of different
-is treated as one.
+matches at the same point when used with <b>pcre2_dfa_match()</b>. There is
 always one pair of offsets; if <b>ovecsize</b> is zero, it is treated as one.
 </P>
 <P>
 The second argument points to a general context, for custom memory management,
--- a/doc/html/pcre2_match_data_create_from_pattern.html
+++ b/doc/html/pcre2_match_data_create_from_pattern.html
@ -26,12 +26,15 @@ SYNOPSIS
 DESCRIPTION
 </b><br>
 <P>
-This function creates a new match data block, which is used for holding the
+This function creates a new match data block for holding the result of a match.
-result of a match. The first argument points to a compiled pattern. The number
+The first argument points to a compiled pattern. The number of capturing
-of capturing parentheses within the pattern is used to compute the number of
+parentheses within the pattern is used to compute the number of pairs of
-pairs of offsets that are required in the match data block. These form the
+offsets that are required in the match data block. These form the "output
-"output vector" (ovector) within the match data block, and are used to identify
+vector" (ovector) within the match data block, and are used to identify the
-the matched string and any captured substrings.
+matched string and any captured substrings when matching with
 <b>pcre2_match()</b>. If you are using <b>pcre2_dfa_match()</b>, which uses the
 outut vector in a different way, you should use <b>pcre2_match_data_create()</b>
 instead of this function.
 </P>
 <P>
 The second argument points to a general context, for custom memory management,
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -2512,20 +2512,31 @@ to an abstract format like Java or .NET serialization.
 Information about a successful or unsuccessful match is placed in a match
 data block, which is an opaque structure that is accessed by function calls. In
 particular, the match data block contains a vector of offsets into the subject
-string that define the matched part of the subject and any substrings that were
+string that define the matched parts of the subject. This is known as the
-captured. This is known as the <i>ovector</i>.
+<i>ovector</i>.
 </P>
 <P>
 Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
 <b>pcre2_jit_match()</b> you must create a match data block by calling one of
 the creation functions above. For <b>pcre2_match_data_create()</b>, the first
-argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
+argument is the number of pairs of offsets in the <i>ovector</i>. 
-offsets is required to identify the string that matched the whole pattern, with
+</P>
-an additional pair for each captured substring. For example, a value of 4
+<P>
-creates enough space to record the matched portion of the subject plus three
+When using <b>pcre2_match()</b>, one pair of offsets is required to identify the
-captured substrings. A minimum of at least 1 pair is imposed by
+string that matched the whole pattern, with an additional pair for each
-<b>pcre2_match_data_create()</b>, so it is always possible to return the overall
+captured substring. For example, a value of 4 creates enough space to record
-matched string.
+the matched portion of the subject plus three captured substrings.
 </P>
 <P>
 When using <b>pcre2_dfa_match()</b> there may be multiple matched substrings of 
 different lengths at the same point in the subject. The ovector should be made
 large enough to hold as many as are expected.
 </P>
 <P>
 A minimum of at least 1 pair is imposed by <b>pcre2_match_data_create()</b>, so
 it is always possible to return the overall matched string in the case of 
 <b>pcre2_match()</b> or the longest match in the case of 
 <b>pcre2_dfa_match()</b>.
 </P>
 <P>
 The second argument of <b>pcre2_match_data_create()</b> is a pointer to a
@ -2536,10 +2547,11 @@ pass NULL, which causes <b>malloc()</b> to be used.
 <P>
 For <b>pcre2_match_data_create_from_pattern()</b>, the first argument is a
 pointer to a compiled pattern. The ovector is created to be exactly the right
-size to hold all the substrings a pattern might capture. The second argument is
+size to hold all the substrings a pattern might capture when matched using
-again a pointer to a general context, but in this case if NULL is passed, the
+<b>pcre2_match()</b>. You should not use this call when matching with
-memory is obtained using the same allocator that was used for the compiled
+<b>pcre2_dfa_match()</b>. The second argument is again a pointer to a general
-pattern (custom or default).
+context, but in this case if NULL is passed, the memory is obtained using the
 same allocator that was used for the compiled pattern (custom or default).
 </P>
 <P>
 A match data block can be used many times, with the same or different compiled
@ -3982,16 +3994,16 @@ fail, this error is given.
 <P>
 Philip Hazel
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 Cambridge, England.
 <br>
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 04 November 2020
+Last updated: 28 August 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2matching.html
+++ b/doc/html/pcre2matching.html
@ -78,8 +78,9 @@ tried is controlled by the greedy or ungreedy nature of the quantifier.
 If a leaf node is reached, a matching string has been found, and at that point
 the algorithm stops. Thus, if there is more than one possible match, this
 algorithm returns the first one that it finds. Whether this is the shortest,
-the longest, or some intermediate length depends on the way the greedy and
+the longest, or some intermediate length depends on the way the alternations
-ungreedy repetition quantifiers are specified in the pattern.
+and the greedy or ungreedy repetition quantifiers are specified in the
 pattern.
 </P>
 <P>
 Because it ends up with a single path through the tree, it is relatively
@ -109,11 +110,17 @@ no more unterminated paths. At this point, terminated paths represent the
 different matching possibilities (if there are none, the match has failed).
 Thus, if there is more than one possible match, this algorithm finds all of
 them, and in particular, it finds the longest. The matches are returned in
-decreasing order of length. There is an option to stop the algorithm after the
+the output vector in decreasing order of length. There is an option to stop the
-first match (which is necessarily the shortest) is found.
+algorithm after the first match (which is necessarily the shortest) is found.
 </P>
 <P>
-Note that all the matches that are found start at the same point in the
+Note that the size of vector needed to contain all the results depends on the
 number of simultaneous matches, not on the number of parentheses in the
 pattern. Using <b>pcre2_match_data_create_from_pattern()</b> to create the match
 data block is therefore not advisable when doing DFA matching.
 </P>
 <P>
 Note also that all the matches that are found start at the same point in the
 subject. If the pattern
 <pre>
  cat(er(pillar)?)?
@ -194,21 +201,14 @@ supported by <b>pcre2_dfa_match()</b>.
 </P>
 <br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
 <P>
-Using the alternative matching algorithm provides the following advantages:
+The main advantage of the alternative algorithm is that all possible matches
 (at a single point in the subject) are automatically found, and in particular,
 the longest match is found. To find more than one match at the same point using
 the standard algorithm, you have to do kludgy things with callouts.
 </P>
 <P>
-1. All possible matches (at a single point in the subject) are automatically
+Partial matching is possible with this algorithm, though it has some
-found, and in particular, the longest match is found. To find more than one
+limitations. The
 match using the standard algorithm, you have to do kludgy things with
 callouts.
 </P>
 <P>
 2. Because the alternative algorithm scans the subject string just once, and
 never needs to backtrack (except for lookbehinds), it is possible to pass very
 long subject strings to the matching function in several pieces, checking for
 partial matching each time. Although it is also possible to do multi-segment
 matching using the standard algorithm, by retaining partially matched
 substrings, it is more complicated. The
 <a href="pcre2partial.html"><b>pcre2partial</b></a>
 documentation gives details of partial matching and discusses multi-segment
 matching.
@ -230,20 +230,23 @@ invalid UTF string are not supported.
 3. Although atomic groups are supported, their use does not provide the
 performance advantage that it does for the standard algorithm.
 </P>
 <P>
 4. JIT optimization is not supported.
 </P>
 <br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 Cambridge, England.
 <br>
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 May 2019
+Last updated: 28 August 2021
 <br>
-Copyright &copy; 1997-2019 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@ -2468,20 +2468,28 @@ THE MATCH DATA BLOCK
       Information about a successful or unsuccessful match  is  placed  in  a
       match  data  block,  which  is  an opaque structure that is accessed by
       function calls. In particular, the match data block contains  a  vector
-       of  offsets into the subject string that define the matched part of the
+       of offsets into the subject string that define the matched parts of the
-       subject and any substrings that were captured. This  is  known  as  the
+       subject. This is known as the ovector.
       ovector.
       Before calling pcre2_match(), pcre2_dfa_match(),  or  pcre2_jit_match()
       you must create a match data block by calling one of the creation func-
       tions above. For pcre2_match_data_create(), the first argument  is  the
-       number of pairs of offsets in the ovector. One pair of offsets  is  re-
+       number of pairs of offsets in the ovector.
-       quired  to  identify the string that matched the whole pattern, with an
+
-       additional pair for each captured substring. For example, a value of  4
+       When  using  pcre2_match(), one pair of offsets is required to identify
-       creates  enough space to record the matched portion of the subject plus
+       the string that matched the whole pattern, with an additional pair  for
-       three captured substrings. A minimum of at least 1 pair is  imposed  by
+       each captured substring. For example, a value of 4 creates enough space
-       pcre2_match_data_create(), so it is always possible to return the over-
+       to record the matched portion of the subject plus three  captured  sub-
-       all matched string.
+       strings.
       When  using  pcre2_dfa_match() there may be multiple matched substrings
       of different lengths at the same point  in  the  subject.  The  ovector
       should be made large enough to hold as many as are expected.
       A  minimum  of at least 1 pair is imposed by pcre2_match_data_create(),
       so it is always possible to return the overall matched  string  in  the
       case   of   pcre2_match()   or   the  longest  match  in  the  case  of
       pcre2_dfa_match().
       The second argument of pcre2_match_data_create() is a pointer to a gen-
       eral  context, which can specify custom memory management for obtaining
@ -2490,10 +2498,12 @@ THE MATCH DATA BLOCK
       For  pcre2_match_data_create_from_pattern(),  the  first  argument is a
       pointer to a compiled pattern. The ovector is created to be exactly the
-       right size to hold all the substrings a pattern might capture. The sec-
+       right  size  to  hold  all  the substrings a pattern might capture when
-       ond argument is again a pointer to a general context, but in this  case
+       matched using pcre2_match(). You should not use this call when matching
-       if NULL is passed, the memory is obtained using the same allocator that
+       with  pcre2_dfa_match().  The  second  argument is again a pointer to a
-       was used for the compiled pattern (custom or default).
+       general context, but in this case if NULL is passed, the memory is  ob-
       tained  using the same allocator that was used for the compiled pattern
       (custom or default).
       A match data block can be used many times, with the same  or  different
       compiled  patterns. You can extract information from a match data block
@ -3825,14 +3835,14 @@ SEE ALSO
 AUTHOR
       Philip Hazel
-       University Computing Service
+       Retired from University Computing Service
       Cambridge, England.
 REVISION
-       Last updated: 04 November 2020
+       Last updated: 28 August 2021
-       Copyright (c) 1997-2020 University of Cambridge.
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
@ -5635,8 +5645,8 @@ THE STANDARD MATCHING ALGORITHM
       that  point the algorithm stops. Thus, if there is more than one possi-
       ble match, this algorithm returns the first one that it finds.  Whether
       this  is the shortest, the longest, or some intermediate length depends
-       on the way the greedy and ungreedy repetition quantifiers are specified
+       on the way the alternations and the greedy or ungreedy repetition quan-
-       in the pattern.
+       tifiers are specified in the pattern.
       Because  it  ends  up  with a single path through the tree, it is rela-
       tively straightforward for this algorithm to keep  track  of  the  sub-
@ -5665,12 +5675,18 @@ THE ALTERNATIVE MATCHING ALGORITHM
       represent the different matching possibilities (if there are none,  the
       match  has  failed).   Thus,  if there is more than one possible match,
       this algorithm finds all of them, and in particular, it finds the long-
-       est.  The  matches are returned in decreasing order of length. There is
+       est.  The matches are returned in the output vector in decreasing order
-       an option to stop the algorithm after the first match (which is  neces-
+       of length. There is an option to stop the  algorithm  after  the  first
-       sarily the shortest) is found.
+       match (which is necessarily the shortest) is found.
-       Note that all the matches that are found start at the same point in the
+       Note  that the size of vector needed to contain all the results depends
-       subject. If the pattern
+       on the number of simultaneous matches, not on the number of parentheses
       in  the pattern. Using pcre2_match_data_create_from_pattern() to create
       the match data block is therefore not advisable when doing  DFA  match-
       ing.
       Note  also  that all the matches that are found start at the same point
       in the subject. If the pattern
         cat(er(pillar)?)?
@ -5746,22 +5762,15 @@ THE ALTERNATIVE MATCHING ALGORITHM
 ADVANTAGES OF THE ALTERNATIVE ALGORITHM
-       Using  the alternative matching algorithm provides the following advan-
+       The  main  advantage  of the alternative algorithm is that all possible
-       tages:
+       matches (at a single point in the subject) are automatically found, and
-
+       in  particular, the longest match is found. To find more than one match
-       1. All possible matches (at a single point in the subject) are automat-
+       at the same point using the standard algorithm, you have to  do  kludgy
       ically  found,  and  in particular, the longest match is found. To find
       more than one match using the standard algorithm, you have to do kludgy
       things with callouts.
-       2.  Because  the  alternative  algorithm  scans the subject string just
+       Partial  matching  is  possible with this algorithm, though it has some
-       once, and never needs to backtrack (except for lookbehinds), it is pos-
+       limitations. The pcre2partial documentation gives  details  of  partial
-       sible  to  pass  very  long subject strings to the matching function in
+       matching and discusses multi-segment matching.
       several pieces, checking for partial matching each time. Although it is
       also  possible  to  do  multi-segment matching using the standard algo-
       rithm, by retaining partially matched substrings, it  is  more  compli-
       cated. The pcre2partial documentation gives details of partial matching
       and discusses multi-segment matching.
 DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
@ -5778,18 +5787,20 @@ DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
       3. Although atomic groups are supported, their use does not provide the
       performance advantage that it does for the standard algorithm.
       4. JIT optimization is not supported.
 AUTHOR
       Philip Hazel
-       University Computing Service
+       Retired from University Computing Service
       Cambridge, England.
 REVISION
-       Last updated: 23 May 2019
+       Last updated: 28 August 2021
-       Copyright (c) 1997-2019 University of Cambridge.
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
--- a/doc/pcre2_dfa_match.3
+++ b/doc/pcre2_dfa_match.3
@ -1,4 +1,4 @@
-.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33"
+.TH PCRE2_DFA_MATCH 3 "28 August 2021" "PCRE2 10.38"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -33,10 +33,15 @@ just once (except when processing lookaround assertions). This function is
  \fIworkspace\fP    Points to a vector of ints used as working space
  \fIwscount\fP      Number of elements in the vector
 .sp
-For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
+The size of output vector needed to contain all the results depends on the  
-up a callout function or specify the heap limit or the match or the recursion
+number of simultaneous matches, not on the number of parentheses in the        
-depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
+pattern. Using \fBpcre2_match_data_create_from_pattern()\fP to create the match 
-characters. The options are:
+data block is therefore not advisable when using this function.   
 .P
 A match context is needed only if you want to set up a callout function or
 specify the heap limit or the match or the recursion depth limits. The
 \fIlength\fP and \fIstartoffset\fP values are code units, not characters. The
 options are:
 .sp
  PCRE2_ANCHORED          Match only at the first position
  PCRE2_COPY_MATCHED_SUBJECT
--- a/doc/pcre2_match_data_create.3
+++ b/doc/pcre2_match_data_create.3
@ -1,4 +1,4 @@
-.TH PCRE2_MATCH_DATA_CREATE 3 "29 July 2015" "PCRE2 10.21"
+.TH PCRE2_MATCH_DATA_CREATE 3 "28 August 2021" "PCRE2 10.38"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -18,8 +18,9 @@ This function creates a new match data block, which is used for holding the
 result of a match. The first argument specifies the number of pairs of offsets
 that are required. These form the "output vector" (ovector) within the match
 data block, and are used to identify the matched string and any captured
-substrings. There is always one pair of offsets; if \fBovecsize\fP is zero, it
+substrings when matching with \fBpcre2_match()\fP, or a number of different
-is treated as one.
+matches at the same point when used with \fBpcre2_dfa_match()\fP. There is
 always one pair of offsets; if \fBovecsize\fP is zero, it is treated as one.
 .P
 The second argument points to a general context, for custom memory management,
 or is NULL for system memory management. The result of the function is NULL if
--- a/doc/pcre2_match_data_create_from_pattern.3
+++ b/doc/pcre2_match_data_create_from_pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "29 July 2015" "PCRE2 10.21"
+.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "28 August 2021" "PCRE2 10.38"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -14,12 +14,15 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .SH DESCRIPTION
 .rs
 .sp
-This function creates a new match data block, which is used for holding the
+This function creates a new match data block for holding the result of a match.
-result of a match. The first argument points to a compiled pattern. The number
+The first argument points to a compiled pattern. The number of capturing
-of capturing parentheses within the pattern is used to compute the number of
+parentheses within the pattern is used to compute the number of pairs of
-pairs of offsets that are required in the match data block. These form the
+offsets that are required in the match data block. These form the "output
-"output vector" (ovector) within the match data block, and are used to identify
+vector" (ovector) within the match data block, and are used to identify the
-the matched string and any captured substrings.
+matched string and any captured substrings when matching with
 \fBpcre2_match()\fP. If you are using \fBpcre2_dfa_match()\fP, which uses the
 outut vector in a different way, you should use \fBpcre2_match_data_create()\fP
 instead of this function.
 .P
 The second argument points to a general context, for custom memory management,
 or is NULL to use the same memory allocator as was used for the compiled
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "04 November 2020" "PCRE2 10.36"
+.TH PCRE2API 3 "28 August 2021" "PCRE2 10.38"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -2490,19 +2490,27 @@ to an abstract format like Java or .NET serialization.
 Information about a successful or unsuccessful match is placed in a match
 data block, which is an opaque structure that is accessed by function calls. In
 particular, the match data block contains a vector of offsets into the subject
-string that define the matched part of the subject and any substrings that were
+string that define the matched parts of the subject. This is known as the
-captured. This is known as the \fIovector\fP.
+\fIovector\fP.
 .P
 Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or
 \fBpcre2_jit_match()\fP you must create a match data block by calling one of
 the creation functions above. For \fBpcre2_match_data_create()\fP, the first
-argument is the number of pairs of offsets in the \fIovector\fP. One pair of
+argument is the number of pairs of offsets in the \fIovector\fP. 
-offsets is required to identify the string that matched the whole pattern, with
+.P
-an additional pair for each captured substring. For example, a value of 4
+When using \fBpcre2_match()\fP, one pair of offsets is required to identify the
-creates enough space to record the matched portion of the subject plus three
+string that matched the whole pattern, with an additional pair for each
-captured substrings. A minimum of at least 1 pair is imposed by
+captured substring. For example, a value of 4 creates enough space to record
-\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
+the matched portion of the subject plus three captured substrings.
-matched string.
+.P
 When using \fBpcre2_dfa_match()\fP there may be multiple matched substrings of 
 different lengths at the same point in the subject. The ovector should be made
 large enough to hold as many as are expected.
 .P
 A minimum of at least 1 pair is imposed by \fBpcre2_match_data_create()\fP, so
 it is always possible to return the overall matched string in the case of 
 \fBpcre2_match()\fP or the longest match in the case of 
 \fBpcre2_dfa_match()\fP.
 .P
 The second argument of \fBpcre2_match_data_create()\fP is a pointer to a
 general context, which can specify custom memory management for obtaining the
@ -2511,10 +2519,11 @@ pass NULL, which causes \fBmalloc()\fP to be used.
 .P
 For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
 pointer to a compiled pattern. The ovector is created to be exactly the right
-size to hold all the substrings a pattern might capture. The second argument is
+size to hold all the substrings a pattern might capture when matched using
-again a pointer to a general context, but in this case if NULL is passed, the
+\fBpcre2_match()\fP. You should not use this call when matching with
-memory is obtained using the same allocator that was used for the compiled
+\fBpcre2_dfa_match()\fP. The second argument is again a pointer to a general
-pattern (custom or default).
+context, but in this case if NULL is passed, the memory is obtained using the
 same allocator that was used for the compiled pattern (custom or default).
 .P
 A match data block can be used many times, with the same or different compiled
 patterns. You can extract information from a match data block after a match
@ -3991,7 +4000,7 @@ fail, this error is given.
 .sp
 .nf
 Philip Hazel
-University Computing Service
+Retired from University Computing Service
 Cambridge, England.
 .fi
 .
@ -4000,6 +4009,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 04 November 2020
+Last updated: 28 August 2021
-Copyright (c) 1997-2020 University of Cambridge.
+Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2matching.3
+++ b/doc/pcre2matching.3
@ -1,4 +1,4 @@
-.TH PCRE2MATCHING 3 "23 May 2019" "PCRE2 10.34"
+.TH PCRE2MATCHING 3 "28 August 2021" "PCRE2 10.38"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 MATCHING ALGORITHMS"
@ -61,8 +61,9 @@ tried is controlled by the greedy or ungreedy nature of the quantifier.
 If a leaf node is reached, a matching string has been found, and at that point
 the algorithm stops. Thus, if there is more than one possible match, this
 algorithm returns the first one that it finds. Whether this is the shortest,
-the longest, or some intermediate length depends on the way the greedy and
+the longest, or some intermediate length depends on the way the alternations
-ungreedy repetition quantifiers are specified in the pattern.
+and the greedy or ungreedy repetition quantifiers are specified in the
 pattern.
 .P
 Because it ends up with a single path through the tree, it is relatively
 straightforward for this algorithm to keep track of the substrings that are
@ -91,10 +92,15 @@ no more unterminated paths. At this point, terminated paths represent the
 different matching possibilities (if there are none, the match has failed).
 Thus, if there is more than one possible match, this algorithm finds all of
 them, and in particular, it finds the longest. The matches are returned in
-decreasing order of length. There is an option to stop the algorithm after the
+the output vector in decreasing order of length. There is an option to stop the
-first match (which is necessarily the shortest) is found.
+algorithm after the first match (which is necessarily the shortest) is found.
 .P
-Note that all the matches that are found start at the same point in the
+Note that the size of vector needed to contain all the results depends on the
 number of simultaneous matches, not on the number of parentheses in the
 pattern. Using \fBpcre2_match_data_create_from_pattern()\fP to create the match
 data block is therefore not advisable when doing DFA matching.
 .P
 Note also that all the matches that are found start at the same point in the
 subject. If the pattern
 .sp
  cat(er(pillar)?)?
@ -165,19 +171,13 @@ supported by \fBpcre2_dfa_match()\fP.
 .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM"
 .rs
 .sp
-Using the alternative matching algorithm provides the following advantages:
+The main advantage of the alternative algorithm is that all possible matches
 (at a single point in the subject) are automatically found, and in particular,
 the longest match is found. To find more than one match at the same point using
 the standard algorithm, you have to do kludgy things with callouts.
 .P
-1. All possible matches (at a single point in the subject) are automatically
+Partial matching is possible with this algorithm, though it has some
-found, and in particular, the longest match is found. To find more than one
+limitations. The
 match using the standard algorithm, you have to do kludgy things with
 callouts.
 .P
 2. Because the alternative algorithm scans the subject string just once, and
 never needs to backtrack (except for lookbehinds), it is possible to pass very
 long subject strings to the matching function in several pieces, checking for
 partial matching each time. Although it is also possible to do multi-segment
 matching using the standard algorithm, by retaining partially matched
 substrings, it is more complicated. The
 .\" HREF
 \fBpcre2partial\fP
 .\"
@ -199,6 +199,8 @@ invalid UTF string are not supported.
 .P
 3. Although atomic groups are supported, their use does not provide the
 performance advantage that it does for the standard algorithm.
 .P
 4. JIT optimization is not supported.
 .
 .
 .SH AUTHOR
@ -206,7 +208,7 @@ performance advantage that it does for the standard algorithm.
 .sp
 .nf
 Philip Hazel
-University Computing Service
+Retired from University Computing Service
 Cambridge, England.
 .fi
 .
@ -215,6 +217,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 23 May 2019
+Last updated: 28 August 2021
-Copyright (c) 1997-2019 University of Cambridge.
+Copyright (c) 1997-2021 University of Cambridge.
 .fi