Documentation update to clarify ovector usage with DFA matching.

This commit is contained in:
Philip Hazel 2021-08-28 16:25:59 +01:00
parent 5ff1daffa0
commit 6c2fe9da99
11 changed files with 204 additions and 148 deletions

View File

@ -45,10 +45,16 @@ just once (except when processing lookaround assertions). This function is
<i>workspace</i> Points to a vector of ints used as working space
<i>wscount</i> Number of elements in the vector
</pre>
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
up a callout function or specify the heap limit or the match or the recursion
depth limits. The <i>length</i> and <i>startoffset</i> values are code units, not
characters. The options are:
The size of output vector needed to contain all the results depends on the
number of simultaneous matches, not on the number of parentheses in the
pattern. Using <b>pcre2_match_data_create_from_pattern()</b> to create the match
data block is therefore not advisable when using this function.
</P>
<P>
A match context is needed only if you want to set up a callout function or
specify the heap limit or the match or the recursion depth limits. The
<i>length</i> and <i>startoffset</i> values are code units, not characters. The
options are:
<pre>
PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT

View File

@ -30,8 +30,9 @@ This function creates a new match data block, which is used for holding the
result of a match. The first argument specifies the number of pairs of offsets
that are required. These form the "output vector" (ovector) within the match
data block, and are used to identify the matched string and any captured
substrings. There is always one pair of offsets; if <b>ovecsize</b> is zero, it
is treated as one.
substrings when matching with <b>pcre2_match()</b>, or a number of different
matches at the same point when used with <b>pcre2_dfa_match()</b>. There is
always one pair of offsets; if <b>ovecsize</b> is zero, it is treated as one.
</P>
<P>
The second argument points to a general context, for custom memory management,

View File

@ -26,12 +26,15 @@ SYNOPSIS
DESCRIPTION
</b><br>
<P>
This function creates a new match data block, which is used for holding the
result of a match. The first argument points to a compiled pattern. The number
of capturing parentheses within the pattern is used to compute the number of
pairs of offsets that are required in the match data block. These form the
"output vector" (ovector) within the match data block, and are used to identify
the matched string and any captured substrings.
This function creates a new match data block for holding the result of a match.
The first argument points to a compiled pattern. The number of capturing
parentheses within the pattern is used to compute the number of pairs of
offsets that are required in the match data block. These form the "output
vector" (ovector) within the match data block, and are used to identify the
matched string and any captured substrings when matching with
<b>pcre2_match()</b>. If you are using <b>pcre2_dfa_match()</b>, which uses the
outut vector in a different way, you should use <b>pcre2_match_data_create()</b>
instead of this function.
</P>
<P>
The second argument points to a general context, for custom memory management,

View File

@ -2512,20 +2512,31 @@ to an abstract format like Java or .NET serialization.
Information about a successful or unsuccessful match is placed in a match
data block, which is an opaque structure that is accessed by function calls. In
particular, the match data block contains a vector of offsets into the subject
string that define the matched part of the subject and any substrings that were
captured. This is known as the <i>ovector</i>.
string that define the matched parts of the subject. This is known as the
<i>ovector</i>.
</P>
<P>
Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
<b>pcre2_jit_match()</b> you must create a match data block by calling one of
the creation functions above. For <b>pcre2_match_data_create()</b>, the first
argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
offsets is required to identify the string that matched the whole pattern, with
an additional pair for each captured substring. For example, a value of 4
creates enough space to record the matched portion of the subject plus three
captured substrings. A minimum of at least 1 pair is imposed by
<b>pcre2_match_data_create()</b>, so it is always possible to return the overall
matched string.
argument is the number of pairs of offsets in the <i>ovector</i>.
</P>
<P>
When using <b>pcre2_match()</b>, one pair of offsets is required to identify the
string that matched the whole pattern, with an additional pair for each
captured substring. For example, a value of 4 creates enough space to record
the matched portion of the subject plus three captured substrings.
</P>
<P>
When using <b>pcre2_dfa_match()</b> there may be multiple matched substrings of
different lengths at the same point in the subject. The ovector should be made
large enough to hold as many as are expected.
</P>
<P>
A minimum of at least 1 pair is imposed by <b>pcre2_match_data_create()</b>, so
it is always possible to return the overall matched string in the case of
<b>pcre2_match()</b> or the longest match in the case of
<b>pcre2_dfa_match()</b>.
</P>
<P>
The second argument of <b>pcre2_match_data_create()</b> is a pointer to a
@ -2536,10 +2547,11 @@ pass NULL, which causes <b>malloc()</b> to be used.
<P>
For <b>pcre2_match_data_create_from_pattern()</b>, the first argument is a
pointer to a compiled pattern. The ovector is created to be exactly the right
size to hold all the substrings a pattern might capture. The second argument is
again a pointer to a general context, but in this case if NULL is passed, the
memory is obtained using the same allocator that was used for the compiled
pattern (custom or default).
size to hold all the substrings a pattern might capture when matched using
<b>pcre2_match()</b>. You should not use this call when matching with
<b>pcre2_dfa_match()</b>. The second argument is again a pointer to a general
context, but in this case if NULL is passed, the memory is obtained using the
same allocator that was used for the compiled pattern (custom or default).
</P>
<P>
A match data block can be used many times, with the same or different compiled
@ -3982,16 +3994,16 @@ fail, this error is given.
<P>
Philip Hazel
<br>
University Computing Service
Retired from University Computing Service
<br>
Cambridge, England.
<br>
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 04 November 2020
Last updated: 28 August 2021
<br>
Copyright &copy; 1997-2020 University of Cambridge.
Copyright &copy; 1997-2021 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -78,8 +78,9 @@ tried is controlled by the greedy or ungreedy nature of the quantifier.
If a leaf node is reached, a matching string has been found, and at that point
the algorithm stops. Thus, if there is more than one possible match, this
algorithm returns the first one that it finds. Whether this is the shortest,
the longest, or some intermediate length depends on the way the greedy and
ungreedy repetition quantifiers are specified in the pattern.
the longest, or some intermediate length depends on the way the alternations
and the greedy or ungreedy repetition quantifiers are specified in the
pattern.
</P>
<P>
Because it ends up with a single path through the tree, it is relatively
@ -109,11 +110,17 @@ no more unterminated paths. At this point, terminated paths represent the
different matching possibilities (if there are none, the match has failed).
Thus, if there is more than one possible match, this algorithm finds all of
them, and in particular, it finds the longest. The matches are returned in
decreasing order of length. There is an option to stop the algorithm after the
first match (which is necessarily the shortest) is found.
the output vector in decreasing order of length. There is an option to stop the
algorithm after the first match (which is necessarily the shortest) is found.
</P>
<P>
Note that all the matches that are found start at the same point in the
Note that the size of vector needed to contain all the results depends on the
number of simultaneous matches, not on the number of parentheses in the
pattern. Using <b>pcre2_match_data_create_from_pattern()</b> to create the match
data block is therefore not advisable when doing DFA matching.
</P>
<P>
Note also that all the matches that are found start at the same point in the
subject. If the pattern
<pre>
cat(er(pillar)?)?
@ -194,21 +201,14 @@ supported by <b>pcre2_dfa_match()</b>.
</P>
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
<P>
Using the alternative matching algorithm provides the following advantages:
The main advantage of the alternative algorithm is that all possible matches
(at a single point in the subject) are automatically found, and in particular,
the longest match is found. To find more than one match at the same point using
the standard algorithm, you have to do kludgy things with callouts.
</P>
<P>
1. All possible matches (at a single point in the subject) are automatically
found, and in particular, the longest match is found. To find more than one
match using the standard algorithm, you have to do kludgy things with
callouts.
</P>
<P>
2. Because the alternative algorithm scans the subject string just once, and
never needs to backtrack (except for lookbehinds), it is possible to pass very
long subject strings to the matching function in several pieces, checking for
partial matching each time. Although it is also possible to do multi-segment
matching using the standard algorithm, by retaining partially matched
substrings, it is more complicated. The
Partial matching is possible with this algorithm, though it has some
limitations. The
<a href="pcre2partial.html"><b>pcre2partial</b></a>
documentation gives details of partial matching and discusses multi-segment
matching.
@ -230,20 +230,23 @@ invalid UTF string are not supported.
3. Although atomic groups are supported, their use does not provide the
performance advantage that it does for the standard algorithm.
</P>
<P>
4. JIT optimization is not supported.
</P>
<br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service
Retired from University Computing Service
<br>
Cambridge, England.
<br>
</P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P>
Last updated: 23 May 2019
Last updated: 28 August 2021
<br>
Copyright &copy; 1997-2019 University of Cambridge.
Copyright &copy; 1997-2021 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -2468,20 +2468,28 @@ THE MATCH DATA BLOCK
Information about a successful or unsuccessful match is placed in a
match data block, which is an opaque structure that is accessed by
function calls. In particular, the match data block contains a vector
of offsets into the subject string that define the matched part of the
subject and any substrings that were captured. This is known as the
ovector.
of offsets into the subject string that define the matched parts of the
subject. This is known as the ovector.
Before calling pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match()
Before calling pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match()
you must create a match data block by calling one of the creation func-
tions above. For pcre2_match_data_create(), the first argument is the
number of pairs of offsets in the ovector. One pair of offsets is re-
quired to identify the string that matched the whole pattern, with an
additional pair for each captured substring. For example, a value of 4
creates enough space to record the matched portion of the subject plus
three captured substrings. A minimum of at least 1 pair is imposed by
pcre2_match_data_create(), so it is always possible to return the over-
all matched string.
tions above. For pcre2_match_data_create(), the first argument is the
number of pairs of offsets in the ovector.
When using pcre2_match(), one pair of offsets is required to identify
the string that matched the whole pattern, with an additional pair for
each captured substring. For example, a value of 4 creates enough space
to record the matched portion of the subject plus three captured sub-
strings.
When using pcre2_dfa_match() there may be multiple matched substrings
of different lengths at the same point in the subject. The ovector
should be made large enough to hold as many as are expected.
A minimum of at least 1 pair is imposed by pcre2_match_data_create(),
so it is always possible to return the overall matched string in the
case of pcre2_match() or the longest match in the case of
pcre2_dfa_match().
The second argument of pcre2_match_data_create() is a pointer to a gen-
eral context, which can specify custom memory management for obtaining
@ -2490,10 +2498,12 @@ THE MATCH DATA BLOCK
For pcre2_match_data_create_from_pattern(), the first argument is a
pointer to a compiled pattern. The ovector is created to be exactly the
right size to hold all the substrings a pattern might capture. The sec-
ond argument is again a pointer to a general context, but in this case
if NULL is passed, the memory is obtained using the same allocator that
was used for the compiled pattern (custom or default).
right size to hold all the substrings a pattern might capture when
matched using pcre2_match(). You should not use this call when matching
with pcre2_dfa_match(). The second argument is again a pointer to a
general context, but in this case if NULL is passed, the memory is ob-
tained using the same allocator that was used for the compiled pattern
(custom or default).
A match data block can be used many times, with the same or different
compiled patterns. You can extract information from a match data block
@ -3825,14 +3835,14 @@ SEE ALSO
AUTHOR
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
REVISION
Last updated: 04 November 2020
Copyright (c) 1997-2020 University of Cambridge.
Last updated: 28 August 2021
Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------
@ -5635,8 +5645,8 @@ THE STANDARD MATCHING ALGORITHM
that point the algorithm stops. Thus, if there is more than one possi-
ble match, this algorithm returns the first one that it finds. Whether
this is the shortest, the longest, or some intermediate length depends
on the way the greedy and ungreedy repetition quantifiers are specified
in the pattern.
on the way the alternations and the greedy or ungreedy repetition quan-
tifiers are specified in the pattern.
Because it ends up with a single path through the tree, it is rela-
tively straightforward for this algorithm to keep track of the sub-
@ -5665,12 +5675,18 @@ THE ALTERNATIVE MATCHING ALGORITHM
represent the different matching possibilities (if there are none, the
match has failed). Thus, if there is more than one possible match,
this algorithm finds all of them, and in particular, it finds the long-
est. The matches are returned in decreasing order of length. There is
an option to stop the algorithm after the first match (which is neces-
sarily the shortest) is found.
est. The matches are returned in the output vector in decreasing order
of length. There is an option to stop the algorithm after the first
match (which is necessarily the shortest) is found.
Note that all the matches that are found start at the same point in the
subject. If the pattern
Note that the size of vector needed to contain all the results depends
on the number of simultaneous matches, not on the number of parentheses
in the pattern. Using pcre2_match_data_create_from_pattern() to create
the match data block is therefore not advisable when doing DFA match-
ing.
Note also that all the matches that are found start at the same point
in the subject. If the pattern
cat(er(pillar)?)?
@ -5746,50 +5762,45 @@ THE ALTERNATIVE MATCHING ALGORITHM
ADVANTAGES OF THE ALTERNATIVE ALGORITHM
Using the alternative matching algorithm provides the following advan-
tages:
1. All possible matches (at a single point in the subject) are automat-
ically found, and in particular, the longest match is found. To find
more than one match using the standard algorithm, you have to do kludgy
The main advantage of the alternative algorithm is that all possible
matches (at a single point in the subject) are automatically found, and
in particular, the longest match is found. To find more than one match
at the same point using the standard algorithm, you have to do kludgy
things with callouts.
2. Because the alternative algorithm scans the subject string just
once, and never needs to backtrack (except for lookbehinds), it is pos-
sible to pass very long subject strings to the matching function in
several pieces, checking for partial matching each time. Although it is
also possible to do multi-segment matching using the standard algo-
rithm, by retaining partially matched substrings, it is more compli-
cated. The pcre2partial documentation gives details of partial matching
and discusses multi-segment matching.
Partial matching is possible with this algorithm, though it has some
limitations. The pcre2partial documentation gives details of partial
matching and discusses multi-segment matching.
DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
The alternative algorithm suffers from a number of disadvantages:
1. It is substantially slower than the standard algorithm. This is
partly because it has to search for all possible matches, but is also
1. It is substantially slower than the standard algorithm. This is
partly because it has to search for all possible matches, but is also
because it is less susceptible to optimization.
2. Capturing parentheses, backreferences, script runs, and matching
2. Capturing parentheses, backreferences, script runs, and matching
within invalid UTF string are not supported.
3. Although atomic groups are supported, their use does not provide the
performance advantage that it does for the standard algorithm.
4. JIT optimization is not supported.
AUTHOR
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
REVISION
Last updated: 23 May 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 28 August 2021
Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33"
.TH PCRE2_DFA_MATCH 3 "28 August 2021" "PCRE2 10.38"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -33,10 +33,15 @@ just once (except when processing lookaround assertions). This function is
\fIworkspace\fP Points to a vector of ints used as working space
\fIwscount\fP Number of elements in the vector
.sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
up a callout function or specify the heap limit or the match or the recursion
depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
characters. The options are:
The size of output vector needed to contain all the results depends on the
number of simultaneous matches, not on the number of parentheses in the
pattern. Using \fBpcre2_match_data_create_from_pattern()\fP to create the match
data block is therefore not advisable when using this function.
.P
A match context is needed only if you want to set up a callout function or
specify the heap limit or the match or the recursion depth limits. The
\fIlength\fP and \fIstartoffset\fP values are code units, not characters. The
options are:
.sp
PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT

View File

@ -1,4 +1,4 @@
.TH PCRE2_MATCH_DATA_CREATE 3 "29 July 2015" "PCRE2 10.21"
.TH PCRE2_MATCH_DATA_CREATE 3 "28 August 2021" "PCRE2 10.38"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -18,8 +18,9 @@ This function creates a new match data block, which is used for holding the
result of a match. The first argument specifies the number of pairs of offsets
that are required. These form the "output vector" (ovector) within the match
data block, and are used to identify the matched string and any captured
substrings. There is always one pair of offsets; if \fBovecsize\fP is zero, it
is treated as one.
substrings when matching with \fBpcre2_match()\fP, or a number of different
matches at the same point when used with \fBpcre2_dfa_match()\fP. There is
always one pair of offsets; if \fBovecsize\fP is zero, it is treated as one.
.P
The second argument points to a general context, for custom memory management,
or is NULL for system memory management. The result of the function is NULL if

View File

@ -1,4 +1,4 @@
.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "29 July 2015" "PCRE2 10.21"
.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "28 August 2021" "PCRE2 10.38"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -14,12 +14,15 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.SH DESCRIPTION
.rs
.sp
This function creates a new match data block, which is used for holding the
result of a match. The first argument points to a compiled pattern. The number
of capturing parentheses within the pattern is used to compute the number of
pairs of offsets that are required in the match data block. These form the
"output vector" (ovector) within the match data block, and are used to identify
the matched string and any captured substrings.
This function creates a new match data block for holding the result of a match.
The first argument points to a compiled pattern. The number of capturing
parentheses within the pattern is used to compute the number of pairs of
offsets that are required in the match data block. These form the "output
vector" (ovector) within the match data block, and are used to identify the
matched string and any captured substrings when matching with
\fBpcre2_match()\fP. If you are using \fBpcre2_dfa_match()\fP, which uses the
outut vector in a different way, you should use \fBpcre2_match_data_create()\fP
instead of this function.
.P
The second argument points to a general context, for custom memory management,
or is NULL to use the same memory allocator as was used for the compiled

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "04 November 2020" "PCRE2 10.36"
.TH PCRE2API 3 "28 August 2021" "PCRE2 10.38"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -2490,19 +2490,27 @@ to an abstract format like Java or .NET serialization.
Information about a successful or unsuccessful match is placed in a match
data block, which is an opaque structure that is accessed by function calls. In
particular, the match data block contains a vector of offsets into the subject
string that define the matched part of the subject and any substrings that were
captured. This is known as the \fIovector\fP.
string that define the matched parts of the subject. This is known as the
\fIovector\fP.
.P
Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or
\fBpcre2_jit_match()\fP you must create a match data block by calling one of
the creation functions above. For \fBpcre2_match_data_create()\fP, the first
argument is the number of pairs of offsets in the \fIovector\fP. One pair of
offsets is required to identify the string that matched the whole pattern, with
an additional pair for each captured substring. For example, a value of 4
creates enough space to record the matched portion of the subject plus three
captured substrings. A minimum of at least 1 pair is imposed by
\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
matched string.
argument is the number of pairs of offsets in the \fIovector\fP.
.P
When using \fBpcre2_match()\fP, one pair of offsets is required to identify the
string that matched the whole pattern, with an additional pair for each
captured substring. For example, a value of 4 creates enough space to record
the matched portion of the subject plus three captured substrings.
.P
When using \fBpcre2_dfa_match()\fP there may be multiple matched substrings of
different lengths at the same point in the subject. The ovector should be made
large enough to hold as many as are expected.
.P
A minimum of at least 1 pair is imposed by \fBpcre2_match_data_create()\fP, so
it is always possible to return the overall matched string in the case of
\fBpcre2_match()\fP or the longest match in the case of
\fBpcre2_dfa_match()\fP.
.P
The second argument of \fBpcre2_match_data_create()\fP is a pointer to a
general context, which can specify custom memory management for obtaining the
@ -2511,10 +2519,11 @@ pass NULL, which causes \fBmalloc()\fP to be used.
.P
For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
pointer to a compiled pattern. The ovector is created to be exactly the right
size to hold all the substrings a pattern might capture. The second argument is
again a pointer to a general context, but in this case if NULL is passed, the
memory is obtained using the same allocator that was used for the compiled
pattern (custom or default).
size to hold all the substrings a pattern might capture when matched using
\fBpcre2_match()\fP. You should not use this call when matching with
\fBpcre2_dfa_match()\fP. The second argument is again a pointer to a general
context, but in this case if NULL is passed, the memory is obtained using the
same allocator that was used for the compiled pattern (custom or default).
.P
A match data block can be used many times, with the same or different compiled
patterns. You can extract information from a match data block after a match
@ -3991,7 +4000,7 @@ fail, this error is given.
.sp
.nf
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
.fi
.
@ -4000,6 +4009,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 04 November 2020
Copyright (c) 1997-2020 University of Cambridge.
Last updated: 28 August 2021
Copyright (c) 1997-2021 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2MATCHING 3 "23 May 2019" "PCRE2 10.34"
.TH PCRE2MATCHING 3 "28 August 2021" "PCRE2 10.38"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 MATCHING ALGORITHMS"
@ -61,8 +61,9 @@ tried is controlled by the greedy or ungreedy nature of the quantifier.
If a leaf node is reached, a matching string has been found, and at that point
the algorithm stops. Thus, if there is more than one possible match, this
algorithm returns the first one that it finds. Whether this is the shortest,
the longest, or some intermediate length depends on the way the greedy and
ungreedy repetition quantifiers are specified in the pattern.
the longest, or some intermediate length depends on the way the alternations
and the greedy or ungreedy repetition quantifiers are specified in the
pattern.
.P
Because it ends up with a single path through the tree, it is relatively
straightforward for this algorithm to keep track of the substrings that are
@ -91,10 +92,15 @@ no more unterminated paths. At this point, terminated paths represent the
different matching possibilities (if there are none, the match has failed).
Thus, if there is more than one possible match, this algorithm finds all of
them, and in particular, it finds the longest. The matches are returned in
decreasing order of length. There is an option to stop the algorithm after the
first match (which is necessarily the shortest) is found.
the output vector in decreasing order of length. There is an option to stop the
algorithm after the first match (which is necessarily the shortest) is found.
.P
Note that all the matches that are found start at the same point in the
Note that the size of vector needed to contain all the results depends on the
number of simultaneous matches, not on the number of parentheses in the
pattern. Using \fBpcre2_match_data_create_from_pattern()\fP to create the match
data block is therefore not advisable when doing DFA matching.
.P
Note also that all the matches that are found start at the same point in the
subject. If the pattern
.sp
cat(er(pillar)?)?
@ -165,19 +171,13 @@ supported by \fBpcre2_dfa_match()\fP.
.SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM"
.rs
.sp
Using the alternative matching algorithm provides the following advantages:
The main advantage of the alternative algorithm is that all possible matches
(at a single point in the subject) are automatically found, and in particular,
the longest match is found. To find more than one match at the same point using
the standard algorithm, you have to do kludgy things with callouts.
.P
1. All possible matches (at a single point in the subject) are automatically
found, and in particular, the longest match is found. To find more than one
match using the standard algorithm, you have to do kludgy things with
callouts.
.P
2. Because the alternative algorithm scans the subject string just once, and
never needs to backtrack (except for lookbehinds), it is possible to pass very
long subject strings to the matching function in several pieces, checking for
partial matching each time. Although it is also possible to do multi-segment
matching using the standard algorithm, by retaining partially matched
substrings, it is more complicated. The
Partial matching is possible with this algorithm, though it has some
limitations. The
.\" HREF
\fBpcre2partial\fP
.\"
@ -199,6 +199,8 @@ invalid UTF string are not supported.
.P
3. Although atomic groups are supported, their use does not provide the
performance advantage that it does for the standard algorithm.
.P
4. JIT optimization is not supported.
.
.
.SH AUTHOR
@ -206,7 +208,7 @@ performance advantage that it does for the standard algorithm.
.sp
.nf
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
.fi
.
@ -215,6 +217,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 23 May 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 28 August 2021
Copyright (c) 1997-2021 University of Cambridge.
.fi