diff --git a/doc/html/pcre2_dfa_match.html b/doc/html/pcre2_dfa_match.html index 232e2bc..f32294f 100644 --- a/doc/html/pcre2_dfa_match.html +++ b/doc/html/pcre2_dfa_match.html @@ -45,10 +45,16 @@ just once (except when processing lookaround assertions). This function is workspace Points to a vector of ints used as working space wscount Number of elements in the vector -For pcre2_dfa_match(), a match context is needed only if you want to set -up a callout function or specify the heap limit or the match or the recursion -depth limits. The length and startoffset values are code units, not -characters. The options are: +The size of output vector needed to contain all the results depends on the +number of simultaneous matches, not on the number of parentheses in the +pattern. Using pcre2_match_data_create_from_pattern() to create the match +data block is therefore not advisable when using this function. +

+

+A match context is needed only if you want to set up a callout function or +specify the heap limit or the match or the recursion depth limits. The +length and startoffset values are code units, not characters. The +options are:

   PCRE2_ANCHORED          Match only at the first position
   PCRE2_COPY_MATCHED_SUBJECT
diff --git a/doc/html/pcre2_match_data_create.html b/doc/html/pcre2_match_data_create.html
index 8d0321b..c26c3b3 100644
--- a/doc/html/pcre2_match_data_create.html
+++ b/doc/html/pcre2_match_data_create.html
@@ -30,8 +30,9 @@ This function creates a new match data block, which is used for holding the
 result of a match. The first argument specifies the number of pairs of offsets
 that are required. These form the "output vector" (ovector) within the match
 data block, and are used to identify the matched string and any captured
-substrings. There is always one pair of offsets; if ovecsize is zero, it
-is treated as one.
+substrings when matching with pcre2_match(), or a number of different
+matches at the same point when used with pcre2_dfa_match(). There is
+always one pair of offsets; if ovecsize is zero, it is treated as one.
 

The second argument points to a general context, for custom memory management, diff --git a/doc/html/pcre2_match_data_create_from_pattern.html b/doc/html/pcre2_match_data_create_from_pattern.html index f40cf1e..4836474 100644 --- a/doc/html/pcre2_match_data_create_from_pattern.html +++ b/doc/html/pcre2_match_data_create_from_pattern.html @@ -26,12 +26,15 @@ SYNOPSIS DESCRIPTION

-This function creates a new match data block, which is used for holding the -result of a match. The first argument points to a compiled pattern. The number -of capturing parentheses within the pattern is used to compute the number of -pairs of offsets that are required in the match data block. These form the -"output vector" (ovector) within the match data block, and are used to identify -the matched string and any captured substrings. +This function creates a new match data block for holding the result of a match. +The first argument points to a compiled pattern. The number of capturing +parentheses within the pattern is used to compute the number of pairs of +offsets that are required in the match data block. These form the "output +vector" (ovector) within the match data block, and are used to identify the +matched string and any captured substrings when matching with +pcre2_match(). If you are using pcre2_dfa_match(), which uses the +outut vector in a different way, you should use pcre2_match_data_create() +instead of this function.

The second argument points to a general context, for custom memory management, diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html index 5d7f12d..673b465 100644 --- a/doc/html/pcre2api.html +++ b/doc/html/pcre2api.html @@ -2512,20 +2512,31 @@ to an abstract format like Java or .NET serialization. Information about a successful or unsuccessful match is placed in a match data block, which is an opaque structure that is accessed by function calls. In particular, the match data block contains a vector of offsets into the subject -string that define the matched part of the subject and any substrings that were -captured. This is known as the ovector. +string that define the matched parts of the subject. This is known as the +ovector.

Before calling pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match() you must create a match data block by calling one of the creation functions above. For pcre2_match_data_create(), the first -argument is the number of pairs of offsets in the ovector. One pair of -offsets is required to identify the string that matched the whole pattern, with -an additional pair for each captured substring. For example, a value of 4 -creates enough space to record the matched portion of the subject plus three -captured substrings. A minimum of at least 1 pair is imposed by -pcre2_match_data_create(), so it is always possible to return the overall -matched string. +argument is the number of pairs of offsets in the ovector. +

+

+When using pcre2_match(), one pair of offsets is required to identify the +string that matched the whole pattern, with an additional pair for each +captured substring. For example, a value of 4 creates enough space to record +the matched portion of the subject plus three captured substrings. +

+

+When using pcre2_dfa_match() there may be multiple matched substrings of +different lengths at the same point in the subject. The ovector should be made +large enough to hold as many as are expected. +

+

+A minimum of at least 1 pair is imposed by pcre2_match_data_create(), so +it is always possible to return the overall matched string in the case of +pcre2_match() or the longest match in the case of +pcre2_dfa_match().

The second argument of pcre2_match_data_create() is a pointer to a @@ -2536,10 +2547,11 @@ pass NULL, which causes malloc() to be used.

For pcre2_match_data_create_from_pattern(), the first argument is a pointer to a compiled pattern. The ovector is created to be exactly the right -size to hold all the substrings a pattern might capture. The second argument is -again a pointer to a general context, but in this case if NULL is passed, the -memory is obtained using the same allocator that was used for the compiled -pattern (custom or default). +size to hold all the substrings a pattern might capture when matched using +pcre2_match(). You should not use this call when matching with +pcre2_dfa_match(). The second argument is again a pointer to a general +context, but in this case if NULL is passed, the memory is obtained using the +same allocator that was used for the compiled pattern (custom or default).

A match data block can be used many times, with the same or different compiled @@ -3982,16 +3994,16 @@ fail, this error is given.

Philip Hazel
-University Computing Service +Retired from University Computing Service
Cambridge, England.


REVISION

-Last updated: 04 November 2020 +Last updated: 28 August 2021
-Copyright © 1997-2020 University of Cambridge. +Copyright © 1997-2021 University of Cambridge.

Return to the PCRE2 index page. diff --git a/doc/html/pcre2matching.html b/doc/html/pcre2matching.html index 4b71c8f..ed92caf 100644 --- a/doc/html/pcre2matching.html +++ b/doc/html/pcre2matching.html @@ -78,8 +78,9 @@ tried is controlled by the greedy or ungreedy nature of the quantifier. If a leaf node is reached, a matching string has been found, and at that point the algorithm stops. Thus, if there is more than one possible match, this algorithm returns the first one that it finds. Whether this is the shortest, -the longest, or some intermediate length depends on the way the greedy and -ungreedy repetition quantifiers are specified in the pattern. +the longest, or some intermediate length depends on the way the alternations +and the greedy or ungreedy repetition quantifiers are specified in the +pattern.

Because it ends up with a single path through the tree, it is relatively @@ -109,11 +110,17 @@ no more unterminated paths. At this point, terminated paths represent the different matching possibilities (if there are none, the match has failed). Thus, if there is more than one possible match, this algorithm finds all of them, and in particular, it finds the longest. The matches are returned in -decreasing order of length. There is an option to stop the algorithm after the -first match (which is necessarily the shortest) is found. +the output vector in decreasing order of length. There is an option to stop the +algorithm after the first match (which is necessarily the shortest) is found.

-Note that all the matches that are found start at the same point in the +Note that the size of vector needed to contain all the results depends on the +number of simultaneous matches, not on the number of parentheses in the +pattern. Using pcre2_match_data_create_from_pattern() to create the match +data block is therefore not advisable when doing DFA matching. +

+

+Note also that all the matches that are found start at the same point in the subject. If the pattern

   cat(er(pillar)?)?
@@ -194,21 +201,14 @@ supported by pcre2_dfa_match().
 


ADVANTAGES OF THE ALTERNATIVE ALGORITHM

-Using the alternative matching algorithm provides the following advantages: +The main advantage of the alternative algorithm is that all possible matches +(at a single point in the subject) are automatically found, and in particular, +the longest match is found. To find more than one match at the same point using +the standard algorithm, you have to do kludgy things with callouts.

-1. All possible matches (at a single point in the subject) are automatically -found, and in particular, the longest match is found. To find more than one -match using the standard algorithm, you have to do kludgy things with -callouts. -

-

-2. Because the alternative algorithm scans the subject string just once, and -never needs to backtrack (except for lookbehinds), it is possible to pass very -long subject strings to the matching function in several pieces, checking for -partial matching each time. Although it is also possible to do multi-segment -matching using the standard algorithm, by retaining partially matched -substrings, it is more complicated. The +Partial matching is possible with this algorithm, though it has some +limitations. The pcre2partial documentation gives details of partial matching and discusses multi-segment matching. @@ -230,20 +230,23 @@ invalid UTF string are not supported. 3. Although atomic groups are supported, their use does not provide the performance advantage that it does for the standard algorithm.

+

+4. JIT optimization is not supported. +


AUTHOR

Philip Hazel
-University Computing Service +Retired from University Computing Service
Cambridge, England.


REVISION

-Last updated: 23 May 2019 +Last updated: 28 August 2021
-Copyright © 1997-2019 University of Cambridge. +Copyright © 1997-2021 University of Cambridge.

Return to the PCRE2 index page. diff --git a/doc/pcre2.txt b/doc/pcre2.txt index a847ed6..b420e22 100644 --- a/doc/pcre2.txt +++ b/doc/pcre2.txt @@ -2468,20 +2468,28 @@ THE MATCH DATA BLOCK Information about a successful or unsuccessful match is placed in a match data block, which is an opaque structure that is accessed by function calls. In particular, the match data block contains a vector - of offsets into the subject string that define the matched part of the - subject and any substrings that were captured. This is known as the - ovector. + of offsets into the subject string that define the matched parts of the + subject. This is known as the ovector. - Before calling pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match() + Before calling pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match() you must create a match data block by calling one of the creation func- - tions above. For pcre2_match_data_create(), the first argument is the - number of pairs of offsets in the ovector. One pair of offsets is re- - quired to identify the string that matched the whole pattern, with an - additional pair for each captured substring. For example, a value of 4 - creates enough space to record the matched portion of the subject plus - three captured substrings. A minimum of at least 1 pair is imposed by - pcre2_match_data_create(), so it is always possible to return the over- - all matched string. + tions above. For pcre2_match_data_create(), the first argument is the + number of pairs of offsets in the ovector. + + When using pcre2_match(), one pair of offsets is required to identify + the string that matched the whole pattern, with an additional pair for + each captured substring. For example, a value of 4 creates enough space + to record the matched portion of the subject plus three captured sub- + strings. + + When using pcre2_dfa_match() there may be multiple matched substrings + of different lengths at the same point in the subject. The ovector + should be made large enough to hold as many as are expected. + + A minimum of at least 1 pair is imposed by pcre2_match_data_create(), + so it is always possible to return the overall matched string in the + case of pcre2_match() or the longest match in the case of + pcre2_dfa_match(). The second argument of pcre2_match_data_create() is a pointer to a gen- eral context, which can specify custom memory management for obtaining @@ -2490,10 +2498,12 @@ THE MATCH DATA BLOCK For pcre2_match_data_create_from_pattern(), the first argument is a pointer to a compiled pattern. The ovector is created to be exactly the - right size to hold all the substrings a pattern might capture. The sec- - ond argument is again a pointer to a general context, but in this case - if NULL is passed, the memory is obtained using the same allocator that - was used for the compiled pattern (custom or default). + right size to hold all the substrings a pattern might capture when + matched using pcre2_match(). You should not use this call when matching + with pcre2_dfa_match(). The second argument is again a pointer to a + general context, but in this case if NULL is passed, the memory is ob- + tained using the same allocator that was used for the compiled pattern + (custom or default). A match data block can be used many times, with the same or different compiled patterns. You can extract information from a match data block @@ -3825,14 +3835,14 @@ SEE ALSO AUTHOR Philip Hazel - University Computing Service + Retired from University Computing Service Cambridge, England. REVISION - Last updated: 04 November 2020 - Copyright (c) 1997-2020 University of Cambridge. + Last updated: 28 August 2021 + Copyright (c) 1997-2021 University of Cambridge. ------------------------------------------------------------------------------ @@ -5635,8 +5645,8 @@ THE STANDARD MATCHING ALGORITHM that point the algorithm stops. Thus, if there is more than one possi- ble match, this algorithm returns the first one that it finds. Whether this is the shortest, the longest, or some intermediate length depends - on the way the greedy and ungreedy repetition quantifiers are specified - in the pattern. + on the way the alternations and the greedy or ungreedy repetition quan- + tifiers are specified in the pattern. Because it ends up with a single path through the tree, it is rela- tively straightforward for this algorithm to keep track of the sub- @@ -5665,12 +5675,18 @@ THE ALTERNATIVE MATCHING ALGORITHM represent the different matching possibilities (if there are none, the match has failed). Thus, if there is more than one possible match, this algorithm finds all of them, and in particular, it finds the long- - est. The matches are returned in decreasing order of length. There is - an option to stop the algorithm after the first match (which is neces- - sarily the shortest) is found. + est. The matches are returned in the output vector in decreasing order + of length. There is an option to stop the algorithm after the first + match (which is necessarily the shortest) is found. - Note that all the matches that are found start at the same point in the - subject. If the pattern + Note that the size of vector needed to contain all the results depends + on the number of simultaneous matches, not on the number of parentheses + in the pattern. Using pcre2_match_data_create_from_pattern() to create + the match data block is therefore not advisable when doing DFA match- + ing. + + Note also that all the matches that are found start at the same point + in the subject. If the pattern cat(er(pillar)?)? @@ -5746,50 +5762,45 @@ THE ALTERNATIVE MATCHING ALGORITHM ADVANTAGES OF THE ALTERNATIVE ALGORITHM - Using the alternative matching algorithm provides the following advan- - tages: - - 1. All possible matches (at a single point in the subject) are automat- - ically found, and in particular, the longest match is found. To find - more than one match using the standard algorithm, you have to do kludgy + The main advantage of the alternative algorithm is that all possible + matches (at a single point in the subject) are automatically found, and + in particular, the longest match is found. To find more than one match + at the same point using the standard algorithm, you have to do kludgy things with callouts. - 2. Because the alternative algorithm scans the subject string just - once, and never needs to backtrack (except for lookbehinds), it is pos- - sible to pass very long subject strings to the matching function in - several pieces, checking for partial matching each time. Although it is - also possible to do multi-segment matching using the standard algo- - rithm, by retaining partially matched substrings, it is more compli- - cated. The pcre2partial documentation gives details of partial matching - and discusses multi-segment matching. + Partial matching is possible with this algorithm, though it has some + limitations. The pcre2partial documentation gives details of partial + matching and discusses multi-segment matching. DISADVANTAGES OF THE ALTERNATIVE ALGORITHM The alternative algorithm suffers from a number of disadvantages: - 1. It is substantially slower than the standard algorithm. This is - partly because it has to search for all possible matches, but is also + 1. It is substantially slower than the standard algorithm. This is + partly because it has to search for all possible matches, but is also because it is less susceptible to optimization. - 2. Capturing parentheses, backreferences, script runs, and matching + 2. Capturing parentheses, backreferences, script runs, and matching within invalid UTF string are not supported. 3. Although atomic groups are supported, their use does not provide the performance advantage that it does for the standard algorithm. + 4. JIT optimization is not supported. + AUTHOR Philip Hazel - University Computing Service + Retired from University Computing Service Cambridge, England. REVISION - Last updated: 23 May 2019 - Copyright (c) 1997-2019 University of Cambridge. + Last updated: 28 August 2021 + Copyright (c) 1997-2021 University of Cambridge. ------------------------------------------------------------------------------ diff --git a/doc/pcre2_dfa_match.3 b/doc/pcre2_dfa_match.3 index 6413cb6..2023348 100644 --- a/doc/pcre2_dfa_match.3 +++ b/doc/pcre2_dfa_match.3 @@ -1,4 +1,4 @@ -.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33" +.TH PCRE2_DFA_MATCH 3 "28 August 2021" "PCRE2 10.38" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH SYNOPSIS @@ -33,10 +33,15 @@ just once (except when processing lookaround assertions). This function is \fIworkspace\fP Points to a vector of ints used as working space \fIwscount\fP Number of elements in the vector .sp -For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set -up a callout function or specify the heap limit or the match or the recursion -depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not -characters. The options are: +The size of output vector needed to contain all the results depends on the +number of simultaneous matches, not on the number of parentheses in the +pattern. Using \fBpcre2_match_data_create_from_pattern()\fP to create the match +data block is therefore not advisable when using this function. +.P +A match context is needed only if you want to set up a callout function or +specify the heap limit or the match or the recursion depth limits. The +\fIlength\fP and \fIstartoffset\fP values are code units, not characters. The +options are: .sp PCRE2_ANCHORED Match only at the first position PCRE2_COPY_MATCHED_SUBJECT diff --git a/doc/pcre2_match_data_create.3 b/doc/pcre2_match_data_create.3 index 3b0a29e..439dea3 100644 --- a/doc/pcre2_match_data_create.3 +++ b/doc/pcre2_match_data_create.3 @@ -1,4 +1,4 @@ -.TH PCRE2_MATCH_DATA_CREATE 3 "29 July 2015" "PCRE2 10.21" +.TH PCRE2_MATCH_DATA_CREATE 3 "28 August 2021" "PCRE2 10.38" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH SYNOPSIS @@ -18,8 +18,9 @@ This function creates a new match data block, which is used for holding the result of a match. The first argument specifies the number of pairs of offsets that are required. These form the "output vector" (ovector) within the match data block, and are used to identify the matched string and any captured -substrings. There is always one pair of offsets; if \fBovecsize\fP is zero, it -is treated as one. +substrings when matching with \fBpcre2_match()\fP, or a number of different +matches at the same point when used with \fBpcre2_dfa_match()\fP. There is +always one pair of offsets; if \fBovecsize\fP is zero, it is treated as one. .P The second argument points to a general context, for custom memory management, or is NULL for system memory management. The result of the function is NULL if diff --git a/doc/pcre2_match_data_create_from_pattern.3 b/doc/pcre2_match_data_create_from_pattern.3 index 60bf77c..37486dd 100644 --- a/doc/pcre2_match_data_create_from_pattern.3 +++ b/doc/pcre2_match_data_create_from_pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "29 July 2015" "PCRE2 10.21" +.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "28 August 2021" "PCRE2 10.38" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH SYNOPSIS @@ -14,12 +14,15 @@ PCRE2 - Perl-compatible regular expressions (revised API) .SH DESCRIPTION .rs .sp -This function creates a new match data block, which is used for holding the -result of a match. The first argument points to a compiled pattern. The number -of capturing parentheses within the pattern is used to compute the number of -pairs of offsets that are required in the match data block. These form the -"output vector" (ovector) within the match data block, and are used to identify -the matched string and any captured substrings. +This function creates a new match data block for holding the result of a match. +The first argument points to a compiled pattern. The number of capturing +parentheses within the pattern is used to compute the number of pairs of +offsets that are required in the match data block. These form the "output +vector" (ovector) within the match data block, and are used to identify the +matched string and any captured substrings when matching with +\fBpcre2_match()\fP. If you are using \fBpcre2_dfa_match()\fP, which uses the +outut vector in a different way, you should use \fBpcre2_match_data_create()\fP +instead of this function. .P The second argument points to a general context, for custom memory management, or is NULL to use the same memory allocator as was used for the compiled diff --git a/doc/pcre2api.3 b/doc/pcre2api.3 index f67153a..94a8241 100644 --- a/doc/pcre2api.3 +++ b/doc/pcre2api.3 @@ -1,4 +1,4 @@ -.TH PCRE2API 3 "04 November 2020" "PCRE2 10.36" +.TH PCRE2API 3 "28 August 2021" "PCRE2 10.38" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .sp @@ -2490,19 +2490,27 @@ to an abstract format like Java or .NET serialization. Information about a successful or unsuccessful match is placed in a match data block, which is an opaque structure that is accessed by function calls. In particular, the match data block contains a vector of offsets into the subject -string that define the matched part of the subject and any substrings that were -captured. This is known as the \fIovector\fP. +string that define the matched parts of the subject. This is known as the +\fIovector\fP. .P Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP you must create a match data block by calling one of the creation functions above. For \fBpcre2_match_data_create()\fP, the first -argument is the number of pairs of offsets in the \fIovector\fP. One pair of -offsets is required to identify the string that matched the whole pattern, with -an additional pair for each captured substring. For example, a value of 4 -creates enough space to record the matched portion of the subject plus three -captured substrings. A minimum of at least 1 pair is imposed by -\fBpcre2_match_data_create()\fP, so it is always possible to return the overall -matched string. +argument is the number of pairs of offsets in the \fIovector\fP. +.P +When using \fBpcre2_match()\fP, one pair of offsets is required to identify the +string that matched the whole pattern, with an additional pair for each +captured substring. For example, a value of 4 creates enough space to record +the matched portion of the subject plus three captured substrings. +.P +When using \fBpcre2_dfa_match()\fP there may be multiple matched substrings of +different lengths at the same point in the subject. The ovector should be made +large enough to hold as many as are expected. +.P +A minimum of at least 1 pair is imposed by \fBpcre2_match_data_create()\fP, so +it is always possible to return the overall matched string in the case of +\fBpcre2_match()\fP or the longest match in the case of +\fBpcre2_dfa_match()\fP. .P The second argument of \fBpcre2_match_data_create()\fP is a pointer to a general context, which can specify custom memory management for obtaining the @@ -2511,10 +2519,11 @@ pass NULL, which causes \fBmalloc()\fP to be used. .P For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a pointer to a compiled pattern. The ovector is created to be exactly the right -size to hold all the substrings a pattern might capture. The second argument is -again a pointer to a general context, but in this case if NULL is passed, the -memory is obtained using the same allocator that was used for the compiled -pattern (custom or default). +size to hold all the substrings a pattern might capture when matched using +\fBpcre2_match()\fP. You should not use this call when matching with +\fBpcre2_dfa_match()\fP. The second argument is again a pointer to a general +context, but in this case if NULL is passed, the memory is obtained using the +same allocator that was used for the compiled pattern (custom or default). .P A match data block can be used many times, with the same or different compiled patterns. You can extract information from a match data block after a match @@ -3991,7 +4000,7 @@ fail, this error is given. .sp .nf Philip Hazel -University Computing Service +Retired from University Computing Service Cambridge, England. .fi . @@ -4000,6 +4009,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 04 November 2020 -Copyright (c) 1997-2020 University of Cambridge. +Last updated: 28 August 2021 +Copyright (c) 1997-2021 University of Cambridge. .fi diff --git a/doc/pcre2matching.3 b/doc/pcre2matching.3 index 7f9bbac..673952d 100644 --- a/doc/pcre2matching.3 +++ b/doc/pcre2matching.3 @@ -1,4 +1,4 @@ -.TH PCRE2MATCHING 3 "23 May 2019" "PCRE2 10.34" +.TH PCRE2MATCHING 3 "28 August 2021" "PCRE2 10.38" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 MATCHING ALGORITHMS" @@ -61,8 +61,9 @@ tried is controlled by the greedy or ungreedy nature of the quantifier. If a leaf node is reached, a matching string has been found, and at that point the algorithm stops. Thus, if there is more than one possible match, this algorithm returns the first one that it finds. Whether this is the shortest, -the longest, or some intermediate length depends on the way the greedy and -ungreedy repetition quantifiers are specified in the pattern. +the longest, or some intermediate length depends on the way the alternations +and the greedy or ungreedy repetition quantifiers are specified in the +pattern. .P Because it ends up with a single path through the tree, it is relatively straightforward for this algorithm to keep track of the substrings that are @@ -91,10 +92,15 @@ no more unterminated paths. At this point, terminated paths represent the different matching possibilities (if there are none, the match has failed). Thus, if there is more than one possible match, this algorithm finds all of them, and in particular, it finds the longest. The matches are returned in -decreasing order of length. There is an option to stop the algorithm after the -first match (which is necessarily the shortest) is found. +the output vector in decreasing order of length. There is an option to stop the +algorithm after the first match (which is necessarily the shortest) is found. .P -Note that all the matches that are found start at the same point in the +Note that the size of vector needed to contain all the results depends on the +number of simultaneous matches, not on the number of parentheses in the +pattern. Using \fBpcre2_match_data_create_from_pattern()\fP to create the match +data block is therefore not advisable when doing DFA matching. +.P +Note also that all the matches that are found start at the same point in the subject. If the pattern .sp cat(er(pillar)?)? @@ -165,19 +171,13 @@ supported by \fBpcre2_dfa_match()\fP. .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM" .rs .sp -Using the alternative matching algorithm provides the following advantages: +The main advantage of the alternative algorithm is that all possible matches +(at a single point in the subject) are automatically found, and in particular, +the longest match is found. To find more than one match at the same point using +the standard algorithm, you have to do kludgy things with callouts. .P -1. All possible matches (at a single point in the subject) are automatically -found, and in particular, the longest match is found. To find more than one -match using the standard algorithm, you have to do kludgy things with -callouts. -.P -2. Because the alternative algorithm scans the subject string just once, and -never needs to backtrack (except for lookbehinds), it is possible to pass very -long subject strings to the matching function in several pieces, checking for -partial matching each time. Although it is also possible to do multi-segment -matching using the standard algorithm, by retaining partially matched -substrings, it is more complicated. The +Partial matching is possible with this algorithm, though it has some +limitations. The .\" HREF \fBpcre2partial\fP .\" @@ -199,6 +199,8 @@ invalid UTF string are not supported. .P 3. Although atomic groups are supported, their use does not provide the performance advantage that it does for the standard algorithm. +.P +4. JIT optimization is not supported. . . .SH AUTHOR @@ -206,7 +208,7 @@ performance advantage that it does for the standard algorithm. .sp .nf Philip Hazel -University Computing Service +Retired from University Computing Service Cambridge, England. .fi . @@ -215,6 +217,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 23 May 2019 -Copyright (c) 1997-2019 University of Cambridge. +Last updated: 28 August 2021 +Copyright (c) 1997-2021 University of Cambridge. .fi