Documentation update.

This commit is contained in:
Philip.Hazel 2019-06-21 16:10:17 +00:00
parent 175b4919f7
commit a89423624d
5 changed files with 666 additions and 627 deletions

View File

@ -355,7 +355,14 @@ characters:
</PRE> </PRE>
</P> </P>
<P> <P>
3. Because a partial match must always contain at least one character, what 3. The maximum lookbehind count is also important when the result of a partial
match attempt is "no match". In this case, the maximum lookbehind characters
from the end of the current segment must be retained at the start of the next
segment, in case the lookbehind is at the start of the pattern. Matching the
next segment must then start at the appropriate offset.
</P>
<P>
4. Because a partial match must always contain at least one character, what
might be considered a partial match of an empty string actually gives a "no might be considered a partial match of an empty string actually gives a "no
match" result. For example: match" result. For example:
<pre> <pre>
@ -369,7 +376,7 @@ happen if characters from the previous segment are retained. For this reason, a
when the pattern contains lookbehinds. when the pattern contains lookbehinds.
</P> </P>
<P> <P>
4. Matching a subject string that is split into multiple segments may not 5. Matching a subject string that is split into multiple segments may not
always produce exactly the same result as matching over one single long string, always produce exactly the same result as matching over one single long string,
especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and
Word Boundaries" above describes an issue that arises if the pattern ends with Word Boundaries" above describes an issue that arises if the pattern ends with
@ -411,7 +418,7 @@ multi-segment data. The example above then behaves differently:
data&#62; gsb\=ph,dfa,dfa_restart data&#62; gsb\=ph,dfa,dfa_restart
Partial match: gsb Partial match: gsb
</pre> </pre>
5. Patterns that contain alternatives at the top level which do not all start 6. Patterns that contain alternatives at the top level which do not all start
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
used. For example, consider this pattern: used. For example, consider this pattern:
<pre> <pre>
@ -456,9 +463,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC10" href="#TOC1">REVISION</a><br> <br><a name="SEC10" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 22 December 2014 Last updated: 21 June 2019
<br> <br>
Copyright &copy; 1997-2014 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -2014,8 +2014,10 @@ no characters with a quantifier that has no upper limit, for example:
</pre> </pre>
Earlier versions of Perl and PCRE1 used to give an error at compile time for Earlier versions of Perl and PCRE1 used to give an error at compile time for
such patterns. However, because there are cases where this can be useful, such such patterns. However, because there are cases where this can be useful, such
patterns are now accepted, but if any repetition of the group does in fact patterns are now accepted, but whenever an iteration of such a group matches no
match no characters, the loop is forcibly broken. characters, matching moves on to the next item in the pattern instead of
repeatedly matching an empty string. This does not prevent backtracking into
any of the iterations if a subsequent item fails to match.
</P> </P>
<P> <P>
By default, quantifiers are "greedy", that is, they match as much as possible By default, quantifiers are "greedy", that is, they match as much as possible
@ -2371,6 +2373,10 @@ A lookaround assertion may also appear as the condition in a
which branch of the condition is followed. which branch of the condition is followed.
</P> </P>
<P> <P>
Lookaround assertions are atomic. If an assertion is true, but there is a
subsequent matching failure, there is no backtracking into the assertion.
</P>
<P>
Assertion groups are not capture groups. If an assertion contains capture Assertion groups are not capture groups. If an assertion contains capture
groups within it, these are counted for the purposes of numbering the capture groups within it, these are counted for the purposes of numbering the capture
groups in the whole pattern. Within each branch of an assertion, locally groups in the whole pattern. Within each branch of an assertion, locally
@ -3519,9 +3525,9 @@ first match attempt, the second attempt would start at the second character
instead of skipping on to "c". instead of skipping on to "c".
</P> </P>
<P> <P>
If (*SKIP) is used inside a lookbehind to specify a new starting point that is If (*SKIP) is used inside a lookbehind to specify a new starting position that
not later than the starting point of the current match, it is ignored, and the is not later than the starting point of the current match, the position
normal "bumpalong" occurs. specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
<pre> <pre>
(*SKIP:NAME) (*SKIP:NAME)
</pre> </pre>
@ -3748,7 +3754,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br> <br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 20 June 2019 Last updated: 21 June 2019
<br> <br>
Copyright &copy; 1997-2019 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>

View File

@ -5970,7 +5970,14 @@ ISSUES WITH MULTI-SEGMENT MATCHING
Partial match: 123ab Partial match: 123ab
<<< <<<
3. Because a partial match must always contain at least one character, 3. The maximum lookbehind count is also important when the result of a
partial match attempt is "no match". In this case, the maximum lookbe-
hind characters from the end of the current segment must be retained at
the start of the next segment, in case the lookbehind is at the start
of the pattern. Matching the next segment must then start at the appro-
priate offset.
4. Because a partial match must always contain at least one character,
what might be considered a partial match of an empty string actually what might be considered a partial match of an empty string actually
gives a "no match" result. For example: gives a "no match" result. For example:
@ -5983,7 +5990,7 @@ ISSUES WITH MULTI-SEGMENT MATCHING
this reason, a "no match" result should be interpreted as "partial this reason, a "no match" result should be interpreted as "partial
match of an empty string" when the pattern contains lookbehinds. match of an empty string" when the pattern contains lookbehinds.
4. Matching a subject string that is split into multiple segments may 5. Matching a subject string that is split into multiple segments may
not always produce exactly the same result as matching over one single not always produce exactly the same result as matching over one single
long string, especially when PCRE2_PARTIAL_SOFT is used. The section long string, especially when PCRE2_PARTIAL_SOFT is used. The section
"Partial Matching and Word Boundaries" above describes an issue that "Partial Matching and Word Boundaries" above describes an issue that
@ -6027,7 +6034,7 @@ ISSUES WITH MULTI-SEGMENT MATCHING
data> gsb\=ph,dfa,dfa_restart data> gsb\=ph,dfa,dfa_restart
Partial match: gsb Partial match: gsb
5. Patterns that contain alternatives at the top level which do not all 6. Patterns that contain alternatives at the top level which do not all
start with the same pattern item may not work as expected when start with the same pattern item may not work as expected when
PCRE2_DFA_RESTART is used. For example, consider this pattern: PCRE2_DFA_RESTART is used. For example, consider this pattern:
@ -6072,8 +6079,8 @@ AUTHOR
REVISION REVISION
Last updated: 22 December 2014 Last updated: 21 June 2019
Copyright (c) 1997-2014 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -7801,8 +7808,11 @@ REPETITION
Earlier versions of Perl and PCRE1 used to give an error at compile Earlier versions of Perl and PCRE1 used to give an error at compile
time for such patterns. However, because there are cases where this can time for such patterns. However, because there are cases where this can
be useful, such patterns are now accepted, but if any repetition of the be useful, such patterns are now accepted, but whenever an iteration of
group does in fact match no characters, the loop is forcibly broken. such a group matches no characters, matching moves on to the next item
in the pattern instead of repeatedly matching an empty string. This
does not prevent backtracking into any of the iterations if a subse-
quent item fails to match.
By default, quantifiers are "greedy", that is, they match as much as By default, quantifiers are "greedy", that is, they match as much as
possible (up to the maximum number of permitted times), without causing possible (up to the maximum number of permitted times), without causing
@ -8143,6 +8153,10 @@ ASSERTIONS
tional group (see below). In this case, the result of matching the tional group (see below). In this case, the result of matching the
assertion determines which branch of the condition is followed. assertion determines which branch of the condition is followed.
Lookaround assertions are atomic. If an assertion is true, but there is
a subsequent matching failure, there is no backtracking into the asser-
tion.
Assertion groups are not capture groups. If an assertion contains cap- Assertion groups are not capture groups. If an assertion contains cap-
ture groups within it, these are counted for the purposes of numbering ture groups within it, these are counted for the purposes of numbering
the capture groups in the whole pattern. Within each branch of an the capture groups in the whole pattern. Within each branch of an
@ -9219,9 +9233,10 @@ BACKTRACKING CONTROL
attempt would start at the second character instead of skipping on to attempt would start at the second character instead of skipping on to
"c". "c".
If (*SKIP) is used inside a lookbehind to specify a new starting point If (*SKIP) is used inside a lookbehind to specify a new starting posi-
that is not later than the starting point of the current match, it is tion that is not later than the starting point of the current match,
ignored, and the normal "bumpalong" occurs. the position specified by (*SKIP) is ignored, and instead the normal
"bumpalong" occurs.
(*SKIP:NAME) (*SKIP:NAME)
@ -9432,7 +9447,7 @@ AUTHOR
REVISION REVISION
Last updated: 20 June 2019 Last updated: 21 June 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PARTIAL 3 "22 December 2014" "PCRE2 10.00" .TH PCRE2PARTIAL 3 "21 June 2019" "PCRE2 10.34"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions PCRE2 - Perl-compatible regular expressions
.SH "PARTIAL MATCHING IN PCRE2" .SH "PARTIAL MATCHING IN PCRE2"
@ -326,7 +326,13 @@ characters:
Partial match: 123ab Partial match: 123ab
<<< <<<
.P .P
3. Because a partial match must always contain at least one character, what 3. The maximum lookbehind count is also important when the result of a partial
match attempt is "no match". In this case, the maximum lookbehind characters
from the end of the current segment must be retained at the start of the next
segment, in case the lookbehind is at the start of the pattern. Matching the
next segment must then start at the appropriate offset.
.P
4. Because a partial match must always contain at least one character, what
might be considered a partial match of an empty string actually gives a "no might be considered a partial match of an empty string actually gives a "no
match" result. For example: match" result. For example:
.sp .sp
@ -339,7 +345,7 @@ happen if characters from the previous segment are retained. For this reason, a
"no match" result should be interpreted as "partial match of an empty string" "no match" result should be interpreted as "partial match of an empty string"
when the pattern contains lookbehinds. when the pattern contains lookbehinds.
.P .P
4. Matching a subject string that is split into multiple segments may not 5. Matching a subject string that is split into multiple segments may not
always produce exactly the same result as matching over one single long string, always produce exactly the same result as matching over one single long string,
especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and
Word Boundaries" above describes an issue that arises if the pattern ends with Word Boundaries" above describes an issue that arises if the pattern ends with
@ -380,7 +386,7 @@ multi-segment data. The example above then behaves differently:
data> gsb\e=ph,dfa,dfa_restart data> gsb\e=ph,dfa,dfa_restart
Partial match: gsb Partial match: gsb
.sp .sp
5. Patterns that contain alternatives at the top level which do not all start 6. Patterns that contain alternatives at the top level which do not all start
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
used. For example, consider this pattern: used. For example, consider this pattern:
.sp .sp
@ -429,6 +435,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 22 December 2014 Last updated: 21 June 2019
Copyright (c) 1997-2014 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "20 June 2019" "PCRE2 10.34" .TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -2022,8 +2022,10 @@ no characters with a quantifier that has no upper limit, for example:
.sp .sp
Earlier versions of Perl and PCRE1 used to give an error at compile time for Earlier versions of Perl and PCRE1 used to give an error at compile time for
such patterns. However, because there are cases where this can be useful, such such patterns. However, because there are cases where this can be useful, such
patterns are now accepted, but if any repetition of the group does in fact patterns are now accepted, but whenever an iteration of such a group matches no
match no characters, the loop is forcibly broken. characters, matching moves on to the next item in the pattern instead of
repeatedly matching an empty string. This does not prevent backtracking into
any of the iterations if a subsequent item fails to match.
.P .P
By default, quantifiers are "greedy", that is, they match as much as possible By default, quantifiers are "greedy", that is, they match as much as possible
(up to the maximum number of permitted times), without causing the rest of the (up to the maximum number of permitted times), without causing the rest of the
@ -2378,6 +2380,9 @@ conditional group
(see below). In this case, the result of matching the assertion determines (see below). In this case, the result of matching the assertion determines
which branch of the condition is followed. which branch of the condition is followed.
.P .P
Lookaround assertions are atomic. If an assertion is true, but there is a
subsequent matching failure, there is no backtracking into the assertion.
.P
Assertion groups are not capture groups. If an assertion contains capture Assertion groups are not capture groups. If an assertion contains capture
groups within it, these are counted for the purposes of numbering the capture groups within it, these are counted for the purposes of numbering the capture
groups in the whole pattern. Within each branch of an assertion, locally groups in the whole pattern. Within each branch of an assertion, locally
@ -3559,9 +3564,9 @@ effect as this example; although it would suppress backtracking during the
first match attempt, the second attempt would start at the second character first match attempt, the second attempt would start at the second character
instead of skipping on to "c". instead of skipping on to "c".
.P .P
If (*SKIP) is used inside a lookbehind to specify a new starting point that is If (*SKIP) is used inside a lookbehind to specify a new starting position that
not later than the starting point of the current match, it is ignored, and the is not later than the starting point of the current match, the position
normal "bumpalong" occurs. specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
.sp .sp
(*SKIP:NAME) (*SKIP:NAME)
.sp .sp
@ -3782,6 +3787,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 20 June 2019 Last updated: 21 June 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi