Update HTML docs.

This commit is contained in:
Philip Hazel 2021-08-30 16:59:34 +01:00
parent 21c26698b3
commit c232286c6b
9 changed files with 877 additions and 851 deletions

View File

@ -1914,6 +1914,13 @@ Extra compile options
<P>
The option bits that can be set in a compile context by calling the
<b>pcre2_set_compile_extra_options()</b> function are as follows:
<pre>
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
</pre>
Since release 10.38 PCRE2 has forbidden the use of \K within lookaround
assertions, following Perl's lead. This option is provided to re-enable the
previous behaviour (act in positive lookarounds, ignore in negative ones) in
case anybody is relying on it.
<pre>
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
</pre>
@ -4001,7 +4008,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 28 August 2021
Last updated: 30 August 2021
<br>
Copyright &copy; 1997-2021 University of Cambridge.
<br>

View File

@ -153,8 +153,10 @@ letters, regardless of case, when case independence is specified.
</P>
<P>
16. From release 5.32.0, Perl locks out the use of \K in lookaround
assertions. In PCRE2, \K is acted on when it occurs in positive assertions,
but is ignored in negative assertions.
assertions. From release 10.38 PCRE2 does the same by default. However, there
is an option for re-enabling the previous behaviour. When this option is set,
\K is acted on when it occurs in positive assertions, but is ignored in
negative assertions.
</P>
<P>
17. PCRE2 provides some extensions to the Perl regular expression facilities.
@ -237,7 +239,7 @@ AUTHOR
<P>
Philip Hazel
<br>
University Computing Service
Retired from University Computing Service
<br>
Cambridge, England.
<br>
@ -246,9 +248,9 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 06 October 2020
Last updated: 30 August 2021
<br>
Copyright &copy; 1997-2019 University of Cambridge.
Copyright &copy; 1997-2021 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -234,9 +234,12 @@ pcre2_match_data_create_from_pattern() above. */
if (rc == 0)
printf("ovector was not big enough for all the captured substrings\n");
/* We must guard against patterns such as /(?=.\K)/ that use \K in an assertion
to set the start of a match later than its end. In this demonstration program,
we just detect this case and give up. */
/* Since release 10.38 PCRE2 has locked out the use of \K in lookaround
assertions. However, there is an option to re-enable the old behaviour. If that
is set, it is possible to run patterns such as /(?=.\K)/ that use \K in an
assertion to set the start of a match later than its end. In this demonstration
program, we show how to detect this case, but it shouldn't arise because the
option is never set. */
if (ovector[0] &gt; ovector[1])
{

View File

@ -1175,9 +1175,11 @@ For example, when the pattern
matches "foobar", the first substring is still set to "foo".
</P>
<P>
Perl used to document that the use of \K within lookaround assertions is "not
well defined", but from version 5.32.0 Perl does not support this usage at all.
In PCRE2, \K is acted upon when it occurs inside positive assertions, but is
From version 5.32.0 Perl forbids the use of \K in lookaround assertions. From
release 10.38 PCRE2 also forbids this by default. However, the
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option can be used when calling
<b>pcre2_compile()</b> to re-enable the previous behaviour. When this option is
set, \K is acted upon when it occurs inside positive assertions, but is
ignored in negative assertions. Note that when a pattern such as (?=ab\K)
matches, the reported start of the match can be greater than the end of the
match. Using \K in a lookbehind assertion at the start of a pattern can also
@ -3845,16 +3847,16 @@ there is a backtrack at the outer level.
<P>
Philip Hazel
<br>
University Computing Service
Retired from University Computing Service
<br>
Cambridge, England.
<br>
</P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P>
Last updated: 06 October 2020
Last updated: 30 August 2021
<br>
Copyright &copy; 1997-2020 University of Cambridge.
Copyright &copy; 1997-2021 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -429,6 +429,9 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
<pre>
\K set reported start of match
</pre>
From release 10.38 \K is not permitted by default in lookaround assertions,
for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
option is set, the previous behaviour is re-enabled. When this option is set,
\K is honoured in positive assertions, but ignored in negative ones.
</P>
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
@ -682,16 +685,16 @@ delimiter }. To encode the ending delimiter within the string, double it.
<P>
Philip Hazel
<br>
University Computing Service
Retired from University Computing Service
<br>
Cambridge, England.
<br>
</P>
<br><a name="SEC29" href="#TOC1">REVISION</a><br>
<P>
Last updated: 28 December 2019
Last updated: 30 August 2021
<br>
Copyright &copy; 1997-2019 University of Cambridge.
Copyright &copy; 1997-2021 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -59,12 +59,7 @@ patterns, and the subject lines specify PCRE2 function options, control how the
subject is processed, and what output is produced.
</P>
<P>
As the original fairly simple PCRE library evolved, it acquired many different
features, and as a result, the original <b>pcretest</b> program ended up with a
lot of options in a messy, arcane syntax for testing all the features. The
move to the new PCRE2 API provided an opportunity to re-implement the test
program as <b>pcre2test</b>, with a cleaner modifier syntax. Nevertheless, there
are still many obscure modifiers, some of which are specifically designed for
There are many obscure modifiers, some of which are specifically designed for
use in conjunction with the test script and data files that are distributed as
part of PCRE2. All the modifiers are documented here, some without much
justification, but many of them are unlikely to be of use except when testing
@ -89,10 +84,10 @@ names used in the libraries have a suffix _8, _16, or _32, as appropriate.
<br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
<P>
Input to <b>pcre2test</b> is processed line by line, either by calling the C
library's <b>fgets()</b> function, or via the <b>libreadline</b> library. In some
Windows environments character 26 (hex 1A) causes an immediate end of file, and
no further data is read, so this character should be avoided unless you really
want that action.
library's <b>fgets()</b> function, or via the <b>libreadline</b> or <b>libedit</b>
library. In some Windows environments character 26 (hex 1A) causes an immediate
end of file, and no further data is read, so this character should be avoided
unless you really want that action.
</P>
<P>
The input is processed using using C's string functions, so must not
@ -514,11 +509,11 @@ A pattern can be followed by a modifier list (details below).
</P>
<br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
<P>
Before each subject line is passed to <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
line is scanned for backslash escapes, unless the <b>subject_literal</b>
modifier was set for the pattern. The following provide a means of encoding
non-printing characters in a visible way:
Before each subject line is passed to <b>pcre2_match()</b>,
<b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>, leading and trailing white
space is removed, and the line is scanned for backslash escapes, unless the
<b>subject_literal</b> modifier was set for the pattern. The following provide a
means of encoding non-printing characters in a visible way:
<pre>
\a alarm (BEL, \x07)
\b backspace (\x08)
@ -615,6 +610,7 @@ way <b>pcre2_compile()</b> behaves. See
for a description of the effects of these options.
<pre>
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
@ -2126,7 +2122,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 28 August 2021
Last updated: 30 August 2021
<br>
Copyright &copy; 1997-2021 University of Cambridge.
<br>

View File

@ -1877,6 +1877,13 @@ COMPILING A PATTERN
The option bits that can be set in a compile context by calling the
pcre2_set_compile_extra_options() function are as follows:
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
Since release 10.38 PCRE2 has forbidden the use of \K within lookaround
assertions, following Perl's lead. This option is provided to re-enable
the previous behaviour (act in positive lookarounds, ignore in negative
ones) in case anybody is relying on it.
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
This option applies when compiling a pattern in UTF-8 or UTF-32 mode.
@ -3841,7 +3848,7 @@ AUTHOR
REVISION
Last updated: 28 August 2021
Last updated: 30 August 2021
Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------
@ -5003,8 +5010,10 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
specified.
16. From release 5.32.0, Perl locks out the use of \K in lookaround as-
sertions. In PCRE2, \K is acted on when it occurs in positive asser-
tions, but is ignored in negative assertions.
sertions. From release 10.38 PCRE2 does the same by default. However,
there is an option for re-enabling the previous behaviour. When this
option is set, \K is acted on when it occurs in positive assertions,
but is ignored in negative assertions.
17. PCRE2 provides some extensions to the Perl regular expression fa-
cilities. Perl 5.10 included new features that were not in earlier
@ -5072,14 +5081,14 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
AUTHOR
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
REVISION
Last updated: 06 October 2020
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 30 August 2021
Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------
@ -7103,14 +7112,16 @@ BACKSLASH
matches "foobar", the first substring is still set to "foo".
Perl used to document that the use of \K within lookaround assertions
is "not well defined", but from version 5.32.0 Perl does not support
this usage at all. In PCRE2, \K is acted upon when it occurs inside
positive assertions, but is ignored in negative assertions. Note that
when a pattern such as (?=ab\K) matches, the reported start of the
match can be greater than the end of the match. Using \K in a lookbe-
hind assertion at the start of a pattern can also lead to odd effects.
For example, consider this pattern:
From version 5.32.0 Perl forbids the use of \K in lookaround asser-
tions. From release 10.38 PCRE2 also forbids this by default. However,
the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option can be used when calling
pcre2_compile() to re-enable the previous behaviour. When this option
is set, \K is acted upon when it occurs inside positive assertions, but
is ignored in negative assertions. Note that when a pattern such as
(?=ab\K) matches, the reported start of the match can be greater than
the end of the match. Using \K in a lookbehind assertion at the start
of a pattern can also lead to odd effects. For example, consider this
pattern:
(?<=\Kfoo)bar
@ -9621,14 +9632,14 @@ SEE ALSO
AUTHOR
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
REVISION
Last updated: 06 October 2020
Copyright (c) 1997-2020 University of Cambridge.
Last updated: 30 August 2021
Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------
@ -10735,7 +10746,11 @@ REPORTED MATCH POINT SETTING
\K set reported start of match
\K is honoured in positive assertions, but ignored in negative ones.
From release 10.38 \K is not permitted by default in lookaround asser-
tions, for compatibility with Perl. However, if the PCRE2_EXTRA_AL-
LOW_LOOKAROUND_BSK option is set, the previous behaviour is re-enabled.
When this option is set, \K is honoured in positive assertions, but ig-
nored in negative ones.
ALTERNATION
@ -10985,14 +11000,14 @@ SEE ALSO
AUTHOR
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
REVISION
Last updated: 28 December 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 30 August 2021
Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -234,9 +234,12 @@ pcre2_match_data_create_from_pattern() above. */
if (rc == 0)
printf("ovector was not big enough for all the captured substrings\en");
/* We must guard against patterns such as /(?=.\eK)/ that use \eK in an assertion
to set the start of a match later than its end. In this demonstration program,
we just detect this case and give up. */
/* Since release 10.38 PCRE2 has locked out the use of \eK in lookaround
assertions. However, there is an option to re-enable the old behaviour. If that
is set, it is possible to run patterns such as /(?=.\eK)/ that use \eK in an
assertion to set the start of a match later than its end. In this demonstration
program, we show how to detect this case, but it shouldn't arise because the
option is never set. */
if (ovector[0] > ovector[1])
{

View File

@ -24,17 +24,11 @@ SYNOPSIS
tion options, control how the subject is processed, and what output is
produced.
As the original fairly simple PCRE library evolved, it acquired many
different features, and as a result, the original pcretest program
ended up with a lot of options in a messy, arcane syntax for testing
all the features. The move to the new PCRE2 API provided an opportunity
to re-implement the test program as pcre2test, with a cleaner modifier
syntax. Nevertheless, there are still many obscure modifiers, some of
which are specifically designed for use in conjunction with the test
script and data files that are distributed as part of PCRE2. All the
modifiers are documented here, some without much justification, but
many of them are unlikely to be of use except when testing the li-
braries.
There are many obscure modifiers, some of which are specifically de-
signed for use in conjunction with the test script and data files that
are distributed as part of PCRE2. All the modifiers are documented
here, some without much justification, but many of them are unlikely to
be of use except when testing the libraries.
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
@ -58,10 +52,10 @@ PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
INPUT ENCODING
Input to pcre2test is processed line by line, either by calling the C
library's fgets() function, or via the libreadline library. In some
Windows environments character 26 (hex 1A) causes an immediate end of
file, and no further data is read, so this character should be avoided
unless you really want that action.
library's fgets() function, or via the libreadline or libedit library.
In some Windows environments character 26 (hex 1A) causes an immediate
end of file, and no further data is read, so this character should be
avoided unless you really want that action.
The input is processed using using C's string functions, so must not
contain binary zeros, even though in Unix-like environments, fgets()
@ -454,11 +448,11 @@ PATTERN SYNTAX
SUBJECT LINE SYNTAX
Before each subject line is passed to pcre2_match() or
pcre2_dfa_match(), leading and trailing white space is removed, and the
line is scanned for backslash escapes, unless the subject_literal modi-
fier was set for the pattern. The following provide a means of encoding
non-printing characters in a visible way:
Before each subject line is passed to pcre2_match(), pcre2_dfa_match(),
or pcre2_jit_match(), leading and trailing white space is removed, and
the line is scanned for backslash escapes, unless the subject_literal
modifier was set for the pattern. The following provide a means of en-
coding non-printing characters in a visible way:
\a alarm (BEL, \x07)
\b backspace (\x08)
@ -553,6 +547,7 @@ PATTERN MODIFIERS
options.
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
@ -1938,5 +1933,5 @@ AUTHOR
REVISION
Last updated: 28 August 2021
Last updated: 30 August 2021
Copyright (c) 1997-2021 University of Cambridge.