Update HTML docs.
This commit is contained in:
parent
21c26698b3
commit
c232286c6b
|
@ -1914,6 +1914,13 @@ Extra compile options
|
|||
<P>
|
||||
The option bits that can be set in a compile context by calling the
|
||||
<b>pcre2_set_compile_extra_options()</b> function are as follows:
|
||||
<pre>
|
||||
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||
</pre>
|
||||
Since release 10.38 PCRE2 has forbidden the use of \K within lookaround
|
||||
assertions, following Perl's lead. This option is provided to re-enable the
|
||||
previous behaviour (act in positive lookarounds, ignore in negative ones) in
|
||||
case anybody is relying on it.
|
||||
<pre>
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
</pre>
|
||||
|
@ -4001,7 +4008,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 28 August 2021
|
||||
Last updated: 30 August 2021
|
||||
<br>
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -153,8 +153,10 @@ letters, regardless of case, when case independence is specified.
|
|||
</P>
|
||||
<P>
|
||||
16. From release 5.32.0, Perl locks out the use of \K in lookaround
|
||||
assertions. In PCRE2, \K is acted on when it occurs in positive assertions,
|
||||
but is ignored in negative assertions.
|
||||
assertions. From release 10.38 PCRE2 does the same by default. However, there
|
||||
is an option for re-enabling the previous behaviour. When this option is set,
|
||||
\K is acted on when it occurs in positive assertions, but is ignored in
|
||||
negative assertions.
|
||||
</P>
|
||||
<P>
|
||||
17. PCRE2 provides some extensions to the Perl regular expression facilities.
|
||||
|
@ -237,7 +239,7 @@ AUTHOR
|
|||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service
|
||||
Retired from University Computing Service
|
||||
<br>
|
||||
Cambridge, England.
|
||||
<br>
|
||||
|
@ -246,9 +248,9 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 06 October 2020
|
||||
Last updated: 30 August 2021
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -234,9 +234,12 @@ pcre2_match_data_create_from_pattern() above. */
|
|||
if (rc == 0)
|
||||
printf("ovector was not big enough for all the captured substrings\n");
|
||||
|
||||
/* We must guard against patterns such as /(?=.\K)/ that use \K in an assertion
|
||||
to set the start of a match later than its end. In this demonstration program,
|
||||
we just detect this case and give up. */
|
||||
/* Since release 10.38 PCRE2 has locked out the use of \K in lookaround
|
||||
assertions. However, there is an option to re-enable the old behaviour. If that
|
||||
is set, it is possible to run patterns such as /(?=.\K)/ that use \K in an
|
||||
assertion to set the start of a match later than its end. In this demonstration
|
||||
program, we show how to detect this case, but it shouldn't arise because the
|
||||
option is never set. */
|
||||
|
||||
if (ovector[0] > ovector[1])
|
||||
{
|
||||
|
|
|
@ -1175,9 +1175,11 @@ For example, when the pattern
|
|||
matches "foobar", the first substring is still set to "foo".
|
||||
</P>
|
||||
<P>
|
||||
Perl used to document that the use of \K within lookaround assertions is "not
|
||||
well defined", but from version 5.32.0 Perl does not support this usage at all.
|
||||
In PCRE2, \K is acted upon when it occurs inside positive assertions, but is
|
||||
From version 5.32.0 Perl forbids the use of \K in lookaround assertions. From
|
||||
release 10.38 PCRE2 also forbids this by default. However, the
|
||||
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option can be used when calling
|
||||
<b>pcre2_compile()</b> to re-enable the previous behaviour. When this option is
|
||||
set, \K is acted upon when it occurs inside positive assertions, but is
|
||||
ignored in negative assertions. Note that when a pattern such as (?=ab\K)
|
||||
matches, the reported start of the match can be greater than the end of the
|
||||
match. Using \K in a lookbehind assertion at the start of a pattern can also
|
||||
|
@ -3845,16 +3847,16 @@ there is a backtrack at the outer level.
|
|||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service
|
||||
Retired from University Computing Service
|
||||
<br>
|
||||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 06 October 2020
|
||||
Last updated: 30 August 2021
|
||||
<br>
|
||||
Copyright © 1997-2020 University of Cambridge.
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -429,6 +429,9 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
<pre>
|
||||
\K set reported start of match
|
||||
</pre>
|
||||
From release 10.38 \K is not permitted by default in lookaround assertions,
|
||||
for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||
option is set, the previous behaviour is re-enabled. When this option is set,
|
||||
\K is honoured in positive assertions, but ignored in negative ones.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
|
||||
|
@ -682,16 +685,16 @@ delimiter }. To encode the ending delimiter within the string, double it.
|
|||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service
|
||||
Retired from University Computing Service
|
||||
<br>
|
||||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC29" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 28 December 2019
|
||||
Last updated: 30 August 2021
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -59,12 +59,7 @@ patterns, and the subject lines specify PCRE2 function options, control how the
|
|||
subject is processed, and what output is produced.
|
||||
</P>
|
||||
<P>
|
||||
As the original fairly simple PCRE library evolved, it acquired many different
|
||||
features, and as a result, the original <b>pcretest</b> program ended up with a
|
||||
lot of options in a messy, arcane syntax for testing all the features. The
|
||||
move to the new PCRE2 API provided an opportunity to re-implement the test
|
||||
program as <b>pcre2test</b>, with a cleaner modifier syntax. Nevertheless, there
|
||||
are still many obscure modifiers, some of which are specifically designed for
|
||||
There are many obscure modifiers, some of which are specifically designed for
|
||||
use in conjunction with the test script and data files that are distributed as
|
||||
part of PCRE2. All the modifiers are documented here, some without much
|
||||
justification, but many of them are unlikely to be of use except when testing
|
||||
|
@ -89,10 +84,10 @@ names used in the libraries have a suffix _8, _16, or _32, as appropriate.
|
|||
<br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
|
||||
<P>
|
||||
Input to <b>pcre2test</b> is processed line by line, either by calling the C
|
||||
library's <b>fgets()</b> function, or via the <b>libreadline</b> library. In some
|
||||
Windows environments character 26 (hex 1A) causes an immediate end of file, and
|
||||
no further data is read, so this character should be avoided unless you really
|
||||
want that action.
|
||||
library's <b>fgets()</b> function, or via the <b>libreadline</b> or <b>libedit</b>
|
||||
library. In some Windows environments character 26 (hex 1A) causes an immediate
|
||||
end of file, and no further data is read, so this character should be avoided
|
||||
unless you really want that action.
|
||||
</P>
|
||||
<P>
|
||||
The input is processed using using C's string functions, so must not
|
||||
|
@ -514,11 +509,11 @@ A pattern can be followed by a modifier list (details below).
|
|||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
|
||||
<P>
|
||||
Before each subject line is passed to <b>pcre2_match()</b> or
|
||||
<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
|
||||
line is scanned for backslash escapes, unless the <b>subject_literal</b>
|
||||
modifier was set for the pattern. The following provide a means of encoding
|
||||
non-printing characters in a visible way:
|
||||
Before each subject line is passed to <b>pcre2_match()</b>,
|
||||
<b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>, leading and trailing white
|
||||
space is removed, and the line is scanned for backslash escapes, unless the
|
||||
<b>subject_literal</b> modifier was set for the pattern. The following provide a
|
||||
means of encoding non-printing characters in a visible way:
|
||||
<pre>
|
||||
\a alarm (BEL, \x07)
|
||||
\b backspace (\x08)
|
||||
|
@ -615,6 +610,7 @@ way <b>pcre2_compile()</b> behaves. See
|
|||
for a description of the effects of these options.
|
||||
<pre>
|
||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
alt_bsux set PCRE2_ALT_BSUX
|
||||
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||
|
@ -2126,7 +2122,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 28 August 2021
|
||||
Last updated: 30 August 2021
|
||||
<br>
|
||||
Copyright © 1997-2021 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -1877,6 +1877,13 @@ COMPILING A PATTERN
|
|||
The option bits that can be set in a compile context by calling the
|
||||
pcre2_set_compile_extra_options() function are as follows:
|
||||
|
||||
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||
|
||||
Since release 10.38 PCRE2 has forbidden the use of \K within lookaround
|
||||
assertions, following Perl's lead. This option is provided to re-enable
|
||||
the previous behaviour (act in positive lookarounds, ignore in negative
|
||||
ones) in case anybody is relying on it.
|
||||
|
||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
|
||||
This option applies when compiling a pattern in UTF-8 or UTF-32 mode.
|
||||
|
@ -3841,7 +3848,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 28 August 2021
|
||||
Last updated: 30 August 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -5003,8 +5010,10 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
|||
specified.
|
||||
|
||||
16. From release 5.32.0, Perl locks out the use of \K in lookaround as-
|
||||
sertions. In PCRE2, \K is acted on when it occurs in positive asser-
|
||||
tions, but is ignored in negative assertions.
|
||||
sertions. From release 10.38 PCRE2 does the same by default. However,
|
||||
there is an option for re-enabling the previous behaviour. When this
|
||||
option is set, \K is acted on when it occurs in positive assertions,
|
||||
but is ignored in negative assertions.
|
||||
|
||||
17. PCRE2 provides some extensions to the Perl regular expression fa-
|
||||
cilities. Perl 5.10 included new features that were not in earlier
|
||||
|
@ -5072,14 +5081,14 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
|
|||
AUTHOR
|
||||
|
||||
Philip Hazel
|
||||
University Computing Service
|
||||
Retired from University Computing Service
|
||||
Cambridge, England.
|
||||
|
||||
|
||||
REVISION
|
||||
|
||||
Last updated: 06 October 2020
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
Last updated: 30 August 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -7103,14 +7112,16 @@ BACKSLASH
|
|||
|
||||
matches "foobar", the first substring is still set to "foo".
|
||||
|
||||
Perl used to document that the use of \K within lookaround assertions
|
||||
is "not well defined", but from version 5.32.0 Perl does not support
|
||||
this usage at all. In PCRE2, \K is acted upon when it occurs inside
|
||||
positive assertions, but is ignored in negative assertions. Note that
|
||||
when a pattern such as (?=ab\K) matches, the reported start of the
|
||||
match can be greater than the end of the match. Using \K in a lookbe-
|
||||
hind assertion at the start of a pattern can also lead to odd effects.
|
||||
For example, consider this pattern:
|
||||
From version 5.32.0 Perl forbids the use of \K in lookaround asser-
|
||||
tions. From release 10.38 PCRE2 also forbids this by default. However,
|
||||
the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option can be used when calling
|
||||
pcre2_compile() to re-enable the previous behaviour. When this option
|
||||
is set, \K is acted upon when it occurs inside positive assertions, but
|
||||
is ignored in negative assertions. Note that when a pattern such as
|
||||
(?=ab\K) matches, the reported start of the match can be greater than
|
||||
the end of the match. Using \K in a lookbehind assertion at the start
|
||||
of a pattern can also lead to odd effects. For example, consider this
|
||||
pattern:
|
||||
|
||||
(?<=\Kfoo)bar
|
||||
|
||||
|
@ -9621,14 +9632,14 @@ SEE ALSO
|
|||
AUTHOR
|
||||
|
||||
Philip Hazel
|
||||
University Computing Service
|
||||
Retired from University Computing Service
|
||||
Cambridge, England.
|
||||
|
||||
|
||||
REVISION
|
||||
|
||||
Last updated: 06 October 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
Last updated: 30 August 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
@ -10735,7 +10746,11 @@ REPORTED MATCH POINT SETTING
|
|||
|
||||
\K set reported start of match
|
||||
|
||||
\K is honoured in positive assertions, but ignored in negative ones.
|
||||
From release 10.38 \K is not permitted by default in lookaround asser-
|
||||
tions, for compatibility with Perl. However, if the PCRE2_EXTRA_AL-
|
||||
LOW_LOOKAROUND_BSK option is set, the previous behaviour is re-enabled.
|
||||
When this option is set, \K is honoured in positive assertions, but ig-
|
||||
nored in negative ones.
|
||||
|
||||
|
||||
ALTERNATION
|
||||
|
@ -10985,14 +11000,14 @@ SEE ALSO
|
|||
AUTHOR
|
||||
|
||||
Philip Hazel
|
||||
University Computing Service
|
||||
Retired from University Computing Service
|
||||
Cambridge, England.
|
||||
|
||||
|
||||
REVISION
|
||||
|
||||
Last updated: 28 December 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
Last updated: 30 August 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
|
|
@ -234,9 +234,12 @@ pcre2_match_data_create_from_pattern() above. */
|
|||
if (rc == 0)
|
||||
printf("ovector was not big enough for all the captured substrings\en");
|
||||
|
||||
/* We must guard against patterns such as /(?=.\eK)/ that use \eK in an assertion
|
||||
to set the start of a match later than its end. In this demonstration program,
|
||||
we just detect this case and give up. */
|
||||
/* Since release 10.38 PCRE2 has locked out the use of \eK in lookaround
|
||||
assertions. However, there is an option to re-enable the old behaviour. If that
|
||||
is set, it is possible to run patterns such as /(?=.\eK)/ that use \eK in an
|
||||
assertion to set the start of a match later than its end. In this demonstration
|
||||
program, we show how to detect this case, but it shouldn't arise because the
|
||||
option is never set. */
|
||||
|
||||
if (ovector[0] > ovector[1])
|
||||
{
|
||||
|
|
|
@ -24,17 +24,11 @@ SYNOPSIS
|
|||
tion options, control how the subject is processed, and what output is
|
||||
produced.
|
||||
|
||||
As the original fairly simple PCRE library evolved, it acquired many
|
||||
different features, and as a result, the original pcretest program
|
||||
ended up with a lot of options in a messy, arcane syntax for testing
|
||||
all the features. The move to the new PCRE2 API provided an opportunity
|
||||
to re-implement the test program as pcre2test, with a cleaner modifier
|
||||
syntax. Nevertheless, there are still many obscure modifiers, some of
|
||||
which are specifically designed for use in conjunction with the test
|
||||
script and data files that are distributed as part of PCRE2. All the
|
||||
modifiers are documented here, some without much justification, but
|
||||
many of them are unlikely to be of use except when testing the li-
|
||||
braries.
|
||||
There are many obscure modifiers, some of which are specifically de-
|
||||
signed for use in conjunction with the test script and data files that
|
||||
are distributed as part of PCRE2. All the modifiers are documented
|
||||
here, some without much justification, but many of them are unlikely to
|
||||
be of use except when testing the libraries.
|
||||
|
||||
|
||||
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||
|
@ -58,10 +52,10 @@ PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
|||
INPUT ENCODING
|
||||
|
||||
Input to pcre2test is processed line by line, either by calling the C
|
||||
library's fgets() function, or via the libreadline library. In some
|
||||
Windows environments character 26 (hex 1A) causes an immediate end of
|
||||
file, and no further data is read, so this character should be avoided
|
||||
unless you really want that action.
|
||||
library's fgets() function, or via the libreadline or libedit library.
|
||||
In some Windows environments character 26 (hex 1A) causes an immediate
|
||||
end of file, and no further data is read, so this character should be
|
||||
avoided unless you really want that action.
|
||||
|
||||
The input is processed using using C's string functions, so must not
|
||||
contain binary zeros, even though in Unix-like environments, fgets()
|
||||
|
@ -454,11 +448,11 @@ PATTERN SYNTAX
|
|||
|
||||
SUBJECT LINE SYNTAX
|
||||
|
||||
Before each subject line is passed to pcre2_match() or
|
||||
pcre2_dfa_match(), leading and trailing white space is removed, and the
|
||||
line is scanned for backslash escapes, unless the subject_literal modi-
|
||||
fier was set for the pattern. The following provide a means of encoding
|
||||
non-printing characters in a visible way:
|
||||
Before each subject line is passed to pcre2_match(), pcre2_dfa_match(),
|
||||
or pcre2_jit_match(), leading and trailing white space is removed, and
|
||||
the line is scanned for backslash escapes, unless the subject_literal
|
||||
modifier was set for the pattern. The following provide a means of en-
|
||||
coding non-printing characters in a visible way:
|
||||
|
||||
\a alarm (BEL, \x07)
|
||||
\b backspace (\x08)
|
||||
|
@ -553,6 +547,7 @@ PATTERN MODIFIERS
|
|||
options.
|
||||
|
||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
alt_bsux set PCRE2_ALT_BSUX
|
||||
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||
|
@ -1938,5 +1933,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 28 August 2021
|
||||
Last updated: 30 August 2021
|
||||
Copyright (c) 1997-2021 University of Cambridge.
|
||||
|
|
Loading…
Reference in New Issue