Update HTML docs.
This commit is contained in:
parent
21c26698b3
commit
c232286c6b
|
@ -1914,6 +1914,13 @@ Extra compile options
|
||||||
<P>
|
<P>
|
||||||
The option bits that can be set in a compile context by calling the
|
The option bits that can be set in a compile context by calling the
|
||||||
<b>pcre2_set_compile_extra_options()</b> function are as follows:
|
<b>pcre2_set_compile_extra_options()</b> function are as follows:
|
||||||
|
<pre>
|
||||||
|
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||||
|
</pre>
|
||||||
|
Since release 10.38 PCRE2 has forbidden the use of \K within lookaround
|
||||||
|
assertions, following Perl's lead. This option is provided to re-enable the
|
||||||
|
previous behaviour (act in positive lookarounds, ignore in negative ones) in
|
||||||
|
case anybody is relying on it.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -4001,7 +4008,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 August 2021
|
Last updated: 30 August 2021
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2021 University of Cambridge.
|
Copyright © 1997-2021 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -153,8 +153,10 @@ letters, regardless of case, when case independence is specified.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
16. From release 5.32.0, Perl locks out the use of \K in lookaround
|
16. From release 5.32.0, Perl locks out the use of \K in lookaround
|
||||||
assertions. In PCRE2, \K is acted on when it occurs in positive assertions,
|
assertions. From release 10.38 PCRE2 does the same by default. However, there
|
||||||
but is ignored in negative assertions.
|
is an option for re-enabling the previous behaviour. When this option is set,
|
||||||
|
\K is acted on when it occurs in positive assertions, but is ignored in
|
||||||
|
negative assertions.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
17. PCRE2 provides some extensions to the Perl regular expression facilities.
|
17. PCRE2 provides some extensions to the Perl regular expression facilities.
|
||||||
|
@ -237,7 +239,7 @@ AUTHOR
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
<br>
|
<br>
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
|
@ -246,9 +248,9 @@ Cambridge, England.
|
||||||
REVISION
|
REVISION
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 06 October 2020
|
Last updated: 30 August 2021
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2021 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -234,9 +234,12 @@ pcre2_match_data_create_from_pattern() above. */
|
||||||
if (rc == 0)
|
if (rc == 0)
|
||||||
printf("ovector was not big enough for all the captured substrings\n");
|
printf("ovector was not big enough for all the captured substrings\n");
|
||||||
|
|
||||||
/* We must guard against patterns such as /(?=.\K)/ that use \K in an assertion
|
/* Since release 10.38 PCRE2 has locked out the use of \K in lookaround
|
||||||
to set the start of a match later than its end. In this demonstration program,
|
assertions. However, there is an option to re-enable the old behaviour. If that
|
||||||
we just detect this case and give up. */
|
is set, it is possible to run patterns such as /(?=.\K)/ that use \K in an
|
||||||
|
assertion to set the start of a match later than its end. In this demonstration
|
||||||
|
program, we show how to detect this case, but it shouldn't arise because the
|
||||||
|
option is never set. */
|
||||||
|
|
||||||
if (ovector[0] > ovector[1])
|
if (ovector[0] > ovector[1])
|
||||||
{
|
{
|
||||||
|
|
|
@ -1175,9 +1175,11 @@ For example, when the pattern
|
||||||
matches "foobar", the first substring is still set to "foo".
|
matches "foobar", the first substring is still set to "foo".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Perl used to document that the use of \K within lookaround assertions is "not
|
From version 5.32.0 Perl forbids the use of \K in lookaround assertions. From
|
||||||
well defined", but from version 5.32.0 Perl does not support this usage at all.
|
release 10.38 PCRE2 also forbids this by default. However, the
|
||||||
In PCRE2, \K is acted upon when it occurs inside positive assertions, but is
|
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option can be used when calling
|
||||||
|
<b>pcre2_compile()</b> to re-enable the previous behaviour. When this option is
|
||||||
|
set, \K is acted upon when it occurs inside positive assertions, but is
|
||||||
ignored in negative assertions. Note that when a pattern such as (?=ab\K)
|
ignored in negative assertions. Note that when a pattern such as (?=ab\K)
|
||||||
matches, the reported start of the match can be greater than the end of the
|
matches, the reported start of the match can be greater than the end of the
|
||||||
match. Using \K in a lookbehind assertion at the start of a pattern can also
|
match. Using \K in a lookbehind assertion at the start of a pattern can also
|
||||||
|
@ -3845,16 +3847,16 @@ there is a backtrack at the outer level.
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
<br>
|
<br>
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 06 October 2020
|
Last updated: 30 August 2021
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2020 University of Cambridge.
|
Copyright © 1997-2021 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -429,6 +429,9 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
<pre>
|
<pre>
|
||||||
\K set reported start of match
|
\K set reported start of match
|
||||||
</pre>
|
</pre>
|
||||||
|
From release 10.38 \K is not permitted by default in lookaround assertions,
|
||||||
|
for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||||
|
option is set, the previous behaviour is re-enabled. When this option is set,
|
||||||
\K is honoured in positive assertions, but ignored in negative ones.
|
\K is honoured in positive assertions, but ignored in negative ones.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
|
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
|
||||||
|
@ -682,16 +685,16 @@ delimiter }. To encode the ending delimiter within the string, double it.
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
<br>
|
<br>
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC29" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC29" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 December 2019
|
Last updated: 30 August 2021
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2021 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -59,12 +59,7 @@ patterns, and the subject lines specify PCRE2 function options, control how the
|
||||||
subject is processed, and what output is produced.
|
subject is processed, and what output is produced.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
As the original fairly simple PCRE library evolved, it acquired many different
|
There are many obscure modifiers, some of which are specifically designed for
|
||||||
features, and as a result, the original <b>pcretest</b> program ended up with a
|
|
||||||
lot of options in a messy, arcane syntax for testing all the features. The
|
|
||||||
move to the new PCRE2 API provided an opportunity to re-implement the test
|
|
||||||
program as <b>pcre2test</b>, with a cleaner modifier syntax. Nevertheless, there
|
|
||||||
are still many obscure modifiers, some of which are specifically designed for
|
|
||||||
use in conjunction with the test script and data files that are distributed as
|
use in conjunction with the test script and data files that are distributed as
|
||||||
part of PCRE2. All the modifiers are documented here, some without much
|
part of PCRE2. All the modifiers are documented here, some without much
|
||||||
justification, but many of them are unlikely to be of use except when testing
|
justification, but many of them are unlikely to be of use except when testing
|
||||||
|
@ -89,10 +84,10 @@ names used in the libraries have a suffix _8, _16, or _32, as appropriate.
|
||||||
<br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
|
<br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
|
||||||
<P>
|
<P>
|
||||||
Input to <b>pcre2test</b> is processed line by line, either by calling the C
|
Input to <b>pcre2test</b> is processed line by line, either by calling the C
|
||||||
library's <b>fgets()</b> function, or via the <b>libreadline</b> library. In some
|
library's <b>fgets()</b> function, or via the <b>libreadline</b> or <b>libedit</b>
|
||||||
Windows environments character 26 (hex 1A) causes an immediate end of file, and
|
library. In some Windows environments character 26 (hex 1A) causes an immediate
|
||||||
no further data is read, so this character should be avoided unless you really
|
end of file, and no further data is read, so this character should be avoided
|
||||||
want that action.
|
unless you really want that action.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The input is processed using using C's string functions, so must not
|
The input is processed using using C's string functions, so must not
|
||||||
|
@ -514,11 +509,11 @@ A pattern can be followed by a modifier list (details below).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
|
<br><a name="SEC9" href="#TOC1">SUBJECT LINE SYNTAX</a><br>
|
||||||
<P>
|
<P>
|
||||||
Before each subject line is passed to <b>pcre2_match()</b> or
|
Before each subject line is passed to <b>pcre2_match()</b>,
|
||||||
<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
|
<b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>, leading and trailing white
|
||||||
line is scanned for backslash escapes, unless the <b>subject_literal</b>
|
space is removed, and the line is scanned for backslash escapes, unless the
|
||||||
modifier was set for the pattern. The following provide a means of encoding
|
<b>subject_literal</b> modifier was set for the pattern. The following provide a
|
||||||
non-printing characters in a visible way:
|
means of encoding non-printing characters in a visible way:
|
||||||
<pre>
|
<pre>
|
||||||
\a alarm (BEL, \x07)
|
\a alarm (BEL, \x07)
|
||||||
\b backspace (\x08)
|
\b backspace (\x08)
|
||||||
|
@ -615,6 +610,7 @@ way <b>pcre2_compile()</b> behaves. See
|
||||||
for a description of the effects of these options.
|
for a description of the effects of these options.
|
||||||
<pre>
|
<pre>
|
||||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||||
|
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
alt_bsux set PCRE2_ALT_BSUX
|
alt_bsux set PCRE2_ALT_BSUX
|
||||||
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||||
|
@ -2126,7 +2122,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 August 2021
|
Last updated: 30 August 2021
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2021 University of Cambridge.
|
Copyright © 1997-2021 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
1601
doc/pcre2.txt
1601
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -234,9 +234,12 @@ pcre2_match_data_create_from_pattern() above. */
|
||||||
if (rc == 0)
|
if (rc == 0)
|
||||||
printf("ovector was not big enough for all the captured substrings\en");
|
printf("ovector was not big enough for all the captured substrings\en");
|
||||||
|
|
||||||
/* We must guard against patterns such as /(?=.\eK)/ that use \eK in an assertion
|
/* Since release 10.38 PCRE2 has locked out the use of \eK in lookaround
|
||||||
to set the start of a match later than its end. In this demonstration program,
|
assertions. However, there is an option to re-enable the old behaviour. If that
|
||||||
we just detect this case and give up. */
|
is set, it is possible to run patterns such as /(?=.\eK)/ that use \eK in an
|
||||||
|
assertion to set the start of a match later than its end. In this demonstration
|
||||||
|
program, we show how to detect this case, but it shouldn't arise because the
|
||||||
|
option is never set. */
|
||||||
|
|
||||||
if (ovector[0] > ovector[1])
|
if (ovector[0] > ovector[1])
|
||||||
{
|
{
|
||||||
|
|
|
@ -24,17 +24,11 @@ SYNOPSIS
|
||||||
tion options, control how the subject is processed, and what output is
|
tion options, control how the subject is processed, and what output is
|
||||||
produced.
|
produced.
|
||||||
|
|
||||||
As the original fairly simple PCRE library evolved, it acquired many
|
There are many obscure modifiers, some of which are specifically de-
|
||||||
different features, and as a result, the original pcretest program
|
signed for use in conjunction with the test script and data files that
|
||||||
ended up with a lot of options in a messy, arcane syntax for testing
|
are distributed as part of PCRE2. All the modifiers are documented
|
||||||
all the features. The move to the new PCRE2 API provided an opportunity
|
here, some without much justification, but many of them are unlikely to
|
||||||
to re-implement the test program as pcre2test, with a cleaner modifier
|
be of use except when testing the libraries.
|
||||||
syntax. Nevertheless, there are still many obscure modifiers, some of
|
|
||||||
which are specifically designed for use in conjunction with the test
|
|
||||||
script and data files that are distributed as part of PCRE2. All the
|
|
||||||
modifiers are documented here, some without much justification, but
|
|
||||||
many of them are unlikely to be of use except when testing the li-
|
|
||||||
braries.
|
|
||||||
|
|
||||||
|
|
||||||
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||||
|
@ -58,10 +52,10 @@ PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||||
INPUT ENCODING
|
INPUT ENCODING
|
||||||
|
|
||||||
Input to pcre2test is processed line by line, either by calling the C
|
Input to pcre2test is processed line by line, either by calling the C
|
||||||
library's fgets() function, or via the libreadline library. In some
|
library's fgets() function, or via the libreadline or libedit library.
|
||||||
Windows environments character 26 (hex 1A) causes an immediate end of
|
In some Windows environments character 26 (hex 1A) causes an immediate
|
||||||
file, and no further data is read, so this character should be avoided
|
end of file, and no further data is read, so this character should be
|
||||||
unless you really want that action.
|
avoided unless you really want that action.
|
||||||
|
|
||||||
The input is processed using using C's string functions, so must not
|
The input is processed using using C's string functions, so must not
|
||||||
contain binary zeros, even though in Unix-like environments, fgets()
|
contain binary zeros, even though in Unix-like environments, fgets()
|
||||||
|
@ -454,11 +448,11 @@ PATTERN SYNTAX
|
||||||
|
|
||||||
SUBJECT LINE SYNTAX
|
SUBJECT LINE SYNTAX
|
||||||
|
|
||||||
Before each subject line is passed to pcre2_match() or
|
Before each subject line is passed to pcre2_match(), pcre2_dfa_match(),
|
||||||
pcre2_dfa_match(), leading and trailing white space is removed, and the
|
or pcre2_jit_match(), leading and trailing white space is removed, and
|
||||||
line is scanned for backslash escapes, unless the subject_literal modi-
|
the line is scanned for backslash escapes, unless the subject_literal
|
||||||
fier was set for the pattern. The following provide a means of encoding
|
modifier was set for the pattern. The following provide a means of en-
|
||||||
non-printing characters in a visible way:
|
coding non-printing characters in a visible way:
|
||||||
|
|
||||||
\a alarm (BEL, \x07)
|
\a alarm (BEL, \x07)
|
||||||
\b backspace (\x08)
|
\b backspace (\x08)
|
||||||
|
@ -553,6 +547,7 @@ PATTERN MODIFIERS
|
||||||
options.
|
options.
|
||||||
|
|
||||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||||
|
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
|
||||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
alt_bsux set PCRE2_ALT_BSUX
|
alt_bsux set PCRE2_ALT_BSUX
|
||||||
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||||
|
@ -1938,5 +1933,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 28 August 2021
|
Last updated: 30 August 2021
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2021 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue