Update HTML documentation.
This commit is contained in:
parent
06477b27af
commit
6aa0f3e56f
|
@ -20,7 +20,7 @@ SYNOPSIS
|
|||
</P>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, const uint8_t *<i>bytes</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
|
|
|
@ -19,8 +19,8 @@ SYNOPSIS
|
|||
<b>#include <pcre2.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||
<b>int32_t pcre2_serialize_encode(const pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, uint8_t **<i>serialized_bytes</i>,</b>
|
||||
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
|
|
|
@ -266,12 +266,12 @@ document for an overview of all the PCRE2 documentation.
|
|||
<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, const uint8_t *<i>bytes</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||
<b>int32_t pcre2_serialize_encode(const pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, uint8_t **<i>serialized_bytes</i>,</b>
|
||||
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
|
@ -1091,7 +1091,10 @@ By default, for compatibility with Perl, the name in any verb sequence such as
|
|||
parenthesis. The name is not processed in any way, and it is not possible to
|
||||
include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES
|
||||
option is set, normal backslash processing is applied to verb names and only an
|
||||
unescaped closing parenthesis terminates the name.
|
||||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
||||
recognized, exactly as in the rest of the pattern.
|
||||
<pre>
|
||||
PCRE2_AUTO_CALLOUT
|
||||
</pre>
|
||||
|
@ -2909,7 +2912,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC40" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 August 2015
|
||||
Last updated: 02 September 2015
|
||||
<br>
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -2925,7 +2925,10 @@ that does not include a closing parenthesis. The name is not processed in
|
|||
any way, and it is not possible to include a closing parenthesis in the name.
|
||||
However, if the PCRE2_ALT_VERBNAMES option is set, normal backslash processing
|
||||
is applied to verb names and only an unescaped closing parenthesis terminates
|
||||
the name.
|
||||
the name. A closing parenthesis can be included in a name either as \) or
|
||||
between \Q and \E. If the PCRE2_EXTENDED option is set, unescaped whitespace
|
||||
in verb names is skipped and #-comments are recognized, exactly as in the rest
|
||||
of the pattern.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a name is 255 in the 8-bit library and 65535 in the
|
||||
|
@ -3348,7 +3351,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 August 2015
|
||||
Last updated: 01 September 2015
|
||||
<br>
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -170,7 +170,7 @@ use the contents of the <i>preg</i> structure. If, for example, you pass it to
|
|||
This area is not simple, because POSIX and Perl take different views of things.
|
||||
It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was
|
||||
never intended to be a POSIX engine. The following table lists the different
|
||||
possibilities for matching newline characters in PCRE2:
|
||||
possibilities for matching newline characters in Perl and PCRE2:
|
||||
<pre>
|
||||
Default Change with
|
||||
|
||||
|
@ -180,7 +180,7 @@ possibilities for matching newline characters in PCRE2:
|
|||
$ matches \n in middle no PCRE2_MULTILINE
|
||||
^ matches \n in middle no PCRE2_MULTILINE
|
||||
</pre>
|
||||
This is the equivalent table for POSIX:
|
||||
This is the equivalent table for a POSIX-compatible pattern matcher:
|
||||
<pre>
|
||||
Default Change with
|
||||
|
||||
|
@ -190,14 +190,18 @@ This is the equivalent table for POSIX:
|
|||
$ matches \n in middle no REG_NEWLINE
|
||||
^ matches \n in middle no REG_NEWLINE
|
||||
</pre>
|
||||
PCRE2's behaviour is the same as Perl's, except that there is no equivalent for
|
||||
PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there is no way to stop
|
||||
newline from matching [^a].
|
||||
This behaviour is not what happens when PCRE2 is called via its POSIX
|
||||
API. By default, PCRE2's behaviour is the same as Perl's, except that there is
|
||||
no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there
|
||||
is no way to stop newline from matching [^a].
|
||||
</P>
|
||||
<P>
|
||||
The default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
|
||||
PCRE2_DOLLAR_ENDONLY, but there is no way to make PCRE2 behave exactly as for
|
||||
the REG_NEWLINE action.
|
||||
Default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
|
||||
PCRE2_DOLLAR_ENDONLY when calling <b>pcre2_compile()</b> directly, but there is
|
||||
no way to make PCRE2 behave exactly as for the REG_NEWLINE action. When using
|
||||
the POSIX API, passing REG_NEWLINE to PCRE2's <b>regcomp()</b> function
|
||||
causes PCRE2_MULTILINE to be passed to <b>pcre2_compile()</b>, and REG_DOTALL
|
||||
passes PCRE2_DOTALL. There is no way to pass PCRE2_DOLLAR_ENDONLY.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
|
||||
<P>
|
||||
|
@ -283,9 +287,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 20 October 2014
|
||||
Last updated: 03 September 2015
|
||||
<br>
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -304,6 +304,36 @@ output.
|
|||
This command is used to load a set of precompiled patterns from a file, as
|
||||
described in the section entitled "Saving and restoring compiled patterns"
|
||||
<a href="#saverestore">below.</a>
|
||||
<pre>
|
||||
#newline_default [<newline-list>]
|
||||
</pre>
|
||||
When PCRE2 is built, a default newline convention can be specified. This
|
||||
determines which characters and/or character pairs are recognized as indicating
|
||||
a newline in a pattern or subject string. The default can be overridden when a
|
||||
pattern is compiled. The standard test files contain tests of various newline
|
||||
conventions, but the majority of the tests expect a single linefeed to be
|
||||
recognized as a newline by default. Without special action the tests would fail
|
||||
when PCRE2 is compiled with either CR or CRLF as the default newline.
|
||||
</P>
|
||||
<P>
|
||||
The #newline_default command specifies a list of newline types that are
|
||||
acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF, or
|
||||
ANY (in upper or lower case), for example:
|
||||
<pre>
|
||||
#newline_default LF Any anyCRLF
|
||||
</pre>
|
||||
If the default newline is in the list, this command has no effect. Otherwise,
|
||||
except when testing the POSIX API, a <b>newline</b> modifier that specifies the
|
||||
first newline convention in the list (LF in the above example) is added to any
|
||||
pattern that does not already have a <b>newline</b> modifier. If the newline
|
||||
list is empty, the feature is turned off. This command is present in a number
|
||||
of the standard test input files.
|
||||
</P>
|
||||
<P>
|
||||
When the POSIX API is being tested there is no way to override the default
|
||||
newline convention, though it is possible to set the newline convention from
|
||||
within the pattern. A warning is given if the <b>posix</b> modifier is used when
|
||||
<b>#newline_default</b> would set a default for the non-POSIX API.
|
||||
<pre>
|
||||
#pattern <modifier-list>
|
||||
</pre>
|
||||
|
@ -625,21 +655,51 @@ actual length of the pattern is passed.
|
|||
JIT compilation
|
||||
</b><br>
|
||||
<P>
|
||||
The <b>/jit</b> modifier may optionally be followed by an equals sign and a
|
||||
number in the range 0 to 7:
|
||||
Just-in-time (JIT) compiling is a heavyweight optimization that can greatly
|
||||
speed up pattern matching. See the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for details. JIT compiling happens, optionally, after a pattern
|
||||
has been successfully compiled into an internal form. The JIT compiler converts
|
||||
this to optimized machine code. It needs to know whether the match-time options
|
||||
PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because
|
||||
different code is generated for the different cases. See the <b>partial</b>
|
||||
modifier in "Subject Modifiers"
|
||||
<a href="#subjectmodifiers">below</a>
|
||||
for details of how these options are specified for each match attempt.
|
||||
</P>
|
||||
<P>
|
||||
JIT compilation is requested by the <b>/jit</b> pattern modifier, which may
|
||||
optionally be followed by an equals sign and a number in the range 0 to 7.
|
||||
The three bits that make up the number specify which of the three JIT operating
|
||||
modes are to be compiled:
|
||||
<pre>
|
||||
1 compile JIT code for non-partial matching
|
||||
2 compile JIT code for soft partial matching
|
||||
4 compile JIT code for hard partial matching
|
||||
</pre>
|
||||
The possible values for the <b>/jit</b> modifier are therefore:
|
||||
<pre>
|
||||
0 disable JIT
|
||||
1 use JIT for normal match only
|
||||
2 use JIT for soft partial match only
|
||||
3 use JIT for normal match and soft partial match
|
||||
4 use JIT for hard partial match only
|
||||
6 use JIT for soft and hard partial match
|
||||
1 normal matching only
|
||||
2 soft partial matching only
|
||||
3 normal and soft partial matching
|
||||
4 hard partial matching only
|
||||
6 soft and hard partial matching only
|
||||
7 all three modes
|
||||
</pre>
|
||||
If no number is given, 7 is assumed. If JIT compilation is successful, the
|
||||
compiled JIT code will automatically be used when <b>pcre2_match()</b> is run
|
||||
for the appropriate type of match, except when incompatible run-time options
|
||||
are specified. For more details, see the
|
||||
If no number is given, 7 is assumed. The phrase "partial matching" means a call
|
||||
to <b>pcre2_match()</b> with either the PCRE2_PARTIAL_SOFT or the
|
||||
PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
|
||||
match; the options enable the possibility of a partial match, but do not
|
||||
require it. Note also that if you request JIT compilation only for partial
|
||||
matching (for example, /jit=2) but do not set the <b>partial</b> modifier on a
|
||||
subject line, that match will not use JIT code because none was compiled for
|
||||
non-partial matching.
|
||||
</P>
|
||||
<P>
|
||||
If JIT compilation is successful, the compiled JIT code will automatically be
|
||||
used when an appropriate type of match is run, except when incompatible
|
||||
run-time options are specified. For more details, see the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation. See also the <b>jitstack</b> modifier below for a way of
|
||||
setting the size of the JIT stack.
|
||||
|
@ -707,8 +767,10 @@ Using the POSIX wrapper API
|
|||
<P>
|
||||
The <b>/posix</b> modifier causes <b>pcre2test</b> to call PCRE2 via the POSIX
|
||||
wrapper API rather than its native API. This supports only the 8-bit library.
|
||||
When the POSIX API is being used, the following pattern modifiers set options
|
||||
for the <b>regcomp()</b> function:
|
||||
Note that it does not imply POSIX matching semantics; for more detail see the
|
||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||
documentation. When the POSIX API is being used, the following pattern
|
||||
modifiers set options for the <b>regcomp()</b> function:
|
||||
<pre>
|
||||
caseless REG_ICASE
|
||||
multiline REG_NEWLINE
|
||||
|
@ -790,7 +852,7 @@ The <b>push</b> modifier is incompatible with compilation modifiers such as
|
|||
warning message, except for <b>replace</b>, which causes an error. Note that,
|
||||
<b>jitverify</b>, which is allowed, does not carry through to any subsequent
|
||||
matching that uses this pattern.
|
||||
</P>
|
||||
<a name="subjectmodifiers"></a></P>
|
||||
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
||||
<P>
|
||||
The modifiers that can appear in subject lines and the <b>#subject</b>
|
||||
|
@ -1471,7 +1533,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 30 August 2015
|
||||
Last updated: 12 September 2015
|
||||
<br>
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -356,11 +356,11 @@ PCRE2 NATIVE API JIT FUNCTIONS
|
|||
PCRE2 NATIVE API SERIALIZATION FUNCTIONS
|
||||
|
||||
int32_t pcre2_serialize_decode(pcre2_code **codes,
|
||||
int32_t number_of_codes, const uint32_t *bytes,
|
||||
int32_t number_of_codes, const uint8_t *bytes,
|
||||
pcre2_general_context *gcontext);
|
||||
|
||||
int32_t pcre2_serialize_encode(pcre2_code **codes,
|
||||
int32_t number_of_codes, uint32_t **serialized_bytes,
|
||||
int32_t pcre2_serialize_encode(const pcre2_code **codes,
|
||||
int32_t number_of_codes, uint8_t **serialized_bytes,
|
||||
PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
|
||||
|
||||
void pcre2_serialize_free(uint8_t *bytes);
|
||||
|
@ -1136,7 +1136,10 @@ COMPILING A PATTERN
|
|||
and it is not possible to include a closing parenthesis in the name.
|
||||
However, if the PCRE2_ALT_VERBNAMES option is set, normal backslash
|
||||
processing is applied to verb names and only an unescaped closing
|
||||
parenthesis terminates the name.
|
||||
parenthesis terminates the name. A closing parenthesis can be included
|
||||
in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||
option is set, unescaped whitespace in verb names is skipped and #-com-
|
||||
ments are recognized, exactly as in the rest of the pattern.
|
||||
|
||||
PCRE2_AUTO_CALLOUT
|
||||
|
||||
|
@ -2851,7 +2854,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 30 August 2015
|
||||
Last updated: 02 September 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -247,6 +247,36 @@ COMMAND LINES
|
|||
as described in the section entitled "Saving and restoring compiled
|
||||
patterns" below.
|
||||
|
||||
#newline_default [<newline-list>]
|
||||
|
||||
When PCRE2 is built, a default newline convention can be specified.
|
||||
This determines which characters and/or character pairs are recognized
|
||||
as indicating a newline in a pattern or subject string. The default can
|
||||
be overridden when a pattern is compiled. The standard test files con-
|
||||
tain tests of various newline conventions, but the majority of the
|
||||
tests expect a single linefeed to be recognized as a newline by
|
||||
default. Without special action the tests would fail when PCRE2 is com-
|
||||
piled with either CR or CRLF as the default newline.
|
||||
|
||||
The #newline_default command specifies a list of newline types that are
|
||||
acceptable as the default. The types must be one of CR, LF, CRLF, ANY-
|
||||
CRLF, or ANY (in upper or lower case), for example:
|
||||
|
||||
#newline_default LF Any anyCRLF
|
||||
|
||||
If the default newline is in the list, this command has no effect. Oth-
|
||||
erwise, except when testing the POSIX API, a newline modifier that
|
||||
specifies the first newline convention in the list (LF in the above
|
||||
example) is added to any pattern that does not already have a newline
|
||||
modifier. If the newline list is empty, the feature is turned off. This
|
||||
command is present in a number of the standard test input files.
|
||||
|
||||
When the POSIX API is being tested there is no way to override the
|
||||
default newline convention, though it is possible to set the newline
|
||||
convention from within the pattern. A warning is given if the posix
|
||||
modifier is used when #newline_default would set a default for the non-
|
||||
POSIX API.
|
||||
|
||||
#pattern <modifier-list>
|
||||
|
||||
This command sets a default modifier list that applies to all subse-
|
||||
|
@ -558,23 +588,49 @@ PATTERN MODIFIERS
|
|||
|
||||
JIT compilation
|
||||
|
||||
The /jit modifier may optionally be followed by an equals sign and a
|
||||
number in the range 0 to 7:
|
||||
Just-in-time (JIT) compiling is a heavyweight optimization that can
|
||||
greatly speed up pattern matching. See the pcre2jit documentation for
|
||||
details. JIT compiling happens, optionally, after a pattern has been
|
||||
successfully compiled into an internal form. The JIT compiler converts
|
||||
this to optimized machine code. It needs to know whether the match-time
|
||||
options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used,
|
||||
because different code is generated for the different cases. See the
|
||||
partial modifier in "Subject Modifiers" below for details of how these
|
||||
options are specified for each match attempt.
|
||||
|
||||
JIT compilation is requested by the /jit pattern modifier, which may
|
||||
optionally be followed by an equals sign and a number in the range 0 to
|
||||
7. The three bits that make up the number specify which of the three
|
||||
JIT operating modes are to be compiled:
|
||||
|
||||
1 compile JIT code for non-partial matching
|
||||
2 compile JIT code for soft partial matching
|
||||
4 compile JIT code for hard partial matching
|
||||
|
||||
The possible values for the /jit modifier are therefore:
|
||||
|
||||
0 disable JIT
|
||||
1 use JIT for normal match only
|
||||
2 use JIT for soft partial match only
|
||||
3 use JIT for normal match and soft partial match
|
||||
4 use JIT for hard partial match only
|
||||
6 use JIT for soft and hard partial match
|
||||
1 normal matching only
|
||||
2 soft partial matching only
|
||||
3 normal and soft partial matching
|
||||
4 hard partial matching only
|
||||
6 soft and hard partial matching only
|
||||
7 all three modes
|
||||
|
||||
If no number is given, 7 is assumed. If JIT compilation is successful,
|
||||
the compiled JIT code will automatically be used when pcre2_match() is
|
||||
run for the appropriate type of match, except when incompatible run-
|
||||
time options are specified. For more details, see the pcre2jit documen-
|
||||
tation. See also the jitstack modifier below for a way of setting the
|
||||
size of the JIT stack.
|
||||
If no number is given, 7 is assumed. The phrase "partial matching"
|
||||
means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the
|
||||
PCRE2_PARTIAL_HARD option set. Note that such a call may return a com-
|
||||
plete match; the options enable the possibility of a partial match, but
|
||||
do not require it. Note also that if you request JIT compilation only
|
||||
for partial matching (for example, /jit=2) but do not set the partial
|
||||
modifier on a subject line, that match will not use JIT code because
|
||||
none was compiled for non-partial matching.
|
||||
|
||||
If JIT compilation is successful, the compiled JIT code will automati-
|
||||
cally be used when an appropriate type of match is run, except when
|
||||
incompatible run-time options are specified. For more details, see the
|
||||
pcre2jit documentation. See also the jitstack modifier below for a way
|
||||
of setting the size of the JIT stack.
|
||||
|
||||
If the jitfast modifier is specified, matching is done using the JIT
|
||||
"fast path" interface, pcre2_jit_match(), which skips some of the san-
|
||||
|
@ -628,8 +684,10 @@ PATTERN MODIFIERS
|
|||
|
||||
The /posix modifier causes pcre2test to call PCRE2 via the POSIX wrap-
|
||||
per API rather than its native API. This supports only the 8-bit
|
||||
library. When the POSIX API is being used, the following pattern modi-
|
||||
fiers set options for the regcomp() function:
|
||||
library. Note that it does not imply POSIX matching semantics; for
|
||||
more detail see the pcre2posix documentation. When the POSIX API is
|
||||
being used, the following pattern modifiers set options for the reg-
|
||||
comp() function:
|
||||
|
||||
caseless REG_ICASE
|
||||
multiline REG_NEWLINE
|
||||
|
@ -1333,5 +1391,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 30 August 2015
|
||||
Last updated: 12 September 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
|
|
Loading…
Reference in New Issue