Documentation update.
This commit is contained in:
parent
a083420cac
commit
c92bfc3d21
|
@ -47,7 +47,7 @@ system stack size checking, or to change one or more of these parameters:
|
||||||
The newline character sequence;
|
The newline character sequence;
|
||||||
The compile time nested parentheses limit;
|
The compile time nested parentheses limit;
|
||||||
The maximum pattern length (in code units) that is allowed.
|
The maximum pattern length (in code units) that is allowed.
|
||||||
The additional options bits
|
The additional options bits (see pcre2_set_compile_extra_options())
|
||||||
</pre>
|
</pre>
|
||||||
The option bits are:
|
The option bits are:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -64,6 +64,7 @@ The option bits are:
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
PCRE2_EXTENDED Ignore white space and # comments
|
PCRE2_EXTENDED Ignore white space and # comments
|
||||||
PCRE2_FIRSTLINE Force matching to be before newline
|
PCRE2_FIRSTLINE Force matching to be before newline
|
||||||
|
PCRE2_LITERAL Pattern characters are all literal
|
||||||
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
||||||
PCRE2_MULTILINE ^ and $ match newlines within data
|
PCRE2_MULTILINE ^ and $ match newlines within data
|
||||||
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns
|
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns
|
||||||
|
|
|
@ -32,6 +32,8 @@ options are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
|
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
|
||||||
|
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
|
||||||
|
PCRE2_EXTRA_MATCH_WORD Pattern matches "words"
|
||||||
</pre>
|
</pre>
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
|
|
@ -1453,6 +1453,19 @@ continue over the newline. See also PCRE2_USE_OFFSET_LIMIT, which provides a
|
||||||
more general limiting facility. If PCRE2_FIRSTLINE is set with an offset limit,
|
more general limiting facility. If PCRE2_FIRSTLINE is set with an offset limit,
|
||||||
a match must occur in the first line and also within the offset limit. In other
|
a match must occur in the first line and also within the offset limit. In other
|
||||||
words, whichever limit comes first is used.
|
words, whichever limit comes first is used.
|
||||||
|
<pre>
|
||||||
|
PCRE2_LITERAL
|
||||||
|
</pre>
|
||||||
|
If this option is set, all meta-characters in the pattern are disabled, and it
|
||||||
|
is treated as a literal string. Matching literal strings with a regular
|
||||||
|
expression engine is not the most efficient way of doing it. If you are doing a
|
||||||
|
lot of literal matching and are worried about efficiency, you should consider
|
||||||
|
using other approaches. The only other main options that are allowed with
|
||||||
|
PCRE2_LITERAL are: PCRE2_ANCHORED, PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT,
|
||||||
|
PCRE2_CASELESS, PCRE2_FIRSTLINE, PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK,
|
||||||
|
PCRE2_UTF, and PCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EXTRA_MATCH_LINE
|
||||||
|
and PCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an
|
||||||
|
error.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_MATCH_UNSET_BACKREF
|
PCRE2_MATCH_UNSET_BACKREF
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1724,6 +1737,24 @@ treated as single-character escapes. For example, \j is a literal "j" and
|
||||||
\x{2z} is treated as the literal string "x{2z}". Setting this option means
|
\x{2z} is treated as the literal string "x{2z}". Setting this option means
|
||||||
that typos in patterns may go undetected and have unexpected results. This is a
|
that typos in patterns may go undetected and have unexpected results. This is a
|
||||||
dangerous option. Use with care.
|
dangerous option. Use with care.
|
||||||
|
<pre>
|
||||||
|
PCRE2_EXTRA_MATCH_LINE
|
||||||
|
</pre>
|
||||||
|
This option is provided for use by the <b>-x</b> option of <b>pcre2grep</b>. It
|
||||||
|
causes the pattern only to match complete lines. This is achieved by
|
||||||
|
automatically inserting the code for "^(?:" at the start of the compiled
|
||||||
|
pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched
|
||||||
|
line may be in the middle of the subject string. This option can be used with
|
||||||
|
PCRE2_LITERAL.
|
||||||
|
<pre>
|
||||||
|
PCRE2_EXTRA_MATCH_WORD
|
||||||
|
</pre>
|
||||||
|
This option is provided for use by the <b>-w</b> option of <b>pcre2grep</b>. It
|
||||||
|
causes the pattern only to match strings that have a word boundary at the start
|
||||||
|
and the end. This is achieved by automatically inserting the code for "\b(?:"
|
||||||
|
at the start of the compiled pattern and ")\b" at the end. The option may be
|
||||||
|
used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is
|
||||||
|
also set.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -3489,7 +3520,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 01 June 2017
|
Last updated: 16 June 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -117,6 +117,14 @@ compilation to the native function.
|
||||||
The PCRE2_MULTILINE option is set when the regular expression is passed for
|
The PCRE2_MULTILINE option is set when the regular expression is passed for
|
||||||
compilation to the native function. Note that this does <i>not</i> mimic the
|
compilation to the native function. Note that this does <i>not</i> mimic the
|
||||||
defined POSIX behaviour for REG_NEWLINE (see the following section).
|
defined POSIX behaviour for REG_NEWLINE (see the following section).
|
||||||
|
<pre>
|
||||||
|
REG_NOSPEC
|
||||||
|
</pre>
|
||||||
|
The PCRE2_LITERAL option is set when the regular expression is passed for
|
||||||
|
compilation to the native function. This disables all meta characters in the
|
||||||
|
pattern, causing it to be treated as a literal string. The only other options
|
||||||
|
that are allowed with REG_NOSPEC are REG_ICASE, REG_NOSUB, REG_PEND, and
|
||||||
|
REG_UTF. Note that REG_NOSPEC is not part of the POSIX standard.
|
||||||
<pre>
|
<pre>
|
||||||
REG_NOSUB
|
REG_NOSUB
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -314,7 +322,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC9" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC9" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 05 June 2017
|
Last updated: 15 June 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -96,12 +96,12 @@ want that action.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The input is processed using using C's string functions, so must not
|
The input is processed using using C's string functions, so must not
|
||||||
contain binary zeroes, even though in Unix-like environments, <b>fgets()</b>
|
contain binary zeros, even though in Unix-like environments, <b>fgets()</b>
|
||||||
treats any bytes other than newline as data characters. An error is generated
|
treats any bytes other than newline as data characters. An error is generated
|
||||||
if a binary zero is encountered. Subject lines are processed for backslash
|
if a binary zero is encountered. By default subject lines are processed for
|
||||||
escapes, which makes it possible to include any data value in strings that are
|
backslash escapes, which makes it possible to include any data value in strings
|
||||||
passed to the library for matching. For patterns, there is a facility for
|
that are passed to the library for matching. For patterns, there is a facility
|
||||||
specifying some or all of the 8-bit input characters as hexadecimal pairs,
|
for specifying some or all of the 8-bit input characters as hexadecimal pairs,
|
||||||
which makes it possible to include binary zeros.
|
which makes it possible to include binary zeros.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
@ -382,8 +382,9 @@ of the standard test input files.
|
||||||
<P>
|
<P>
|
||||||
When the POSIX API is being tested there is no way to override the default
|
When the POSIX API is being tested there is no way to override the default
|
||||||
newline convention, though it is possible to set the newline convention from
|
newline convention, though it is possible to set the newline convention from
|
||||||
within the pattern. A warning is given if the <b>posix</b> modifier is used when
|
within the pattern. A warning is given if the <b>posix</b> or <b>posix_nosub</b>
|
||||||
<b>#newline_default</b> would set a default for the non-POSIX API.
|
modifier is used when <b>#newline_default</b> would set a default for the
|
||||||
|
non-POSIX API.
|
||||||
<pre>
|
<pre>
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -479,8 +480,9 @@ A pattern can be followed by a modifier list (details below).
|
||||||
<P>
|
<P>
|
||||||
Before each subject line is passed to <b>pcre2_match()</b> or
|
Before each subject line is passed to <b>pcre2_match()</b> or
|
||||||
<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
|
<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
|
||||||
line is scanned for backslash escapes. The following provide a means of
|
line is scanned for backslash escapes, unless the <b>subject_literal</b>
|
||||||
encoding non-printing characters in a visible way:
|
modifier was set for the pattern. The following provide a means of encoding
|
||||||
|
non-printing characters in a visible way:
|
||||||
<pre>
|
<pre>
|
||||||
\a alarm (BEL, \x07)
|
\a alarm (BEL, \x07)
|
||||||
\b backspace (\x08)
|
\b backspace (\x08)
|
||||||
|
@ -548,6 +550,12 @@ the very last character in the line is a backslash (and there is no modifier
|
||||||
list), it is ignored. This gives a way of passing an empty line as data, since
|
list), it is ignored. This gives a way of passing an empty line as data, since
|
||||||
a real empty line terminates the data input.
|
a real empty line terminates the data input.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
If the <b>subject_literal</b> modifier is set for a pattern, all subject lines
|
||||||
|
that follow are treated as literals, with no special treatment of backslashes.
|
||||||
|
No replication is possible, and any subject modifiers must be set as defaults
|
||||||
|
by a <b>#subject</b> command.
|
||||||
|
</P>
|
||||||
<br><a name="SEC10" href="#TOC1">PATTERN MODIFIERS</a><br>
|
<br><a name="SEC10" href="#TOC1">PATTERN MODIFIERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
There are several types of modifier that can appear in pattern lines. Except
|
There are several types of modifier that can appear in pattern lines. Except
|
||||||
|
@ -586,7 +594,10 @@ for a description of the effects of these options.
|
||||||
/x extended set PCRE2_EXTENDED
|
/x extended set PCRE2_EXTENDED
|
||||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||||
firstline set PCRE2_FIRSTLINE
|
firstline set PCRE2_FIRSTLINE
|
||||||
|
literal set PCRE2_LITERAL
|
||||||
|
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||||
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
||||||
|
match_word set PCRE2_EXTRA_MATCH_WORD
|
||||||
/m multiline set PCRE2_MULTILINE
|
/m multiline set PCRE2_MULTILINE
|
||||||
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
||||||
never_ucp set PCRE2_NEVER_UCP
|
never_ucp set PCRE2_NEVER_UCP
|
||||||
|
@ -638,6 +649,7 @@ heavily used in the test files.
|
||||||
push push compiled pattern onto the stack
|
push push compiled pattern onto the stack
|
||||||
pushcopy push a copy onto the stack
|
pushcopy push a copy onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
|
subject_literal treat all subject lines as literal
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
use_length do not zero-terminate the pattern
|
use_length do not zero-terminate the pattern
|
||||||
utf8_input treat input as UTF-8
|
utf8_input treat input as UTF-8
|
||||||
|
@ -728,18 +740,6 @@ testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses
|
||||||
default values).
|
default values).
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Specifying the pattern's length
|
|
||||||
</b><br>
|
|
||||||
<P>
|
|
||||||
By default, patterns are passed to the compiling functions as zero-terminated
|
|
||||||
strings. When using the POSIX wrapper API, there is no other option. However,
|
|
||||||
when using PCRE2's native API, patterns can be passed by length instead of
|
|
||||||
being zero-terminated. The <b>use_length</b> modifier causes this to happen.
|
|
||||||
Using a length happens automatically (whether or not <b>use_length</b> is set)
|
|
||||||
when <b>hex</b> is set, because patterns specified in hexadecimal may contain
|
|
||||||
binary zeros.
|
|
||||||
</P>
|
|
||||||
<br><b>
|
|
||||||
Specifying pattern characters in hexadecimal
|
Specifying pattern characters in hexadecimal
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -761,11 +761,20 @@ Either single or double quotes may be used. There is no way of including
|
||||||
the delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are
|
the delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are
|
||||||
mutually exclusive.
|
mutually exclusive.
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Specifying the pattern's length
|
||||||
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The POSIX API cannot be used with patterns specified in hexadecimal because
|
By default, patterns are passed to the compiling functions as zero-terminated
|
||||||
they may contain binary zeros, which conflicts with <b>regcomp()</b>'s
|
strings but can be passed by length instead of being zero-terminated. The
|
||||||
requirement for a zero-terminated string. Such patterns are always passed to
|
<b>use_length</b> modifier causes this to happen. Using a length happens
|
||||||
<b>pcre2_compile()</b> as a string with a length, not as zero-terminated.
|
automatically (whether or not <b>use_length</b> is set) when <b>hex</b> is set,
|
||||||
|
because patterns specified in hexadecimal may contain binary zeros.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
|
||||||
|
<a href="#posixwrapper">"Using the POSIX wrapper API"</a>
|
||||||
|
below), the REG_PEND extension is used to pass the pattern's length.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Specifying wide characters in 16-bit and 32-bit modes
|
Specifying wide characters in 16-bit and 32-bit modes
|
||||||
|
@ -826,7 +835,7 @@ modifier in "Subject Modifiers"
|
||||||
for details of how these options are specified for each match attempt.
|
for details of how these options are specified for each match attempt.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
JIT compilation is requested by the <b>/jit</b> pattern modifier, which may
|
JIT compilation is requested by the <b>jit</b> pattern modifier, which may
|
||||||
optionally be followed by an equals sign and a number in the range 0 to 7.
|
optionally be followed by an equals sign and a number in the range 0 to 7.
|
||||||
The three bits that make up the number specify which of the three JIT operating
|
The three bits that make up the number specify which of the three JIT operating
|
||||||
modes are to be compiled:
|
modes are to be compiled:
|
||||||
|
@ -850,7 +859,7 @@ to <b>pcre2_match()</b> with either the PCRE2_PARTIAL_SOFT or the
|
||||||
PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
|
PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
|
||||||
match; the options enable the possibility of a partial match, but do not
|
match; the options enable the possibility of a partial match, but do not
|
||||||
require it. Note also that if you request JIT compilation only for partial
|
require it. Note also that if you request JIT compilation only for partial
|
||||||
matching (for example, /jit=2) but do not set the <b>partial</b> modifier on a
|
matching (for example, jit=2) but do not set the <b>partial</b> modifier on a
|
||||||
subject line, that match will not use JIT code because none was compiled for
|
subject line, that match will not use JIT code because none was compiled for
|
||||||
non-partial matching.
|
non-partial matching.
|
||||||
</P>
|
</P>
|
||||||
|
@ -927,12 +936,12 @@ The <b>max_pattern_length</b> modifier sets a limit, in code units, to the
|
||||||
length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
|
length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
|
||||||
causes a compilation error. The default is the largest number a PCRE2_SIZE
|
causes a compilation error. The default is the largest number a PCRE2_SIZE
|
||||||
variable can hold (essentially unlimited).
|
variable can hold (essentially unlimited).
|
||||||
</P>
|
<a name="posixwrapper"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Using the POSIX wrapper API
|
Using the POSIX wrapper API
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The <b>/posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call
|
The <b>posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call
|
||||||
PCRE2 via the POSIX wrapper API rather than its native API. When
|
PCRE2 via the POSIX wrapper API rather than its native API. When
|
||||||
<b>posix_nosub</b> is used, the POSIX option REG_NOSUB is passed to
|
<b>posix_nosub</b> is used, the POSIX option REG_NOSUB is passed to
|
||||||
<b>regcomp()</b>. The POSIX wrapper supports only the 8-bit library. Note that
|
<b>regcomp()</b>. The POSIX wrapper supports only the 8-bit library. Note that
|
||||||
|
@ -962,6 +971,11 @@ The <b>aftertext</b> and <b>allaftertext</b> subject modifiers work as described
|
||||||
below. All other modifiers are either ignored, with a warning message, or cause
|
below. All other modifiers are either ignored, with a warning message, or cause
|
||||||
an error.
|
an error.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
|
||||||
|
default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
|
||||||
|
REG_PEND extension is used to pass it by length.
|
||||||
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Testing the stack guard feature
|
Testing the stack guard feature
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -999,17 +1013,18 @@ are mutually exclusive.
|
||||||
Setting certain match controls
|
Setting certain match controls
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The following modifiers are really subject modifiers, and are described below.
|
The following modifiers are really subject modifiers, and are described under
|
||||||
However, they may be included in a pattern's modifier list, in which case they
|
"Subject Modifiers" below. However, they may be included in a pattern's
|
||||||
are applied to every subject line that is processed with that pattern. They may
|
modifier list, in which case they are applied to every subject line that is
|
||||||
not appear in <b>#pattern</b> commands. These modifiers do not affect the
|
processed with that pattern. They may not appear in <b>#pattern</b> commands.
|
||||||
compilation process.
|
These modifiers do not affect the compilation process.
|
||||||
<pre>
|
<pre>
|
||||||
aftertext show text after match
|
aftertext show text after match
|
||||||
allaftertext show text after captures
|
allaftertext show text after captures
|
||||||
allcaptures show all captures
|
allcaptures show all captures
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
/g global global matching
|
/g global global matching
|
||||||
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
replace=<string> specify a replacement string
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
|
@ -1022,6 +1037,15 @@ These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
||||||
defaults, set them in a <b>#subject</b> command.
|
defaults, set them in a <b>#subject</b> command.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
Specifying literal subject lines
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
If the <b>subject_literal</b> modifier is present on a pattern, all the subject
|
||||||
|
lines that it matches are taken as literal strings, with no interpretation of
|
||||||
|
backslashes. It is not possible to set subject modifiers on such lines, but any
|
||||||
|
that are set as defaults by a <b>#subject</b> command are recognized.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
Saving a compiled pattern
|
Saving a compiled pattern
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1072,11 +1096,11 @@ The partial matching modifiers are provided with abbreviations because they
|
||||||
appear frequently in tests.
|
appear frequently in tests.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the <b>posix</b> modifier was present on the pattern, causing the POSIX
|
If the <b>posix</b> or <b>posix_nosub</b> modifier was present on the pattern,
|
||||||
wrapper API to be used, the only option-setting modifiers that have any effect
|
causing the POSIX wrapper API to be used, the only option-setting modifiers
|
||||||
are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
|
that have any effect are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>,
|
||||||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
|
causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
|
||||||
The other modifiers are ignored, with a warning message.
|
<b>regexec()</b>. The other modifiers are ignored, with a warning message.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There is one additional modifier that can be used with the POSIX wrapper. It is
|
There is one additional modifier that can be used with the POSIX wrapper. It is
|
||||||
|
@ -1085,11 +1109,13 @@ ignored (with a warning) if used for non-POSIX matching.
|
||||||
posix_startend=<n>[:<m>]
|
posix_startend=<n>[:<m>]
|
||||||
</pre>
|
</pre>
|
||||||
This causes the subject string to be passed to <b>regexec()</b> using the
|
This causes the subject string to be passed to <b>regexec()</b> using the
|
||||||
REG_STARTEND option, which uses offsets to restrict which part of the string is
|
REG_STARTEND option, which uses offsets to specify which part of the string is
|
||||||
searched. If only one number is given, the end offset is passed as the end of
|
searched. If only one number is given, the end offset is passed as the end of
|
||||||
the subject string. For more detail of REG_STARTEND, see the
|
the subject string. For more detail of REG_STARTEND, see the
|
||||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
documentation.
|
documentation. If the subject string contains binary zeros (coded as escapes
|
||||||
|
such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
|
||||||
|
its input), you must use <b>posix_startend</b> to specify its length.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting match controls
|
Setting match controls
|
||||||
|
@ -1355,9 +1381,11 @@ Setting the JIT stack size
|
||||||
<P>
|
<P>
|
||||||
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
|
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if JIT
|
that is used by the just-in-time optimization code. It is ignored if JIT
|
||||||
optimization is not being used. The value is a number of kilobytes. Providing a
|
optimization is not being used. The value is a number of kilobytes. Setting
|
||||||
stack that is larger than the default 32K is necessary only for very
|
zero reverts to the default of 32K. Providing a stack that is larger than the
|
||||||
complicated patterns.
|
default is necessary only for very complicated patterns. If <b>jitstack</b> is
|
||||||
|
set non-zero on a subject line it overrides any value that was set on the
|
||||||
|
pattern.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting heap, match, and depth limits
|
Setting heap, match, and depth limits
|
||||||
|
@ -1461,8 +1489,8 @@ Passing the subject as zero-terminated
|
||||||
By default, the subject string is passed to a native API matching function with
|
By default, the subject string is passed to a native API matching function with
|
||||||
its correct length. In order to test the facility for passing a zero-terminated
|
its correct length. In order to test the facility for passing a zero-terminated
|
||||||
string, the <b>zero_terminate</b> modifier is provided. It causes the length to
|
string, the <b>zero_terminate</b> modifier is provided. It causes the length to
|
||||||
be passed as PCRE2_ZERO_TERMINATED. (When matching via the POSIX interface,
|
be passed as PCRE2_ZERO_TERMINATED. When matching via the POSIX interface,
|
||||||
this modifier has no effect, as there is no facility for passing a length.)
|
this modifier is ignored, with a warning.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When testing <b>pcre2_substitute()</b>, this modifier also has the effect of
|
When testing <b>pcre2_substitute()</b>, this modifier also has the effect of
|
||||||
|
@ -1675,7 +1703,7 @@ callout is in a lookbehind assertion.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
||||||
result of the <b>/auto_callout</b> pattern modifier. In this case, instead of
|
result of the <b>auto_callout</b> pattern modifier. In this case, instead of
|
||||||
showing the callout number, the offset in the pattern, preceded by a plus, is
|
showing the callout number, the offset in the pattern, preceded by a plus, is
|
||||||
output. For example:
|
output. For example:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -1830,7 +1858,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 03 June 2017
|
Last updated: 16 June 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -1441,6 +1441,20 @@ COMPILING A PATTERN
|
||||||
first line and also within the offset limit. In other words, whichever
|
first line and also within the offset limit. In other words, whichever
|
||||||
limit comes first is used.
|
limit comes first is used.
|
||||||
|
|
||||||
|
PCRE2_LITERAL
|
||||||
|
|
||||||
|
If this option is set, all meta-characters in the pattern are disabled,
|
||||||
|
and it is treated as a literal string. Matching literal strings with a
|
||||||
|
regular expression engine is not the most efficient way of doing it. If
|
||||||
|
you are doing a lot of literal matching and are worried about effi-
|
||||||
|
ciency, you should consider using other approaches. The only other main
|
||||||
|
options that are allowed with PCRE2_LITERAL are: PCRE2_ANCHORED,
|
||||||
|
PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT, PCRE2_CASELESS, PCRE2_FIRSTLINE,
|
||||||
|
PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, PCRE2_UTF, and
|
||||||
|
PCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EXTRA_MATCH_LINE and
|
||||||
|
PCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an
|
||||||
|
error.
|
||||||
|
|
||||||
PCRE2_MATCH_UNSET_BACKREF
|
PCRE2_MATCH_UNSET_BACKREF
|
||||||
|
|
||||||
If this option is set, a back reference to an unset subpattern group
|
If this option is set, a back reference to an unset subpattern group
|
||||||
|
@ -1706,6 +1720,24 @@ COMPILING A PATTERN
|
||||||
option means that typos in patterns may go undetected and have unex-
|
option means that typos in patterns may go undetected and have unex-
|
||||||
pected results. This is a dangerous option. Use with care.
|
pected results. This is a dangerous option. Use with care.
|
||||||
|
|
||||||
|
PCRE2_EXTRA_MATCH_LINE
|
||||||
|
|
||||||
|
This option is provided for use by the -x option of pcre2grep. It
|
||||||
|
causes the pattern only to match complete lines. This is achieved by
|
||||||
|
automatically inserting the code for "^(?:" at the start of the com-
|
||||||
|
piled pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set,
|
||||||
|
the matched line may be in the middle of the subject string. This
|
||||||
|
option can be used with PCRE2_LITERAL.
|
||||||
|
|
||||||
|
PCRE2_EXTRA_MATCH_WORD
|
||||||
|
|
||||||
|
This option is provided for use by the -w option of pcre2grep. It
|
||||||
|
causes the pattern only to match strings that have a word boundary at
|
||||||
|
the start and the end. This is achieved by automatically inserting the
|
||||||
|
code for "\b(?:" at the start of the compiled pattern and ")\b" at the
|
||||||
|
end. The option may be used with PCRE2_LITERAL. However, it is ignored
|
||||||
|
if PCRE2_EXTRA_MATCH_LINE is also set.
|
||||||
|
|
||||||
|
|
||||||
COMPILATION ERROR CODES
|
COMPILATION ERROR CODES
|
||||||
|
|
||||||
|
@ -3368,7 +3400,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 01 June 2017
|
Last updated: 16 June 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -9036,6 +9068,15 @@ COMPILING A PATTERN
|
||||||
the defined POSIX behaviour for REG_NEWLINE (see the following sec-
|
the defined POSIX behaviour for REG_NEWLINE (see the following sec-
|
||||||
tion).
|
tion).
|
||||||
|
|
||||||
|
REG_NOSPEC
|
||||||
|
|
||||||
|
The PCRE2_LITERAL option is set when the regular expression is passed
|
||||||
|
for compilation to the native function. This disables all meta charac-
|
||||||
|
ters in the pattern, causing it to be treated as a literal string. The
|
||||||
|
only other options that are allowed with REG_NOSPEC are REG_ICASE,
|
||||||
|
REG_NOSUB, REG_PEND, and REG_UTF. Note that REG_NOSPEC is not part of
|
||||||
|
the POSIX standard.
|
||||||
|
|
||||||
REG_NOSUB
|
REG_NOSUB
|
||||||
|
|
||||||
When a pattern that is compiled with this flag is passed to regexec()
|
When a pattern that is compiled with this flag is passed to regexec()
|
||||||
|
@ -9232,7 +9273,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 05 June 2017
|
Last updated: 15 June 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_COMPILE 3 "17 May 2017" "PCRE2 10.30"
|
.TH PCRE2_COMPILE 3 "16 June 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -35,7 +35,7 @@ system stack size checking, or to change one or more of these parameters:
|
||||||
The newline character sequence;
|
The newline character sequence;
|
||||||
The compile time nested parentheses limit;
|
The compile time nested parentheses limit;
|
||||||
The maximum pattern length (in code units) that is allowed.
|
The maximum pattern length (in code units) that is allowed.
|
||||||
The additional options bits
|
The additional options bits (see pcre2_set_compile_extra_options())
|
||||||
.sp
|
.sp
|
||||||
The option bits are:
|
The option bits are:
|
||||||
.sp
|
.sp
|
||||||
|
@ -52,6 +52,7 @@ The option bits are:
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
PCRE2_EXTENDED Ignore white space and # comments
|
PCRE2_EXTENDED Ignore white space and # comments
|
||||||
PCRE2_FIRSTLINE Force matching to be before newline
|
PCRE2_FIRSTLINE Force matching to be before newline
|
||||||
|
PCRE2_LITERAL Pattern characters are all literal
|
||||||
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
||||||
PCRE2_MULTILINE ^ and $ match newlines within data
|
PCRE2_MULTILINE ^ and $ match newlines within data
|
||||||
PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns
|
PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "01 June 2017" "PCRE2 10.30"
|
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "16 June 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -24,6 +24,8 @@ options are:
|
||||||
.\" JOIN
|
.\" JOIN
|
||||||
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as
|
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as
|
||||||
a literal following character
|
a literal following character
|
||||||
|
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
|
||||||
|
PCRE2_EXTRA_MATCH_WORD Pattern matches "words"
|
||||||
.sp
|
.sp
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -64,12 +64,12 @@ INPUT ENCODING
|
||||||
unless you really want that action.
|
unless you really want that action.
|
||||||
|
|
||||||
The input is processed using using C's string functions, so must not
|
The input is processed using using C's string functions, so must not
|
||||||
contain binary zeroes, even though in Unix-like environments, fgets()
|
contain binary zeros, even though in Unix-like environments, fgets()
|
||||||
treats any bytes other than newline as data characters. An error is
|
treats any bytes other than newline as data characters. An error is
|
||||||
generated if a binary zero is encountered. Subject lines are processed
|
generated if a binary zero is encountered. By default subject lines are
|
||||||
for backslash escapes, which makes it possible to include any data
|
processed for backslash escapes, which makes it possible to include any
|
||||||
value in strings that are passed to the library for matching. For pat-
|
data value in strings that are passed to the library for matching. For
|
||||||
terns, there is a facility for specifying some or all of the 8-bit
|
patterns, there is a facility for specifying some or all of the 8-bit
|
||||||
input characters as hexadecimal pairs, which makes it possible to
|
input characters as hexadecimal pairs, which makes it possible to
|
||||||
include binary zeros.
|
include binary zeros.
|
||||||
|
|
||||||
|
@ -319,9 +319,9 @@ COMMAND LINES
|
||||||
|
|
||||||
When the POSIX API is being tested there is no way to override the
|
When the POSIX API is being tested there is no way to override the
|
||||||
default newline convention, though it is possible to set the newline
|
default newline convention, though it is possible to set the newline
|
||||||
convention from within the pattern. A warning is given if the posix
|
convention from within the pattern. A warning is given if the posix or
|
||||||
modifier is used when #newline_default would set a default for the non-
|
posix_nosub modifier is used when #newline_default would set a default
|
||||||
POSIX API.
|
for the non-POSIX API.
|
||||||
|
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
|
|
||||||
|
@ -424,8 +424,9 @@ SUBJECT LINE SYNTAX
|
||||||
|
|
||||||
Before each subject line is passed to pcre2_match() or
|
Before each subject line is passed to pcre2_match() or
|
||||||
pcre2_dfa_match(), leading and trailing white space is removed, and the
|
pcre2_dfa_match(), leading and trailing white space is removed, and the
|
||||||
line is scanned for backslash escapes. The following provide a means of
|
line is scanned for backslash escapes, unless the subject_literal modi-
|
||||||
encoding non-printing characters in a visible way:
|
fier was set for the pattern. The following provide a means of encoding
|
||||||
|
non-printing characters in a visible way:
|
||||||
|
|
||||||
\a alarm (BEL, \x07)
|
\a alarm (BEL, \x07)
|
||||||
\b backspace (\x08)
|
\b backspace (\x08)
|
||||||
|
@ -493,6 +494,11 @@ SUBJECT LINE SYNTAX
|
||||||
passing an empty line as data, since a real empty line terminates the
|
passing an empty line as data, since a real empty line terminates the
|
||||||
data input.
|
data input.
|
||||||
|
|
||||||
|
If the subject_literal modifier is set for a pattern, all subject lines
|
||||||
|
that follow are treated as literals, with no special treatment of back-
|
||||||
|
slashes. No replication is possible, and any subject modifiers must be
|
||||||
|
set as defaults by a #subject command.
|
||||||
|
|
||||||
|
|
||||||
PATTERN MODIFIERS
|
PATTERN MODIFIERS
|
||||||
|
|
||||||
|
@ -530,7 +536,10 @@ PATTERN MODIFIERS
|
||||||
/x extended set PCRE2_EXTENDED
|
/x extended set PCRE2_EXTENDED
|
||||||
/xx extended_more set PCRE2_EXTENDED_MORE
|
/xx extended_more set PCRE2_EXTENDED_MORE
|
||||||
firstline set PCRE2_FIRSTLINE
|
firstline set PCRE2_FIRSTLINE
|
||||||
|
literal set PCRE2_LITERAL
|
||||||
|
match_line set PCRE2_EXTRA_MATCH_LINE
|
||||||
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
||||||
|
match_word set PCRE2_EXTRA_MATCH_WORD
|
||||||
/m multiline set PCRE2_MULTILINE
|
/m multiline set PCRE2_MULTILINE
|
||||||
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
||||||
never_ucp set PCRE2_NEVER_UCP
|
never_ucp set PCRE2_NEVER_UCP
|
||||||
|
@ -580,6 +589,7 @@ PATTERN MODIFIERS
|
||||||
push push compiled pattern onto the stack
|
push push compiled pattern onto the stack
|
||||||
pushcopy push a copy onto the stack
|
pushcopy push a copy onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
|
subject_literal treat all subject lines as literal
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
use_length do not zero-terminate the pattern
|
use_length do not zero-terminate the pattern
|
||||||
utf8_input treat input as UTF-8
|
utf8_input treat input as UTF-8
|
||||||
|
@ -659,16 +669,6 @@ PATTERN MODIFIERS
|
||||||
testing that pcre2_compile() behaves correctly in this case (it uses
|
testing that pcre2_compile() behaves correctly in this case (it uses
|
||||||
default values).
|
default values).
|
||||||
|
|
||||||
Specifying the pattern's length
|
|
||||||
|
|
||||||
By default, patterns are passed to the compiling functions as zero-ter-
|
|
||||||
minated strings. When using the POSIX wrapper API, there is no other
|
|
||||||
option. However, when using PCRE2's native API, patterns can be passed
|
|
||||||
by length instead of being zero-terminated. The use_length modifier
|
|
||||||
causes this to happen. Using a length happens automatically (whether
|
|
||||||
or not use_length is set) when hex is set, because patterns specified
|
|
||||||
in hexadecimal may contain binary zeros.
|
|
||||||
|
|
||||||
Specifying pattern characters in hexadecimal
|
Specifying pattern characters in hexadecimal
|
||||||
|
|
||||||
The hex modifier specifies that the characters of the pattern, except
|
The hex modifier specifies that the characters of the pattern, except
|
||||||
|
@ -690,11 +690,18 @@ PATTERN MODIFIERS
|
||||||
ing the delimiter within a substring. The hex and expand modifiers are
|
ing the delimiter within a substring. The hex and expand modifiers are
|
||||||
mutually exclusive.
|
mutually exclusive.
|
||||||
|
|
||||||
The POSIX API cannot be used with patterns specified in hexadecimal
|
Specifying the pattern's length
|
||||||
because they may contain binary zeros, which conflicts with regcomp()'s
|
|
||||||
requirement for a zero-terminated string. Such patterns are always
|
By default, patterns are passed to the compiling functions as zero-ter-
|
||||||
passed to pcre2_compile() as a string with a length, not as zero-termi-
|
minated strings but can be passed by length instead of being zero-ter-
|
||||||
nated.
|
minated. The use_length modifier causes this to happen. Using a length
|
||||||
|
happens automatically (whether or not use_length is set) when hex is
|
||||||
|
set, because patterns specified in hexadecimal may contain binary
|
||||||
|
zeros.
|
||||||
|
|
||||||
|
If hex or use_length is used with the POSIX wrapper API (see "Using the
|
||||||
|
POSIX wrapper API" below), the REG_PEND extension is used to pass the
|
||||||
|
pattern's length.
|
||||||
|
|
||||||
Specifying wide characters in 16-bit and 32-bit modes
|
Specifying wide characters in 16-bit and 32-bit modes
|
||||||
|
|
||||||
|
@ -742,7 +749,7 @@ PATTERN MODIFIERS
|
||||||
partial modifier in "Subject Modifiers" below for details of how these
|
partial modifier in "Subject Modifiers" below for details of how these
|
||||||
options are specified for each match attempt.
|
options are specified for each match attempt.
|
||||||
|
|
||||||
JIT compilation is requested by the /jit pattern modifier, which may
|
JIT compilation is requested by the jit pattern modifier, which may
|
||||||
optionally be followed by an equals sign and a number in the range 0 to
|
optionally be followed by an equals sign and a number in the range 0 to
|
||||||
7. The three bits that make up the number specify which of the three
|
7. The three bits that make up the number specify which of the three
|
||||||
JIT operating modes are to be compiled:
|
JIT operating modes are to be compiled:
|
||||||
|
@ -766,7 +773,7 @@ PATTERN MODIFIERS
|
||||||
PCRE2_PARTIAL_HARD option set. Note that such a call may return a com-
|
PCRE2_PARTIAL_HARD option set. Note that such a call may return a com-
|
||||||
plete match; the options enable the possibility of a partial match, but
|
plete match; the options enable the possibility of a partial match, but
|
||||||
do not require it. Note also that if you request JIT compilation only
|
do not require it. Note also that if you request JIT compilation only
|
||||||
for partial matching (for example, /jit=2) but do not set the partial
|
for partial matching (for example, jit=2) but do not set the partial
|
||||||
modifier on a subject line, that match will not use JIT code because
|
modifier on a subject line, that match will not use JIT code because
|
||||||
none was compiled for non-partial matching.
|
none was compiled for non-partial matching.
|
||||||
|
|
||||||
|
@ -833,7 +840,7 @@ PATTERN MODIFIERS
|
||||||
|
|
||||||
Using the POSIX wrapper API
|
Using the POSIX wrapper API
|
||||||
|
|
||||||
The /posix and posix_nosub modifiers cause pcre2test to call PCRE2 via
|
The posix and posix_nosub modifiers cause pcre2test to call PCRE2 via
|
||||||
the POSIX wrapper API rather than its native API. When posix_nosub is
|
the POSIX wrapper API rather than its native API. When posix_nosub is
|
||||||
used, the POSIX option REG_NOSUB is passed to regcomp(). The POSIX
|
used, the POSIX option REG_NOSUB is passed to regcomp(). The POSIX
|
||||||
wrapper supports only the 8-bit library. Note that it does not imply
|
wrapper supports only the 8-bit library. Note that it does not imply
|
||||||
|
@ -862,6 +869,10 @@ PATTERN MODIFIERS
|
||||||
below. All other modifiers are either ignored, with a warning message,
|
below. All other modifiers are either ignored, with a warning message,
|
||||||
or cause an error.
|
or cause an error.
|
||||||
|
|
||||||
|
The pattern is passed to regcomp() as a zero-terminated string by
|
||||||
|
default, but if the use_length or hex modifiers are set, the REG_PEND
|
||||||
|
extension is used to pass it by length.
|
||||||
|
|
||||||
Testing the stack guard feature
|
Testing the stack guard feature
|
||||||
|
|
||||||
The stackguard modifier is used to test the use of pcre2_set_com-
|
The stackguard modifier is used to test the use of pcre2_set_com-
|
||||||
|
@ -894,16 +905,18 @@ PATTERN MODIFIERS
|
||||||
Setting certain match controls
|
Setting certain match controls
|
||||||
|
|
||||||
The following modifiers are really subject modifiers, and are described
|
The following modifiers are really subject modifiers, and are described
|
||||||
below. However, they may be included in a pattern's modifier list, in
|
under "Subject Modifiers" below. However, they may be included in a
|
||||||
which case they are applied to every subject line that is processed
|
pattern's modifier list, in which case they are applied to every sub-
|
||||||
with that pattern. They may not appear in #pattern commands. These mod-
|
ject line that is processed with that pattern. They may not appear in
|
||||||
ifiers do not affect the compilation process.
|
#pattern commands. These modifiers do not affect the compilation
|
||||||
|
process.
|
||||||
|
|
||||||
aftertext show text after match
|
aftertext show text after match
|
||||||
allaftertext show text after captures
|
allaftertext show text after captures
|
||||||
allcaptures show all captures
|
allcaptures show all captures
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
/g global global matching
|
/g global global matching
|
||||||
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
replace=<string> specify a replacement string
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
|
@ -915,6 +928,14 @@ PATTERN MODIFIERS
|
||||||
These modifiers may not appear in a #pattern command. If you want them
|
These modifiers may not appear in a #pattern command. If you want them
|
||||||
as defaults, set them in a #subject command.
|
as defaults, set them in a #subject command.
|
||||||
|
|
||||||
|
Specifying literal subject lines
|
||||||
|
|
||||||
|
If the subject_literal modifier is present on a pattern, all the sub-
|
||||||
|
ject lines that it matches are taken as literal strings, with no inter-
|
||||||
|
pretation of backslashes. It is not possible to set subject modifiers
|
||||||
|
on such lines, but any that are set as defaults by a #subject command
|
||||||
|
are recognized.
|
||||||
|
|
||||||
Saving a compiled pattern
|
Saving a compiled pattern
|
||||||
|
|
||||||
When a pattern with the push modifier is successfully compiled, it is
|
When a pattern with the push modifier is successfully compiled, it is
|
||||||
|
@ -959,11 +980,11 @@ SUBJECT MODIFIERS
|
||||||
The partial matching modifiers are provided with abbreviations because
|
The partial matching modifiers are provided with abbreviations because
|
||||||
they appear frequently in tests.
|
they appear frequently in tests.
|
||||||
|
|
||||||
If the posix modifier was present on the pattern, causing the POSIX
|
If the posix or posix_nosub modifier was present on the pattern, caus-
|
||||||
wrapper API to be used, the only option-setting modifiers that have any
|
ing the POSIX wrapper API to be used, the only option-setting modifiers
|
||||||
effect are notbol, notempty, and noteol, causing REG_NOTBOL,
|
that have any effect are notbol, notempty, and noteol, causing REG_NOT-
|
||||||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
|
||||||
The other modifiers are ignored, with a warning message.
|
regexec(). The other modifiers are ignored, with a warning message.
|
||||||
|
|
||||||
There is one additional modifier that can be used with the POSIX wrap-
|
There is one additional modifier that can be used with the POSIX wrap-
|
||||||
per. It is ignored (with a warning) if used for non-POSIX matching.
|
per. It is ignored (with a warning) if used for non-POSIX matching.
|
||||||
|
@ -971,10 +992,13 @@ SUBJECT MODIFIERS
|
||||||
posix_startend=<n>[:<m>]
|
posix_startend=<n>[:<m>]
|
||||||
|
|
||||||
This causes the subject string to be passed to regexec() using the
|
This causes the subject string to be passed to regexec() using the
|
||||||
REG_STARTEND option, which uses offsets to restrict which part of the
|
REG_STARTEND option, which uses offsets to specify which part of the
|
||||||
string is searched. If only one number is given, the end offset is
|
string is searched. If only one number is given, the end offset is
|
||||||
passed as the end of the subject string. For more detail of REG_STAR-
|
passed as the end of the subject string. For more detail of REG_STAR-
|
||||||
TEND, see the pcre2posix documentation.
|
TEND, see the pcre2posix documentation. If the subject string contains
|
||||||
|
binary zeros (coded as escapes such as \x{00} because pcre2test does
|
||||||
|
not support actual binary zeros in its input), you must use posix_star-
|
||||||
|
tend to specify its length.
|
||||||
|
|
||||||
Setting match controls
|
Setting match controls
|
||||||
|
|
||||||
|
@ -1222,8 +1246,10 @@ SUBJECT MODIFIERS
|
||||||
The jitstack modifier provides a way of setting the maximum stack size
|
The jitstack modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if
|
that is used by the just-in-time optimization code. It is ignored if
|
||||||
JIT optimization is not being used. The value is a number of kilobytes.
|
JIT optimization is not being used. The value is a number of kilobytes.
|
||||||
Providing a stack that is larger than the default 32K is necessary only
|
Setting zero reverts to the default of 32K. Providing a stack that is
|
||||||
for very complicated patterns.
|
larger than the default is necessary only for very complicated pat-
|
||||||
|
terns. If jitstack is set non-zero on a subject line it overrides any
|
||||||
|
value that was set on the pattern.
|
||||||
|
|
||||||
Setting heap, match, and depth limits
|
Setting heap, match, and depth limits
|
||||||
|
|
||||||
|
@ -1310,9 +1336,8 @@ SUBJECT MODIFIERS
|
||||||
By default, the subject string is passed to a native API matching func-
|
By default, the subject string is passed to a native API matching func-
|
||||||
tion with its correct length. In order to test the facility for passing
|
tion with its correct length. In order to test the facility for passing
|
||||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching
|
||||||
via the POSIX interface, this modifier has no effect, as there is no
|
via the POSIX interface, this modifier is ignored, with a warning.
|
||||||
facility for passing a length.)
|
|
||||||
|
|
||||||
When testing pcre2_substitute(), this modifier also has the effect of
|
When testing pcre2_substitute(), this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
|
@ -1513,8 +1538,8 @@ CALLOUTS
|
||||||
position, which can happen if the callout is in a lookbehind assertion.
|
position, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the /auto_callout pattern modifier. In this case, instead
|
a result of the auto_callout pattern modifier. In this case, instead of
|
||||||
of showing the callout number, the offset in the pattern, preceded by a
|
showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
re> /\d?[A-E]\*/auto_callout
|
re> /\d?[A-E]\*/auto_callout
|
||||||
|
@ -1662,5 +1687,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 03 June 2017
|
Last updated: 16 June 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue