Documentation update.

This commit is contained in:
Philip.Hazel 2017-06-16 17:57:18 +00:00
parent a083420cac
commit c92bfc3d21
9 changed files with 522 additions and 383 deletions

View File

@ -47,7 +47,7 @@ system stack size checking, or to change one or more of these parameters:
The newline character sequence;
The compile time nested parentheses limit;
The maximum pattern length (in code units) that is allowed.
The additional options bits
The additional options bits (see pcre2_set_compile_extra_options())
</pre>
The option bits are:
<pre>
@ -64,6 +64,7 @@ The option bits are:
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_EXTENDED Ignore white space and # comments
PCRE2_FIRSTLINE Force matching to be before newline
PCRE2_LITERAL Pattern characters are all literal
PCRE2_MATCH_UNSET_BACKREF Match unset back references
PCRE2_MULTILINE ^ and $ match newlines within data
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns

View File

@ -32,6 +32,8 @@ options are:
<pre>
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
PCRE2_EXTRA_MATCH_WORD Pattern matches "words"
</pre>
There is a complete description of the PCRE2 native API in the
<a href="pcre2api.html"><b>pcre2api</b></a>

View File

@ -1453,6 +1453,19 @@ continue over the newline. See also PCRE2_USE_OFFSET_LIMIT, which provides a
more general limiting facility. If PCRE2_FIRSTLINE is set with an offset limit,
a match must occur in the first line and also within the offset limit. In other
words, whichever limit comes first is used.
<pre>
PCRE2_LITERAL
</pre>
If this option is set, all meta-characters in the pattern are disabled, and it
is treated as a literal string. Matching literal strings with a regular
expression engine is not the most efficient way of doing it. If you are doing a
lot of literal matching and are worried about efficiency, you should consider
using other approaches. The only other main options that are allowed with
PCRE2_LITERAL are: PCRE2_ANCHORED, PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT,
PCRE2_CASELESS, PCRE2_FIRSTLINE, PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK,
PCRE2_UTF, and PCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EXTRA_MATCH_LINE
and PCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an
error.
<pre>
PCRE2_MATCH_UNSET_BACKREF
</pre>
@ -1724,6 +1737,24 @@ treated as single-character escapes. For example, \j is a literal "j" and
\x{2z} is treated as the literal string "x{2z}". Setting this option means
that typos in patterns may go undetected and have unexpected results. This is a
dangerous option. Use with care.
<pre>
PCRE2_EXTRA_MATCH_LINE
</pre>
This option is provided for use by the <b>-x</b> option of <b>pcre2grep</b>. It
causes the pattern only to match complete lines. This is achieved by
automatically inserting the code for "^(?:" at the start of the compiled
pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched
line may be in the middle of the subject string. This option can be used with
PCRE2_LITERAL.
<pre>
PCRE2_EXTRA_MATCH_WORD
</pre>
This option is provided for use by the <b>-w</b> option of <b>pcre2grep</b>. It
causes the pattern only to match strings that have a word boundary at the start
and the end. This is achieved by automatically inserting the code for "\b(?:"
at the start of the compiled pattern and ")\b" at the end. The option may be
used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is
also set.
</P>
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
<P>
@ -3489,7 +3520,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 01 June 2017
Last updated: 16 June 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -117,6 +117,14 @@ compilation to the native function.
The PCRE2_MULTILINE option is set when the regular expression is passed for
compilation to the native function. Note that this does <i>not</i> mimic the
defined POSIX behaviour for REG_NEWLINE (see the following section).
<pre>
REG_NOSPEC
</pre>
The PCRE2_LITERAL option is set when the regular expression is passed for
compilation to the native function. This disables all meta characters in the
pattern, causing it to be treated as a literal string. The only other options
that are allowed with REG_NOSPEC are REG_ICASE, REG_NOSUB, REG_PEND, and
REG_UTF. Note that REG_NOSPEC is not part of the POSIX standard.
<pre>
REG_NOSUB
</pre>
@ -314,7 +322,7 @@ Cambridge, England.
</P>
<br><a name="SEC9" href="#TOC1">REVISION</a><br>
<P>
Last updated: 05 June 2017
Last updated: 15 June 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -96,12 +96,12 @@ want that action.
</P>
<P>
The input is processed using using C's string functions, so must not
contain binary zeroes, even though in Unix-like environments, <b>fgets()</b>
contain binary zeros, even though in Unix-like environments, <b>fgets()</b>
treats any bytes other than newline as data characters. An error is generated
if a binary zero is encountered. Subject lines are processed for backslash
escapes, which makes it possible to include any data value in strings that are
passed to the library for matching. For patterns, there is a facility for
specifying some or all of the 8-bit input characters as hexadecimal pairs,
if a binary zero is encountered. By default subject lines are processed for
backslash escapes, which makes it possible to include any data value in strings
that are passed to the library for matching. For patterns, there is a facility
for specifying some or all of the 8-bit input characters as hexadecimal pairs,
which makes it possible to include binary zeros.
</P>
<br><b>
@ -382,8 +382,9 @@ of the standard test input files.
<P>
When the POSIX API is being tested there is no way to override the default
newline convention, though it is possible to set the newline convention from
within the pattern. A warning is given if the <b>posix</b> modifier is used when
<b>#newline_default</b> would set a default for the non-POSIX API.
within the pattern. A warning is given if the <b>posix</b> or <b>posix_nosub</b>
modifier is used when <b>#newline_default</b> would set a default for the
non-POSIX API.
<pre>
#pattern &#60;modifier-list&#62;
</pre>
@ -479,8 +480,9 @@ A pattern can be followed by a modifier list (details below).
<P>
Before each subject line is passed to <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b>, leading and trailing white space is removed, and the
line is scanned for backslash escapes. The following provide a means of
encoding non-printing characters in a visible way:
line is scanned for backslash escapes, unless the <b>subject_literal</b>
modifier was set for the pattern. The following provide a means of encoding
non-printing characters in a visible way:
<pre>
\a alarm (BEL, \x07)
\b backspace (\x08)
@ -548,6 +550,12 @@ the very last character in the line is a backslash (and there is no modifier
list), it is ignored. This gives a way of passing an empty line as data, since
a real empty line terminates the data input.
</P>
<P>
If the <b>subject_literal</b> modifier is set for a pattern, all subject lines
that follow are treated as literals, with no special treatment of backslashes.
No replication is possible, and any subject modifiers must be set as defaults
by a <b>#subject</b> command.
</P>
<br><a name="SEC10" href="#TOC1">PATTERN MODIFIERS</a><br>
<P>
There are several types of modifier that can appear in pattern lines. Except
@ -586,7 +594,10 @@ for a description of the effects of these options.
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
match_word set PCRE2_EXTRA_MATCH_WORD
/m multiline set PCRE2_MULTILINE
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_ucp set PCRE2_NEVER_UCP
@ -638,6 +649,7 @@ heavily used in the test files.
push push compiled pattern onto the stack
pushcopy push a copy onto the stack
stackguard=&#60;number&#62; test the stackguard feature
subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
@ -728,18 +740,6 @@ testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses
default values).
</P>
<br><b>
Specifying the pattern's length
</b><br>
<P>
By default, patterns are passed to the compiling functions as zero-terminated
strings. When using the POSIX wrapper API, there is no other option. However,
when using PCRE2's native API, patterns can be passed by length instead of
being zero-terminated. The <b>use_length</b> modifier causes this to happen.
Using a length happens automatically (whether or not <b>use_length</b> is set)
when <b>hex</b> is set, because patterns specified in hexadecimal may contain
binary zeros.
</P>
<br><b>
Specifying pattern characters in hexadecimal
</b><br>
<P>
@ -761,11 +761,20 @@ Either single or double quotes may be used. There is no way of including
the delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are
mutually exclusive.
</P>
<br><b>
Specifying the pattern's length
</b><br>
<P>
The POSIX API cannot be used with patterns specified in hexadecimal because
they may contain binary zeros, which conflicts with <b>regcomp()</b>'s
requirement for a zero-terminated string. Such patterns are always passed to
<b>pcre2_compile()</b> as a string with a length, not as zero-terminated.
By default, patterns are passed to the compiling functions as zero-terminated
strings but can be passed by length instead of being zero-terminated. The
<b>use_length</b> modifier causes this to happen. Using a length happens
automatically (whether or not <b>use_length</b> is set) when <b>hex</b> is set,
because patterns specified in hexadecimal may contain binary zeros.
</P>
<P>
If <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
<a href="#posixwrapper">"Using the POSIX wrapper API"</a>
below), the REG_PEND extension is used to pass the pattern's length.
</P>
<br><b>
Specifying wide characters in 16-bit and 32-bit modes
@ -826,7 +835,7 @@ modifier in "Subject Modifiers"
for details of how these options are specified for each match attempt.
</P>
<P>
JIT compilation is requested by the <b>/jit</b> pattern modifier, which may
JIT compilation is requested by the <b>jit</b> pattern modifier, which may
optionally be followed by an equals sign and a number in the range 0 to 7.
The three bits that make up the number specify which of the three JIT operating
modes are to be compiled:
@ -850,7 +859,7 @@ to <b>pcre2_match()</b> with either the PCRE2_PARTIAL_SOFT or the
PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
match; the options enable the possibility of a partial match, but do not
require it. Note also that if you request JIT compilation only for partial
matching (for example, /jit=2) but do not set the <b>partial</b> modifier on a
matching (for example, jit=2) but do not set the <b>partial</b> modifier on a
subject line, that match will not use JIT code because none was compiled for
non-partial matching.
</P>
@ -927,12 +936,12 @@ The <b>max_pattern_length</b> modifier sets a limit, in code units, to the
length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
causes a compilation error. The default is the largest number a PCRE2_SIZE
variable can hold (essentially unlimited).
</P>
<a name="posixwrapper"></a></P>
<br><b>
Using the POSIX wrapper API
</b><br>
<P>
The <b>/posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call
The <b>posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call
PCRE2 via the POSIX wrapper API rather than its native API. When
<b>posix_nosub</b> is used, the POSIX option REG_NOSUB is passed to
<b>regcomp()</b>. The POSIX wrapper supports only the 8-bit library. Note that
@ -962,6 +971,11 @@ The <b>aftertext</b> and <b>allaftertext</b> subject modifiers work as described
below. All other modifiers are either ignored, with a warning message, or cause
an error.
</P>
<P>
The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
REG_PEND extension is used to pass it by length.
</P>
<br><b>
Testing the stack guard feature
</b><br>
@ -999,17 +1013,18 @@ are mutually exclusive.
Setting certain match controls
</b><br>
<P>
The following modifiers are really subject modifiers, and are described below.
However, they may be included in a pattern's modifier list, in which case they
are applied to every subject line that is processed with that pattern. They may
not appear in <b>#pattern</b> commands. These modifiers do not affect the
compilation process.
The following modifiers are really subject modifiers, and are described under
"Subject Modifiers" below. However, they may be included in a pattern's
modifier list, in which case they are applied to every subject line that is
processed with that pattern. They may not appear in <b>#pattern</b> commands.
These modifiers do not affect the compilation process.
<pre>
aftertext show text after match
allaftertext show text after captures
allcaptures show all captures
allusedtext show all consulted text
/g global global matching
jitstack=&#60;n&#62; set size of JIT stack
mark show mark values
replace=&#60;string&#62; specify a replacement string
startchar show starting character when relevant
@ -1022,6 +1037,15 @@ These modifiers may not appear in a <b>#pattern</b> command. If you want them as
defaults, set them in a <b>#subject</b> command.
</P>
<br><b>
Specifying literal subject lines
</b><br>
<P>
If the <b>subject_literal</b> modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any
that are set as defaults by a <b>#subject</b> command are recognized.
</P>
<br><b>
Saving a compiled pattern
</b><br>
<P>
@ -1072,11 +1096,11 @@ The partial matching modifiers are provided with abbreviations because they
appear frequently in tests.
</P>
<P>
If the <b>posix</b> modifier was present on the pattern, causing the POSIX
wrapper API to be used, the only option-setting modifiers that have any effect
are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
The other modifiers are ignored, with a warning message.
If the <b>posix</b> or <b>posix_nosub</b> modifier was present on the pattern,
causing the POSIX wrapper API to be used, the only option-setting modifiers
that have any effect are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>,
causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
<b>regexec()</b>. The other modifiers are ignored, with a warning message.
</P>
<P>
There is one additional modifier that can be used with the POSIX wrapper. It is
@ -1085,11 +1109,13 @@ ignored (with a warning) if used for non-POSIX matching.
posix_startend=&#60;n&#62;[:&#60;m&#62;]
</pre>
This causes the subject string to be passed to <b>regexec()</b> using the
REG_STARTEND option, which uses offsets to restrict which part of the string is
REG_STARTEND option, which uses offsets to specify which part of the string is
searched. If only one number is given, the end offset is passed as the end of
the subject string. For more detail of REG_STARTEND, see the
<a href="pcre2posix.html"><b>pcre2posix</b></a>
documentation.
documentation. If the subject string contains binary zeros (coded as escapes
such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
its input), you must use <b>posix_startend</b> to specify its length.
</P>
<br><b>
Setting match controls
@ -1355,9 +1381,11 @@ Setting the JIT stack size
<P>
The <b>jitstack</b> modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Providing a
stack that is larger than the default 32K is necessary only for very
complicated patterns.
optimization is not being used. The value is a number of kilobytes. Setting
zero reverts to the default of 32K. Providing a stack that is larger than the
default is necessary only for very complicated patterns. If <b>jitstack</b> is
set non-zero on a subject line it overrides any value that was set on the
pattern.
</P>
<br><b>
Setting heap, match, and depth limits
@ -1461,8 +1489,8 @@ Passing the subject as zero-terminated
By default, the subject string is passed to a native API matching function with
its correct length. In order to test the facility for passing a zero-terminated
string, the <b>zero_terminate</b> modifier is provided. It causes the length to
be passed as PCRE2_ZERO_TERMINATED. (When matching via the POSIX interface,
this modifier has no effect, as there is no facility for passing a length.)
be passed as PCRE2_ZERO_TERMINATED. When matching via the POSIX interface,
this modifier is ignored, with a warning.
</P>
<P>
When testing <b>pcre2_substitute()</b>, this modifier also has the effect of
@ -1675,7 +1703,7 @@ callout is in a lookbehind assertion.
</P>
<P>
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
result of the <b>/auto_callout</b> pattern modifier. In this case, instead of
result of the <b>auto_callout</b> pattern modifier. In this case, instead of
showing the callout number, the offset in the pattern, preceded by a plus, is
output. For example:
<pre>
@ -1830,7 +1858,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 03 June 2017
Last updated: 16 June 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -1441,6 +1441,20 @@ COMPILING A PATTERN
first line and also within the offset limit. In other words, whichever
limit comes first is used.
PCRE2_LITERAL
If this option is set, all meta-characters in the pattern are disabled,
and it is treated as a literal string. Matching literal strings with a
regular expression engine is not the most efficient way of doing it. If
you are doing a lot of literal matching and are worried about effi-
ciency, you should consider using other approaches. The only other main
options that are allowed with PCRE2_LITERAL are: PCRE2_ANCHORED,
PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT, PCRE2_CASELESS, PCRE2_FIRSTLINE,
PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, PCRE2_UTF, and
PCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EXTRA_MATCH_LINE and
PCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an
error.
PCRE2_MATCH_UNSET_BACKREF
If this option is set, a back reference to an unset subpattern group
@ -1706,6 +1720,24 @@ COMPILING A PATTERN
option means that typos in patterns may go undetected and have unex-
pected results. This is a dangerous option. Use with care.
PCRE2_EXTRA_MATCH_LINE
This option is provided for use by the -x option of pcre2grep. It
causes the pattern only to match complete lines. This is achieved by
automatically inserting the code for "^(?:" at the start of the com-
piled pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set,
the matched line may be in the middle of the subject string. This
option can be used with PCRE2_LITERAL.
PCRE2_EXTRA_MATCH_WORD
This option is provided for use by the -w option of pcre2grep. It
causes the pattern only to match strings that have a word boundary at
the start and the end. This is achieved by automatically inserting the
code for "\b(?:" at the start of the compiled pattern and ")\b" at the
end. The option may be used with PCRE2_LITERAL. However, it is ignored
if PCRE2_EXTRA_MATCH_LINE is also set.
COMPILATION ERROR CODES
@ -3368,7 +3400,7 @@ AUTHOR
REVISION
Last updated: 01 June 2017
Last updated: 16 June 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -9036,6 +9068,15 @@ COMPILING A PATTERN
the defined POSIX behaviour for REG_NEWLINE (see the following sec-
tion).
REG_NOSPEC
The PCRE2_LITERAL option is set when the regular expression is passed
for compilation to the native function. This disables all meta charac-
ters in the pattern, causing it to be treated as a literal string. The
only other options that are allowed with REG_NOSPEC are REG_ICASE,
REG_NOSUB, REG_PEND, and REG_UTF. Note that REG_NOSPEC is not part of
the POSIX standard.
REG_NOSUB
When a pattern that is compiled with this flag is passed to regexec()
@ -9232,7 +9273,7 @@ AUTHOR
REVISION
Last updated: 05 June 2017
Last updated: 15 June 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_COMPILE 3 "17 May 2017" "PCRE2 10.30"
.TH PCRE2_COMPILE 3 "16 June 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -35,7 +35,7 @@ system stack size checking, or to change one or more of these parameters:
The newline character sequence;
The compile time nested parentheses limit;
The maximum pattern length (in code units) that is allowed.
The additional options bits
The additional options bits (see pcre2_set_compile_extra_options())
.sp
The option bits are:
.sp
@ -52,6 +52,7 @@ The option bits are:
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_EXTENDED Ignore white space and # comments
PCRE2_FIRSTLINE Force matching to be before newline
PCRE2_LITERAL Pattern characters are all literal
PCRE2_MATCH_UNSET_BACKREF Match unset back references
PCRE2_MULTILINE ^ and $ match newlines within data
PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns

View File

@ -1,4 +1,4 @@
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "01 June 2017" "PCRE2 10.30"
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "16 June 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -24,6 +24,8 @@ options are:
.\" JOIN
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as
a literal following character
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
PCRE2_EXTRA_MATCH_WORD Pattern matches "words"
.sp
There is a complete description of the PCRE2 native API in the
.\" HREF

View File

@ -64,12 +64,12 @@ INPUT ENCODING
unless you really want that action.
The input is processed using using C's string functions, so must not
contain binary zeroes, even though in Unix-like environments, fgets()
contain binary zeros, even though in Unix-like environments, fgets()
treats any bytes other than newline as data characters. An error is
generated if a binary zero is encountered. Subject lines are processed
for backslash escapes, which makes it possible to include any data
value in strings that are passed to the library for matching. For pat-
terns, there is a facility for specifying some or all of the 8-bit
generated if a binary zero is encountered. By default subject lines are
processed for backslash escapes, which makes it possible to include any
data value in strings that are passed to the library for matching. For
patterns, there is a facility for specifying some or all of the 8-bit
input characters as hexadecimal pairs, which makes it possible to
include binary zeros.
@ -319,9 +319,9 @@ COMMAND LINES
When the POSIX API is being tested there is no way to override the
default newline convention, though it is possible to set the newline
convention from within the pattern. A warning is given if the posix
modifier is used when #newline_default would set a default for the non-
POSIX API.
convention from within the pattern. A warning is given if the posix or
posix_nosub modifier is used when #newline_default would set a default
for the non-POSIX API.
#pattern <modifier-list>
@ -424,8 +424,9 @@ SUBJECT LINE SYNTAX
Before each subject line is passed to pcre2_match() or
pcre2_dfa_match(), leading and trailing white space is removed, and the
line is scanned for backslash escapes. The following provide a means of
encoding non-printing characters in a visible way:
line is scanned for backslash escapes, unless the subject_literal modi-
fier was set for the pattern. The following provide a means of encoding
non-printing characters in a visible way:
\a alarm (BEL, \x07)
\b backspace (\x08)
@ -493,6 +494,11 @@ SUBJECT LINE SYNTAX
passing an empty line as data, since a real empty line terminates the
data input.
If the subject_literal modifier is set for a pattern, all subject lines
that follow are treated as literals, with no special treatment of back-
slashes. No replication is possible, and any subject modifiers must be
set as defaults by a #subject command.
PATTERN MODIFIERS
@ -530,7 +536,10 @@ PATTERN MODIFIERS
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
match_word set PCRE2_EXTRA_MATCH_WORD
/m multiline set PCRE2_MULTILINE
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_ucp set PCRE2_NEVER_UCP
@ -580,6 +589,7 @@ PATTERN MODIFIERS
push push compiled pattern onto the stack
pushcopy push a copy onto the stack
stackguard=<number> test the stackguard feature
subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
@ -659,16 +669,6 @@ PATTERN MODIFIERS
testing that pcre2_compile() behaves correctly in this case (it uses
default values).
Specifying the pattern's length
By default, patterns are passed to the compiling functions as zero-ter-
minated strings. When using the POSIX wrapper API, there is no other
option. However, when using PCRE2's native API, patterns can be passed
by length instead of being zero-terminated. The use_length modifier
causes this to happen. Using a length happens automatically (whether
or not use_length is set) when hex is set, because patterns specified
in hexadecimal may contain binary zeros.
Specifying pattern characters in hexadecimal
The hex modifier specifies that the characters of the pattern, except
@ -690,11 +690,18 @@ PATTERN MODIFIERS
ing the delimiter within a substring. The hex and expand modifiers are
mutually exclusive.
The POSIX API cannot be used with patterns specified in hexadecimal
because they may contain binary zeros, which conflicts with regcomp()'s
requirement for a zero-terminated string. Such patterns are always
passed to pcre2_compile() as a string with a length, not as zero-termi-
nated.
Specifying the pattern's length
By default, patterns are passed to the compiling functions as zero-ter-
minated strings but can be passed by length instead of being zero-ter-
minated. The use_length modifier causes this to happen. Using a length
happens automatically (whether or not use_length is set) when hex is
set, because patterns specified in hexadecimal may contain binary
zeros.
If hex or use_length is used with the POSIX wrapper API (see "Using the
POSIX wrapper API" below), the REG_PEND extension is used to pass the
pattern's length.
Specifying wide characters in 16-bit and 32-bit modes
@ -742,7 +749,7 @@ PATTERN MODIFIERS
partial modifier in "Subject Modifiers" below for details of how these
options are specified for each match attempt.
JIT compilation is requested by the /jit pattern modifier, which may
JIT compilation is requested by the jit pattern modifier, which may
optionally be followed by an equals sign and a number in the range 0 to
7. The three bits that make up the number specify which of the three
JIT operating modes are to be compiled:
@ -766,7 +773,7 @@ PATTERN MODIFIERS
PCRE2_PARTIAL_HARD option set. Note that such a call may return a com-
plete match; the options enable the possibility of a partial match, but
do not require it. Note also that if you request JIT compilation only
for partial matching (for example, /jit=2) but do not set the partial
for partial matching (for example, jit=2) but do not set the partial
modifier on a subject line, that match will not use JIT code because
none was compiled for non-partial matching.
@ -833,7 +840,7 @@ PATTERN MODIFIERS
Using the POSIX wrapper API
The /posix and posix_nosub modifiers cause pcre2test to call PCRE2 via
The posix and posix_nosub modifiers cause pcre2test to call PCRE2 via
the POSIX wrapper API rather than its native API. When posix_nosub is
used, the POSIX option REG_NOSUB is passed to regcomp(). The POSIX
wrapper supports only the 8-bit library. Note that it does not imply
@ -862,6 +869,10 @@ PATTERN MODIFIERS
below. All other modifiers are either ignored, with a warning message,
or cause an error.
The pattern is passed to regcomp() as a zero-terminated string by
default, but if the use_length or hex modifiers are set, the REG_PEND
extension is used to pass it by length.
Testing the stack guard feature
The stackguard modifier is used to test the use of pcre2_set_com-
@ -894,16 +905,18 @@ PATTERN MODIFIERS
Setting certain match controls
The following modifiers are really subject modifiers, and are described
below. However, they may be included in a pattern's modifier list, in
which case they are applied to every subject line that is processed
with that pattern. They may not appear in #pattern commands. These mod-
ifiers do not affect the compilation process.
under "Subject Modifiers" below. However, they may be included in a
pattern's modifier list, in which case they are applied to every sub-
ject line that is processed with that pattern. They may not appear in
#pattern commands. These modifiers do not affect the compilation
process.
aftertext show text after match
allaftertext show text after captures
allcaptures show all captures
allusedtext show all consulted text
/g global global matching
jitstack=<n> set size of JIT stack
mark show mark values
replace=<string> specify a replacement string
startchar show starting character when relevant
@ -915,6 +928,14 @@ PATTERN MODIFIERS
These modifiers may not appear in a #pattern command. If you want them
as defaults, set them in a #subject command.
Specifying literal subject lines
If the subject_literal modifier is present on a pattern, all the sub-
ject lines that it matches are taken as literal strings, with no inter-
pretation of backslashes. It is not possible to set subject modifiers
on such lines, but any that are set as defaults by a #subject command
are recognized.
Saving a compiled pattern
When a pattern with the push modifier is successfully compiled, it is
@ -959,11 +980,11 @@ SUBJECT MODIFIERS
The partial matching modifiers are provided with abbreviations because
they appear frequently in tests.
If the posix modifier was present on the pattern, causing the POSIX
wrapper API to be used, the only option-setting modifiers that have any
effect are notbol, notempty, and noteol, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
The other modifiers are ignored, with a warning message.
If the posix or posix_nosub modifier was present on the pattern, caus-
ing the POSIX wrapper API to be used, the only option-setting modifiers
that have any effect are notbol, notempty, and noteol, causing REG_NOT-
BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
regexec(). The other modifiers are ignored, with a warning message.
There is one additional modifier that can be used with the POSIX wrap-
per. It is ignored (with a warning) if used for non-POSIX matching.
@ -971,10 +992,13 @@ SUBJECT MODIFIERS
posix_startend=<n>[:<m>]
This causes the subject string to be passed to regexec() using the
REG_STARTEND option, which uses offsets to restrict which part of the
REG_STARTEND option, which uses offsets to specify which part of the
string is searched. If only one number is given, the end offset is
passed as the end of the subject string. For more detail of REG_STAR-
TEND, see the pcre2posix documentation.
TEND, see the pcre2posix documentation. If the subject string contains
binary zeros (coded as escapes such as \x{00} because pcre2test does
not support actual binary zeros in its input), you must use posix_star-
tend to specify its length.
Setting match controls
@ -1222,8 +1246,10 @@ SUBJECT MODIFIERS
The jitstack modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if
JIT optimization is not being used. The value is a number of kilobytes.
Providing a stack that is larger than the default 32K is necessary only
for very complicated patterns.
Setting zero reverts to the default of 32K. Providing a stack that is
larger than the default is necessary only for very complicated pat-
terns. If jitstack is set non-zero on a subject line it overrides any
value that was set on the pattern.
Setting heap, match, and depth limits
@ -1310,9 +1336,8 @@ SUBJECT MODIFIERS
By default, the subject string is passed to a native API matching func-
tion with its correct length. In order to test the facility for passing
a zero-terminated string, the zero_terminate modifier is provided. It
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
via the POSIX interface, this modifier has no effect, as there is no
facility for passing a length.)
causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching
via the POSIX interface, this modifier is ignored, with a warning.
When testing pcre2_substitute(), this modifier also has the effect of
passing the replacement string as zero-terminated.
@ -1513,8 +1538,8 @@ CALLOUTS
position, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the /auto_callout pattern modifier. In this case, instead
of showing the callout number, the offset in the pattern, preceded by a
a result of the auto_callout pattern modifier. In this case, instead of
showing the callout number, the offset in the pattern, preceded by a
plus, is output. For example:
re> /\d?[A-E]\*/auto_callout
@ -1662,5 +1687,5 @@ AUTHOR
REVISION
Last updated: 03 June 2017
Last updated: 16 June 2017
Copyright (c) 1997-2017 University of Cambridge.