Documentation update.
This commit is contained in:
parent
a89423624d
commit
c6ee84317d
|
@ -3525,9 +3525,10 @@ first match attempt, the second attempt would start at the second character
|
|||
instead of skipping on to "c".
|
||||
</P>
|
||||
<P>
|
||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
||||
is not later than the starting point of the current match, the position
|
||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
||||
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||
starting position of the current match, or (by being inside a lookbehind)
|
||||
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||
"bumpalong" occurs.
|
||||
<pre>
|
||||
(*SKIP:NAME)
|
||||
</pre>
|
||||
|
@ -3754,7 +3755,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 June 2019
|
||||
Last updated: 22 June 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -16,8 +16,8 @@ DESCRIPTION
|
|||
|
||||
pcre2-config returns the configuration of the installed PCRE2 libraries
|
||||
and the options required to compile a program to use them. Some of the
|
||||
options apply only to the 8-bit, or 16-bit, or 32-bit libraries,
|
||||
respectively, and are not available for libraries that have not been
|
||||
options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re-
|
||||
spectively, and are not available for libraries that have not been
|
||||
built. If an unavailable option is encountered, the "usage" information
|
||||
is output.
|
||||
|
||||
|
@ -36,30 +36,30 @@ OPTIONS
|
|||
--version Writes the version number of the installed PCRE2 libraries to
|
||||
the standard output.
|
||||
|
||||
--libs8 Writes to the standard output the command line options
|
||||
required to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
||||
--libs8 Writes to the standard output the command line options re-
|
||||
quired to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
||||
many systems).
|
||||
|
||||
--libs16 Writes to the standard output the command line options
|
||||
required to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
||||
--libs16 Writes to the standard output the command line options re-
|
||||
quired to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
||||
many systems).
|
||||
|
||||
--libs32 Writes to the standard output the command line options
|
||||
required to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
||||
--libs32 Writes to the standard output the command line options re-
|
||||
quired to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
||||
many systems).
|
||||
|
||||
--libs-posix
|
||||
Writes to the standard output the command line options
|
||||
required to link with PCRE2's POSIX API wrapper library
|
||||
Writes to the standard output the command line options re-
|
||||
quired to link with PCRE2's POSIX API wrapper library
|
||||
(-lpcre2-posix -lpcre2-8 on many systems).
|
||||
|
||||
--cflags Writes to the standard output the command line options
|
||||
required to compile files that use PCRE2 (this may include
|
||||
some -I options, but is blank on many systems).
|
||||
--cflags Writes to the standard output the command line options re-
|
||||
quired to compile files that use PCRE2 (this may include some
|
||||
-I options, but is blank on many systems).
|
||||
|
||||
--cflags-posix
|
||||
Writes to the standard output the command line options
|
||||
required to compile files that use PCRE2's POSIX API wrapper
|
||||
Writes to the standard output the command line options re-
|
||||
quired to compile files that use PCRE2's POSIX API wrapper
|
||||
library (this may include some -I options, but is blank on
|
||||
many systems).
|
||||
|
||||
|
|
4151
doc/pcre2.txt
4151
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34"
|
||||
.TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -3564,9 +3564,10 @@ effect as this example; although it would suppress backtracking during the
|
|||
first match attempt, the second attempt would start at the second character
|
||||
instead of skipping on to "c".
|
||||
.P
|
||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
||||
is not later than the starting point of the current match, the position
|
||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
||||
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||
starting position of the current match, or (by being inside a lookbehind)
|
||||
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||
"bumpalong" occurs.
|
||||
.sp
|
||||
(*SKIP:NAME)
|
||||
.sp
|
||||
|
@ -3787,6 +3788,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 June 2019
|
||||
Last updated: 22 June 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -13,8 +13,8 @@ SYNOPSIS
|
|||
but it can also be used for experimenting with regular expressions.
|
||||
This document describes the features of the test program; for details
|
||||
of the regular expressions themselves, see the pcre2pattern documenta-
|
||||
tion. For details of the PCRE2 library function calls and their
|
||||
options, see the pcre2api documentation.
|
||||
tion. For details of the PCRE2 library function calls and their op-
|
||||
tions, see the pcre2api documentation.
|
||||
|
||||
The input for pcre2test is a sequence of regular expression patterns
|
||||
and subject strings to be matched. There are also command lines for
|
||||
|
@ -33,26 +33,26 @@ SYNOPSIS
|
|||
which are specifically designed for use in conjunction with the test
|
||||
script and data files that are distributed as part of PCRE2. All the
|
||||
modifiers are documented here, some without much justification, but
|
||||
many of them are unlikely to be of use except when testing the
|
||||
libraries.
|
||||
many of them are unlikely to be of use except when testing the li-
|
||||
braries.
|
||||
|
||||
|
||||
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||
|
||||
Different versions of the PCRE2 library can be built to support charac-
|
||||
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
|
||||
One, two, or all three of these libraries may be simultaneously
|
||||
installed. The pcre2test program can be used to test all the libraries.
|
||||
One, two, or all three of these libraries may be simultaneously in-
|
||||
stalled. The pcre2test program can be used to test all the libraries.
|
||||
However, its own input and output are always in 8-bit format. When
|
||||
testing the 16-bit or 32-bit libraries, patterns and subject strings
|
||||
are converted to 16-bit or 32-bit format before being passed to the
|
||||
library functions. Results are converted back to 8-bit code units for
|
||||
are converted to 16-bit or 32-bit format before being passed to the li-
|
||||
brary functions. Results are converted back to 8-bit code units for
|
||||
output.
|
||||
|
||||
In the rest of this document, the names of library functions and struc-
|
||||
tures are given in generic form, for example, pcre_compile(). The
|
||||
actual names used in the libraries have a suffix _8, _16, or _32, as
|
||||
appropriate.
|
||||
tures are given in generic form, for example, pcre_compile(). The ac-
|
||||
tual names used in the libraries have a suffix _8, _16, or _32, as ap-
|
||||
propriate.
|
||||
|
||||
|
||||
INPUT ENCODING
|
||||
|
@ -70,18 +70,18 @@ INPUT ENCODING
|
|||
processed for backslash escapes, which makes it possible to include any
|
||||
data value in strings that are passed to the library for matching. For
|
||||
patterns, there is a facility for specifying some or all of the 8-bit
|
||||
input characters as hexadecimal pairs, which makes it possible to
|
||||
include binary zeros.
|
||||
input characters as hexadecimal pairs, which makes it possible to in-
|
||||
clude binary zeros.
|
||||
|
||||
Input for the 16-bit and 32-bit libraries
|
||||
|
||||
When testing the 16-bit or 32-bit libraries, there is a need to be able
|
||||
to generate character code points greater than 255 in the strings that
|
||||
are passed to the library. For subject lines, backslash escapes can be
|
||||
used. In addition, when the utf modifier (see "Setting compilation
|
||||
options" below) is set, the pattern and any following subject lines are
|
||||
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as
|
||||
appropriate.
|
||||
used. In addition, when the utf modifier (see "Setting compilation op-
|
||||
tions" below) is set, the pattern and any following subject lines are
|
||||
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
|
||||
propriate.
|
||||
|
||||
For non-UTF testing of wide characters, the utf8_input modifier can be
|
||||
used. This is mutually exclusive with utf, and is allowed only in
|
||||
|
@ -121,8 +121,8 @@ COMMAND LINE OPTIONS
|
|||
piled.
|
||||
|
||||
-AC As for -ac, but in addition behave as if each subject line
|
||||
has the callout_extra modifier, that is, show additional
|
||||
information from callouts.
|
||||
has the callout_extra modifier, that is, show additional in-
|
||||
formation from callouts.
|
||||
|
||||
-b Behave as if each pattern has the fullbincode modifier; the
|
||||
full internal binary form of the pattern is output after com-
|
||||
|
@ -130,9 +130,9 @@ COMMAND LINE OPTIONS
|
|||
|
||||
-C Output the version number of the PCRE2 library, and all
|
||||
available information about the optional features that are
|
||||
included, and then exit with zero exit code. All other
|
||||
options are ignored. If both -C and -LM are present, which-
|
||||
ever is first is recognized.
|
||||
included, and then exit with zero exit code. All other op-
|
||||
tions are ignored. If both -C and -LM are present, whichever
|
||||
is first is recognized.
|
||||
|
||||
-C option Output information about a specific build-time option, then
|
||||
exit. This functionality is intended for use in scripts such
|
||||
|
@ -269,8 +269,8 @@ DESCRIPTION
|
|||
supply them explicitly.
|
||||
|
||||
An empty line or the end of the file signals the end of the subject
|
||||
lines for a test, at which point a new pattern or command line is
|
||||
expected if there is still input to be read.
|
||||
lines for a test, at which point a new pattern or command line is ex-
|
||||
pected if there is still input to be read.
|
||||
|
||||
|
||||
COMMAND LINES
|
||||
|
@ -311,8 +311,8 @@ COMMAND LINES
|
|||
as indicating a newline in a pattern or subject string. The default can
|
||||
be overridden when a pattern is compiled. The standard test files con-
|
||||
tain tests of various newline conventions, but the majority of the
|
||||
tests expect a single linefeed to be recognized as a newline by
|
||||
default. Without special action the tests would fail when PCRE2 is com-
|
||||
tests expect a single linefeed to be recognized as a newline by de-
|
||||
fault. Without special action the tests would fail when PCRE2 is com-
|
||||
piled with either CR or CRLF as the default newline.
|
||||
|
||||
The #newline_default command specifies a list of newline types that are
|
||||
|
@ -323,14 +323,14 @@ COMMAND LINES
|
|||
|
||||
If the default newline is in the list, this command has no effect. Oth-
|
||||
erwise, except when testing the POSIX API, a newline modifier that
|
||||
specifies the first newline convention in the list (LF in the above
|
||||
example) is added to any pattern that does not already have a newline
|
||||
specifies the first newline convention in the list (LF in the above ex-
|
||||
ample) is added to any pattern that does not already have a newline
|
||||
modifier. If the newline list is empty, the feature is turned off. This
|
||||
command is present in a number of the standard test input files.
|
||||
|
||||
When the POSIX API is being tested there is no way to override the
|
||||
default newline convention, though it is possible to set the newline
|
||||
convention from within the pattern. A warning is given if the posix or
|
||||
When the POSIX API is being tested there is no way to override the de-
|
||||
fault newline convention, though it is possible to set the newline con-
|
||||
vention from within the pattern. A warning is given if the posix or
|
||||
posix_nosub modifier is used when #newline_default would set a default
|
||||
for the non-POSIX API.
|
||||
|
||||
|
@ -344,8 +344,8 @@ COMMAND LINES
|
|||
The appearance of this line causes all subsequent modifier settings to
|
||||
be checked for compatibility with the perltest.sh script, which is used
|
||||
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
||||
comment lines, #pattern commands, and #subject commands that set or
|
||||
unset "mark", no command lines are permitted, because they and many of
|
||||
comment lines, #pattern commands, and #subject commands that set or un-
|
||||
set "mark", no command lines are permitted, because they and many of
|
||||
the modifiers are specific to pcre2test, and should not be used in test
|
||||
files that are also processed by perltest.sh. The #perltest command
|
||||
helps detect tests that are accidentally put in the wrong file.
|
||||
|
@ -376,8 +376,8 @@ MODIFIER SYNTAX
|
|||
list are separated by commas followed by optional white space. Trailing
|
||||
whitespace in a modifier list is ignored. Some modifiers may be given
|
||||
for both patterns and subject lines, whereas others are valid only for
|
||||
one or the other. Each modifier has a long name, for example
|
||||
"anchored", and some of them must be followed by an equals sign and a
|
||||
one or the other. Each modifier has a long name, for example "an-
|
||||
chored", and some of them must be followed by an equals sign and a
|
||||
value, for example, "offset=12". Values cannot contain comma charac-
|
||||
ters, but may contain spaces. Modifiers that do not take values may be
|
||||
preceded by a minus sign to turn off a previous setting.
|
||||
|
@ -498,8 +498,8 @@ SUBJECT LINE SYNTAX
|
|||
\= This is a comment.
|
||||
abc\= This is an invalid modifier list.
|
||||
|
||||
A backslash followed by any other non-alphanumeric character just
|
||||
escapes that character. A backslash followed by anything else causes an
|
||||
A backslash followed by any other non-alphanumeric character just es-
|
||||
capes that character. A backslash followed by anything else causes an
|
||||
error. However, if the very last character in the line is a backslash
|
||||
(and there is no modifier list), it is ignored. This gives a way of
|
||||
passing an empty line as data, since a real empty line terminates the
|
||||
|
@ -523,13 +523,13 @@ PATTERN MODIFIERS
|
|||
The following modifiers set options for pcre2_compile(). Most of them
|
||||
set bits in the options argument of that function, but those whose
|
||||
names start with PCRE2_EXTRA are additional options that are set in the
|
||||
compile context. For the main options, there are some single-letter
|
||||
abbreviations that are the same as Perl options. There is special han-
|
||||
compile context. For the main options, there are some single-letter ab-
|
||||
breviations that are the same as Perl options. There is special han-
|
||||
dling for /x: if a second x is present, PCRE2_EXTENDED is converted
|
||||
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds
|
||||
PCRE2_EXTENDED as well, though this makes no difference to the way
|
||||
pcre2_compile() behaves. See pcre2api for a description of the effects
|
||||
of these options.
|
||||
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
|
||||
TENDED as well, though this makes no difference to the way pcre2_com-
|
||||
pile() behaves. See pcre2api for a description of the effects of these
|
||||
options.
|
||||
|
||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
|
@ -577,9 +577,9 @@ PATTERN MODIFIERS
|
|||
|
||||
Setting compilation controls
|
||||
|
||||
The following modifiers affect the compilation process or request
|
||||
information about the pattern. There are single-letter abbreviations
|
||||
for some that are heavily used in the test files.
|
||||
The following modifiers affect the compilation process or request in-
|
||||
formation about the pattern. There are single-letter abbreviations for
|
||||
some that are heavily used in the test files.
|
||||
|
||||
bsr=[anycrlf|unicode] specify \R handling
|
||||
/B bincode show binary code without lengths
|
||||
|
@ -717,8 +717,8 @@ PATTERN MODIFIERS
|
|||
minated strings but can be passed by length instead of being zero-ter-
|
||||
minated. The use_length modifier causes this to happen. Using a length
|
||||
happens automatically (whether or not use_length is set) when hex is
|
||||
set, because patterns specified in hexadecimal may contain binary
|
||||
zeros.
|
||||
set, because patterns specified in hexadecimal may contain binary ze-
|
||||
ros.
|
||||
|
||||
If hex or use_length is used with the POSIX wrapper API (see "Using the
|
||||
POSIX wrapper API" below), the REG_PEND extension is used to pass the
|
||||
|
@ -770,8 +770,8 @@ PATTERN MODIFIERS
|
|||
partial modifier in "Subject Modifiers" below for details of how these
|
||||
options are specified for each match attempt.
|
||||
|
||||
JIT compilation is requested by the jit pattern modifier, which may
|
||||
optionally be followed by an equals sign and a number in the range 0 to
|
||||
JIT compilation is requested by the jit pattern modifier, which may op-
|
||||
tionally be followed by an equals sign and a number in the range 0 to
|
||||
7. The three bits that make up the number specify which of the three
|
||||
JIT operating modes are to be compiled:
|
||||
|
||||
|
@ -799,8 +799,8 @@ PATTERN MODIFIERS
|
|||
none was compiled for non-partial matching.
|
||||
|
||||
If JIT compilation is successful, the compiled JIT code will automati-
|
||||
cally be used when an appropriate type of match is run, except when
|
||||
incompatible run-time options are specified. For more details, see the
|
||||
cally be used when an appropriate type of match is run, except when in-
|
||||
compatible run-time options are specified. For more details, see the
|
||||
pcre2jit documentation. See also the jitstack modifier below for a way
|
||||
of setting the size of the JIT stack.
|
||||
|
||||
|
@ -847,8 +847,8 @@ PATTERN MODIFIERS
|
|||
Limiting nested parentheses
|
||||
|
||||
The parens_nest_limit modifier sets a limit on the depth of nested
|
||||
parentheses in a pattern. Breaching the limit causes a compilation
|
||||
error. The default for the library is set when PCRE2 is built, but
|
||||
parentheses in a pattern. Breaching the limit causes a compilation er-
|
||||
ror. The default for the library is set when PCRE2 is built, but
|
||||
pcre2test sets its own default of 220, which is required for running
|
||||
the standard test suite.
|
||||
|
||||
|
@ -886,13 +886,13 @@ PATTERN MODIFIERS
|
|||
buffer is too small for the error message. If this modifier has not
|
||||
been set, a large buffer is used.
|
||||
|
||||
The aftertext and allaftertext subject modifiers work as described
|
||||
below. All other modifiers are either ignored, with a warning message,
|
||||
or cause an error.
|
||||
The aftertext and allaftertext subject modifiers work as described be-
|
||||
low. All other modifiers are either ignored, with a warning message, or
|
||||
cause an error.
|
||||
|
||||
The pattern is passed to regcomp() as a zero-terminated string by
|
||||
default, but if the use_length or hex modifiers are set, the REG_PEND
|
||||
extension is used to pass it by length.
|
||||
The pattern is passed to regcomp() as a zero-terminated string by de-
|
||||
fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
|
||||
tension is used to pass it by length.
|
||||
|
||||
Testing the stack guard feature
|
||||
|
||||
|
@ -920,8 +920,8 @@ PATTERN MODIFIERS
|
|||
2 a set of tables defining ISO 8859 characters
|
||||
|
||||
In table 2, some characters whose codes are greater than 128 are iden-
|
||||
tified as letters, digits, spaces, etc. Setting alternate character
|
||||
tables and a locale are mutually exclusive.
|
||||
tified as letters, digits, spaces, etc. Setting alternate character ta-
|
||||
bles and a locale are mutually exclusive.
|
||||
|
||||
Setting certain match controls
|
||||
|
||||
|
@ -971,12 +971,12 @@ PATTERN MODIFIERS
|
|||
terns" below. If pushcopy is used instead of push, a copy of the com-
|
||||
piled pattern is stacked, leaving the original as current, ready to
|
||||
match the following input lines. This provides a way of testing the
|
||||
pcre2_code_copy() function. The push and pushcopy modifiers are
|
||||
incompatible with compilation modifiers such as global that act at
|
||||
match time. Any that are specified are ignored (for the stacked copy),
|
||||
with a warning message, except for replace, which causes an error. Note
|
||||
that jitverify, which is allowed, does not carry through to any subse-
|
||||
quent matching that uses a stacked pattern.
|
||||
pcre2_code_copy() function. The push and pushcopy modifiers are in-
|
||||
compatible with compilation modifiers such as global that act at match
|
||||
time. Any that are specified are ignored (for the stacked copy), with a
|
||||
warning message, except for replace, which causes an error. Note that
|
||||
jitverify, which is allowed, does not carry through to any subsequent
|
||||
matching that uses a stacked pattern.
|
||||
|
||||
Testing foreign pattern conversion
|
||||
|
||||
|
@ -1124,12 +1124,12 @@ SUBJECT MODIFIERS
|
|||
The allusedtext modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown.
|
||||
This feature is not supported for JIT matching, and if requested with
|
||||
JIT it is ignored (with a warning message). Setting this modifier
|
||||
affects the output if there is a lookbehind at the start of a match, or
|
||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||
that precede or follow the start and end of the actual match are indi-
|
||||
cated in the output by '<' or '>' characters underneath them. Here is
|
||||
an example:
|
||||
JIT it is ignored (with a warning message). Setting this modifier af-
|
||||
fects the output if there is a lookbehind at the start of a match, or a
|
||||
lookahead at the end, or if \K is used in the pattern. Characters that
|
||||
precede or follow the start and end of the actual match are indicated
|
||||
in the output by '<' or '>' characters underneath them. Here is an ex-
|
||||
ample:
|
||||
|
||||
re> /(?<=pqr)abc(?=xyz)/
|
||||
data> 123pqrabcxyz456\=allusedtext
|
||||
|
@ -1145,8 +1145,8 @@ SUBJECT MODIFIERS
|
|||
string. The only time when this occurs is when \K has been processed as
|
||||
part of the match. In this situation, the output for the matched string
|
||||
is displayed from the starting character instead of from the match
|
||||
point, with circumflex characters under the earlier characters. For
|
||||
example:
|
||||
point, with circumflex characters under the earlier characters. For ex-
|
||||
ample:
|
||||
|
||||
re> /abc\Kxyz/
|
||||
data> abcxyz\=startchar
|
||||
|
@ -1171,12 +1171,12 @@ SUBJECT MODIFIERS
|
|||
The allvector modifier requests that the entire ovector be shown, what-
|
||||
ever the outcome of the match. Compare allcaptures, which shows only up
|
||||
to the maximum number of capture groups for the pattern, and then only
|
||||
for a successful complete non-DFA match. This modifier, which acts
|
||||
after any match result, and also for DFA matching, provides a means of
|
||||
for a successful complete non-DFA match. This modifier, which acts af-
|
||||
ter any match result, and also for DFA matching, provides a means of
|
||||
checking that there are no unexpected modifications to ovector fields.
|
||||
Before each match attempt, the ovector is filled with a special value,
|
||||
and if this is found in both elements of a capturing pair,
|
||||
"<unchanged>" is output. After a successful match, this applies to all
|
||||
and if this is found in both elements of a capturing pair, "<un-
|
||||
changed>" is output. After a successful match, this applies to all
|
||||
groups after the maximum capture group for the pattern. In other cases
|
||||
it applies to the entire ovector. After a partial match, the first two
|
||||
elements are the only ones that should be set. After a DFA match, the
|
||||
|
@ -1207,12 +1207,12 @@ SUBJECT MODIFIERS
|
|||
If an empty string is matched, the next match is done with the
|
||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||
for another, non-empty, match at the same point in the subject. If this
|
||||
match fails, the start offset is advanced, and the normal match is
|
||||
retried. This imitates the way Perl handles such cases when using the
|
||||
/g modifier or the split() function. Normally, the start offset is
|
||||
advanced by one character, but if the newline convention recognizes
|
||||
CRLF as a newline, and the current character is CR followed by LF, an
|
||||
advance of two characters occurs.
|
||||
match fails, the start offset is advanced, and the normal match is re-
|
||||
tried. This imitates the way Perl handles such cases when using the /g
|
||||
modifier or the split() function. Normally, the start offset is ad-
|
||||
vanced by one character, but if the newline convention recognizes CRLF
|
||||
as a newline, and the current character is CR followed by LF, an ad-
|
||||
vance of two characters occurs.
|
||||
|
||||
Testing substring extraction functions
|
||||
|
||||
|
@ -1275,8 +1275,8 @@ SUBJECT MODIFIERS
|
|||
than 256 characters) for substitution tests, as fixed-size buffers are
|
||||
used. To make it easy to test for buffer overflow, if the replacement
|
||||
string starts with a number in square brackets, that number is passed
|
||||
to pcre2_substitute() as the size of the output buffer, with the
|
||||
replacement string starting at the next character. Here is an example
|
||||
to pcre2_substitute() as the size of the output buffer, with the re-
|
||||
placement string starting at the next character. Here is an example
|
||||
that tests the edge case:
|
||||
|
||||
/abc/
|
||||
|
@ -1285,10 +1285,10 @@ SUBJECT MODIFIERS
|
|||
123abc123\=replace=[9]XYZ
|
||||
Failed: error -47: no more memory
|
||||
|
||||
The default action of pcre2_substitute() is to return
|
||||
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
|
||||
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
|
||||
stitute_overflow_length modifier), pcre2_substitute() continues to go
|
||||
The default action of pcre2_substitute() is to return PCRE2_ER-
|
||||
ROR_NOMEMORY when the output buffer is too small. However, if the
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
|
||||
tute_overflow_length modifier), pcre2_substitute() continues to go
|
||||
through the motions of matching and substituting (but not doing any
|
||||
callouts), in order to compute the size of buffer that is required.
|
||||
When this happens, pcre2test shows the required buffer length (which
|
||||
|
@ -1323,8 +1323,8 @@ SUBJECT MODIFIERS
|
|||
Then are listed the offsets of the old substring, its contents, and the
|
||||
same for the replacement.
|
||||
|
||||
By default, the substitution callout function returns zero, which
|
||||
accepts the replacement and causes matching to continue if /g was used.
|
||||
By default, the substitution callout function returns zero, which ac-
|
||||
cepts the replacement and causes matching to continue if /g was used.
|
||||
Two further modifiers can be used to test other return values. If sub-
|
||||
stitute_skip is set to a value greater than zero the callout function
|
||||
returns +1 for the match of that number, and similarly substitute_stop
|
||||
|
@ -1411,8 +1411,8 @@ SUBJECT MODIFIERS
|
|||
|
||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||
ory allocation and freeing calls that occur during a call to
|
||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match
|
||||
requires a bigger vector than the default for remembering backtracking
|
||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
|
||||
quires a bigger vector than the default for remembering backtracking
|
||||
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
||||
In many cases there will be no heap memory used and therefore no addi-
|
||||
tional output. No heap memory is allocated during matching with JIT, so
|
||||
|
@ -1435,9 +1435,9 @@ SUBJECT MODIFIERS
|
|||
|
||||
Setting the size of the output vector
|
||||
|
||||
The ovector modifier applies only to the subject line in which it
|
||||
appears, though of course it can also be used to set a default in a
|
||||
#subject command. It specifies the number of pairs of offsets that are
|
||||
The ovector modifier applies only to the subject line in which it ap-
|
||||
pears, though of course it can also be used to set a default in a #sub-
|
||||
ject command. It specifies the number of pairs of offsets that are
|
||||
available for storing matching information. The default is 15.
|
||||
|
||||
A value of zero is useful when testing the POSIX API because it causes
|
||||
|
@ -1491,12 +1491,12 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
|
||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||
strings, starting with number 0 for the string that matched the whole
|
||||
pattern. Otherwise, it outputs "No match" when the return is
|
||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||
this is the entire substring that was inspected during the partial
|
||||
match; it may include characters before the actual match start if a
|
||||
lookbehind assertion, \K, \b, or \B was involved.)
|
||||
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
|
||||
ROR_NOMATCH, or "Partial match:" followed by the partially matching
|
||||
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
|
||||
the entire substring that was inspected during the partial match; it
|
||||
may include characters before the actual match start if a lookbehind
|
||||
assertion, \K, \b, or \B was involved.)
|
||||
|
||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||
and a short descriptive phrase. If the error is a failed UTF string
|
||||
|
@ -1541,8 +1541,8 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
0: cat
|
||||
0+ aract
|
||||
|
||||
If global matching is requested, the results of successive matching
|
||||
attempts are output in sequence, like this:
|
||||
If global matching is requested, the results of successive matching at-
|
||||
tempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
|
@ -1580,12 +1580,12 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
|||
2: tan
|
||||
|
||||
Using the normal matching function on this data finds only "tang". The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||
followed by the partially matching substring. Note that this is the
|
||||
entire substring that was inspected during the partial match; it may
|
||||
include characters before the actual match start if a lookbehind asser-
|
||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
||||
longest matching string is always given first (and numbered zero). Af-
|
||||
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
|
||||
lowed by the partially matching substring. Note that this is the entire
|
||||
substring that was inspected during the partial match; it may include
|
||||
characters before the actual match start if a lookbehind assertion, \b,
|
||||
or \B was involved. (\K is not supported for DFA matching.)
|
||||
|
||||
If global matching is requested, the search for further matches resumes
|
||||
at the end of the longest match. For example:
|
||||
|
@ -1638,12 +1638,12 @@ CALLOUTS
|
|||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
positions are the same, or if the current position precedes the start
|
||||
position, which can happen if the callout is in a lookbehind assertion.
|
||||
This output indicates that callout number 0 occurred for a match at-
|
||||
tempt starting at the fourth character of the subject string, when the
|
||||
pointer was at the seventh character, and when the next pattern item
|
||||
was \d. Just one circumflex is output if the start and current posi-
|
||||
tions are the same, or if the current position precedes the start posi-
|
||||
tion, which can happen if the callout is in a lookbehind assertion.
|
||||
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||
a result of the auto_callout pattern modifier. In this case, instead of
|
||||
|
@ -1660,8 +1660,8 @@ CALLOUTS
|
|||
0: E*
|
||||
|
||||
If a pattern contains (*MARK) items, an additional line is output when-
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
example:
|
||||
ever a change of latest mark is passed to the callout function. For ex-
|
||||
ample:
|
||||
|
||||
re> /a(*MARK:X)bc/auto_callout
|
||||
data> abc
|
||||
|
@ -1683,8 +1683,8 @@ CALLOUTS
|
|||
|
||||
The output for a callout with a string argument is similar, except that
|
||||
instead of outputting a callout number before the position indicators,
|
||||
the callout string and its offset in the pattern string are output
|
||||
before the reflection of the subject string, and the subject string is
|
||||
the callout string and its offset in the pattern string are output be-
|
||||
fore the reflection of the subject string, and the subject string is
|
||||
reflected for each callout. For example:
|
||||
|
||||
re> /^ab(?C'first')cd(?C"second")ef/
|
||||
|
@ -1800,9 +1800,9 @@ NON-PRINTING CHARACTERS
|
|||
|
||||
When pcre2test is outputting text that is a matched part of a subject
|
||||
string, it behaves in the same way, unless a different locale has been
|
||||
set for the pattern (using the locale modifier). In this case, the
|
||||
isprint() function is used to distinguish printing and non-printing
|
||||
characters.
|
||||
set for the pattern (using the locale modifier). In this case, the is-
|
||||
print() function is used to distinguish printing and non-printing char-
|
||||
acters.
|
||||
|
||||
|
||||
SAVING AND RESTORING COMPILED PATTERNS
|
||||
|
@ -1814,14 +1814,14 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||
compiled patterns can be saved they must be serialized, that is, con-
|
||||
verted to a stream of bytes. A single byte stream may contain any num-
|
||||
ber of compiled patterns, but they must all use the same character
|
||||
tables. A single copy of the tables is included in the byte stream (its
|
||||
ber of compiled patterns, but they must all use the same character ta-
|
||||
bles. A single copy of the tables is included in the byte stream (its
|
||||
size is 1088 bytes).
|
||||
|
||||
The functions whose names begin with pcre2_serialize_ are used for
|
||||
serializing and de-serializing. They are described in the pcre2serial-
|
||||
ize documentation. In this section we describe the features of
|
||||
pcre2test that can be used to test these functions.
|
||||
The functions whose names begin with pcre2_serialize_ are used for se-
|
||||
rializing and de-serializing. They are described in the pcre2serialize
|
||||
documentation. In this section we describe the features of pcre2test
|
||||
that can be used to test these functions.
|
||||
|
||||
Note that "serialization" in PCRE2 does not convert compiled patterns
|
||||
to an abstract format like Java or .NET. It just makes a reloadable
|
||||
|
@ -1831,8 +1831,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
||||
expects the next line to contain a new pattern (or command) instead of
|
||||
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
||||
compiled pattern to be stacked, leaving the original available for
|
||||
immediate matching. By using push and/or pushcopy, a number of patterns
|
||||
compiled pattern to be stacked, leaving the original available for im-
|
||||
mediate matching. By using push and/or pushcopy, a number of patterns
|
||||
can be compiled and retained. These modifiers are incompatible with
|
||||
posix, and control modifiers that act at match time are ignored (with a
|
||||
message) for the stacked patterns. The jitverify modifier applies only
|
||||
|
@ -1855,8 +1855,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
matched with the pattern, terminated as usual by an empty line or end
|
||||
of file. This command may be followed by a modifier list containing
|
||||
only control modifiers that act after a pattern has been compiled. In
|
||||
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||
particular, hex, posix, posix_nosub, push, and pushcopy are not al-
|
||||
lowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||
however permitted. Here is an example that saves and reloads two pat-
|
||||
terns.
|
||||
|
||||
|
|
Loading…
Reference in New Issue