Documentation update.
This commit is contained in:
parent
a89423624d
commit
c6ee84317d
|
@ -3525,9 +3525,10 @@ first match attempt, the second attempt would start at the second character
|
||||||
instead of skipping on to "c".
|
instead of skipping on to "c".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||||
is not later than the starting point of the current match, the position
|
starting position of the current match, or (by being inside a lookbehind)
|
||||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||||
|
"bumpalong" occurs.
|
||||||
<pre>
|
<pre>
|
||||||
(*SKIP:NAME)
|
(*SKIP:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3754,7 +3755,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 June 2019
|
Last updated: 22 June 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -16,8 +16,8 @@ DESCRIPTION
|
||||||
|
|
||||||
pcre2-config returns the configuration of the installed PCRE2 libraries
|
pcre2-config returns the configuration of the installed PCRE2 libraries
|
||||||
and the options required to compile a program to use them. Some of the
|
and the options required to compile a program to use them. Some of the
|
||||||
options apply only to the 8-bit, or 16-bit, or 32-bit libraries,
|
options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re-
|
||||||
respectively, and are not available for libraries that have not been
|
spectively, and are not available for libraries that have not been
|
||||||
built. If an unavailable option is encountered, the "usage" information
|
built. If an unavailable option is encountered, the "usage" information
|
||||||
is output.
|
is output.
|
||||||
|
|
||||||
|
@ -36,30 +36,30 @@ OPTIONS
|
||||||
--version Writes the version number of the installed PCRE2 libraries to
|
--version Writes the version number of the installed PCRE2 libraries to
|
||||||
the standard output.
|
the standard output.
|
||||||
|
|
||||||
--libs8 Writes to the standard output the command line options
|
--libs8 Writes to the standard output the command line options re-
|
||||||
required to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
quired to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
--libs16 Writes to the standard output the command line options
|
--libs16 Writes to the standard output the command line options re-
|
||||||
required to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
quired to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
--libs32 Writes to the standard output the command line options
|
--libs32 Writes to the standard output the command line options re-
|
||||||
required to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
quired to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
--libs-posix
|
--libs-posix
|
||||||
Writes to the standard output the command line options
|
Writes to the standard output the command line options re-
|
||||||
required to link with PCRE2's POSIX API wrapper library
|
quired to link with PCRE2's POSIX API wrapper library
|
||||||
(-lpcre2-posix -lpcre2-8 on many systems).
|
(-lpcre2-posix -lpcre2-8 on many systems).
|
||||||
|
|
||||||
--cflags Writes to the standard output the command line options
|
--cflags Writes to the standard output the command line options re-
|
||||||
required to compile files that use PCRE2 (this may include
|
quired to compile files that use PCRE2 (this may include some
|
||||||
some -I options, but is blank on many systems).
|
-I options, but is blank on many systems).
|
||||||
|
|
||||||
--cflags-posix
|
--cflags-posix
|
||||||
Writes to the standard output the command line options
|
Writes to the standard output the command line options re-
|
||||||
required to compile files that use PCRE2's POSIX API wrapper
|
quired to compile files that use PCRE2's POSIX API wrapper
|
||||||
library (this may include some -I options, but is blank on
|
library (this may include some -I options, but is blank on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
|
|
4151
doc/pcre2.txt
4151
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34"
|
.TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -3564,9 +3564,10 @@ effect as this example; although it would suppress backtracking during the
|
||||||
first match attempt, the second attempt would start at the second character
|
first match attempt, the second attempt would start at the second character
|
||||||
instead of skipping on to "c".
|
instead of skipping on to "c".
|
||||||
.P
|
.P
|
||||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||||
is not later than the starting point of the current match, the position
|
starting position of the current match, or (by being inside a lookbehind)
|
||||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||||
|
"bumpalong" occurs.
|
||||||
.sp
|
.sp
|
||||||
(*SKIP:NAME)
|
(*SKIP:NAME)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3787,6 +3788,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 June 2019
|
Last updated: 22 June 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -13,8 +13,8 @@ SYNOPSIS
|
||||||
but it can also be used for experimenting with regular expressions.
|
but it can also be used for experimenting with regular expressions.
|
||||||
This document describes the features of the test program; for details
|
This document describes the features of the test program; for details
|
||||||
of the regular expressions themselves, see the pcre2pattern documenta-
|
of the regular expressions themselves, see the pcre2pattern documenta-
|
||||||
tion. For details of the PCRE2 library function calls and their
|
tion. For details of the PCRE2 library function calls and their op-
|
||||||
options, see the pcre2api documentation.
|
tions, see the pcre2api documentation.
|
||||||
|
|
||||||
The input for pcre2test is a sequence of regular expression patterns
|
The input for pcre2test is a sequence of regular expression patterns
|
||||||
and subject strings to be matched. There are also command lines for
|
and subject strings to be matched. There are also command lines for
|
||||||
|
@ -33,26 +33,26 @@ SYNOPSIS
|
||||||
which are specifically designed for use in conjunction with the test
|
which are specifically designed for use in conjunction with the test
|
||||||
script and data files that are distributed as part of PCRE2. All the
|
script and data files that are distributed as part of PCRE2. All the
|
||||||
modifiers are documented here, some without much justification, but
|
modifiers are documented here, some without much justification, but
|
||||||
many of them are unlikely to be of use except when testing the
|
many of them are unlikely to be of use except when testing the li-
|
||||||
libraries.
|
braries.
|
||||||
|
|
||||||
|
|
||||||
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||||
|
|
||||||
Different versions of the PCRE2 library can be built to support charac-
|
Different versions of the PCRE2 library can be built to support charac-
|
||||||
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
|
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
|
||||||
One, two, or all three of these libraries may be simultaneously
|
One, two, or all three of these libraries may be simultaneously in-
|
||||||
installed. The pcre2test program can be used to test all the libraries.
|
stalled. The pcre2test program can be used to test all the libraries.
|
||||||
However, its own input and output are always in 8-bit format. When
|
However, its own input and output are always in 8-bit format. When
|
||||||
testing the 16-bit or 32-bit libraries, patterns and subject strings
|
testing the 16-bit or 32-bit libraries, patterns and subject strings
|
||||||
are converted to 16-bit or 32-bit format before being passed to the
|
are converted to 16-bit or 32-bit format before being passed to the li-
|
||||||
library functions. Results are converted back to 8-bit code units for
|
brary functions. Results are converted back to 8-bit code units for
|
||||||
output.
|
output.
|
||||||
|
|
||||||
In the rest of this document, the names of library functions and struc-
|
In the rest of this document, the names of library functions and struc-
|
||||||
tures are given in generic form, for example, pcre_compile(). The
|
tures are given in generic form, for example, pcre_compile(). The ac-
|
||||||
actual names used in the libraries have a suffix _8, _16, or _32, as
|
tual names used in the libraries have a suffix _8, _16, or _32, as ap-
|
||||||
appropriate.
|
propriate.
|
||||||
|
|
||||||
|
|
||||||
INPUT ENCODING
|
INPUT ENCODING
|
||||||
|
@ -70,18 +70,18 @@ INPUT ENCODING
|
||||||
processed for backslash escapes, which makes it possible to include any
|
processed for backslash escapes, which makes it possible to include any
|
||||||
data value in strings that are passed to the library for matching. For
|
data value in strings that are passed to the library for matching. For
|
||||||
patterns, there is a facility for specifying some or all of the 8-bit
|
patterns, there is a facility for specifying some or all of the 8-bit
|
||||||
input characters as hexadecimal pairs, which makes it possible to
|
input characters as hexadecimal pairs, which makes it possible to in-
|
||||||
include binary zeros.
|
clude binary zeros.
|
||||||
|
|
||||||
Input for the 16-bit and 32-bit libraries
|
Input for the 16-bit and 32-bit libraries
|
||||||
|
|
||||||
When testing the 16-bit or 32-bit libraries, there is a need to be able
|
When testing the 16-bit or 32-bit libraries, there is a need to be able
|
||||||
to generate character code points greater than 255 in the strings that
|
to generate character code points greater than 255 in the strings that
|
||||||
are passed to the library. For subject lines, backslash escapes can be
|
are passed to the library. For subject lines, backslash escapes can be
|
||||||
used. In addition, when the utf modifier (see "Setting compilation
|
used. In addition, when the utf modifier (see "Setting compilation op-
|
||||||
options" below) is set, the pattern and any following subject lines are
|
tions" below) is set, the pattern and any following subject lines are
|
||||||
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as
|
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
|
||||||
appropriate.
|
propriate.
|
||||||
|
|
||||||
For non-UTF testing of wide characters, the utf8_input modifier can be
|
For non-UTF testing of wide characters, the utf8_input modifier can be
|
||||||
used. This is mutually exclusive with utf, and is allowed only in
|
used. This is mutually exclusive with utf, and is allowed only in
|
||||||
|
@ -121,8 +121,8 @@ COMMAND LINE OPTIONS
|
||||||
piled.
|
piled.
|
||||||
|
|
||||||
-AC As for -ac, but in addition behave as if each subject line
|
-AC As for -ac, but in addition behave as if each subject line
|
||||||
has the callout_extra modifier, that is, show additional
|
has the callout_extra modifier, that is, show additional in-
|
||||||
information from callouts.
|
formation from callouts.
|
||||||
|
|
||||||
-b Behave as if each pattern has the fullbincode modifier; the
|
-b Behave as if each pattern has the fullbincode modifier; the
|
||||||
full internal binary form of the pattern is output after com-
|
full internal binary form of the pattern is output after com-
|
||||||
|
@ -130,9 +130,9 @@ COMMAND LINE OPTIONS
|
||||||
|
|
||||||
-C Output the version number of the PCRE2 library, and all
|
-C Output the version number of the PCRE2 library, and all
|
||||||
available information about the optional features that are
|
available information about the optional features that are
|
||||||
included, and then exit with zero exit code. All other
|
included, and then exit with zero exit code. All other op-
|
||||||
options are ignored. If both -C and -LM are present, which-
|
tions are ignored. If both -C and -LM are present, whichever
|
||||||
ever is first is recognized.
|
is first is recognized.
|
||||||
|
|
||||||
-C option Output information about a specific build-time option, then
|
-C option Output information about a specific build-time option, then
|
||||||
exit. This functionality is intended for use in scripts such
|
exit. This functionality is intended for use in scripts such
|
||||||
|
@ -269,8 +269,8 @@ DESCRIPTION
|
||||||
supply them explicitly.
|
supply them explicitly.
|
||||||
|
|
||||||
An empty line or the end of the file signals the end of the subject
|
An empty line or the end of the file signals the end of the subject
|
||||||
lines for a test, at which point a new pattern or command line is
|
lines for a test, at which point a new pattern or command line is ex-
|
||||||
expected if there is still input to be read.
|
pected if there is still input to be read.
|
||||||
|
|
||||||
|
|
||||||
COMMAND LINES
|
COMMAND LINES
|
||||||
|
@ -311,8 +311,8 @@ COMMAND LINES
|
||||||
as indicating a newline in a pattern or subject string. The default can
|
as indicating a newline in a pattern or subject string. The default can
|
||||||
be overridden when a pattern is compiled. The standard test files con-
|
be overridden when a pattern is compiled. The standard test files con-
|
||||||
tain tests of various newline conventions, but the majority of the
|
tain tests of various newline conventions, but the majority of the
|
||||||
tests expect a single linefeed to be recognized as a newline by
|
tests expect a single linefeed to be recognized as a newline by de-
|
||||||
default. Without special action the tests would fail when PCRE2 is com-
|
fault. Without special action the tests would fail when PCRE2 is com-
|
||||||
piled with either CR or CRLF as the default newline.
|
piled with either CR or CRLF as the default newline.
|
||||||
|
|
||||||
The #newline_default command specifies a list of newline types that are
|
The #newline_default command specifies a list of newline types that are
|
||||||
|
@ -323,14 +323,14 @@ COMMAND LINES
|
||||||
|
|
||||||
If the default newline is in the list, this command has no effect. Oth-
|
If the default newline is in the list, this command has no effect. Oth-
|
||||||
erwise, except when testing the POSIX API, a newline modifier that
|
erwise, except when testing the POSIX API, a newline modifier that
|
||||||
specifies the first newline convention in the list (LF in the above
|
specifies the first newline convention in the list (LF in the above ex-
|
||||||
example) is added to any pattern that does not already have a newline
|
ample) is added to any pattern that does not already have a newline
|
||||||
modifier. If the newline list is empty, the feature is turned off. This
|
modifier. If the newline list is empty, the feature is turned off. This
|
||||||
command is present in a number of the standard test input files.
|
command is present in a number of the standard test input files.
|
||||||
|
|
||||||
When the POSIX API is being tested there is no way to override the
|
When the POSIX API is being tested there is no way to override the de-
|
||||||
default newline convention, though it is possible to set the newline
|
fault newline convention, though it is possible to set the newline con-
|
||||||
convention from within the pattern. A warning is given if the posix or
|
vention from within the pattern. A warning is given if the posix or
|
||||||
posix_nosub modifier is used when #newline_default would set a default
|
posix_nosub modifier is used when #newline_default would set a default
|
||||||
for the non-POSIX API.
|
for the non-POSIX API.
|
||||||
|
|
||||||
|
@ -344,8 +344,8 @@ COMMAND LINES
|
||||||
The appearance of this line causes all subsequent modifier settings to
|
The appearance of this line causes all subsequent modifier settings to
|
||||||
be checked for compatibility with the perltest.sh script, which is used
|
be checked for compatibility with the perltest.sh script, which is used
|
||||||
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
||||||
comment lines, #pattern commands, and #subject commands that set or
|
comment lines, #pattern commands, and #subject commands that set or un-
|
||||||
unset "mark", no command lines are permitted, because they and many of
|
set "mark", no command lines are permitted, because they and many of
|
||||||
the modifiers are specific to pcre2test, and should not be used in test
|
the modifiers are specific to pcre2test, and should not be used in test
|
||||||
files that are also processed by perltest.sh. The #perltest command
|
files that are also processed by perltest.sh. The #perltest command
|
||||||
helps detect tests that are accidentally put in the wrong file.
|
helps detect tests that are accidentally put in the wrong file.
|
||||||
|
@ -376,8 +376,8 @@ MODIFIER SYNTAX
|
||||||
list are separated by commas followed by optional white space. Trailing
|
list are separated by commas followed by optional white space. Trailing
|
||||||
whitespace in a modifier list is ignored. Some modifiers may be given
|
whitespace in a modifier list is ignored. Some modifiers may be given
|
||||||
for both patterns and subject lines, whereas others are valid only for
|
for both patterns and subject lines, whereas others are valid only for
|
||||||
one or the other. Each modifier has a long name, for example
|
one or the other. Each modifier has a long name, for example "an-
|
||||||
"anchored", and some of them must be followed by an equals sign and a
|
chored", and some of them must be followed by an equals sign and a
|
||||||
value, for example, "offset=12". Values cannot contain comma charac-
|
value, for example, "offset=12". Values cannot contain comma charac-
|
||||||
ters, but may contain spaces. Modifiers that do not take values may be
|
ters, but may contain spaces. Modifiers that do not take values may be
|
||||||
preceded by a minus sign to turn off a previous setting.
|
preceded by a minus sign to turn off a previous setting.
|
||||||
|
@ -498,8 +498,8 @@ SUBJECT LINE SYNTAX
|
||||||
\= This is a comment.
|
\= This is a comment.
|
||||||
abc\= This is an invalid modifier list.
|
abc\= This is an invalid modifier list.
|
||||||
|
|
||||||
A backslash followed by any other non-alphanumeric character just
|
A backslash followed by any other non-alphanumeric character just es-
|
||||||
escapes that character. A backslash followed by anything else causes an
|
capes that character. A backslash followed by anything else causes an
|
||||||
error. However, if the very last character in the line is a backslash
|
error. However, if the very last character in the line is a backslash
|
||||||
(and there is no modifier list), it is ignored. This gives a way of
|
(and there is no modifier list), it is ignored. This gives a way of
|
||||||
passing an empty line as data, since a real empty line terminates the
|
passing an empty line as data, since a real empty line terminates the
|
||||||
|
@ -523,13 +523,13 @@ PATTERN MODIFIERS
|
||||||
The following modifiers set options for pcre2_compile(). Most of them
|
The following modifiers set options for pcre2_compile(). Most of them
|
||||||
set bits in the options argument of that function, but those whose
|
set bits in the options argument of that function, but those whose
|
||||||
names start with PCRE2_EXTRA are additional options that are set in the
|
names start with PCRE2_EXTRA are additional options that are set in the
|
||||||
compile context. For the main options, there are some single-letter
|
compile context. For the main options, there are some single-letter ab-
|
||||||
abbreviations that are the same as Perl options. There is special han-
|
breviations that are the same as Perl options. There is special han-
|
||||||
dling for /x: if a second x is present, PCRE2_EXTENDED is converted
|
dling for /x: if a second x is present, PCRE2_EXTENDED is converted
|
||||||
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds
|
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
|
||||||
PCRE2_EXTENDED as well, though this makes no difference to the way
|
TENDED as well, though this makes no difference to the way pcre2_com-
|
||||||
pcre2_compile() behaves. See pcre2api for a description of the effects
|
pile() behaves. See pcre2api for a description of the effects of these
|
||||||
of these options.
|
options.
|
||||||
|
|
||||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
|
@ -577,9 +577,9 @@ PATTERN MODIFIERS
|
||||||
|
|
||||||
Setting compilation controls
|
Setting compilation controls
|
||||||
|
|
||||||
The following modifiers affect the compilation process or request
|
The following modifiers affect the compilation process or request in-
|
||||||
information about the pattern. There are single-letter abbreviations
|
formation about the pattern. There are single-letter abbreviations for
|
||||||
for some that are heavily used in the test files.
|
some that are heavily used in the test files.
|
||||||
|
|
||||||
bsr=[anycrlf|unicode] specify \R handling
|
bsr=[anycrlf|unicode] specify \R handling
|
||||||
/B bincode show binary code without lengths
|
/B bincode show binary code without lengths
|
||||||
|
@ -717,8 +717,8 @@ PATTERN MODIFIERS
|
||||||
minated strings but can be passed by length instead of being zero-ter-
|
minated strings but can be passed by length instead of being zero-ter-
|
||||||
minated. The use_length modifier causes this to happen. Using a length
|
minated. The use_length modifier causes this to happen. Using a length
|
||||||
happens automatically (whether or not use_length is set) when hex is
|
happens automatically (whether or not use_length is set) when hex is
|
||||||
set, because patterns specified in hexadecimal may contain binary
|
set, because patterns specified in hexadecimal may contain binary ze-
|
||||||
zeros.
|
ros.
|
||||||
|
|
||||||
If hex or use_length is used with the POSIX wrapper API (see "Using the
|
If hex or use_length is used with the POSIX wrapper API (see "Using the
|
||||||
POSIX wrapper API" below), the REG_PEND extension is used to pass the
|
POSIX wrapper API" below), the REG_PEND extension is used to pass the
|
||||||
|
@ -770,8 +770,8 @@ PATTERN MODIFIERS
|
||||||
partial modifier in "Subject Modifiers" below for details of how these
|
partial modifier in "Subject Modifiers" below for details of how these
|
||||||
options are specified for each match attempt.
|
options are specified for each match attempt.
|
||||||
|
|
||||||
JIT compilation is requested by the jit pattern modifier, which may
|
JIT compilation is requested by the jit pattern modifier, which may op-
|
||||||
optionally be followed by an equals sign and a number in the range 0 to
|
tionally be followed by an equals sign and a number in the range 0 to
|
||||||
7. The three bits that make up the number specify which of the three
|
7. The three bits that make up the number specify which of the three
|
||||||
JIT operating modes are to be compiled:
|
JIT operating modes are to be compiled:
|
||||||
|
|
||||||
|
@ -799,8 +799,8 @@ PATTERN MODIFIERS
|
||||||
none was compiled for non-partial matching.
|
none was compiled for non-partial matching.
|
||||||
|
|
||||||
If JIT compilation is successful, the compiled JIT code will automati-
|
If JIT compilation is successful, the compiled JIT code will automati-
|
||||||
cally be used when an appropriate type of match is run, except when
|
cally be used when an appropriate type of match is run, except when in-
|
||||||
incompatible run-time options are specified. For more details, see the
|
compatible run-time options are specified. For more details, see the
|
||||||
pcre2jit documentation. See also the jitstack modifier below for a way
|
pcre2jit documentation. See also the jitstack modifier below for a way
|
||||||
of setting the size of the JIT stack.
|
of setting the size of the JIT stack.
|
||||||
|
|
||||||
|
@ -847,8 +847,8 @@ PATTERN MODIFIERS
|
||||||
Limiting nested parentheses
|
Limiting nested parentheses
|
||||||
|
|
||||||
The parens_nest_limit modifier sets a limit on the depth of nested
|
The parens_nest_limit modifier sets a limit on the depth of nested
|
||||||
parentheses in a pattern. Breaching the limit causes a compilation
|
parentheses in a pattern. Breaching the limit causes a compilation er-
|
||||||
error. The default for the library is set when PCRE2 is built, but
|
ror. The default for the library is set when PCRE2 is built, but
|
||||||
pcre2test sets its own default of 220, which is required for running
|
pcre2test sets its own default of 220, which is required for running
|
||||||
the standard test suite.
|
the standard test suite.
|
||||||
|
|
||||||
|
@ -886,13 +886,13 @@ PATTERN MODIFIERS
|
||||||
buffer is too small for the error message. If this modifier has not
|
buffer is too small for the error message. If this modifier has not
|
||||||
been set, a large buffer is used.
|
been set, a large buffer is used.
|
||||||
|
|
||||||
The aftertext and allaftertext subject modifiers work as described
|
The aftertext and allaftertext subject modifiers work as described be-
|
||||||
below. All other modifiers are either ignored, with a warning message,
|
low. All other modifiers are either ignored, with a warning message, or
|
||||||
or cause an error.
|
cause an error.
|
||||||
|
|
||||||
The pattern is passed to regcomp() as a zero-terminated string by
|
The pattern is passed to regcomp() as a zero-terminated string by de-
|
||||||
default, but if the use_length or hex modifiers are set, the REG_PEND
|
fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
|
||||||
extension is used to pass it by length.
|
tension is used to pass it by length.
|
||||||
|
|
||||||
Testing the stack guard feature
|
Testing the stack guard feature
|
||||||
|
|
||||||
|
@ -920,8 +920,8 @@ PATTERN MODIFIERS
|
||||||
2 a set of tables defining ISO 8859 characters
|
2 a set of tables defining ISO 8859 characters
|
||||||
|
|
||||||
In table 2, some characters whose codes are greater than 128 are iden-
|
In table 2, some characters whose codes are greater than 128 are iden-
|
||||||
tified as letters, digits, spaces, etc. Setting alternate character
|
tified as letters, digits, spaces, etc. Setting alternate character ta-
|
||||||
tables and a locale are mutually exclusive.
|
bles and a locale are mutually exclusive.
|
||||||
|
|
||||||
Setting certain match controls
|
Setting certain match controls
|
||||||
|
|
||||||
|
@ -971,12 +971,12 @@ PATTERN MODIFIERS
|
||||||
terns" below. If pushcopy is used instead of push, a copy of the com-
|
terns" below. If pushcopy is used instead of push, a copy of the com-
|
||||||
piled pattern is stacked, leaving the original as current, ready to
|
piled pattern is stacked, leaving the original as current, ready to
|
||||||
match the following input lines. This provides a way of testing the
|
match the following input lines. This provides a way of testing the
|
||||||
pcre2_code_copy() function. The push and pushcopy modifiers are
|
pcre2_code_copy() function. The push and pushcopy modifiers are in-
|
||||||
incompatible with compilation modifiers such as global that act at
|
compatible with compilation modifiers such as global that act at match
|
||||||
match time. Any that are specified are ignored (for the stacked copy),
|
time. Any that are specified are ignored (for the stacked copy), with a
|
||||||
with a warning message, except for replace, which causes an error. Note
|
warning message, except for replace, which causes an error. Note that
|
||||||
that jitverify, which is allowed, does not carry through to any subse-
|
jitverify, which is allowed, does not carry through to any subsequent
|
||||||
quent matching that uses a stacked pattern.
|
matching that uses a stacked pattern.
|
||||||
|
|
||||||
Testing foreign pattern conversion
|
Testing foreign pattern conversion
|
||||||
|
|
||||||
|
@ -1124,12 +1124,12 @@ SUBJECT MODIFIERS
|
||||||
The allusedtext modifier requests that all the text that was consulted
|
The allusedtext modifier requests that all the text that was consulted
|
||||||
during a successful pattern match by the interpreter should be shown.
|
during a successful pattern match by the interpreter should be shown.
|
||||||
This feature is not supported for JIT matching, and if requested with
|
This feature is not supported for JIT matching, and if requested with
|
||||||
JIT it is ignored (with a warning message). Setting this modifier
|
JIT it is ignored (with a warning message). Setting this modifier af-
|
||||||
affects the output if there is a lookbehind at the start of a match, or
|
fects the output if there is a lookbehind at the start of a match, or a
|
||||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
lookahead at the end, or if \K is used in the pattern. Characters that
|
||||||
that precede or follow the start and end of the actual match are indi-
|
precede or follow the start and end of the actual match are indicated
|
||||||
cated in the output by '<' or '>' characters underneath them. Here is
|
in the output by '<' or '>' characters underneath them. Here is an ex-
|
||||||
an example:
|
ample:
|
||||||
|
|
||||||
re> /(?<=pqr)abc(?=xyz)/
|
re> /(?<=pqr)abc(?=xyz)/
|
||||||
data> 123pqrabcxyz456\=allusedtext
|
data> 123pqrabcxyz456\=allusedtext
|
||||||
|
@ -1145,8 +1145,8 @@ SUBJECT MODIFIERS
|
||||||
string. The only time when this occurs is when \K has been processed as
|
string. The only time when this occurs is when \K has been processed as
|
||||||
part of the match. In this situation, the output for the matched string
|
part of the match. In this situation, the output for the matched string
|
||||||
is displayed from the starting character instead of from the match
|
is displayed from the starting character instead of from the match
|
||||||
point, with circumflex characters under the earlier characters. For
|
point, with circumflex characters under the earlier characters. For ex-
|
||||||
example:
|
ample:
|
||||||
|
|
||||||
re> /abc\Kxyz/
|
re> /abc\Kxyz/
|
||||||
data> abcxyz\=startchar
|
data> abcxyz\=startchar
|
||||||
|
@ -1171,12 +1171,12 @@ SUBJECT MODIFIERS
|
||||||
The allvector modifier requests that the entire ovector be shown, what-
|
The allvector modifier requests that the entire ovector be shown, what-
|
||||||
ever the outcome of the match. Compare allcaptures, which shows only up
|
ever the outcome of the match. Compare allcaptures, which shows only up
|
||||||
to the maximum number of capture groups for the pattern, and then only
|
to the maximum number of capture groups for the pattern, and then only
|
||||||
for a successful complete non-DFA match. This modifier, which acts
|
for a successful complete non-DFA match. This modifier, which acts af-
|
||||||
after any match result, and also for DFA matching, provides a means of
|
ter any match result, and also for DFA matching, provides a means of
|
||||||
checking that there are no unexpected modifications to ovector fields.
|
checking that there are no unexpected modifications to ovector fields.
|
||||||
Before each match attempt, the ovector is filled with a special value,
|
Before each match attempt, the ovector is filled with a special value,
|
||||||
and if this is found in both elements of a capturing pair,
|
and if this is found in both elements of a capturing pair, "<un-
|
||||||
"<unchanged>" is output. After a successful match, this applies to all
|
changed>" is output. After a successful match, this applies to all
|
||||||
groups after the maximum capture group for the pattern. In other cases
|
groups after the maximum capture group for the pattern. In other cases
|
||||||
it applies to the entire ovector. After a partial match, the first two
|
it applies to the entire ovector. After a partial match, the first two
|
||||||
elements are the only ones that should be set. After a DFA match, the
|
elements are the only ones that should be set. After a DFA match, the
|
||||||
|
@ -1207,12 +1207,12 @@ SUBJECT MODIFIERS
|
||||||
If an empty string is matched, the next match is done with the
|
If an empty string is matched, the next match is done with the
|
||||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||||
for another, non-empty, match at the same point in the subject. If this
|
for another, non-empty, match at the same point in the subject. If this
|
||||||
match fails, the start offset is advanced, and the normal match is
|
match fails, the start offset is advanced, and the normal match is re-
|
||||||
retried. This imitates the way Perl handles such cases when using the
|
tried. This imitates the way Perl handles such cases when using the /g
|
||||||
/g modifier or the split() function. Normally, the start offset is
|
modifier or the split() function. Normally, the start offset is ad-
|
||||||
advanced by one character, but if the newline convention recognizes
|
vanced by one character, but if the newline convention recognizes CRLF
|
||||||
CRLF as a newline, and the current character is CR followed by LF, an
|
as a newline, and the current character is CR followed by LF, an ad-
|
||||||
advance of two characters occurs.
|
vance of two characters occurs.
|
||||||
|
|
||||||
Testing substring extraction functions
|
Testing substring extraction functions
|
||||||
|
|
||||||
|
@ -1275,8 +1275,8 @@ SUBJECT MODIFIERS
|
||||||
than 256 characters) for substitution tests, as fixed-size buffers are
|
than 256 characters) for substitution tests, as fixed-size buffers are
|
||||||
used. To make it easy to test for buffer overflow, if the replacement
|
used. To make it easy to test for buffer overflow, if the replacement
|
||||||
string starts with a number in square brackets, that number is passed
|
string starts with a number in square brackets, that number is passed
|
||||||
to pcre2_substitute() as the size of the output buffer, with the
|
to pcre2_substitute() as the size of the output buffer, with the re-
|
||||||
replacement string starting at the next character. Here is an example
|
placement string starting at the next character. Here is an example
|
||||||
that tests the edge case:
|
that tests the edge case:
|
||||||
|
|
||||||
/abc/
|
/abc/
|
||||||
|
@ -1285,10 +1285,10 @@ SUBJECT MODIFIERS
|
||||||
123abc123\=replace=[9]XYZ
|
123abc123\=replace=[9]XYZ
|
||||||
Failed: error -47: no more memory
|
Failed: error -47: no more memory
|
||||||
|
|
||||||
The default action of pcre2_substitute() is to return
|
The default action of pcre2_substitute() is to return PCRE2_ER-
|
||||||
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
|
ROR_NOMEMORY when the output buffer is too small. However, if the
|
||||||
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
|
||||||
stitute_overflow_length modifier), pcre2_substitute() continues to go
|
tute_overflow_length modifier), pcre2_substitute() continues to go
|
||||||
through the motions of matching and substituting (but not doing any
|
through the motions of matching and substituting (but not doing any
|
||||||
callouts), in order to compute the size of buffer that is required.
|
callouts), in order to compute the size of buffer that is required.
|
||||||
When this happens, pcre2test shows the required buffer length (which
|
When this happens, pcre2test shows the required buffer length (which
|
||||||
|
@ -1323,8 +1323,8 @@ SUBJECT MODIFIERS
|
||||||
Then are listed the offsets of the old substring, its contents, and the
|
Then are listed the offsets of the old substring, its contents, and the
|
||||||
same for the replacement.
|
same for the replacement.
|
||||||
|
|
||||||
By default, the substitution callout function returns zero, which
|
By default, the substitution callout function returns zero, which ac-
|
||||||
accepts the replacement and causes matching to continue if /g was used.
|
cepts the replacement and causes matching to continue if /g was used.
|
||||||
Two further modifiers can be used to test other return values. If sub-
|
Two further modifiers can be used to test other return values. If sub-
|
||||||
stitute_skip is set to a value greater than zero the callout function
|
stitute_skip is set to a value greater than zero the callout function
|
||||||
returns +1 for the match of that number, and similarly substitute_stop
|
returns +1 for the match of that number, and similarly substitute_stop
|
||||||
|
@ -1411,8 +1411,8 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||||
ory allocation and freeing calls that occur during a call to
|
ory allocation and freeing calls that occur during a call to
|
||||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match
|
pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
|
||||||
requires a bigger vector than the default for remembering backtracking
|
quires a bigger vector than the default for remembering backtracking
|
||||||
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
||||||
In many cases there will be no heap memory used and therefore no addi-
|
In many cases there will be no heap memory used and therefore no addi-
|
||||||
tional output. No heap memory is allocated during matching with JIT, so
|
tional output. No heap memory is allocated during matching with JIT, so
|
||||||
|
@ -1435,9 +1435,9 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
Setting the size of the output vector
|
Setting the size of the output vector
|
||||||
|
|
||||||
The ovector modifier applies only to the subject line in which it
|
The ovector modifier applies only to the subject line in which it ap-
|
||||||
appears, though of course it can also be used to set a default in a
|
pears, though of course it can also be used to set a default in a #sub-
|
||||||
#subject command. It specifies the number of pairs of offsets that are
|
ject command. It specifies the number of pairs of offsets that are
|
||||||
available for storing matching information. The default is 15.
|
available for storing matching information. The default is 15.
|
||||||
|
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
|
@ -1491,12 +1491,12 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
|
|
||||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||||
strings, starting with number 0 for the string that matched the whole
|
strings, starting with number 0 for the string that matched the whole
|
||||||
pattern. Otherwise, it outputs "No match" when the return is
|
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
|
||||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
ROR_NOMATCH, or "Partial match:" followed by the partially matching
|
||||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
|
||||||
this is the entire substring that was inspected during the partial
|
the entire substring that was inspected during the partial match; it
|
||||||
match; it may include characters before the actual match start if a
|
may include characters before the actual match start if a lookbehind
|
||||||
lookbehind assertion, \K, \b, or \B was involved.)
|
assertion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||||
and a short descriptive phrase. If the error is a failed UTF string
|
and a short descriptive phrase. If the error is a failed UTF string
|
||||||
|
@ -1541,8 +1541,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: cat
|
0: cat
|
||||||
0+ aract
|
0+ aract
|
||||||
|
|
||||||
If global matching is requested, the results of successive matching
|
If global matching is requested, the results of successive matching at-
|
||||||
attempts are output in sequence, like this:
|
tempts are output in sequence, like this:
|
||||||
|
|
||||||
re> /\Bi(\w\w)/g
|
re> /\Bi(\w\w)/g
|
||||||
data> Mississippi
|
data> Mississippi
|
||||||
|
@ -1580,12 +1580,12 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
2: tan
|
2: tan
|
||||||
|
|
||||||
Using the normal matching function on this data finds only "tang". The
|
Using the normal matching function on this data finds only "tang". The
|
||||||
longest matching string is always given first (and numbered zero).
|
longest matching string is always given first (and numbered zero). Af-
|
||||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
|
||||||
followed by the partially matching substring. Note that this is the
|
lowed by the partially matching substring. Note that this is the entire
|
||||||
entire substring that was inspected during the partial match; it may
|
substring that was inspected during the partial match; it may include
|
||||||
include characters before the actual match start if a lookbehind asser-
|
characters before the actual match start if a lookbehind assertion, \b,
|
||||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
or \B was involved. (\K is not supported for DFA matching.)
|
||||||
|
|
||||||
If global matching is requested, the search for further matches resumes
|
If global matching is requested, the search for further matches resumes
|
||||||
at the end of the longest match. For example:
|
at the end of the longest match. For example:
|
||||||
|
@ -1638,12 +1638,12 @@ CALLOUTS
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match
|
This output indicates that callout number 0 occurred for a match at-
|
||||||
attempt starting at the fourth character of the subject string, when
|
tempt starting at the fourth character of the subject string, when the
|
||||||
the pointer was at the seventh character, and when the next pattern
|
pointer was at the seventh character, and when the next pattern item
|
||||||
item was \d. Just one circumflex is output if the start and current
|
was \d. Just one circumflex is output if the start and current posi-
|
||||||
positions are the same, or if the current position precedes the start
|
tions are the same, or if the current position precedes the start posi-
|
||||||
position, which can happen if the callout is in a lookbehind assertion.
|
tion, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the auto_callout pattern modifier. In this case, instead of
|
a result of the auto_callout pattern modifier. In this case, instead of
|
||||||
|
@ -1660,8 +1660,8 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For ex-
|
||||||
example:
|
ample:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
data> abc
|
data> abc
|
||||||
|
@ -1683,8 +1683,8 @@ CALLOUTS
|
||||||
|
|
||||||
The output for a callout with a string argument is similar, except that
|
The output for a callout with a string argument is similar, except that
|
||||||
instead of outputting a callout number before the position indicators,
|
instead of outputting a callout number before the position indicators,
|
||||||
the callout string and its offset in the pattern string are output
|
the callout string and its offset in the pattern string are output be-
|
||||||
before the reflection of the subject string, and the subject string is
|
fore the reflection of the subject string, and the subject string is
|
||||||
reflected for each callout. For example:
|
reflected for each callout. For example:
|
||||||
|
|
||||||
re> /^ab(?C'first')cd(?C"second")ef/
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -1800,9 +1800,9 @@ NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the locale modifier). In this case, the
|
set for the pattern (using the locale modifier). In this case, the is-
|
||||||
isprint() function is used to distinguish printing and non-printing
|
print() function is used to distinguish printing and non-printing char-
|
||||||
characters.
|
acters.
|
||||||
|
|
||||||
|
|
||||||
SAVING AND RESTORING COMPILED PATTERNS
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
@ -1814,14 +1814,14 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
compiled patterns can be saved they must be serialized, that is, con-
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
verted to a stream of bytes. A single byte stream may contain any num-
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
ber of compiled patterns, but they must all use the same character
|
ber of compiled patterns, but they must all use the same character ta-
|
||||||
tables. A single copy of the tables is included in the byte stream (its
|
bles. A single copy of the tables is included in the byte stream (its
|
||||||
size is 1088 bytes).
|
size is 1088 bytes).
|
||||||
|
|
||||||
The functions whose names begin with pcre2_serialize_ are used for
|
The functions whose names begin with pcre2_serialize_ are used for se-
|
||||||
serializing and de-serializing. They are described in the pcre2serial-
|
rializing and de-serializing. They are described in the pcre2serialize
|
||||||
ize documentation. In this section we describe the features of
|
documentation. In this section we describe the features of pcre2test
|
||||||
pcre2test that can be used to test these functions.
|
that can be used to test these functions.
|
||||||
|
|
||||||
Note that "serialization" in PCRE2 does not convert compiled patterns
|
Note that "serialization" in PCRE2 does not convert compiled patterns
|
||||||
to an abstract format like Java or .NET. It just makes a reloadable
|
to an abstract format like Java or .NET. It just makes a reloadable
|
||||||
|
@ -1831,8 +1831,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
||||||
expects the next line to contain a new pattern (or command) instead of
|
expects the next line to contain a new pattern (or command) instead of
|
||||||
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
||||||
compiled pattern to be stacked, leaving the original available for
|
compiled pattern to be stacked, leaving the original available for im-
|
||||||
immediate matching. By using push and/or pushcopy, a number of patterns
|
mediate matching. By using push and/or pushcopy, a number of patterns
|
||||||
can be compiled and retained. These modifiers are incompatible with
|
can be compiled and retained. These modifiers are incompatible with
|
||||||
posix, and control modifiers that act at match time are ignored (with a
|
posix, and control modifiers that act at match time are ignored (with a
|
||||||
message) for the stacked patterns. The jitverify modifier applies only
|
message) for the stacked patterns. The jitverify modifier applies only
|
||||||
|
@ -1855,8 +1855,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
matched with the pattern, terminated as usual by an empty line or end
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
of file. This command may be followed by a modifier list containing
|
of file. This command may be followed by a modifier list containing
|
||||||
only control modifiers that act after a pattern has been compiled. In
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
particular, hex, posix, posix_nosub, push, and pushcopy are not al-
|
||||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
lowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||||
however permitted. Here is an example that saves and reloads two pat-
|
however permitted. Here is an example that saves and reloads two pat-
|
||||||
terns.
|
terns.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue