Documentation update.

This commit is contained in:
Philip.Hazel 2019-06-22 16:36:15 +00:00
parent a89423624d
commit c6ee84317d
6 changed files with 2699 additions and 2715 deletions

View File

@ -3525,9 +3525,10 @@ first match attempt, the second attempt would start at the second character
instead of skipping on to "c". instead of skipping on to "c".
</P> </P>
<P> <P>
If (*SKIP) is used inside a lookbehind to specify a new starting position that If (*SKIP) is used to specify a new starting position that is the same as the
is not later than the starting point of the current match, the position starting position of the current match, or (by being inside a lookbehind)
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs. earlier, the position specified by (*SKIP) is ignored, and instead the normal
"bumpalong" occurs.
<pre> <pre>
(*SKIP:NAME) (*SKIP:NAME)
</pre> </pre>
@ -3754,7 +3755,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br> <br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 June 2019 Last updated: 22 June 2019
<br> <br>
Copyright &copy; 1997-2019 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>

View File

@ -16,8 +16,8 @@ DESCRIPTION
pcre2-config returns the configuration of the installed PCRE2 libraries pcre2-config returns the configuration of the installed PCRE2 libraries
and the options required to compile a program to use them. Some of the and the options required to compile a program to use them. Some of the
options apply only to the 8-bit, or 16-bit, or 32-bit libraries, options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re-
respectively, and are not available for libraries that have not been spectively, and are not available for libraries that have not been
built. If an unavailable option is encountered, the "usage" information built. If an unavailable option is encountered, the "usage" information
is output. is output.
@ -36,30 +36,30 @@ OPTIONS
--version Writes the version number of the installed PCRE2 libraries to --version Writes the version number of the installed PCRE2 libraries to
the standard output. the standard output.
--libs8 Writes to the standard output the command line options --libs8 Writes to the standard output the command line options re-
required to link with the 8-bit PCRE2 library (-lpcre2-8 on quired to link with the 8-bit PCRE2 library (-lpcre2-8 on
many systems). many systems).
--libs16 Writes to the standard output the command line options --libs16 Writes to the standard output the command line options re-
required to link with the 16-bit PCRE2 library (-lpcre2-16 on quired to link with the 16-bit PCRE2 library (-lpcre2-16 on
many systems). many systems).
--libs32 Writes to the standard output the command line options --libs32 Writes to the standard output the command line options re-
required to link with the 32-bit PCRE2 library (-lpcre2-32 on quired to link with the 32-bit PCRE2 library (-lpcre2-32 on
many systems). many systems).
--libs-posix --libs-posix
Writes to the standard output the command line options Writes to the standard output the command line options re-
required to link with PCRE2's POSIX API wrapper library quired to link with PCRE2's POSIX API wrapper library
(-lpcre2-posix -lpcre2-8 on many systems). (-lpcre2-posix -lpcre2-8 on many systems).
--cflags Writes to the standard output the command line options --cflags Writes to the standard output the command line options re-
required to compile files that use PCRE2 (this may include quired to compile files that use PCRE2 (this may include some
some -I options, but is blank on many systems). -I options, but is blank on many systems).
--cflags-posix --cflags-posix
Writes to the standard output the command line options Writes to the standard output the command line options re-
required to compile files that use PCRE2's POSIX API wrapper quired to compile files that use PCRE2's POSIX API wrapper
library (this may include some -I options, but is blank on library (this may include some -I options, but is blank on
many systems). many systems).

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34" .TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -3564,9 +3564,10 @@ effect as this example; although it would suppress backtracking during the
first match attempt, the second attempt would start at the second character first match attempt, the second attempt would start at the second character
instead of skipping on to "c". instead of skipping on to "c".
.P .P
If (*SKIP) is used inside a lookbehind to specify a new starting position that If (*SKIP) is used to specify a new starting position that is the same as the
is not later than the starting point of the current match, the position starting position of the current match, or (by being inside a lookbehind)
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs. earlier, the position specified by (*SKIP) is ignored, and instead the normal
"bumpalong" occurs.
.sp .sp
(*SKIP:NAME) (*SKIP:NAME)
.sp .sp
@ -3787,6 +3788,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 21 June 2019 Last updated: 22 June 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -13,8 +13,8 @@ SYNOPSIS
but it can also be used for experimenting with regular expressions. but it can also be used for experimenting with regular expressions.
This document describes the features of the test program; for details This document describes the features of the test program; for details
of the regular expressions themselves, see the pcre2pattern documenta- of the regular expressions themselves, see the pcre2pattern documenta-
tion. For details of the PCRE2 library function calls and their tion. For details of the PCRE2 library function calls and their op-
options, see the pcre2api documentation. tions, see the pcre2api documentation.
The input for pcre2test is a sequence of regular expression patterns The input for pcre2test is a sequence of regular expression patterns
and subject strings to be matched. There are also command lines for and subject strings to be matched. There are also command lines for
@ -33,26 +33,26 @@ SYNOPSIS
which are specifically designed for use in conjunction with the test which are specifically designed for use in conjunction with the test
script and data files that are distributed as part of PCRE2. All the script and data files that are distributed as part of PCRE2. All the
modifiers are documented here, some without much justification, but modifiers are documented here, some without much justification, but
many of them are unlikely to be of use except when testing the many of them are unlikely to be of use except when testing the li-
libraries. braries.
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
Different versions of the PCRE2 library can be built to support charac- Different versions of the PCRE2 library can be built to support charac-
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units. ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
One, two, or all three of these libraries may be simultaneously One, two, or all three of these libraries may be simultaneously in-
installed. The pcre2test program can be used to test all the libraries. stalled. The pcre2test program can be used to test all the libraries.
However, its own input and output are always in 8-bit format. When However, its own input and output are always in 8-bit format. When
testing the 16-bit or 32-bit libraries, patterns and subject strings testing the 16-bit or 32-bit libraries, patterns and subject strings
are converted to 16-bit or 32-bit format before being passed to the are converted to 16-bit or 32-bit format before being passed to the li-
library functions. Results are converted back to 8-bit code units for brary functions. Results are converted back to 8-bit code units for
output. output.
In the rest of this document, the names of library functions and struc- In the rest of this document, the names of library functions and struc-
tures are given in generic form, for example, pcre_compile(). The tures are given in generic form, for example, pcre_compile(). The ac-
actual names used in the libraries have a suffix _8, _16, or _32, as tual names used in the libraries have a suffix _8, _16, or _32, as ap-
appropriate. propriate.
INPUT ENCODING INPUT ENCODING
@ -70,18 +70,18 @@ INPUT ENCODING
processed for backslash escapes, which makes it possible to include any processed for backslash escapes, which makes it possible to include any
data value in strings that are passed to the library for matching. For data value in strings that are passed to the library for matching. For
patterns, there is a facility for specifying some or all of the 8-bit patterns, there is a facility for specifying some or all of the 8-bit
input characters as hexadecimal pairs, which makes it possible to input characters as hexadecimal pairs, which makes it possible to in-
include binary zeros. clude binary zeros.
Input for the 16-bit and 32-bit libraries Input for the 16-bit and 32-bit libraries
When testing the 16-bit or 32-bit libraries, there is a need to be able When testing the 16-bit or 32-bit libraries, there is a need to be able
to generate character code points greater than 255 in the strings that to generate character code points greater than 255 in the strings that
are passed to the library. For subject lines, backslash escapes can be are passed to the library. For subject lines, backslash escapes can be
used. In addition, when the utf modifier (see "Setting compilation used. In addition, when the utf modifier (see "Setting compilation op-
options" below) is set, the pattern and any following subject lines are tions" below) is set, the pattern and any following subject lines are
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
appropriate. propriate.
For non-UTF testing of wide characters, the utf8_input modifier can be For non-UTF testing of wide characters, the utf8_input modifier can be
used. This is mutually exclusive with utf, and is allowed only in used. This is mutually exclusive with utf, and is allowed only in
@ -121,8 +121,8 @@ COMMAND LINE OPTIONS
piled. piled.
-AC As for -ac, but in addition behave as if each subject line -AC As for -ac, but in addition behave as if each subject line
has the callout_extra modifier, that is, show additional has the callout_extra modifier, that is, show additional in-
information from callouts. formation from callouts.
-b Behave as if each pattern has the fullbincode modifier; the -b Behave as if each pattern has the fullbincode modifier; the
full internal binary form of the pattern is output after com- full internal binary form of the pattern is output after com-
@ -130,9 +130,9 @@ COMMAND LINE OPTIONS
-C Output the version number of the PCRE2 library, and all -C Output the version number of the PCRE2 library, and all
available information about the optional features that are available information about the optional features that are
included, and then exit with zero exit code. All other included, and then exit with zero exit code. All other op-
options are ignored. If both -C and -LM are present, which- tions are ignored. If both -C and -LM are present, whichever
ever is first is recognized. is first is recognized.
-C option Output information about a specific build-time option, then -C option Output information about a specific build-time option, then
exit. This functionality is intended for use in scripts such exit. This functionality is intended for use in scripts such
@ -269,8 +269,8 @@ DESCRIPTION
supply them explicitly. supply them explicitly.
An empty line or the end of the file signals the end of the subject An empty line or the end of the file signals the end of the subject
lines for a test, at which point a new pattern or command line is lines for a test, at which point a new pattern or command line is ex-
expected if there is still input to be read. pected if there is still input to be read.
COMMAND LINES COMMAND LINES
@ -311,8 +311,8 @@ COMMAND LINES
as indicating a newline in a pattern or subject string. The default can as indicating a newline in a pattern or subject string. The default can
be overridden when a pattern is compiled. The standard test files con- be overridden when a pattern is compiled. The standard test files con-
tain tests of various newline conventions, but the majority of the tain tests of various newline conventions, but the majority of the
tests expect a single linefeed to be recognized as a newline by tests expect a single linefeed to be recognized as a newline by de-
default. Without special action the tests would fail when PCRE2 is com- fault. Without special action the tests would fail when PCRE2 is com-
piled with either CR or CRLF as the default newline. piled with either CR or CRLF as the default newline.
The #newline_default command specifies a list of newline types that are The #newline_default command specifies a list of newline types that are
@ -323,14 +323,14 @@ COMMAND LINES
If the default newline is in the list, this command has no effect. Oth- If the default newline is in the list, this command has no effect. Oth-
erwise, except when testing the POSIX API, a newline modifier that erwise, except when testing the POSIX API, a newline modifier that
specifies the first newline convention in the list (LF in the above specifies the first newline convention in the list (LF in the above ex-
example) is added to any pattern that does not already have a newline ample) is added to any pattern that does not already have a newline
modifier. If the newline list is empty, the feature is turned off. This modifier. If the newline list is empty, the feature is turned off. This
command is present in a number of the standard test input files. command is present in a number of the standard test input files.
When the POSIX API is being tested there is no way to override the When the POSIX API is being tested there is no way to override the de-
default newline convention, though it is possible to set the newline fault newline convention, though it is possible to set the newline con-
convention from within the pattern. A warning is given if the posix or vention from within the pattern. A warning is given if the posix or
posix_nosub modifier is used when #newline_default would set a default posix_nosub modifier is used when #newline_default would set a default
for the non-POSIX API. for the non-POSIX API.
@ -344,8 +344,8 @@ COMMAND LINES
The appearance of this line causes all subsequent modifier settings to The appearance of this line causes all subsequent modifier settings to
be checked for compatibility with the perltest.sh script, which is used be checked for compatibility with the perltest.sh script, which is used
to confirm that Perl gives the same results as PCRE2. Also, apart from to confirm that Perl gives the same results as PCRE2. Also, apart from
comment lines, #pattern commands, and #subject commands that set or comment lines, #pattern commands, and #subject commands that set or un-
unset "mark", no command lines are permitted, because they and many of set "mark", no command lines are permitted, because they and many of
the modifiers are specific to pcre2test, and should not be used in test the modifiers are specific to pcre2test, and should not be used in test
files that are also processed by perltest.sh. The #perltest command files that are also processed by perltest.sh. The #perltest command
helps detect tests that are accidentally put in the wrong file. helps detect tests that are accidentally put in the wrong file.
@ -376,8 +376,8 @@ MODIFIER SYNTAX
list are separated by commas followed by optional white space. Trailing list are separated by commas followed by optional white space. Trailing
whitespace in a modifier list is ignored. Some modifiers may be given whitespace in a modifier list is ignored. Some modifiers may be given
for both patterns and subject lines, whereas others are valid only for for both patterns and subject lines, whereas others are valid only for
one or the other. Each modifier has a long name, for example one or the other. Each modifier has a long name, for example "an-
"anchored", and some of them must be followed by an equals sign and a chored", and some of them must be followed by an equals sign and a
value, for example, "offset=12". Values cannot contain comma charac- value, for example, "offset=12". Values cannot contain comma charac-
ters, but may contain spaces. Modifiers that do not take values may be ters, but may contain spaces. Modifiers that do not take values may be
preceded by a minus sign to turn off a previous setting. preceded by a minus sign to turn off a previous setting.
@ -498,8 +498,8 @@ SUBJECT LINE SYNTAX
\= This is a comment. \= This is a comment.
abc\= This is an invalid modifier list. abc\= This is an invalid modifier list.
A backslash followed by any other non-alphanumeric character just A backslash followed by any other non-alphanumeric character just es-
escapes that character. A backslash followed by anything else causes an capes that character. A backslash followed by anything else causes an
error. However, if the very last character in the line is a backslash error. However, if the very last character in the line is a backslash
(and there is no modifier list), it is ignored. This gives a way of (and there is no modifier list), it is ignored. This gives a way of
passing an empty line as data, since a real empty line terminates the passing an empty line as data, since a real empty line terminates the
@ -523,13 +523,13 @@ PATTERN MODIFIERS
The following modifiers set options for pcre2_compile(). Most of them The following modifiers set options for pcre2_compile(). Most of them
set bits in the options argument of that function, but those whose set bits in the options argument of that function, but those whose
names start with PCRE2_EXTRA are additional options that are set in the names start with PCRE2_EXTRA are additional options that are set in the
compile context. For the main options, there are some single-letter compile context. For the main options, there are some single-letter ab-
abbreviations that are the same as Perl options. There is special han- breviations that are the same as Perl options. There is special han-
dling for /x: if a second x is present, PCRE2_EXTENDED is converted dling for /x: if a second x is present, PCRE2_EXTENDED is converted
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
PCRE2_EXTENDED as well, though this makes no difference to the way TENDED as well, though this makes no difference to the way pcre2_com-
pcre2_compile() behaves. See pcre2api for a description of the effects pile() behaves. See pcre2api for a description of the effects of these
of these options. options.
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
@ -577,9 +577,9 @@ PATTERN MODIFIERS
Setting compilation controls Setting compilation controls
The following modifiers affect the compilation process or request The following modifiers affect the compilation process or request in-
information about the pattern. There are single-letter abbreviations formation about the pattern. There are single-letter abbreviations for
for some that are heavily used in the test files. some that are heavily used in the test files.
bsr=[anycrlf|unicode] specify \R handling bsr=[anycrlf|unicode] specify \R handling
/B bincode show binary code without lengths /B bincode show binary code without lengths
@ -717,8 +717,8 @@ PATTERN MODIFIERS
minated strings but can be passed by length instead of being zero-ter- minated strings but can be passed by length instead of being zero-ter-
minated. The use_length modifier causes this to happen. Using a length minated. The use_length modifier causes this to happen. Using a length
happens automatically (whether or not use_length is set) when hex is happens automatically (whether or not use_length is set) when hex is
set, because patterns specified in hexadecimal may contain binary set, because patterns specified in hexadecimal may contain binary ze-
zeros. ros.
If hex or use_length is used with the POSIX wrapper API (see "Using the If hex or use_length is used with the POSIX wrapper API (see "Using the
POSIX wrapper API" below), the REG_PEND extension is used to pass the POSIX wrapper API" below), the REG_PEND extension is used to pass the
@ -770,8 +770,8 @@ PATTERN MODIFIERS
partial modifier in "Subject Modifiers" below for details of how these partial modifier in "Subject Modifiers" below for details of how these
options are specified for each match attempt. options are specified for each match attempt.
JIT compilation is requested by the jit pattern modifier, which may JIT compilation is requested by the jit pattern modifier, which may op-
optionally be followed by an equals sign and a number in the range 0 to tionally be followed by an equals sign and a number in the range 0 to
7. The three bits that make up the number specify which of the three 7. The three bits that make up the number specify which of the three
JIT operating modes are to be compiled: JIT operating modes are to be compiled:
@ -799,8 +799,8 @@ PATTERN MODIFIERS
none was compiled for non-partial matching. none was compiled for non-partial matching.
If JIT compilation is successful, the compiled JIT code will automati- If JIT compilation is successful, the compiled JIT code will automati-
cally be used when an appropriate type of match is run, except when cally be used when an appropriate type of match is run, except when in-
incompatible run-time options are specified. For more details, see the compatible run-time options are specified. For more details, see the
pcre2jit documentation. See also the jitstack modifier below for a way pcre2jit documentation. See also the jitstack modifier below for a way
of setting the size of the JIT stack. of setting the size of the JIT stack.
@ -847,8 +847,8 @@ PATTERN MODIFIERS
Limiting nested parentheses Limiting nested parentheses
The parens_nest_limit modifier sets a limit on the depth of nested The parens_nest_limit modifier sets a limit on the depth of nested
parentheses in a pattern. Breaching the limit causes a compilation parentheses in a pattern. Breaching the limit causes a compilation er-
error. The default for the library is set when PCRE2 is built, but ror. The default for the library is set when PCRE2 is built, but
pcre2test sets its own default of 220, which is required for running pcre2test sets its own default of 220, which is required for running
the standard test suite. the standard test suite.
@ -886,13 +886,13 @@ PATTERN MODIFIERS
buffer is too small for the error message. If this modifier has not buffer is too small for the error message. If this modifier has not
been set, a large buffer is used. been set, a large buffer is used.
The aftertext and allaftertext subject modifiers work as described The aftertext and allaftertext subject modifiers work as described be-
below. All other modifiers are either ignored, with a warning message, low. All other modifiers are either ignored, with a warning message, or
or cause an error. cause an error.
The pattern is passed to regcomp() as a zero-terminated string by The pattern is passed to regcomp() as a zero-terminated string by de-
default, but if the use_length or hex modifiers are set, the REG_PEND fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
extension is used to pass it by length. tension is used to pass it by length.
Testing the stack guard feature Testing the stack guard feature
@ -920,8 +920,8 @@ PATTERN MODIFIERS
2 a set of tables defining ISO 8859 characters 2 a set of tables defining ISO 8859 characters
In table 2, some characters whose codes are greater than 128 are iden- In table 2, some characters whose codes are greater than 128 are iden-
tified as letters, digits, spaces, etc. Setting alternate character tified as letters, digits, spaces, etc. Setting alternate character ta-
tables and a locale are mutually exclusive. bles and a locale are mutually exclusive.
Setting certain match controls Setting certain match controls
@ -971,12 +971,12 @@ PATTERN MODIFIERS
terns" below. If pushcopy is used instead of push, a copy of the com- terns" below. If pushcopy is used instead of push, a copy of the com-
piled pattern is stacked, leaving the original as current, ready to piled pattern is stacked, leaving the original as current, ready to
match the following input lines. This provides a way of testing the match the following input lines. This provides a way of testing the
pcre2_code_copy() function. The push and pushcopy modifiers are pcre2_code_copy() function. The push and pushcopy modifiers are in-
incompatible with compilation modifiers such as global that act at compatible with compilation modifiers such as global that act at match
match time. Any that are specified are ignored (for the stacked copy), time. Any that are specified are ignored (for the stacked copy), with a
with a warning message, except for replace, which causes an error. Note warning message, except for replace, which causes an error. Note that
that jitverify, which is allowed, does not carry through to any subse- jitverify, which is allowed, does not carry through to any subsequent
quent matching that uses a stacked pattern. matching that uses a stacked pattern.
Testing foreign pattern conversion Testing foreign pattern conversion
@ -1124,12 +1124,12 @@ SUBJECT MODIFIERS
The allusedtext modifier requests that all the text that was consulted The allusedtext modifier requests that all the text that was consulted
during a successful pattern match by the interpreter should be shown. during a successful pattern match by the interpreter should be shown.
This feature is not supported for JIT matching, and if requested with This feature is not supported for JIT matching, and if requested with
JIT it is ignored (with a warning message). Setting this modifier JIT it is ignored (with a warning message). Setting this modifier af-
affects the output if there is a lookbehind at the start of a match, or fects the output if there is a lookbehind at the start of a match, or a
a lookahead at the end, or if \K is used in the pattern. Characters lookahead at the end, or if \K is used in the pattern. Characters that
that precede or follow the start and end of the actual match are indi- precede or follow the start and end of the actual match are indicated
cated in the output by '<' or '>' characters underneath them. Here is in the output by '<' or '>' characters underneath them. Here is an ex-
an example: ample:
re> /(?<=pqr)abc(?=xyz)/ re> /(?<=pqr)abc(?=xyz)/
data> 123pqrabcxyz456\=allusedtext data> 123pqrabcxyz456\=allusedtext
@ -1145,8 +1145,8 @@ SUBJECT MODIFIERS
string. The only time when this occurs is when \K has been processed as string. The only time when this occurs is when \K has been processed as
part of the match. In this situation, the output for the matched string part of the match. In this situation, the output for the matched string
is displayed from the starting character instead of from the match is displayed from the starting character instead of from the match
point, with circumflex characters under the earlier characters. For point, with circumflex characters under the earlier characters. For ex-
example: ample:
re> /abc\Kxyz/ re> /abc\Kxyz/
data> abcxyz\=startchar data> abcxyz\=startchar
@ -1171,12 +1171,12 @@ SUBJECT MODIFIERS
The allvector modifier requests that the entire ovector be shown, what- The allvector modifier requests that the entire ovector be shown, what-
ever the outcome of the match. Compare allcaptures, which shows only up ever the outcome of the match. Compare allcaptures, which shows only up
to the maximum number of capture groups for the pattern, and then only to the maximum number of capture groups for the pattern, and then only
for a successful complete non-DFA match. This modifier, which acts for a successful complete non-DFA match. This modifier, which acts af-
after any match result, and also for DFA matching, provides a means of ter any match result, and also for DFA matching, provides a means of
checking that there are no unexpected modifications to ovector fields. checking that there are no unexpected modifications to ovector fields.
Before each match attempt, the ovector is filled with a special value, Before each match attempt, the ovector is filled with a special value,
and if this is found in both elements of a capturing pair, and if this is found in both elements of a capturing pair, "<un-
"<unchanged>" is output. After a successful match, this applies to all changed>" is output. After a successful match, this applies to all
groups after the maximum capture group for the pattern. In other cases groups after the maximum capture group for the pattern. In other cases
it applies to the entire ovector. After a partial match, the first two it applies to the entire ovector. After a partial match, the first two
elements are the only ones that should be set. After a DFA match, the elements are the only ones that should be set. After a DFA match, the
@ -1207,12 +1207,12 @@ SUBJECT MODIFIERS
If an empty string is matched, the next match is done with the If an empty string is matched, the next match is done with the
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
for another, non-empty, match at the same point in the subject. If this for another, non-empty, match at the same point in the subject. If this
match fails, the start offset is advanced, and the normal match is match fails, the start offset is advanced, and the normal match is re-
retried. This imitates the way Perl handles such cases when using the tried. This imitates the way Perl handles such cases when using the /g
/g modifier or the split() function. Normally, the start offset is modifier or the split() function. Normally, the start offset is ad-
advanced by one character, but if the newline convention recognizes vanced by one character, but if the newline convention recognizes CRLF
CRLF as a newline, and the current character is CR followed by LF, an as a newline, and the current character is CR followed by LF, an ad-
advance of two characters occurs. vance of two characters occurs.
Testing substring extraction functions Testing substring extraction functions
@ -1275,8 +1275,8 @@ SUBJECT MODIFIERS
than 256 characters) for substitution tests, as fixed-size buffers are than 256 characters) for substitution tests, as fixed-size buffers are
used. To make it easy to test for buffer overflow, if the replacement used. To make it easy to test for buffer overflow, if the replacement
string starts with a number in square brackets, that number is passed string starts with a number in square brackets, that number is passed
to pcre2_substitute() as the size of the output buffer, with the to pcre2_substitute() as the size of the output buffer, with the re-
replacement string starting at the next character. Here is an example placement string starting at the next character. Here is an example
that tests the edge case: that tests the edge case:
/abc/ /abc/
@ -1285,10 +1285,10 @@ SUBJECT MODIFIERS
123abc123\=replace=[9]XYZ 123abc123\=replace=[9]XYZ
Failed: error -47: no more memory Failed: error -47: no more memory
The default action of pcre2_substitute() is to return The default action of pcre2_substitute() is to return PCRE2_ER-
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if ROR_NOMEMORY when the output buffer is too small. However, if the
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub- PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
stitute_overflow_length modifier), pcre2_substitute() continues to go tute_overflow_length modifier), pcre2_substitute() continues to go
through the motions of matching and substituting (but not doing any through the motions of matching and substituting (but not doing any
callouts), in order to compute the size of buffer that is required. callouts), in order to compute the size of buffer that is required.
When this happens, pcre2test shows the required buffer length (which When this happens, pcre2test shows the required buffer length (which
@ -1323,8 +1323,8 @@ SUBJECT MODIFIERS
Then are listed the offsets of the old substring, its contents, and the Then are listed the offsets of the old substring, its contents, and the
same for the replacement. same for the replacement.
By default, the substitution callout function returns zero, which By default, the substitution callout function returns zero, which ac-
accepts the replacement and causes matching to continue if /g was used. cepts the replacement and causes matching to continue if /g was used.
Two further modifiers can be used to test other return values. If sub- Two further modifiers can be used to test other return values. If sub-
stitute_skip is set to a value greater than zero the callout function stitute_skip is set to a value greater than zero the callout function
returns +1 for the match of that number, and similarly substitute_stop returns +1 for the match of that number, and similarly substitute_stop
@ -1411,8 +1411,8 @@ SUBJECT MODIFIERS
The memory modifier causes pcre2test to log the sizes of all heap mem- The memory modifier causes pcre2test to log the sizes of all heap mem-
ory allocation and freeing calls that occur during a call to ory allocation and freeing calls that occur during a call to
pcre2_match() or pcre2_dfa_match(). These occur only when a match pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
requires a bigger vector than the default for remembering backtracking quires a bigger vector than the default for remembering backtracking
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()). points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
In many cases there will be no heap memory used and therefore no addi- In many cases there will be no heap memory used and therefore no addi-
tional output. No heap memory is allocated during matching with JIT, so tional output. No heap memory is allocated during matching with JIT, so
@ -1435,9 +1435,9 @@ SUBJECT MODIFIERS
Setting the size of the output vector Setting the size of the output vector
The ovector modifier applies only to the subject line in which it The ovector modifier applies only to the subject line in which it ap-
appears, though of course it can also be used to set a default in a pears, though of course it can also be used to set a default in a #sub-
#subject command. It specifies the number of pairs of offsets that are ject command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15. available for storing matching information. The default is 15.
A value of zero is useful when testing the POSIX API because it causes A value of zero is useful when testing the POSIX API because it causes
@ -1491,12 +1491,12 @@ DEFAULT OUTPUT FROM pcre2test
When a match succeeds, pcre2test outputs the list of captured sub- When a match succeeds, pcre2test outputs the list of captured sub-
strings, starting with number 0 for the string that matched the whole strings, starting with number 0 for the string that matched the whole
pattern. Otherwise, it outputs "No match" when the return is pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially ROR_NOMATCH, or "Partial match:" followed by the partially matching
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
this is the entire substring that was inspected during the partial the entire substring that was inspected during the partial match; it
match; it may include characters before the actual match start if a may include characters before the actual match start if a lookbehind
lookbehind assertion, \K, \b, or \B was involved.) assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error number For any other return, pcre2test outputs the PCRE2 negative error number
and a short descriptive phrase. If the error is a failed UTF string and a short descriptive phrase. If the error is a failed UTF string
@ -1541,8 +1541,8 @@ DEFAULT OUTPUT FROM pcre2test
0: cat 0: cat
0+ aract 0+ aract
If global matching is requested, the results of successive matching If global matching is requested, the results of successive matching at-
attempts are output in sequence, like this: tempts are output in sequence, like this:
re> /\Bi(\w\w)/g re> /\Bi(\w\w)/g
data> Mississippi data> Mississippi
@ -1580,12 +1580,12 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
2: tan 2: tan
Using the normal matching function on this data finds only "tang". The Using the normal matching function on this data finds only "tang". The
longest matching string is always given first (and numbered zero). longest matching string is always given first (and numbered zero). Af-
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
followed by the partially matching substring. Note that this is the lowed by the partially matching substring. Note that this is the entire
entire substring that was inspected during the partial match; it may substring that was inspected during the partial match; it may include
include characters before the actual match start if a lookbehind asser- characters before the actual match start if a lookbehind assertion, \b,
tion, \b, or \B was involved. (\K is not supported for DFA matching.) or \B was involved. (\K is not supported for DFA matching.)
If global matching is requested, the search for further matches resumes If global matching is requested, the search for further matches resumes
at the end of the longest match. For example: at the end of the longest match. For example:
@ -1638,12 +1638,12 @@ CALLOUTS
--->pqrabcdef --->pqrabcdef
0 ^ ^ \d 0 ^ ^ \d
This output indicates that callout number 0 occurred for a match This output indicates that callout number 0 occurred for a match at-
attempt starting at the fourth character of the subject string, when tempt starting at the fourth character of the subject string, when the
the pointer was at the seventh character, and when the next pattern pointer was at the seventh character, and when the next pattern item
item was \d. Just one circumflex is output if the start and current was \d. Just one circumflex is output if the start and current posi-
positions are the same, or if the current position precedes the start tions are the same, or if the current position precedes the start posi-
position, which can happen if the callout is in a lookbehind assertion. tion, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the auto_callout pattern modifier. In this case, instead of a result of the auto_callout pattern modifier. In this case, instead of
@ -1660,8 +1660,8 @@ CALLOUTS
0: E* 0: E*
If a pattern contains (*MARK) items, an additional line is output when- If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For ever a change of latest mark is passed to the callout function. For ex-
example: ample:
re> /a(*MARK:X)bc/auto_callout re> /a(*MARK:X)bc/auto_callout
data> abc data> abc
@ -1683,8 +1683,8 @@ CALLOUTS
The output for a callout with a string argument is similar, except that The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators, instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output the callout string and its offset in the pattern string are output be-
before the reflection of the subject string, and the subject string is fore the reflection of the subject string, and the subject string is
reflected for each callout. For example: reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/ re> /^ab(?C'first')cd(?C"second")ef/
@ -1800,9 +1800,9 @@ NON-PRINTING CHARACTERS
When pcre2test is outputting text that is a matched part of a subject When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been string, it behaves in the same way, unless a different locale has been
set for the pattern (using the locale modifier). In this case, the set for the pattern (using the locale modifier). In this case, the is-
isprint() function is used to distinguish printing and non-printing print() function is used to distinguish printing and non-printing char-
characters. acters.
SAVING AND RESTORING COMPILED PATTERNS SAVING AND RESTORING COMPILED PATTERNS
@ -1814,14 +1814,14 @@ SAVING AND RESTORING COMPILED PATTERNS
have the same endianness, pointer width and PCRE2_SIZE type. Before have the same endianness, pointer width and PCRE2_SIZE type. Before
compiled patterns can be saved they must be serialized, that is, con- compiled patterns can be saved they must be serialized, that is, con-
verted to a stream of bytes. A single byte stream may contain any num- verted to a stream of bytes. A single byte stream may contain any num-
ber of compiled patterns, but they must all use the same character ber of compiled patterns, but they must all use the same character ta-
tables. A single copy of the tables is included in the byte stream (its bles. A single copy of the tables is included in the byte stream (its
size is 1088 bytes). size is 1088 bytes).
The functions whose names begin with pcre2_serialize_ are used for The functions whose names begin with pcre2_serialize_ are used for se-
serializing and de-serializing. They are described in the pcre2serial- rializing and de-serializing. They are described in the pcre2serialize
ize documentation. In this section we describe the features of documentation. In this section we describe the features of pcre2test
pcre2test that can be used to test these functions. that can be used to test these functions.
Note that "serialization" in PCRE2 does not convert compiled patterns Note that "serialization" in PCRE2 does not convert compiled patterns
to an abstract format like Java or .NET. It just makes a reloadable to an abstract format like Java or .NET. It just makes a reloadable
@ -1831,8 +1831,8 @@ SAVING AND RESTORING COMPILED PATTERNS
piled, it is pushed onto a stack of compiled patterns, and pcre2test piled, it is pushed onto a stack of compiled patterns, and pcre2test
expects the next line to contain a new pattern (or command) instead of expects the next line to contain a new pattern (or command) instead of
a subject line. By contrast, the pushcopy modifier causes a copy of the a subject line. By contrast, the pushcopy modifier causes a copy of the
compiled pattern to be stacked, leaving the original available for compiled pattern to be stacked, leaving the original available for im-
immediate matching. By using push and/or pushcopy, a number of patterns mediate matching. By using push and/or pushcopy, a number of patterns
can be compiled and retained. These modifiers are incompatible with can be compiled and retained. These modifiers are incompatible with
posix, and control modifiers that act at match time are ignored (with a posix, and control modifiers that act at match time are ignored (with a
message) for the stacked patterns. The jitverify modifier applies only message) for the stacked patterns. The jitverify modifier applies only
@ -1855,8 +1855,8 @@ SAVING AND RESTORING COMPILED PATTERNS
matched with the pattern, terminated as usual by an empty line or end matched with the pattern, terminated as usual by an empty line or end
of file. This command may be followed by a modifier list containing of file. This command may be followed by a modifier list containing
only control modifiers that act after a pattern has been compiled. In only control modifiers that act after a pattern has been compiled. In
particular, hex, posix, posix_nosub, push, and pushcopy are not particular, hex, posix, posix_nosub, push, and pushcopy are not al-
allowed, nor are any option-setting modifiers. The JIT modifiers are, lowed, nor are any option-setting modifiers. The JIT modifiers are,
however permitted. Here is an example that saves and reloads two pat- however permitted. Here is an example that saves and reloads two pat-
terns. terns.