Documentation update.

This commit is contained in:
Philip.Hazel 2019-06-22 16:36:15 +00:00
parent a89423624d
commit c6ee84317d
6 changed files with 2699 additions and 2715 deletions

View File

@ -3525,9 +3525,10 @@ first match attempt, the second attempt would start at the second character
instead of skipping on to "c".
</P>
<P>
If (*SKIP) is used inside a lookbehind to specify a new starting position that
is not later than the starting point of the current match, the position
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
If (*SKIP) is used to specify a new starting position that is the same as the
starting position of the current match, or (by being inside a lookbehind)
earlier, the position specified by (*SKIP) is ignored, and instead the normal
"bumpalong" occurs.
<pre>
(*SKIP:NAME)
</pre>
@ -3754,7 +3755,7 @@ Cambridge, England.
</P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P>
Last updated: 21 June 2019
Last updated: 22 June 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>

View File

@ -16,8 +16,8 @@ DESCRIPTION
pcre2-config returns the configuration of the installed PCRE2 libraries
and the options required to compile a program to use them. Some of the
options apply only to the 8-bit, or 16-bit, or 32-bit libraries,
respectively, and are not available for libraries that have not been
options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re-
spectively, and are not available for libraries that have not been
built. If an unavailable option is encountered, the "usage" information
is output.
@ -36,30 +36,30 @@ OPTIONS
--version Writes the version number of the installed PCRE2 libraries to
the standard output.
--libs8 Writes to the standard output the command line options
required to link with the 8-bit PCRE2 library (-lpcre2-8 on
--libs8 Writes to the standard output the command line options re-
quired to link with the 8-bit PCRE2 library (-lpcre2-8 on
many systems).
--libs16 Writes to the standard output the command line options
required to link with the 16-bit PCRE2 library (-lpcre2-16 on
--libs16 Writes to the standard output the command line options re-
quired to link with the 16-bit PCRE2 library (-lpcre2-16 on
many systems).
--libs32 Writes to the standard output the command line options
required to link with the 32-bit PCRE2 library (-lpcre2-32 on
--libs32 Writes to the standard output the command line options re-
quired to link with the 32-bit PCRE2 library (-lpcre2-32 on
many systems).
--libs-posix
Writes to the standard output the command line options
required to link with PCRE2's POSIX API wrapper library
Writes to the standard output the command line options re-
quired to link with PCRE2's POSIX API wrapper library
(-lpcre2-posix -lpcre2-8 on many systems).
--cflags Writes to the standard output the command line options
required to compile files that use PCRE2 (this may include
some -I options, but is blank on many systems).
--cflags Writes to the standard output the command line options re-
quired to compile files that use PCRE2 (this may include some
-I options, but is blank on many systems).
--cflags-posix
Writes to the standard output the command line options
required to compile files that use PCRE2's POSIX API wrapper
Writes to the standard output the command line options re-
quired to compile files that use PCRE2's POSIX API wrapper
library (this may include some -I options, but is blank on
many systems).

File diff suppressed because it is too large Load Diff

View File

@ -12,11 +12,11 @@ SYNOPSIS
DESCRIPTION
pcre2grep searches files for character patterns, in the same way as
other grep commands do, but it uses the PCRE2 regular expression
library to support patterns that are compatible with the regular
expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
of pattern syntax, or pcre2pattern(3) for a full description of the
syntax and semantics of the regular expressions that PCRE2 supports.
other grep commands do, but it uses the PCRE2 regular expression li-
brary to support patterns that are compatible with the regular expres-
sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of
pattern syntax, or pcre2pattern(3) for a full description of the syntax
and semantics of the regular expressions that PCRE2 supports.
Patterns, whether supplied on the command line or in a separate file,
are given without delimiters. For example:
@ -26,8 +26,8 @@ DESCRIPTION
If you attempt to use delimiters (for example, by surrounding a pattern
with slashes, as is common in Perl scripts), they are interpreted as
part of the pattern. Quotes can of course be used to delimit patterns
on the command line because they are interpreted by the shell, and
indeed quotes are required if a pattern contains white space or shell
on the command line because they are interpreted by the shell, and in-
deed quotes are required if a pattern contains white space or shell
metacharacters.
The first argument that follows any option settings is treated as the
@ -54,8 +54,8 @@ DESCRIPTION
controlled by parameters that can be set by the --buffer-size and
--max-buffer-size options. The first of these sets the size of buffer
that is obtained at the start of processing. If an input file contains
very long lines, a larger buffer may be needed; this is handled by
automatically extending the buffer, up to the limit specified by --max-
very long lines, a larger buffer may be needed; this is handled by au-
tomatically extending the buffer, up to the limit specified by --max-
buffer-size. The default values for these parameters can be set when
pcre2grep is built; if nothing is specified, the defaults are set to
20KiB and 1MiB respectively. An error occurs if a line is too long and
@ -75,12 +75,12 @@ DESCRIPTION
By default, as soon as one pattern matches a line, no further patterns
are considered. However, if --colour (or --color) is used to colour the
matching substrings, or if --only-matching, --file-offsets, or --line-
offsets is used to output only the part of the line that matched
(either shown literally, or as an offset), scanning resumes immediately
offsets is used to output only the part of the line that matched (ei-
ther shown literally, or as an offset), scanning resumes immediately
following the match, so that further matches on the same line can be
found. If there are multiple patterns, they are all tried on the
remainder of the line, but patterns that follow the one that matched
are not tried on the earlier part of the line.
found. If there are multiple patterns, they are all tried on the re-
mainder of the line, but patterns that follow the one that matched are
not tried on the earlier part of the line.
This behaviour means that the order in which multiple patterns are
specified can affect the output when one of the above options is used.
@ -89,11 +89,11 @@ DESCRIPTION
overlap).
Patterns that can match an empty string are accepted, but empty string
matches are never recognized. An example is the pattern
"(super)?(man)?", in which all components are optional. This pattern
finds all occurrences of both "super" and "man"; the output differs
from matching with "super|man" when only the matching substrings are
being shown.
matches are never recognized. An example is the pattern "(su-
per)?(man)?", in which all components are optional. This pattern finds
all occurrences of both "super" and "man"; the output differs from
matching with "super|man" when only the matching substrings are being
shown.
If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
the value to set a locale when calling the PCRE2 library. The --locale
@ -116,10 +116,9 @@ BINARY FILES
By default, a file that contains a binary zero byte within the first
1024 bytes is identified as a binary file, and is processed specially.
(GNU grep identifies binary files in this manner.) However, if the new-
line type is specified as "nul", that is, the line terminator is a
binary zero, the test for a binary file is not applied. See the
--binary-files option for a means of changing the way binary files are
handled.
line type is specified as "nul", that is, the line terminator is a bi-
nary zero, the test for a binary file is not applied. See the --binary-
files option for a means of changing the way binary files are handled.
BINARY ZEROS IN PATTERNS
@ -148,12 +147,12 @@ OPTIONS
Output up to number lines of context after each matching
line. Fewer lines are output if the next match or the end of
the file is reached, or if the processing buffer size has
been set too small. If file names and/or line numbers are
being output, a hyphen separator is used instead of a colon
for the context lines. A line containing "--" is output
between each group of lines, unless they are in fact contigu-
ous in the input file. The value of number is expected to be
relatively small. When -c is used, -A is ignored.
been set too small. If file names and/or line numbers are be-
ing output, a hyphen separator is used instead of a colon for
the context lines. A line containing "--" is output between
each group of lines, unless they are in fact contiguous in
the input file. The value of number is expected to be rela-
tively small. When -c is used, -A is ignored.
-a, --text
Treat binary files as text. This is equivalent to --binary-
@ -164,26 +163,26 @@ OPTIONS
line. Fewer lines are output if the previous match or the
start of the file is within number lines, or if the process-
ing buffer size has been set too small. If file names and/or
line numbers are being output, a hyphen separator is used
instead of a colon for the context lines. A line containing
line numbers are being output, a hyphen separator is used in-
stead of a colon for the context lines. A line containing
"--" is output between each group of lines, unless they are
in fact contiguous in the input file. The value of number is
expected to be relatively small. When -c is used, -B is
ignored.
expected to be relatively small. When -c is used, -B is ig-
nored.
--binary-files=word
Specify how binary files are to be processed. If the word is
"binary" (the default), pattern matching is performed on
binary files, but the only output is "Binary file <name>
"binary" (the default), pattern matching is performed on bi-
nary files, but the only output is "Binary file <name>
matches" when a match succeeds. If the word is "text", which
is equivalent to the -a or --text option, binary files are
processed in the same way as any other file. In this case,
when a match succeeds, the output may be binary garbage,
which can have nasty effects if sent to a terminal. If the
word is "without-match", which is equivalent to the -I
option, binary files are not processed at all; they are
assumed not to be of interest and are skipped without causing
any output or affecting the return code.
word is "without-match", which is equivalent to the -I op-
tion, binary files are not processed at all; they are assumed
not to be of interest and are skipped without causing any
output or affecting the return code.
--buffer-size=number
Set the parameter that controls how much memory is obtained
@ -208,10 +207,10 @@ OPTIONS
If no lines are selected, the number zero is output. If sev-
eral files are are being scanned, a count is output for each
of them and the -t option can be used to cause a total to be
output at the end. However, if the --files-with-matches
option is also used, only those files whose counts are
greater than zero are listed. When -c is used, the -A, -B,
and -C options are ignored.
output at the end. However, if the --files-with-matches op-
tion is also used, only those files whose counts are greater
than zero are listed. When -c is used, the -A, -B, and -C op-
tions are ignored.
--colour, --color
If this option is given without any data, it is equivalent to
@ -238,8 +237,8 @@ OPTIONS
semicolon, except in the case of GREP_COLORS, which must
start with "ms=" or "mt=" followed by two semicolon-separated
colours, terminated by the end of the string or by a colon.
If GREP_COLORS does not start with "ms=" or "mt=" it is
ignored, and GREP_COLOR is checked.
If GREP_COLORS does not start with "ms=" or "mt=" it is ig-
nored, and GREP_COLOR is checked.
If the string obtained from one of the above variables con-
tains any characters other than semicolon or digits, the set-
@ -250,9 +249,9 @@ OPTIONS
set, the default is "1;31", which gives red.
-D action, --devices=action
If an input path is not a regular file or a directory,
"action" specifies how it is to be processed. Valid values
are "read" (the default) or "skip" (silently skip the path).
If an input path is not a regular file or a directory, "ac-
tion" specifies how it is to be processed. Valid values are
"read" (the default) or "skip" (silently skip the path).
-d action, --directories=action
If an input path is a directory, "action" specifies how it is
@ -261,8 +260,8 @@ OPTIONS
"recurse" (equivalent to the -r option), or "skip" (silently
skip the path, the default in Windows environments). In the
"read" case, directories are read as if they were ordinary
files. In some operating systems the effect of reading a
directory like this is an immediate end-of-file; in others it
files. In some operating systems the effect of reading a di-
rectory like this is an immediate end-of-file; in others it
may provoke an error.
--depth-limit=number
@ -295,8 +294,8 @@ OPTIONS
whether listed on the command line, obtained from --file-
list, or by scanning a directory. The pattern is a PCRE2 reg-
ular expression, and is matched against the final component
of the file name, not the entire path. The -F, -w, and -x
options do not apply to this pattern. The option may be given
of the file name, not the entire path. The -F, -w, and -x op-
tions do not apply to this pattern. The option may be given
any number of times in order to specify multiple patterns. If
a file name matches both an --include and an --exclude pat-
tern, it is excluded. There is no short form for this option.
@ -310,29 +309,29 @@ OPTIONS
--exclude-dir=pattern
Directories whose names match the pattern are skipped without
being processed, whatever the setting of the --recursive
option. This applies to all directories, whether listed on
the command line, obtained from --file-list, or by scanning a
being processed, whatever the setting of the --recursive op-
tion. This applies to all directories, whether listed on the
command line, obtained from --file-list, or by scanning a
parent directory. The pattern is a PCRE2 regular expression,
and is matched against the final component of the directory
name, not the entire path. The -F, -w, and -x options do not
apply to this pattern. The option may be given any number of
times in order to specify more than one pattern. If a direc-
tory matches both --include-dir and --exclude-dir, it is
excluded. There is no short form for this option.
tory matches both --include-dir and --exclude-dir, it is ex-
cluded. There is no short form for this option.
-F, --fixed-strings
Interpret each data-matching pattern as a list of fixed
strings, separated by newlines, instead of as a regular
expression. What constitutes a newline for this purpose is
controlled by the --newline option. The -w (match as a word)
and -x (match whole line) options can be used with -F. They
apply to each of the fixed strings. A line is selected if any
strings, separated by newlines, instead of as a regular ex-
pression. What constitutes a newline for this purpose is con-
trolled by the --newline option. The -w (match as a word) and
-x (match whole line) options can be used with -F. They ap-
ply to each of the fixed strings. A line is selected if any
of the fixed strings are found in it (subject to -w or -x, if
present). This option applies only to the patterns that are
matched against the contents of files; it does not apply to
patterns specified by any of the --include or --exclude
options.
patterns specified by any of the --include or --exclude op-
tions.
-f filename, --file=filename
Read patterns from the file, one per line, and match them
@ -360,8 +359,8 @@ OPTIONS
--file-list=filename
Read a list of files and/or directories that are to be
scanned from the given file, one per line. What constitutes a
newline when reading the file is the operating system's
default. Trailing white space is removed from each line, and
newline when reading the file is the operating system's de-
fault. Trailing white space is removed from each line, and
blank lines are ignored. These paths are processed before any
that are listed on the command line. The file name can be
given as "-" to refer to the standard input. If --file and
@ -388,8 +387,8 @@ OPTIONS
is used. If a line number is also being output, it follows
the file name. When the -M option causes a pattern to match
more than one line, only the first is preceded by the file
name. This option overrides any previous -h, -l, or -L
options.
name. This option overrides any previous -h, -l, or -L op-
tions.
-h, --no-filename
Suppress the output file names when searching multiple files.
@ -415,16 +414,16 @@ OPTIONS
--include=pattern
If any --include patterns are specified, the only files that
are processed are those that match one of the patterns (and
do not match an --exclude pattern). This option does not
affect directories, but it applies to all files, whether
listed on the command line, obtained from --file-list, or by
scanning a directory. The pattern is a PCRE2 regular expres-
sion, and is matched against the final component of the file
name, not the entire path. The -F, -w, and -x options do not
apply to this pattern. The option may be given any number of
times. If a file name matches both an --include and an
--exclude pattern, it is excluded. There is no short form
for this option.
do not match an --exclude pattern). This option does not af-
fect directories, but it applies to all files, whether listed
on the command line, obtained from --file-list, or by scan-
ning a directory. The pattern is a PCRE2 regular expression,
and is matched against the final component of the file name,
not the entire path. The -F, -w, and -x options do not apply
to this pattern. The option may be given any number of times.
If a file name matches both an --include and an --exclude
pattern, it is excluded. There is no short form for this op-
tion.
--include-from=filename
Treat each non-empty line of the file as the data for an
@ -438,8 +437,8 @@ OPTIONS
tories that are processed are those that match one of the
patterns (and do not match an --exclude-dir pattern). This
applies to all directories, whether listed on the command
line, obtained from --file-list, or by scanning a parent
directory. The pattern is a PCRE2 regular expression, and is
line, obtained from --file-list, or by scanning a parent di-
rectory. The pattern is a PCRE2 regular expression, and is
matched against the final component of the directory name,
not the entire path. The -F, -w, and -x options do not apply
to this pattern. The option may be given any number of times.
@ -480,8 +479,8 @@ OPTIONS
flushed by the operating system. This option can be useful
when the input or output is attached to a pipe and you do not
want pcre2grep to buffer up large amounts of data. However,
its use will affect performance, and the -M (multiline)
option ceases to work. When input is from a compressed .gz or
its use will affect performance, and the -M (multiline) op-
tion ceases to work. When input is from a compressed .gz or
.bz2 file, --line-buffered is ignored.
--line-offsets
@ -498,9 +497,9 @@ OPTIONS
--locale=locale-name
This option specifies a locale to be used for pattern match-
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
ronment variables. If no locale is specified, the PCRE2
library's default (usually the "C" locale) is used. There is
no short form for this option.
ronment variables. If no locale is specified, the PCRE2 li-
brary's default (usually the "C" locale) is used. There is no
short form for this option.
--match-limit=number
Processing some regular expression patterns may take a very
@ -509,13 +508,13 @@ OPTIONS
options that set resource limits for matching.
The --match-limit option provides a means of limiting comput-
ing resource usage when processing patterns that are not
going to match, but which have a very large number of possi-
bilities in their search trees. The classic example is a pat-
tern that uses nested unlimited repeats. Internally, PCRE2
has a counter that is incremented each time around its main
processing loop. If the value set by --match-limit is
reached, an error occurs.
ing resource usage when processing patterns that are not go-
ing to match, but which have a very large number of possibil-
ities in their search trees. The classic example is a pattern
that uses nested unlimited repeats. Internally, PCRE2 has a
counter that is incremented each time around its main pro-
cessing loop. If the value set by --match-limit is reached,
an error occurs.
The --heap-limit option specifies, as a number of kibibytes
(units of 1024 bytes), the amount of heap memory that may be
@ -567,10 +566,10 @@ OPTIONS
pcre2grep -M 'regular\s+expression' <file>
The \s escape sequence matches any white space character,
including newlines, and is followed by + so as to match
trailing white space on the first line as well as possibly
handling a two-character newline sequence.
The \s escape sequence matches any white space character, in-
cluding newlines, and is followed by + so as to match trail-
ing white space on the first line as well as possibly han-
dling a two-character newline sequence.
There is a limit to the number of lines that can be matched,
imposed by the way that pcre2grep buffers the input file as
@ -579,30 +578,30 @@ OPTIONS
when input is read line by line (see --line-buffered.)
-N newline-type, --newline=newline-type
The PCRE2 library supports five different conventions for
indicating the ends of lines. They are the single-character
sequences CR (carriage return) and LF (linefeed), the two-
character sequence CRLF, an "anycrlf" convention, which rec-
ognizes any of the preceding three types, and an "any" con-
vention, in which any Unicode line ending sequence is assumed
to end a line. The Unicode sequences are the three just men-
The PCRE2 library supports five different conventions for in-
dicating the ends of lines. They are the single-character se-
quences CR (carriage return) and LF (linefeed), the two-char-
acter sequence CRLF, an "anycrlf" convention, which recog-
nizes any of the preceding three types, and an "any" conven-
tion, in which any Unicode line ending sequence is assumed to
end a line. The Unicode sequences are the three just men-
tioned, plus VT (vertical tab, U+000B), FF (form feed,
U+000C), NEL (next line, U+0085), LS (line separator,
U+2028), and PS (paragraph separator, U+2029).
When the PCRE2 library is built, a default line-ending
sequence is specified. This is normally the standard
sequence for the operating system. Unless otherwise specified
by this option, pcre2grep uses the library's default. The
possible values for this option are CR, LF, CRLF, ANYCRLF, or
ANY. This makes it possible to use pcre2grep to scan files
that have come from other environments without having to mod-
ify their line endings. If the data that is being scanned
does not agree with the convention set by this option,
pcre2grep may behave in strange ways. Note that this option
does not apply to files specified by the -f, --exclude-from,
or --include-from options, which are expected to use the
operating system's standard newline sequence.
When the PCRE2 library is built, a default line-ending se-
quence is specified. This is normally the standard sequence
for the operating system. Unless otherwise specified by this
option, pcre2grep uses the library's default. The possible
values for this option are CR, LF, CRLF, ANYCRLF, or ANY.
This makes it possible to use pcre2grep to scan files that
have come from other environments without having to modify
their line endings. If the data that is being scanned does
not agree with the convention set by this option, pcre2grep
may behave in strange ways. Note that this option does not
apply to files specified by the -f, --exclude-from, or --in-
clude-from options, which are expected to use the operating
system's standard newline sequence.
-n, --line-number
Precede each output line by its line number in the file, fol-
@ -621,8 +620,8 @@ OPTIONS
-O text, --output=text
When there is a match, instead of outputting the whole line
that matched, output just the given text. This option is
mutually exclusive with --only-matching, --file-offsets, and
that matched, output just the given text. This option is mu-
tually exclusive with --only-matching, --file-offsets, and
--line-offsets. Escape sequences starting with a dollar char-
acter may be used to insert the contents of the matched part
of the line and/or captured substrings into the text.
@ -651,9 +650,9 @@ OPTIONS
of the whole line. In this mode, no context is shown. That
is, the -A, -B, and -C options are ignored. If there is more
than one match in a line, each of them is shown separately,
on a separate line of output. If -o is combined with -v
(invert the sense of the match to find non-matching lines),
no output is generated, but the return code is set appropri-
on a separate line of output. If -o is combined with -v (in-
vert the sense of the match to find non-matching lines), no
output is generated, but the return code is set appropri-
ately. If the matched portion of the line is empty, nothing
is output unless the file name or line number are being
printed, in which case they are shown on an otherwise empty
@ -671,8 +670,8 @@ OPTIONS
-o0 is the same as -o without a number. Because these options
can be given without an argument (see above), if an argument
is present, it must be given in the same shell item, for
example, -o3 or --only-matching=2. The comments given for the
is present, it must be given in the same shell item, for ex-
ample, -o3 or --only-matching=2. The comments given for the
non-argument case above also apply to this option. If the
specified capturing parentheses do not exist in the pattern,
or were not set in the match, nothing is output unless the
@ -704,8 +703,8 @@ OPTIONS
it contains, taking note of any --include and --exclude set-
tings. By default, a directory is read as a normal file; in
some operating systems this gives an immediate end-of-file.
This option is a shorthand for setting the -d option to
"recurse".
This option is a shorthand for setting the -d option to "re-
curse".
--recursion-limit=number
See --match-limit above.
@ -719,8 +718,8 @@ OPTIONS
This option is useful when scanning more than one file. If
used on its own, -t suppresses all output except for a grand
total number of matching lines (or non-matching lines if -v
is used) in all the files. If -t is used with -c, a grand
total is output except when the previous output is just one
is used) in all the files. If -t is used with -c, a grand to-
tal is output except when the previous output is just one
line. In other words, it is not output when just one file's
count is listed. If file names are being output, the grand
total is preceded by "TOTAL:". Otherwise, it appears as just
@ -773,10 +772,10 @@ OPTIONS
ENVIRONMENT VARIABLES
The environment variables LC_ALL and LC_CTYPE are examined, in that
order, for a locale. The first one that is set is used. This can be
overridden by the --locale option. If no locale is set, the PCRE2
library's default (usually the "C" locale) is used.
The environment variables LC_ALL and LC_CTYPE are examined, in that or-
der, for a locale. The first one that is set is used. This can be over-
ridden by the --locale option. If no locale is set, the PCRE2 library's
default (usually the "C" locale) is used.
NEWLINES
@ -834,13 +833,13 @@ OPTIONS WITH DATA
--file /some/file
Note, however, that if you want to supply a file name beginning with ~
as data in a shell command, and have the shell expand ~ to a home
directory, you must separate the file name from the option, because the
as data in a shell command, and have the shell expand ~ to a home di-
rectory, you must separate the file name from the option, because the
shell does not treat ~ specially unless it is at the start of an item.
The exceptions to the above are the --colour (or --color) and --only-
matching options, for which the data is optional. If one of these
options does have data, it must be given in the first form, using an
matching options, for which the data is optional. If one of these op-
tions does have data, it must be given in the first form, using an
equals character. Otherwise pcre2grep will assume that it has no data.
@ -850,8 +849,8 @@ USING PCRE2'S CALLOUT FACILITY
scripts or echoing specific strings during matching by making use of
PCRE2's callout facility. However, this support can be completely or
partially disabled when pcre2grep is built. You can find out whether
your binary has support for callouts by running it with the --help
option. If callout support is completely disabled, all callouts in pat-
your binary has support for callouts by running it with the --help op-
tion. If callout support is completely disabled, all callouts in pat-
terns are ignored by pcre2grep. If the facility is partially disabled,
calling external programs is not supported, and callouts that request
it are ignored.
@ -875,16 +874,16 @@ USING PCRE2'S CALLOUT FACILITY
executable_name|arg1|arg2|...
Any substring (including the executable name) may contain escape
sequences started by a dollar character: $<digits> or ${<digits>} is
replaced by the captured substring of the given decimal number, which
Any substring (including the executable name) may contain escape se-
quences started by a dollar character: $<digits> or ${<digits>} is re-
placed by the captured substring of the given decimal number, which
must be greater than zero. If the number is greater than the number of
capturing substrings, or if the capture is unset, the replacement is
empty.
Any other character is substituted by itself. In particular, $$ is
replaced by a single dollar and $| is replaced by a pipe character.
Here is an example:
Any other character is substituted by itself. In particular, $$ is re-
placed by a single dollar and $| is replaced by a pipe character. Here
is an example:
echo -e "abcde\n12345" | pcre2grep \
'(?x)(.)(..(.))
@ -914,10 +913,10 @@ USING PCRE2'S CALLOUT FACILITY
to the output, having been passed through the same escape processing as
text from the --output option. This provides a simple echoing facility
that avoids calling an external program or script. No terminator is
added to the string, so if you want a newline, you must include it
explicitly. Matching continues normally after the string is output. If
you want to see only the callout output but not any output from an
actual match, you should end the relevant pattern with (*FAIL).
added to the string, so if you want a newline, you must include it ex-
plicitly. Matching continues normally after the string is output. If
you want to see only the callout output but not any output from an ac-
tual match, you should end the relevant pattern with (*FAIL).
MATCHING ERRORS
@ -925,8 +924,8 @@ MATCHING ERRORS
It is possible to supply a regular expression that takes a very long
time to fail to match certain lines. Such patterns normally involve
nested indefinite repeats, for example: (a+)*\d when matched against a
line of a's with no final digit. The PCRE2 matching function has a
resource limit that causes it to abort in these circumstances. If this
line of a's with no final digit. The PCRE2 matching function has a re-
source limit that causes it to abort in these circumstances. If this
happens, pcre2grep outputs an error message and the line that caused
the problem to the standard error stream. If there are more than 20
such errors, pcre2grep gives up.

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34"
.TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -3564,9 +3564,10 @@ effect as this example; although it would suppress backtracking during the
first match attempt, the second attempt would start at the second character
instead of skipping on to "c".
.P
If (*SKIP) is used inside a lookbehind to specify a new starting position that
is not later than the starting point of the current match, the position
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
If (*SKIP) is used to specify a new starting position that is the same as the
starting position of the current match, or (by being inside a lookbehind)
earlier, the position specified by (*SKIP) is ignored, and instead the normal
"bumpalong" occurs.
.sp
(*SKIP:NAME)
.sp
@ -3787,6 +3788,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 21 June 2019
Last updated: 22 June 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -13,8 +13,8 @@ SYNOPSIS
but it can also be used for experimenting with regular expressions.
This document describes the features of the test program; for details
of the regular expressions themselves, see the pcre2pattern documenta-
tion. For details of the PCRE2 library function calls and their
options, see the pcre2api documentation.
tion. For details of the PCRE2 library function calls and their op-
tions, see the pcre2api documentation.
The input for pcre2test is a sequence of regular expression patterns
and subject strings to be matched. There are also command lines for
@ -33,26 +33,26 @@ SYNOPSIS
which are specifically designed for use in conjunction with the test
script and data files that are distributed as part of PCRE2. All the
modifiers are documented here, some without much justification, but
many of them are unlikely to be of use except when testing the
libraries.
many of them are unlikely to be of use except when testing the li-
braries.
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
Different versions of the PCRE2 library can be built to support charac-
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
One, two, or all three of these libraries may be simultaneously
installed. The pcre2test program can be used to test all the libraries.
One, two, or all three of these libraries may be simultaneously in-
stalled. The pcre2test program can be used to test all the libraries.
However, its own input and output are always in 8-bit format. When
testing the 16-bit or 32-bit libraries, patterns and subject strings
are converted to 16-bit or 32-bit format before being passed to the
library functions. Results are converted back to 8-bit code units for
are converted to 16-bit or 32-bit format before being passed to the li-
brary functions. Results are converted back to 8-bit code units for
output.
In the rest of this document, the names of library functions and struc-
tures are given in generic form, for example, pcre_compile(). The
actual names used in the libraries have a suffix _8, _16, or _32, as
appropriate.
tures are given in generic form, for example, pcre_compile(). The ac-
tual names used in the libraries have a suffix _8, _16, or _32, as ap-
propriate.
INPUT ENCODING
@ -70,18 +70,18 @@ INPUT ENCODING
processed for backslash escapes, which makes it possible to include any
data value in strings that are passed to the library for matching. For
patterns, there is a facility for specifying some or all of the 8-bit
input characters as hexadecimal pairs, which makes it possible to
include binary zeros.
input characters as hexadecimal pairs, which makes it possible to in-
clude binary zeros.
Input for the 16-bit and 32-bit libraries
When testing the 16-bit or 32-bit libraries, there is a need to be able
to generate character code points greater than 255 in the strings that
are passed to the library. For subject lines, backslash escapes can be
used. In addition, when the utf modifier (see "Setting compilation
options" below) is set, the pattern and any following subject lines are
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as
appropriate.
used. In addition, when the utf modifier (see "Setting compilation op-
tions" below) is set, the pattern and any following subject lines are
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
propriate.
For non-UTF testing of wide characters, the utf8_input modifier can be
used. This is mutually exclusive with utf, and is allowed only in
@ -121,8 +121,8 @@ COMMAND LINE OPTIONS
piled.
-AC As for -ac, but in addition behave as if each subject line
has the callout_extra modifier, that is, show additional
information from callouts.
has the callout_extra modifier, that is, show additional in-
formation from callouts.
-b Behave as if each pattern has the fullbincode modifier; the
full internal binary form of the pattern is output after com-
@ -130,9 +130,9 @@ COMMAND LINE OPTIONS
-C Output the version number of the PCRE2 library, and all
available information about the optional features that are
included, and then exit with zero exit code. All other
options are ignored. If both -C and -LM are present, which-
ever is first is recognized.
included, and then exit with zero exit code. All other op-
tions are ignored. If both -C and -LM are present, whichever
is first is recognized.
-C option Output information about a specific build-time option, then
exit. This functionality is intended for use in scripts such
@ -269,8 +269,8 @@ DESCRIPTION
supply them explicitly.
An empty line or the end of the file signals the end of the subject
lines for a test, at which point a new pattern or command line is
expected if there is still input to be read.
lines for a test, at which point a new pattern or command line is ex-
pected if there is still input to be read.
COMMAND LINES
@ -311,8 +311,8 @@ COMMAND LINES
as indicating a newline in a pattern or subject string. The default can
be overridden when a pattern is compiled. The standard test files con-
tain tests of various newline conventions, but the majority of the
tests expect a single linefeed to be recognized as a newline by
default. Without special action the tests would fail when PCRE2 is com-
tests expect a single linefeed to be recognized as a newline by de-
fault. Without special action the tests would fail when PCRE2 is com-
piled with either CR or CRLF as the default newline.
The #newline_default command specifies a list of newline types that are
@ -323,14 +323,14 @@ COMMAND LINES
If the default newline is in the list, this command has no effect. Oth-
erwise, except when testing the POSIX API, a newline modifier that
specifies the first newline convention in the list (LF in the above
example) is added to any pattern that does not already have a newline
specifies the first newline convention in the list (LF in the above ex-
ample) is added to any pattern that does not already have a newline
modifier. If the newline list is empty, the feature is turned off. This
command is present in a number of the standard test input files.
When the POSIX API is being tested there is no way to override the
default newline convention, though it is possible to set the newline
convention from within the pattern. A warning is given if the posix or
When the POSIX API is being tested there is no way to override the de-
fault newline convention, though it is possible to set the newline con-
vention from within the pattern. A warning is given if the posix or
posix_nosub modifier is used when #newline_default would set a default
for the non-POSIX API.
@ -344,8 +344,8 @@ COMMAND LINES
The appearance of this line causes all subsequent modifier settings to
be checked for compatibility with the perltest.sh script, which is used
to confirm that Perl gives the same results as PCRE2. Also, apart from
comment lines, #pattern commands, and #subject commands that set or
unset "mark", no command lines are permitted, because they and many of
comment lines, #pattern commands, and #subject commands that set or un-
set "mark", no command lines are permitted, because they and many of
the modifiers are specific to pcre2test, and should not be used in test
files that are also processed by perltest.sh. The #perltest command
helps detect tests that are accidentally put in the wrong file.
@ -376,8 +376,8 @@ MODIFIER SYNTAX
list are separated by commas followed by optional white space. Trailing
whitespace in a modifier list is ignored. Some modifiers may be given
for both patterns and subject lines, whereas others are valid only for
one or the other. Each modifier has a long name, for example
"anchored", and some of them must be followed by an equals sign and a
one or the other. Each modifier has a long name, for example "an-
chored", and some of them must be followed by an equals sign and a
value, for example, "offset=12". Values cannot contain comma charac-
ters, but may contain spaces. Modifiers that do not take values may be
preceded by a minus sign to turn off a previous setting.
@ -498,8 +498,8 @@ SUBJECT LINE SYNTAX
\= This is a comment.
abc\= This is an invalid modifier list.
A backslash followed by any other non-alphanumeric character just
escapes that character. A backslash followed by anything else causes an
A backslash followed by any other non-alphanumeric character just es-
capes that character. A backslash followed by anything else causes an
error. However, if the very last character in the line is a backslash
(and there is no modifier list), it is ignored. This gives a way of
passing an empty line as data, since a real empty line terminates the
@ -523,13 +523,13 @@ PATTERN MODIFIERS
The following modifiers set options for pcre2_compile(). Most of them
set bits in the options argument of that function, but those whose
names start with PCRE2_EXTRA are additional options that are set in the
compile context. For the main options, there are some single-letter
abbreviations that are the same as Perl options. There is special han-
compile context. For the main options, there are some single-letter ab-
breviations that are the same as Perl options. There is special han-
dling for /x: if a second x is present, PCRE2_EXTENDED is converted
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds
PCRE2_EXTENDED as well, though this makes no difference to the way
pcre2_compile() behaves. See pcre2api for a description of the effects
of these options.
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
TENDED as well, though this makes no difference to the way pcre2_com-
pile() behaves. See pcre2api for a description of the effects of these
options.
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
@ -577,9 +577,9 @@ PATTERN MODIFIERS
Setting compilation controls
The following modifiers affect the compilation process or request
information about the pattern. There are single-letter abbreviations
for some that are heavily used in the test files.
The following modifiers affect the compilation process or request in-
formation about the pattern. There are single-letter abbreviations for
some that are heavily used in the test files.
bsr=[anycrlf|unicode] specify \R handling
/B bincode show binary code without lengths
@ -717,8 +717,8 @@ PATTERN MODIFIERS
minated strings but can be passed by length instead of being zero-ter-
minated. The use_length modifier causes this to happen. Using a length
happens automatically (whether or not use_length is set) when hex is
set, because patterns specified in hexadecimal may contain binary
zeros.
set, because patterns specified in hexadecimal may contain binary ze-
ros.
If hex or use_length is used with the POSIX wrapper API (see "Using the
POSIX wrapper API" below), the REG_PEND extension is used to pass the
@ -770,8 +770,8 @@ PATTERN MODIFIERS
partial modifier in "Subject Modifiers" below for details of how these
options are specified for each match attempt.
JIT compilation is requested by the jit pattern modifier, which may
optionally be followed by an equals sign and a number in the range 0 to
JIT compilation is requested by the jit pattern modifier, which may op-
tionally be followed by an equals sign and a number in the range 0 to
7. The three bits that make up the number specify which of the three
JIT operating modes are to be compiled:
@ -799,8 +799,8 @@ PATTERN MODIFIERS
none was compiled for non-partial matching.
If JIT compilation is successful, the compiled JIT code will automati-
cally be used when an appropriate type of match is run, except when
incompatible run-time options are specified. For more details, see the
cally be used when an appropriate type of match is run, except when in-
compatible run-time options are specified. For more details, see the
pcre2jit documentation. See also the jitstack modifier below for a way
of setting the size of the JIT stack.
@ -847,8 +847,8 @@ PATTERN MODIFIERS
Limiting nested parentheses
The parens_nest_limit modifier sets a limit on the depth of nested
parentheses in a pattern. Breaching the limit causes a compilation
error. The default for the library is set when PCRE2 is built, but
parentheses in a pattern. Breaching the limit causes a compilation er-
ror. The default for the library is set when PCRE2 is built, but
pcre2test sets its own default of 220, which is required for running
the standard test suite.
@ -886,13 +886,13 @@ PATTERN MODIFIERS
buffer is too small for the error message. If this modifier has not
been set, a large buffer is used.
The aftertext and allaftertext subject modifiers work as described
below. All other modifiers are either ignored, with a warning message,
or cause an error.
The aftertext and allaftertext subject modifiers work as described be-
low. All other modifiers are either ignored, with a warning message, or
cause an error.
The pattern is passed to regcomp() as a zero-terminated string by
default, but if the use_length or hex modifiers are set, the REG_PEND
extension is used to pass it by length.
The pattern is passed to regcomp() as a zero-terminated string by de-
fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
tension is used to pass it by length.
Testing the stack guard feature
@ -920,8 +920,8 @@ PATTERN MODIFIERS
2 a set of tables defining ISO 8859 characters
In table 2, some characters whose codes are greater than 128 are iden-
tified as letters, digits, spaces, etc. Setting alternate character
tables and a locale are mutually exclusive.
tified as letters, digits, spaces, etc. Setting alternate character ta-
bles and a locale are mutually exclusive.
Setting certain match controls
@ -971,12 +971,12 @@ PATTERN MODIFIERS
terns" below. If pushcopy is used instead of push, a copy of the com-
piled pattern is stacked, leaving the original as current, ready to
match the following input lines. This provides a way of testing the
pcre2_code_copy() function. The push and pushcopy modifiers are
incompatible with compilation modifiers such as global that act at
match time. Any that are specified are ignored (for the stacked copy),
with a warning message, except for replace, which causes an error. Note
that jitverify, which is allowed, does not carry through to any subse-
quent matching that uses a stacked pattern.
pcre2_code_copy() function. The push and pushcopy modifiers are in-
compatible with compilation modifiers such as global that act at match
time. Any that are specified are ignored (for the stacked copy), with a
warning message, except for replace, which causes an error. Note that
jitverify, which is allowed, does not carry through to any subsequent
matching that uses a stacked pattern.
Testing foreign pattern conversion
@ -1124,12 +1124,12 @@ SUBJECT MODIFIERS
The allusedtext modifier requests that all the text that was consulted
during a successful pattern match by the interpreter should be shown.
This feature is not supported for JIT matching, and if requested with
JIT it is ignored (with a warning message). Setting this modifier
affects the output if there is a lookbehind at the start of a match, or
a lookahead at the end, or if \K is used in the pattern. Characters
that precede or follow the start and end of the actual match are indi-
cated in the output by '<' or '>' characters underneath them. Here is
an example:
JIT it is ignored (with a warning message). Setting this modifier af-
fects the output if there is a lookbehind at the start of a match, or a
lookahead at the end, or if \K is used in the pattern. Characters that
precede or follow the start and end of the actual match are indicated
in the output by '<' or '>' characters underneath them. Here is an ex-
ample:
re> /(?<=pqr)abc(?=xyz)/
data> 123pqrabcxyz456\=allusedtext
@ -1145,8 +1145,8 @@ SUBJECT MODIFIERS
string. The only time when this occurs is when \K has been processed as
part of the match. In this situation, the output for the matched string
is displayed from the starting character instead of from the match
point, with circumflex characters under the earlier characters. For
example:
point, with circumflex characters under the earlier characters. For ex-
ample:
re> /abc\Kxyz/
data> abcxyz\=startchar
@ -1171,12 +1171,12 @@ SUBJECT MODIFIERS
The allvector modifier requests that the entire ovector be shown, what-
ever the outcome of the match. Compare allcaptures, which shows only up
to the maximum number of capture groups for the pattern, and then only
for a successful complete non-DFA match. This modifier, which acts
after any match result, and also for DFA matching, provides a means of
for a successful complete non-DFA match. This modifier, which acts af-
ter any match result, and also for DFA matching, provides a means of
checking that there are no unexpected modifications to ovector fields.
Before each match attempt, the ovector is filled with a special value,
and if this is found in both elements of a capturing pair,
"<unchanged>" is output. After a successful match, this applies to all
and if this is found in both elements of a capturing pair, "<un-
changed>" is output. After a successful match, this applies to all
groups after the maximum capture group for the pattern. In other cases
it applies to the entire ovector. After a partial match, the first two
elements are the only ones that should be set. After a DFA match, the
@ -1207,12 +1207,12 @@ SUBJECT MODIFIERS
If an empty string is matched, the next match is done with the
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
for another, non-empty, match at the same point in the subject. If this
match fails, the start offset is advanced, and the normal match is
retried. This imitates the way Perl handles such cases when using the
/g modifier or the split() function. Normally, the start offset is
advanced by one character, but if the newline convention recognizes
CRLF as a newline, and the current character is CR followed by LF, an
advance of two characters occurs.
match fails, the start offset is advanced, and the normal match is re-
tried. This imitates the way Perl handles such cases when using the /g
modifier or the split() function. Normally, the start offset is ad-
vanced by one character, but if the newline convention recognizes CRLF
as a newline, and the current character is CR followed by LF, an ad-
vance of two characters occurs.
Testing substring extraction functions
@ -1275,8 +1275,8 @@ SUBJECT MODIFIERS
than 256 characters) for substitution tests, as fixed-size buffers are
used. To make it easy to test for buffer overflow, if the replacement
string starts with a number in square brackets, that number is passed
to pcre2_substitute() as the size of the output buffer, with the
replacement string starting at the next character. Here is an example
to pcre2_substitute() as the size of the output buffer, with the re-
placement string starting at the next character. Here is an example
that tests the edge case:
/abc/
@ -1285,10 +1285,10 @@ SUBJECT MODIFIERS
123abc123\=replace=[9]XYZ
Failed: error -47: no more memory
The default action of pcre2_substitute() is to return
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
stitute_overflow_length modifier), pcre2_substitute() continues to go
The default action of pcre2_substitute() is to return PCRE2_ER-
ROR_NOMEMORY when the output buffer is too small. However, if the
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
tute_overflow_length modifier), pcre2_substitute() continues to go
through the motions of matching and substituting (but not doing any
callouts), in order to compute the size of buffer that is required.
When this happens, pcre2test shows the required buffer length (which
@ -1323,8 +1323,8 @@ SUBJECT MODIFIERS
Then are listed the offsets of the old substring, its contents, and the
same for the replacement.
By default, the substitution callout function returns zero, which
accepts the replacement and causes matching to continue if /g was used.
By default, the substitution callout function returns zero, which ac-
cepts the replacement and causes matching to continue if /g was used.
Two further modifiers can be used to test other return values. If sub-
stitute_skip is set to a value greater than zero the callout function
returns +1 for the match of that number, and similarly substitute_stop
@ -1411,8 +1411,8 @@ SUBJECT MODIFIERS
The memory modifier causes pcre2test to log the sizes of all heap mem-
ory allocation and freeing calls that occur during a call to
pcre2_match() or pcre2_dfa_match(). These occur only when a match
requires a bigger vector than the default for remembering backtracking
pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
quires a bigger vector than the default for remembering backtracking
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
In many cases there will be no heap memory used and therefore no addi-
tional output. No heap memory is allocated during matching with JIT, so
@ -1435,9 +1435,9 @@ SUBJECT MODIFIERS
Setting the size of the output vector
The ovector modifier applies only to the subject line in which it
appears, though of course it can also be used to set a default in a
#subject command. It specifies the number of pairs of offsets that are
The ovector modifier applies only to the subject line in which it ap-
pears, though of course it can also be used to set a default in a #sub-
ject command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15.
A value of zero is useful when testing the POSIX API because it causes
@ -1491,12 +1491,12 @@ DEFAULT OUTPUT FROM pcre2test
When a match succeeds, pcre2test outputs the list of captured sub-
strings, starting with number 0 for the string that matched the whole
pattern. Otherwise, it outputs "No match" when the return is
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
this is the entire substring that was inspected during the partial
match; it may include characters before the actual match start if a
lookbehind assertion, \K, \b, or \B was involved.)
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
ROR_NOMATCH, or "Partial match:" followed by the partially matching
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
the entire substring that was inspected during the partial match; it
may include characters before the actual match start if a lookbehind
assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error number
and a short descriptive phrase. If the error is a failed UTF string
@ -1541,8 +1541,8 @@ DEFAULT OUTPUT FROM pcre2test
0: cat
0+ aract
If global matching is requested, the results of successive matching
attempts are output in sequence, like this:
If global matching is requested, the results of successive matching at-
tempts are output in sequence, like this:
re> /\Bi(\w\w)/g
data> Mississippi
@ -1580,12 +1580,12 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
2: tan
Using the normal matching function on this data finds only "tang". The
longest matching string is always given first (and numbered zero).
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
followed by the partially matching substring. Note that this is the
entire substring that was inspected during the partial match; it may
include characters before the actual match start if a lookbehind asser-
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
longest matching string is always given first (and numbered zero). Af-
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
lowed by the partially matching substring. Note that this is the entire
substring that was inspected during the partial match; it may include
characters before the actual match start if a lookbehind assertion, \b,
or \B was involved. (\K is not supported for DFA matching.)
If global matching is requested, the search for further matches resumes
at the end of the longest match. For example:
@ -1638,12 +1638,12 @@ CALLOUTS
--->pqrabcdef
0 ^ ^ \d
This output indicates that callout number 0 occurred for a match
attempt starting at the fourth character of the subject string, when
the pointer was at the seventh character, and when the next pattern
item was \d. Just one circumflex is output if the start and current
positions are the same, or if the current position precedes the start
position, which can happen if the callout is in a lookbehind assertion.
This output indicates that callout number 0 occurred for a match at-
tempt starting at the fourth character of the subject string, when the
pointer was at the seventh character, and when the next pattern item
was \d. Just one circumflex is output if the start and current posi-
tions are the same, or if the current position precedes the start posi-
tion, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the auto_callout pattern modifier. In this case, instead of
@ -1660,8 +1660,8 @@ CALLOUTS
0: E*
If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For
example:
ever a change of latest mark is passed to the callout function. For ex-
ample:
re> /a(*MARK:X)bc/auto_callout
data> abc
@ -1683,8 +1683,8 @@ CALLOUTS
The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output
before the reflection of the subject string, and the subject string is
the callout string and its offset in the pattern string are output be-
fore the reflection of the subject string, and the subject string is
reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/
@ -1800,9 +1800,9 @@ NON-PRINTING CHARACTERS
When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been
set for the pattern (using the locale modifier). In this case, the
isprint() function is used to distinguish printing and non-printing
characters.
set for the pattern (using the locale modifier). In this case, the is-
print() function is used to distinguish printing and non-printing char-
acters.
SAVING AND RESTORING COMPILED PATTERNS
@ -1814,14 +1814,14 @@ SAVING AND RESTORING COMPILED PATTERNS
have the same endianness, pointer width and PCRE2_SIZE type. Before
compiled patterns can be saved they must be serialized, that is, con-
verted to a stream of bytes. A single byte stream may contain any num-
ber of compiled patterns, but they must all use the same character
tables. A single copy of the tables is included in the byte stream (its
ber of compiled patterns, but they must all use the same character ta-
bles. A single copy of the tables is included in the byte stream (its
size is 1088 bytes).
The functions whose names begin with pcre2_serialize_ are used for
serializing and de-serializing. They are described in the pcre2serial-
ize documentation. In this section we describe the features of
pcre2test that can be used to test these functions.
The functions whose names begin with pcre2_serialize_ are used for se-
rializing and de-serializing. They are described in the pcre2serialize
documentation. In this section we describe the features of pcre2test
that can be used to test these functions.
Note that "serialization" in PCRE2 does not convert compiled patterns
to an abstract format like Java or .NET. It just makes a reloadable
@ -1831,8 +1831,8 @@ SAVING AND RESTORING COMPILED PATTERNS
piled, it is pushed onto a stack of compiled patterns, and pcre2test
expects the next line to contain a new pattern (or command) instead of
a subject line. By contrast, the pushcopy modifier causes a copy of the
compiled pattern to be stacked, leaving the original available for
immediate matching. By using push and/or pushcopy, a number of patterns
compiled pattern to be stacked, leaving the original available for im-
mediate matching. By using push and/or pushcopy, a number of patterns
can be compiled and retained. These modifiers are incompatible with
posix, and control modifiers that act at match time are ignored (with a
message) for the stacked patterns. The jitverify modifier applies only
@ -1855,8 +1855,8 @@ SAVING AND RESTORING COMPILED PATTERNS
matched with the pattern, terminated as usual by an empty line or end
of file. This command may be followed by a modifier list containing
only control modifiers that act after a pattern has been compiled. In
particular, hex, posix, posix_nosub, push, and pushcopy are not
allowed, nor are any option-setting modifiers. The JIT modifiers are,
particular, hex, posix, posix_nosub, push, and pushcopy are not al-
lowed, nor are any option-setting modifiers. The JIT modifiers are,
however permitted. Here is an example that saves and reloads two pat-
terns.