Documentation update.

This commit is contained in:
Philip.Hazel 2019-06-22 16:36:15 +00:00
parent a89423624d
commit c6ee84317d
6 changed files with 2699 additions and 2715 deletions

View File

@ -3525,9 +3525,10 @@ first match attempt, the second attempt would start at the second character
instead of skipping on to "c". instead of skipping on to "c".
</P> </P>
<P> <P>
If (*SKIP) is used inside a lookbehind to specify a new starting position that If (*SKIP) is used to specify a new starting position that is the same as the
is not later than the starting point of the current match, the position starting position of the current match, or (by being inside a lookbehind)
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs. earlier, the position specified by (*SKIP) is ignored, and instead the normal
"bumpalong" occurs.
<pre> <pre>
(*SKIP:NAME) (*SKIP:NAME)
</pre> </pre>
@ -3754,7 +3755,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br> <br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 June 2019 Last updated: 22 June 2019
<br> <br>
Copyright &copy; 1997-2019 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>

View File

@ -16,8 +16,8 @@ DESCRIPTION
pcre2-config returns the configuration of the installed PCRE2 libraries pcre2-config returns the configuration of the installed PCRE2 libraries
and the options required to compile a program to use them. Some of the and the options required to compile a program to use them. Some of the
options apply only to the 8-bit, or 16-bit, or 32-bit libraries, options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re-
respectively, and are not available for libraries that have not been spectively, and are not available for libraries that have not been
built. If an unavailable option is encountered, the "usage" information built. If an unavailable option is encountered, the "usage" information
is output. is output.
@ -36,30 +36,30 @@ OPTIONS
--version Writes the version number of the installed PCRE2 libraries to --version Writes the version number of the installed PCRE2 libraries to
the standard output. the standard output.
--libs8 Writes to the standard output the command line options --libs8 Writes to the standard output the command line options re-
required to link with the 8-bit PCRE2 library (-lpcre2-8 on quired to link with the 8-bit PCRE2 library (-lpcre2-8 on
many systems). many systems).
--libs16 Writes to the standard output the command line options --libs16 Writes to the standard output the command line options re-
required to link with the 16-bit PCRE2 library (-lpcre2-16 on quired to link with the 16-bit PCRE2 library (-lpcre2-16 on
many systems). many systems).
--libs32 Writes to the standard output the command line options --libs32 Writes to the standard output the command line options re-
required to link with the 32-bit PCRE2 library (-lpcre2-32 on quired to link with the 32-bit PCRE2 library (-lpcre2-32 on
many systems). many systems).
--libs-posix --libs-posix
Writes to the standard output the command line options Writes to the standard output the command line options re-
required to link with PCRE2's POSIX API wrapper library quired to link with PCRE2's POSIX API wrapper library
(-lpcre2-posix -lpcre2-8 on many systems). (-lpcre2-posix -lpcre2-8 on many systems).
--cflags Writes to the standard output the command line options --cflags Writes to the standard output the command line options re-
required to compile files that use PCRE2 (this may include quired to compile files that use PCRE2 (this may include some
some -I options, but is blank on many systems). -I options, but is blank on many systems).
--cflags-posix --cflags-posix
Writes to the standard output the command line options Writes to the standard output the command line options re-
required to compile files that use PCRE2's POSIX API wrapper quired to compile files that use PCRE2's POSIX API wrapper
library (this may include some -I options, but is blank on library (this may include some -I options, but is blank on
many systems). many systems).

File diff suppressed because it is too large Load Diff

View File

@ -12,11 +12,11 @@ SYNOPSIS
DESCRIPTION DESCRIPTION
pcre2grep searches files for character patterns, in the same way as pcre2grep searches files for character patterns, in the same way as
other grep commands do, but it uses the PCRE2 regular expression other grep commands do, but it uses the PCRE2 regular expression li-
library to support patterns that are compatible with the regular brary to support patterns that are compatible with the regular expres-
expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of
of pattern syntax, or pcre2pattern(3) for a full description of the pattern syntax, or pcre2pattern(3) for a full description of the syntax
syntax and semantics of the regular expressions that PCRE2 supports. and semantics of the regular expressions that PCRE2 supports.
Patterns, whether supplied on the command line or in a separate file, Patterns, whether supplied on the command line or in a separate file,
are given without delimiters. For example: are given without delimiters. For example:
@ -26,8 +26,8 @@ DESCRIPTION
If you attempt to use delimiters (for example, by surrounding a pattern If you attempt to use delimiters (for example, by surrounding a pattern
with slashes, as is common in Perl scripts), they are interpreted as with slashes, as is common in Perl scripts), they are interpreted as
part of the pattern. Quotes can of course be used to delimit patterns part of the pattern. Quotes can of course be used to delimit patterns
on the command line because they are interpreted by the shell, and on the command line because they are interpreted by the shell, and in-
indeed quotes are required if a pattern contains white space or shell deed quotes are required if a pattern contains white space or shell
metacharacters. metacharacters.
The first argument that follows any option settings is treated as the The first argument that follows any option settings is treated as the
@ -54,8 +54,8 @@ DESCRIPTION
controlled by parameters that can be set by the --buffer-size and controlled by parameters that can be set by the --buffer-size and
--max-buffer-size options. The first of these sets the size of buffer --max-buffer-size options. The first of these sets the size of buffer
that is obtained at the start of processing. If an input file contains that is obtained at the start of processing. If an input file contains
very long lines, a larger buffer may be needed; this is handled by very long lines, a larger buffer may be needed; this is handled by au-
automatically extending the buffer, up to the limit specified by --max- tomatically extending the buffer, up to the limit specified by --max-
buffer-size. The default values for these parameters can be set when buffer-size. The default values for these parameters can be set when
pcre2grep is built; if nothing is specified, the defaults are set to pcre2grep is built; if nothing is specified, the defaults are set to
20KiB and 1MiB respectively. An error occurs if a line is too long and 20KiB and 1MiB respectively. An error occurs if a line is too long and
@ -75,12 +75,12 @@ DESCRIPTION
By default, as soon as one pattern matches a line, no further patterns By default, as soon as one pattern matches a line, no further patterns
are considered. However, if --colour (or --color) is used to colour the are considered. However, if --colour (or --color) is used to colour the
matching substrings, or if --only-matching, --file-offsets, or --line- matching substrings, or if --only-matching, --file-offsets, or --line-
offsets is used to output only the part of the line that matched offsets is used to output only the part of the line that matched (ei-
(either shown literally, or as an offset), scanning resumes immediately ther shown literally, or as an offset), scanning resumes immediately
following the match, so that further matches on the same line can be following the match, so that further matches on the same line can be
found. If there are multiple patterns, they are all tried on the found. If there are multiple patterns, they are all tried on the re-
remainder of the line, but patterns that follow the one that matched mainder of the line, but patterns that follow the one that matched are
are not tried on the earlier part of the line. not tried on the earlier part of the line.
This behaviour means that the order in which multiple patterns are This behaviour means that the order in which multiple patterns are
specified can affect the output when one of the above options is used. specified can affect the output when one of the above options is used.
@ -89,11 +89,11 @@ DESCRIPTION
overlap). overlap).
Patterns that can match an empty string are accepted, but empty string Patterns that can match an empty string are accepted, but empty string
matches are never recognized. An example is the pattern matches are never recognized. An example is the pattern "(su-
"(super)?(man)?", in which all components are optional. This pattern per)?(man)?", in which all components are optional. This pattern finds
finds all occurrences of both "super" and "man"; the output differs all occurrences of both "super" and "man"; the output differs from
from matching with "super|man" when only the matching substrings are matching with "super|man" when only the matching substrings are being
being shown. shown.
If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
the value to set a locale when calling the PCRE2 library. The --locale the value to set a locale when calling the PCRE2 library. The --locale
@ -116,10 +116,9 @@ BINARY FILES
By default, a file that contains a binary zero byte within the first By default, a file that contains a binary zero byte within the first
1024 bytes is identified as a binary file, and is processed specially. 1024 bytes is identified as a binary file, and is processed specially.
(GNU grep identifies binary files in this manner.) However, if the new- (GNU grep identifies binary files in this manner.) However, if the new-
line type is specified as "nul", that is, the line terminator is a line type is specified as "nul", that is, the line terminator is a bi-
binary zero, the test for a binary file is not applied. See the nary zero, the test for a binary file is not applied. See the --binary-
--binary-files option for a means of changing the way binary files are files option for a means of changing the way binary files are handled.
handled.
BINARY ZEROS IN PATTERNS BINARY ZEROS IN PATTERNS
@ -148,12 +147,12 @@ OPTIONS
Output up to number lines of context after each matching Output up to number lines of context after each matching
line. Fewer lines are output if the next match or the end of line. Fewer lines are output if the next match or the end of
the file is reached, or if the processing buffer size has the file is reached, or if the processing buffer size has
been set too small. If file names and/or line numbers are been set too small. If file names and/or line numbers are be-
being output, a hyphen separator is used instead of a colon ing output, a hyphen separator is used instead of a colon for
for the context lines. A line containing "--" is output the context lines. A line containing "--" is output between
between each group of lines, unless they are in fact contigu- each group of lines, unless they are in fact contiguous in
ous in the input file. The value of number is expected to be the input file. The value of number is expected to be rela-
relatively small. When -c is used, -A is ignored. tively small. When -c is used, -A is ignored.
-a, --text -a, --text
Treat binary files as text. This is equivalent to --binary- Treat binary files as text. This is equivalent to --binary-
@ -164,26 +163,26 @@ OPTIONS
line. Fewer lines are output if the previous match or the line. Fewer lines are output if the previous match or the
start of the file is within number lines, or if the process- start of the file is within number lines, or if the process-
ing buffer size has been set too small. If file names and/or ing buffer size has been set too small. If file names and/or
line numbers are being output, a hyphen separator is used line numbers are being output, a hyphen separator is used in-
instead of a colon for the context lines. A line containing stead of a colon for the context lines. A line containing
"--" is output between each group of lines, unless they are "--" is output between each group of lines, unless they are
in fact contiguous in the input file. The value of number is in fact contiguous in the input file. The value of number is
expected to be relatively small. When -c is used, -B is expected to be relatively small. When -c is used, -B is ig-
ignored. nored.
--binary-files=word --binary-files=word
Specify how binary files are to be processed. If the word is Specify how binary files are to be processed. If the word is
"binary" (the default), pattern matching is performed on "binary" (the default), pattern matching is performed on bi-
binary files, but the only output is "Binary file <name> nary files, but the only output is "Binary file <name>
matches" when a match succeeds. If the word is "text", which matches" when a match succeeds. If the word is "text", which
is equivalent to the -a or --text option, binary files are is equivalent to the -a or --text option, binary files are
processed in the same way as any other file. In this case, processed in the same way as any other file. In this case,
when a match succeeds, the output may be binary garbage, when a match succeeds, the output may be binary garbage,
which can have nasty effects if sent to a terminal. If the which can have nasty effects if sent to a terminal. If the
word is "without-match", which is equivalent to the -I word is "without-match", which is equivalent to the -I op-
option, binary files are not processed at all; they are tion, binary files are not processed at all; they are assumed
assumed not to be of interest and are skipped without causing not to be of interest and are skipped without causing any
any output or affecting the return code. output or affecting the return code.
--buffer-size=number --buffer-size=number
Set the parameter that controls how much memory is obtained Set the parameter that controls how much memory is obtained
@ -208,10 +207,10 @@ OPTIONS
If no lines are selected, the number zero is output. If sev- If no lines are selected, the number zero is output. If sev-
eral files are are being scanned, a count is output for each eral files are are being scanned, a count is output for each
of them and the -t option can be used to cause a total to be of them and the -t option can be used to cause a total to be
output at the end. However, if the --files-with-matches output at the end. However, if the --files-with-matches op-
option is also used, only those files whose counts are tion is also used, only those files whose counts are greater
greater than zero are listed. When -c is used, the -A, -B, than zero are listed. When -c is used, the -A, -B, and -C op-
and -C options are ignored. tions are ignored.
--colour, --color --colour, --color
If this option is given without any data, it is equivalent to If this option is given without any data, it is equivalent to
@ -238,8 +237,8 @@ OPTIONS
semicolon, except in the case of GREP_COLORS, which must semicolon, except in the case of GREP_COLORS, which must
start with "ms=" or "mt=" followed by two semicolon-separated start with "ms=" or "mt=" followed by two semicolon-separated
colours, terminated by the end of the string or by a colon. colours, terminated by the end of the string or by a colon.
If GREP_COLORS does not start with "ms=" or "mt=" it is If GREP_COLORS does not start with "ms=" or "mt=" it is ig-
ignored, and GREP_COLOR is checked. nored, and GREP_COLOR is checked.
If the string obtained from one of the above variables con- If the string obtained from one of the above variables con-
tains any characters other than semicolon or digits, the set- tains any characters other than semicolon or digits, the set-
@ -250,9 +249,9 @@ OPTIONS
set, the default is "1;31", which gives red. set, the default is "1;31", which gives red.
-D action, --devices=action -D action, --devices=action
If an input path is not a regular file or a directory, If an input path is not a regular file or a directory, "ac-
"action" specifies how it is to be processed. Valid values tion" specifies how it is to be processed. Valid values are
are "read" (the default) or "skip" (silently skip the path). "read" (the default) or "skip" (silently skip the path).
-d action, --directories=action -d action, --directories=action
If an input path is a directory, "action" specifies how it is If an input path is a directory, "action" specifies how it is
@ -261,8 +260,8 @@ OPTIONS
"recurse" (equivalent to the -r option), or "skip" (silently "recurse" (equivalent to the -r option), or "skip" (silently
skip the path, the default in Windows environments). In the skip the path, the default in Windows environments). In the
"read" case, directories are read as if they were ordinary "read" case, directories are read as if they were ordinary
files. In some operating systems the effect of reading a files. In some operating systems the effect of reading a di-
directory like this is an immediate end-of-file; in others it rectory like this is an immediate end-of-file; in others it
may provoke an error. may provoke an error.
--depth-limit=number --depth-limit=number
@ -295,8 +294,8 @@ OPTIONS
whether listed on the command line, obtained from --file- whether listed on the command line, obtained from --file-
list, or by scanning a directory. The pattern is a PCRE2 reg- list, or by scanning a directory. The pattern is a PCRE2 reg-
ular expression, and is matched against the final component ular expression, and is matched against the final component
of the file name, not the entire path. The -F, -w, and -x of the file name, not the entire path. The -F, -w, and -x op-
options do not apply to this pattern. The option may be given tions do not apply to this pattern. The option may be given
any number of times in order to specify multiple patterns. If any number of times in order to specify multiple patterns. If
a file name matches both an --include and an --exclude pat- a file name matches both an --include and an --exclude pat-
tern, it is excluded. There is no short form for this option. tern, it is excluded. There is no short form for this option.
@ -310,29 +309,29 @@ OPTIONS
--exclude-dir=pattern --exclude-dir=pattern
Directories whose names match the pattern are skipped without Directories whose names match the pattern are skipped without
being processed, whatever the setting of the --recursive being processed, whatever the setting of the --recursive op-
option. This applies to all directories, whether listed on tion. This applies to all directories, whether listed on the
the command line, obtained from --file-list, or by scanning a command line, obtained from --file-list, or by scanning a
parent directory. The pattern is a PCRE2 regular expression, parent directory. The pattern is a PCRE2 regular expression,
and is matched against the final component of the directory and is matched against the final component of the directory
name, not the entire path. The -F, -w, and -x options do not name, not the entire path. The -F, -w, and -x options do not
apply to this pattern. The option may be given any number of apply to this pattern. The option may be given any number of
times in order to specify more than one pattern. If a direc- times in order to specify more than one pattern. If a direc-
tory matches both --include-dir and --exclude-dir, it is tory matches both --include-dir and --exclude-dir, it is ex-
excluded. There is no short form for this option. cluded. There is no short form for this option.
-F, --fixed-strings -F, --fixed-strings
Interpret each data-matching pattern as a list of fixed Interpret each data-matching pattern as a list of fixed
strings, separated by newlines, instead of as a regular strings, separated by newlines, instead of as a regular ex-
expression. What constitutes a newline for this purpose is pression. What constitutes a newline for this purpose is con-
controlled by the --newline option. The -w (match as a word) trolled by the --newline option. The -w (match as a word) and
and -x (match whole line) options can be used with -F. They -x (match whole line) options can be used with -F. They ap-
apply to each of the fixed strings. A line is selected if any ply to each of the fixed strings. A line is selected if any
of the fixed strings are found in it (subject to -w or -x, if of the fixed strings are found in it (subject to -w or -x, if
present). This option applies only to the patterns that are present). This option applies only to the patterns that are
matched against the contents of files; it does not apply to matched against the contents of files; it does not apply to
patterns specified by any of the --include or --exclude patterns specified by any of the --include or --exclude op-
options. tions.
-f filename, --file=filename -f filename, --file=filename
Read patterns from the file, one per line, and match them Read patterns from the file, one per line, and match them
@ -360,8 +359,8 @@ OPTIONS
--file-list=filename --file-list=filename
Read a list of files and/or directories that are to be Read a list of files and/or directories that are to be
scanned from the given file, one per line. What constitutes a scanned from the given file, one per line. What constitutes a
newline when reading the file is the operating system's newline when reading the file is the operating system's de-
default. Trailing white space is removed from each line, and fault. Trailing white space is removed from each line, and
blank lines are ignored. These paths are processed before any blank lines are ignored. These paths are processed before any
that are listed on the command line. The file name can be that are listed on the command line. The file name can be
given as "-" to refer to the standard input. If --file and given as "-" to refer to the standard input. If --file and
@ -388,8 +387,8 @@ OPTIONS
is used. If a line number is also being output, it follows is used. If a line number is also being output, it follows
the file name. When the -M option causes a pattern to match the file name. When the -M option causes a pattern to match
more than one line, only the first is preceded by the file more than one line, only the first is preceded by the file
name. This option overrides any previous -h, -l, or -L name. This option overrides any previous -h, -l, or -L op-
options. tions.
-h, --no-filename -h, --no-filename
Suppress the output file names when searching multiple files. Suppress the output file names when searching multiple files.
@ -415,16 +414,16 @@ OPTIONS
--include=pattern --include=pattern
If any --include patterns are specified, the only files that If any --include patterns are specified, the only files that
are processed are those that match one of the patterns (and are processed are those that match one of the patterns (and
do not match an --exclude pattern). This option does not do not match an --exclude pattern). This option does not af-
affect directories, but it applies to all files, whether fect directories, but it applies to all files, whether listed
listed on the command line, obtained from --file-list, or by on the command line, obtained from --file-list, or by scan-
scanning a directory. The pattern is a PCRE2 regular expres- ning a directory. The pattern is a PCRE2 regular expression,
sion, and is matched against the final component of the file and is matched against the final component of the file name,
name, not the entire path. The -F, -w, and -x options do not not the entire path. The -F, -w, and -x options do not apply
apply to this pattern. The option may be given any number of to this pattern. The option may be given any number of times.
times. If a file name matches both an --include and an If a file name matches both an --include and an --exclude
--exclude pattern, it is excluded. There is no short form pattern, it is excluded. There is no short form for this op-
for this option. tion.
--include-from=filename --include-from=filename
Treat each non-empty line of the file as the data for an Treat each non-empty line of the file as the data for an
@ -438,8 +437,8 @@ OPTIONS
tories that are processed are those that match one of the tories that are processed are those that match one of the
patterns (and do not match an --exclude-dir pattern). This patterns (and do not match an --exclude-dir pattern). This
applies to all directories, whether listed on the command applies to all directories, whether listed on the command
line, obtained from --file-list, or by scanning a parent line, obtained from --file-list, or by scanning a parent di-
directory. The pattern is a PCRE2 regular expression, and is rectory. The pattern is a PCRE2 regular expression, and is
matched against the final component of the directory name, matched against the final component of the directory name,
not the entire path. The -F, -w, and -x options do not apply not the entire path. The -F, -w, and -x options do not apply
to this pattern. The option may be given any number of times. to this pattern. The option may be given any number of times.
@ -480,8 +479,8 @@ OPTIONS
flushed by the operating system. This option can be useful flushed by the operating system. This option can be useful
when the input or output is attached to a pipe and you do not when the input or output is attached to a pipe and you do not
want pcre2grep to buffer up large amounts of data. However, want pcre2grep to buffer up large amounts of data. However,
its use will affect performance, and the -M (multiline) its use will affect performance, and the -M (multiline) op-
option ceases to work. When input is from a compressed .gz or tion ceases to work. When input is from a compressed .gz or
.bz2 file, --line-buffered is ignored. .bz2 file, --line-buffered is ignored.
--line-offsets --line-offsets
@ -498,9 +497,9 @@ OPTIONS
--locale=locale-name --locale=locale-name
This option specifies a locale to be used for pattern match- This option specifies a locale to be used for pattern match-
ing. It overrides the value in the LC_ALL or LC_CTYPE envi- ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
ronment variables. If no locale is specified, the PCRE2 ronment variables. If no locale is specified, the PCRE2 li-
library's default (usually the "C" locale) is used. There is brary's default (usually the "C" locale) is used. There is no
no short form for this option. short form for this option.
--match-limit=number --match-limit=number
Processing some regular expression patterns may take a very Processing some regular expression patterns may take a very
@ -509,13 +508,13 @@ OPTIONS
options that set resource limits for matching. options that set resource limits for matching.
The --match-limit option provides a means of limiting comput- The --match-limit option provides a means of limiting comput-
ing resource usage when processing patterns that are not ing resource usage when processing patterns that are not go-
going to match, but which have a very large number of possi- ing to match, but which have a very large number of possibil-
bilities in their search trees. The classic example is a pat- ities in their search trees. The classic example is a pattern
tern that uses nested unlimited repeats. Internally, PCRE2 that uses nested unlimited repeats. Internally, PCRE2 has a
has a counter that is incremented each time around its main counter that is incremented each time around its main pro-
processing loop. If the value set by --match-limit is cessing loop. If the value set by --match-limit is reached,
reached, an error occurs. an error occurs.
The --heap-limit option specifies, as a number of kibibytes The --heap-limit option specifies, as a number of kibibytes
(units of 1024 bytes), the amount of heap memory that may be (units of 1024 bytes), the amount of heap memory that may be
@ -567,10 +566,10 @@ OPTIONS
pcre2grep -M 'regular\s+expression' <file> pcre2grep -M 'regular\s+expression' <file>
The \s escape sequence matches any white space character, The \s escape sequence matches any white space character, in-
including newlines, and is followed by + so as to match cluding newlines, and is followed by + so as to match trail-
trailing white space on the first line as well as possibly ing white space on the first line as well as possibly han-
handling a two-character newline sequence. dling a two-character newline sequence.
There is a limit to the number of lines that can be matched, There is a limit to the number of lines that can be matched,
imposed by the way that pcre2grep buffers the input file as imposed by the way that pcre2grep buffers the input file as
@ -579,30 +578,30 @@ OPTIONS
when input is read line by line (see --line-buffered.) when input is read line by line (see --line-buffered.)
-N newline-type, --newline=newline-type -N newline-type, --newline=newline-type
The PCRE2 library supports five different conventions for The PCRE2 library supports five different conventions for in-
indicating the ends of lines. They are the single-character dicating the ends of lines. They are the single-character se-
sequences CR (carriage return) and LF (linefeed), the two- quences CR (carriage return) and LF (linefeed), the two-char-
character sequence CRLF, an "anycrlf" convention, which rec- acter sequence CRLF, an "anycrlf" convention, which recog-
ognizes any of the preceding three types, and an "any" con- nizes any of the preceding three types, and an "any" conven-
vention, in which any Unicode line ending sequence is assumed tion, in which any Unicode line ending sequence is assumed to
to end a line. The Unicode sequences are the three just men- end a line. The Unicode sequences are the three just men-
tioned, plus VT (vertical tab, U+000B), FF (form feed, tioned, plus VT (vertical tab, U+000B), FF (form feed,
U+000C), NEL (next line, U+0085), LS (line separator, U+000C), NEL (next line, U+0085), LS (line separator,
U+2028), and PS (paragraph separator, U+2029). U+2028), and PS (paragraph separator, U+2029).
When the PCRE2 library is built, a default line-ending When the PCRE2 library is built, a default line-ending se-
sequence is specified. This is normally the standard quence is specified. This is normally the standard sequence
sequence for the operating system. Unless otherwise specified for the operating system. Unless otherwise specified by this
by this option, pcre2grep uses the library's default. The option, pcre2grep uses the library's default. The possible
possible values for this option are CR, LF, CRLF, ANYCRLF, or values for this option are CR, LF, CRLF, ANYCRLF, or ANY.
ANY. This makes it possible to use pcre2grep to scan files This makes it possible to use pcre2grep to scan files that
that have come from other environments without having to mod- have come from other environments without having to modify
ify their line endings. If the data that is being scanned their line endings. If the data that is being scanned does
does not agree with the convention set by this option, not agree with the convention set by this option, pcre2grep
pcre2grep may behave in strange ways. Note that this option may behave in strange ways. Note that this option does not
does not apply to files specified by the -f, --exclude-from, apply to files specified by the -f, --exclude-from, or --in-
or --include-from options, which are expected to use the clude-from options, which are expected to use the operating
operating system's standard newline sequence. system's standard newline sequence.
-n, --line-number -n, --line-number
Precede each output line by its line number in the file, fol- Precede each output line by its line number in the file, fol-
@ -621,8 +620,8 @@ OPTIONS
-O text, --output=text -O text, --output=text
When there is a match, instead of outputting the whole line When there is a match, instead of outputting the whole line
that matched, output just the given text. This option is that matched, output just the given text. This option is mu-
mutually exclusive with --only-matching, --file-offsets, and tually exclusive with --only-matching, --file-offsets, and
--line-offsets. Escape sequences starting with a dollar char- --line-offsets. Escape sequences starting with a dollar char-
acter may be used to insert the contents of the matched part acter may be used to insert the contents of the matched part
of the line and/or captured substrings into the text. of the line and/or captured substrings into the text.
@ -651,9 +650,9 @@ OPTIONS
of the whole line. In this mode, no context is shown. That of the whole line. In this mode, no context is shown. That
is, the -A, -B, and -C options are ignored. If there is more is, the -A, -B, and -C options are ignored. If there is more
than one match in a line, each of them is shown separately, than one match in a line, each of them is shown separately,
on a separate line of output. If -o is combined with -v on a separate line of output. If -o is combined with -v (in-
(invert the sense of the match to find non-matching lines), vert the sense of the match to find non-matching lines), no
no output is generated, but the return code is set appropri- output is generated, but the return code is set appropri-
ately. If the matched portion of the line is empty, nothing ately. If the matched portion of the line is empty, nothing
is output unless the file name or line number are being is output unless the file name or line number are being
printed, in which case they are shown on an otherwise empty printed, in which case they are shown on an otherwise empty
@ -671,8 +670,8 @@ OPTIONS
-o0 is the same as -o without a number. Because these options -o0 is the same as -o without a number. Because these options
can be given without an argument (see above), if an argument can be given without an argument (see above), if an argument
is present, it must be given in the same shell item, for is present, it must be given in the same shell item, for ex-
example, -o3 or --only-matching=2. The comments given for the ample, -o3 or --only-matching=2. The comments given for the
non-argument case above also apply to this option. If the non-argument case above also apply to this option. If the
specified capturing parentheses do not exist in the pattern, specified capturing parentheses do not exist in the pattern,
or were not set in the match, nothing is output unless the or were not set in the match, nothing is output unless the
@ -704,8 +703,8 @@ OPTIONS
it contains, taking note of any --include and --exclude set- it contains, taking note of any --include and --exclude set-
tings. By default, a directory is read as a normal file; in tings. By default, a directory is read as a normal file; in
some operating systems this gives an immediate end-of-file. some operating systems this gives an immediate end-of-file.
This option is a shorthand for setting the -d option to This option is a shorthand for setting the -d option to "re-
"recurse". curse".
--recursion-limit=number --recursion-limit=number
See --match-limit above. See --match-limit above.
@ -719,8 +718,8 @@ OPTIONS
This option is useful when scanning more than one file. If This option is useful when scanning more than one file. If
used on its own, -t suppresses all output except for a grand used on its own, -t suppresses all output except for a grand
total number of matching lines (or non-matching lines if -v total number of matching lines (or non-matching lines if -v
is used) in all the files. If -t is used with -c, a grand is used) in all the files. If -t is used with -c, a grand to-
total is output except when the previous output is just one tal is output except when the previous output is just one
line. In other words, it is not output when just one file's line. In other words, it is not output when just one file's
count is listed. If file names are being output, the grand count is listed. If file names are being output, the grand
total is preceded by "TOTAL:". Otherwise, it appears as just total is preceded by "TOTAL:". Otherwise, it appears as just
@ -773,10 +772,10 @@ OPTIONS
ENVIRONMENT VARIABLES ENVIRONMENT VARIABLES
The environment variables LC_ALL and LC_CTYPE are examined, in that The environment variables LC_ALL and LC_CTYPE are examined, in that or-
order, for a locale. The first one that is set is used. This can be der, for a locale. The first one that is set is used. This can be over-
overridden by the --locale option. If no locale is set, the PCRE2 ridden by the --locale option. If no locale is set, the PCRE2 library's
library's default (usually the "C" locale) is used. default (usually the "C" locale) is used.
NEWLINES NEWLINES
@ -834,13 +833,13 @@ OPTIONS WITH DATA
--file /some/file --file /some/file
Note, however, that if you want to supply a file name beginning with ~ Note, however, that if you want to supply a file name beginning with ~
as data in a shell command, and have the shell expand ~ to a home as data in a shell command, and have the shell expand ~ to a home di-
directory, you must separate the file name from the option, because the rectory, you must separate the file name from the option, because the
shell does not treat ~ specially unless it is at the start of an item. shell does not treat ~ specially unless it is at the start of an item.
The exceptions to the above are the --colour (or --color) and --only- The exceptions to the above are the --colour (or --color) and --only-
matching options, for which the data is optional. If one of these matching options, for which the data is optional. If one of these op-
options does have data, it must be given in the first form, using an tions does have data, it must be given in the first form, using an
equals character. Otherwise pcre2grep will assume that it has no data. equals character. Otherwise pcre2grep will assume that it has no data.
@ -850,8 +849,8 @@ USING PCRE2'S CALLOUT FACILITY
scripts or echoing specific strings during matching by making use of scripts or echoing specific strings during matching by making use of
PCRE2's callout facility. However, this support can be completely or PCRE2's callout facility. However, this support can be completely or
partially disabled when pcre2grep is built. You can find out whether partially disabled when pcre2grep is built. You can find out whether
your binary has support for callouts by running it with the --help your binary has support for callouts by running it with the --help op-
option. If callout support is completely disabled, all callouts in pat- tion. If callout support is completely disabled, all callouts in pat-
terns are ignored by pcre2grep. If the facility is partially disabled, terns are ignored by pcre2grep. If the facility is partially disabled,
calling external programs is not supported, and callouts that request calling external programs is not supported, and callouts that request
it are ignored. it are ignored.
@ -875,16 +874,16 @@ USING PCRE2'S CALLOUT FACILITY
executable_name|arg1|arg2|... executable_name|arg1|arg2|...
Any substring (including the executable name) may contain escape Any substring (including the executable name) may contain escape se-
sequences started by a dollar character: $<digits> or ${<digits>} is quences started by a dollar character: $<digits> or ${<digits>} is re-
replaced by the captured substring of the given decimal number, which placed by the captured substring of the given decimal number, which
must be greater than zero. If the number is greater than the number of must be greater than zero. If the number is greater than the number of
capturing substrings, or if the capture is unset, the replacement is capturing substrings, or if the capture is unset, the replacement is
empty. empty.
Any other character is substituted by itself. In particular, $$ is Any other character is substituted by itself. In particular, $$ is re-
replaced by a single dollar and $| is replaced by a pipe character. placed by a single dollar and $| is replaced by a pipe character. Here
Here is an example: is an example:
echo -e "abcde\n12345" | pcre2grep \ echo -e "abcde\n12345" | pcre2grep \
'(?x)(.)(..(.)) '(?x)(.)(..(.))
@ -914,10 +913,10 @@ USING PCRE2'S CALLOUT FACILITY
to the output, having been passed through the same escape processing as to the output, having been passed through the same escape processing as
text from the --output option. This provides a simple echoing facility text from the --output option. This provides a simple echoing facility
that avoids calling an external program or script. No terminator is that avoids calling an external program or script. No terminator is
added to the string, so if you want a newline, you must include it added to the string, so if you want a newline, you must include it ex-
explicitly. Matching continues normally after the string is output. If plicitly. Matching continues normally after the string is output. If
you want to see only the callout output but not any output from an you want to see only the callout output but not any output from an ac-
actual match, you should end the relevant pattern with (*FAIL). tual match, you should end the relevant pattern with (*FAIL).
MATCHING ERRORS MATCHING ERRORS
@ -925,8 +924,8 @@ MATCHING ERRORS
It is possible to supply a regular expression that takes a very long It is possible to supply a regular expression that takes a very long
time to fail to match certain lines. Such patterns normally involve time to fail to match certain lines. Such patterns normally involve
nested indefinite repeats, for example: (a+)*\d when matched against a nested indefinite repeats, for example: (a+)*\d when matched against a
line of a's with no final digit. The PCRE2 matching function has a line of a's with no final digit. The PCRE2 matching function has a re-
resource limit that causes it to abort in these circumstances. If this source limit that causes it to abort in these circumstances. If this
happens, pcre2grep outputs an error message and the line that caused happens, pcre2grep outputs an error message and the line that caused
the problem to the standard error stream. If there are more than 20 the problem to the standard error stream. If there are more than 20
such errors, pcre2grep gives up. such errors, pcre2grep gives up.

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34" .TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -3564,9 +3564,10 @@ effect as this example; although it would suppress backtracking during the
first match attempt, the second attempt would start at the second character first match attempt, the second attempt would start at the second character
instead of skipping on to "c". instead of skipping on to "c".
.P .P
If (*SKIP) is used inside a lookbehind to specify a new starting position that If (*SKIP) is used to specify a new starting position that is the same as the
is not later than the starting point of the current match, the position starting position of the current match, or (by being inside a lookbehind)
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs. earlier, the position specified by (*SKIP) is ignored, and instead the normal
"bumpalong" occurs.
.sp .sp
(*SKIP:NAME) (*SKIP:NAME)
.sp .sp
@ -3787,6 +3788,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 21 June 2019 Last updated: 22 June 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -13,8 +13,8 @@ SYNOPSIS
but it can also be used for experimenting with regular expressions. but it can also be used for experimenting with regular expressions.
This document describes the features of the test program; for details This document describes the features of the test program; for details
of the regular expressions themselves, see the pcre2pattern documenta- of the regular expressions themselves, see the pcre2pattern documenta-
tion. For details of the PCRE2 library function calls and their tion. For details of the PCRE2 library function calls and their op-
options, see the pcre2api documentation. tions, see the pcre2api documentation.
The input for pcre2test is a sequence of regular expression patterns The input for pcre2test is a sequence of regular expression patterns
and subject strings to be matched. There are also command lines for and subject strings to be matched. There are also command lines for
@ -33,26 +33,26 @@ SYNOPSIS
which are specifically designed for use in conjunction with the test which are specifically designed for use in conjunction with the test
script and data files that are distributed as part of PCRE2. All the script and data files that are distributed as part of PCRE2. All the
modifiers are documented here, some without much justification, but modifiers are documented here, some without much justification, but
many of them are unlikely to be of use except when testing the many of them are unlikely to be of use except when testing the li-
libraries. braries.
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
Different versions of the PCRE2 library can be built to support charac- Different versions of the PCRE2 library can be built to support charac-
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units. ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
One, two, or all three of these libraries may be simultaneously One, two, or all three of these libraries may be simultaneously in-
installed. The pcre2test program can be used to test all the libraries. stalled. The pcre2test program can be used to test all the libraries.
However, its own input and output are always in 8-bit format. When However, its own input and output are always in 8-bit format. When
testing the 16-bit or 32-bit libraries, patterns and subject strings testing the 16-bit or 32-bit libraries, patterns and subject strings
are converted to 16-bit or 32-bit format before being passed to the are converted to 16-bit or 32-bit format before being passed to the li-
library functions. Results are converted back to 8-bit code units for brary functions. Results are converted back to 8-bit code units for
output. output.
In the rest of this document, the names of library functions and struc- In the rest of this document, the names of library functions and struc-
tures are given in generic form, for example, pcre_compile(). The tures are given in generic form, for example, pcre_compile(). The ac-
actual names used in the libraries have a suffix _8, _16, or _32, as tual names used in the libraries have a suffix _8, _16, or _32, as ap-
appropriate. propriate.
INPUT ENCODING INPUT ENCODING
@ -70,18 +70,18 @@ INPUT ENCODING
processed for backslash escapes, which makes it possible to include any processed for backslash escapes, which makes it possible to include any
data value in strings that are passed to the library for matching. For data value in strings that are passed to the library for matching. For
patterns, there is a facility for specifying some or all of the 8-bit patterns, there is a facility for specifying some or all of the 8-bit
input characters as hexadecimal pairs, which makes it possible to input characters as hexadecimal pairs, which makes it possible to in-
include binary zeros. clude binary zeros.
Input for the 16-bit and 32-bit libraries Input for the 16-bit and 32-bit libraries
When testing the 16-bit or 32-bit libraries, there is a need to be able When testing the 16-bit or 32-bit libraries, there is a need to be able
to generate character code points greater than 255 in the strings that to generate character code points greater than 255 in the strings that
are passed to the library. For subject lines, backslash escapes can be are passed to the library. For subject lines, backslash escapes can be
used. In addition, when the utf modifier (see "Setting compilation used. In addition, when the utf modifier (see "Setting compilation op-
options" below) is set, the pattern and any following subject lines are tions" below) is set, the pattern and any following subject lines are
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
appropriate. propriate.
For non-UTF testing of wide characters, the utf8_input modifier can be For non-UTF testing of wide characters, the utf8_input modifier can be
used. This is mutually exclusive with utf, and is allowed only in used. This is mutually exclusive with utf, and is allowed only in
@ -121,8 +121,8 @@ COMMAND LINE OPTIONS
piled. piled.
-AC As for -ac, but in addition behave as if each subject line -AC As for -ac, but in addition behave as if each subject line
has the callout_extra modifier, that is, show additional has the callout_extra modifier, that is, show additional in-
information from callouts. formation from callouts.
-b Behave as if each pattern has the fullbincode modifier; the -b Behave as if each pattern has the fullbincode modifier; the
full internal binary form of the pattern is output after com- full internal binary form of the pattern is output after com-
@ -130,9 +130,9 @@ COMMAND LINE OPTIONS
-C Output the version number of the PCRE2 library, and all -C Output the version number of the PCRE2 library, and all
available information about the optional features that are available information about the optional features that are
included, and then exit with zero exit code. All other included, and then exit with zero exit code. All other op-
options are ignored. If both -C and -LM are present, which- tions are ignored. If both -C and -LM are present, whichever
ever is first is recognized. is first is recognized.
-C option Output information about a specific build-time option, then -C option Output information about a specific build-time option, then
exit. This functionality is intended for use in scripts such exit. This functionality is intended for use in scripts such
@ -269,8 +269,8 @@ DESCRIPTION
supply them explicitly. supply them explicitly.
An empty line or the end of the file signals the end of the subject An empty line or the end of the file signals the end of the subject
lines for a test, at which point a new pattern or command line is lines for a test, at which point a new pattern or command line is ex-
expected if there is still input to be read. pected if there is still input to be read.
COMMAND LINES COMMAND LINES
@ -311,8 +311,8 @@ COMMAND LINES
as indicating a newline in a pattern or subject string. The default can as indicating a newline in a pattern or subject string. The default can
be overridden when a pattern is compiled. The standard test files con- be overridden when a pattern is compiled. The standard test files con-
tain tests of various newline conventions, but the majority of the tain tests of various newline conventions, but the majority of the
tests expect a single linefeed to be recognized as a newline by tests expect a single linefeed to be recognized as a newline by de-
default. Without special action the tests would fail when PCRE2 is com- fault. Without special action the tests would fail when PCRE2 is com-
piled with either CR or CRLF as the default newline. piled with either CR or CRLF as the default newline.
The #newline_default command specifies a list of newline types that are The #newline_default command specifies a list of newline types that are
@ -323,14 +323,14 @@ COMMAND LINES
If the default newline is in the list, this command has no effect. Oth- If the default newline is in the list, this command has no effect. Oth-
erwise, except when testing the POSIX API, a newline modifier that erwise, except when testing the POSIX API, a newline modifier that
specifies the first newline convention in the list (LF in the above specifies the first newline convention in the list (LF in the above ex-
example) is added to any pattern that does not already have a newline ample) is added to any pattern that does not already have a newline
modifier. If the newline list is empty, the feature is turned off. This modifier. If the newline list is empty, the feature is turned off. This
command is present in a number of the standard test input files. command is present in a number of the standard test input files.
When the POSIX API is being tested there is no way to override the When the POSIX API is being tested there is no way to override the de-
default newline convention, though it is possible to set the newline fault newline convention, though it is possible to set the newline con-
convention from within the pattern. A warning is given if the posix or vention from within the pattern. A warning is given if the posix or
posix_nosub modifier is used when #newline_default would set a default posix_nosub modifier is used when #newline_default would set a default
for the non-POSIX API. for the non-POSIX API.
@ -344,8 +344,8 @@ COMMAND LINES
The appearance of this line causes all subsequent modifier settings to The appearance of this line causes all subsequent modifier settings to
be checked for compatibility with the perltest.sh script, which is used be checked for compatibility with the perltest.sh script, which is used
to confirm that Perl gives the same results as PCRE2. Also, apart from to confirm that Perl gives the same results as PCRE2. Also, apart from
comment lines, #pattern commands, and #subject commands that set or comment lines, #pattern commands, and #subject commands that set or un-
unset "mark", no command lines are permitted, because they and many of set "mark", no command lines are permitted, because they and many of
the modifiers are specific to pcre2test, and should not be used in test the modifiers are specific to pcre2test, and should not be used in test
files that are also processed by perltest.sh. The #perltest command files that are also processed by perltest.sh. The #perltest command
helps detect tests that are accidentally put in the wrong file. helps detect tests that are accidentally put in the wrong file.
@ -376,8 +376,8 @@ MODIFIER SYNTAX
list are separated by commas followed by optional white space. Trailing list are separated by commas followed by optional white space. Trailing
whitespace in a modifier list is ignored. Some modifiers may be given whitespace in a modifier list is ignored. Some modifiers may be given
for both patterns and subject lines, whereas others are valid only for for both patterns and subject lines, whereas others are valid only for
one or the other. Each modifier has a long name, for example one or the other. Each modifier has a long name, for example "an-
"anchored", and some of them must be followed by an equals sign and a chored", and some of them must be followed by an equals sign and a
value, for example, "offset=12". Values cannot contain comma charac- value, for example, "offset=12". Values cannot contain comma charac-
ters, but may contain spaces. Modifiers that do not take values may be ters, but may contain spaces. Modifiers that do not take values may be
preceded by a minus sign to turn off a previous setting. preceded by a minus sign to turn off a previous setting.
@ -498,8 +498,8 @@ SUBJECT LINE SYNTAX
\= This is a comment. \= This is a comment.
abc\= This is an invalid modifier list. abc\= This is an invalid modifier list.
A backslash followed by any other non-alphanumeric character just A backslash followed by any other non-alphanumeric character just es-
escapes that character. A backslash followed by anything else causes an capes that character. A backslash followed by anything else causes an
error. However, if the very last character in the line is a backslash error. However, if the very last character in the line is a backslash
(and there is no modifier list), it is ignored. This gives a way of (and there is no modifier list), it is ignored. This gives a way of
passing an empty line as data, since a real empty line terminates the passing an empty line as data, since a real empty line terminates the
@ -523,13 +523,13 @@ PATTERN MODIFIERS
The following modifiers set options for pcre2_compile(). Most of them The following modifiers set options for pcre2_compile(). Most of them
set bits in the options argument of that function, but those whose set bits in the options argument of that function, but those whose
names start with PCRE2_EXTRA are additional options that are set in the names start with PCRE2_EXTRA are additional options that are set in the
compile context. For the main options, there are some single-letter compile context. For the main options, there are some single-letter ab-
abbreviations that are the same as Perl options. There is special han- breviations that are the same as Perl options. There is special han-
dling for /x: if a second x is present, PCRE2_EXTENDED is converted dling for /x: if a second x is present, PCRE2_EXTENDED is converted
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
PCRE2_EXTENDED as well, though this makes no difference to the way TENDED as well, though this makes no difference to the way pcre2_com-
pcre2_compile() behaves. See pcre2api for a description of the effects pile() behaves. See pcre2api for a description of the effects of these
of these options. options.
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
@ -577,9 +577,9 @@ PATTERN MODIFIERS
Setting compilation controls Setting compilation controls
The following modifiers affect the compilation process or request The following modifiers affect the compilation process or request in-
information about the pattern. There are single-letter abbreviations formation about the pattern. There are single-letter abbreviations for
for some that are heavily used in the test files. some that are heavily used in the test files.
bsr=[anycrlf|unicode] specify \R handling bsr=[anycrlf|unicode] specify \R handling
/B bincode show binary code without lengths /B bincode show binary code without lengths
@ -717,8 +717,8 @@ PATTERN MODIFIERS
minated strings but can be passed by length instead of being zero-ter- minated strings but can be passed by length instead of being zero-ter-
minated. The use_length modifier causes this to happen. Using a length minated. The use_length modifier causes this to happen. Using a length
happens automatically (whether or not use_length is set) when hex is happens automatically (whether or not use_length is set) when hex is
set, because patterns specified in hexadecimal may contain binary set, because patterns specified in hexadecimal may contain binary ze-
zeros. ros.
If hex or use_length is used with the POSIX wrapper API (see "Using the If hex or use_length is used with the POSIX wrapper API (see "Using the
POSIX wrapper API" below), the REG_PEND extension is used to pass the POSIX wrapper API" below), the REG_PEND extension is used to pass the
@ -770,8 +770,8 @@ PATTERN MODIFIERS
partial modifier in "Subject Modifiers" below for details of how these partial modifier in "Subject Modifiers" below for details of how these
options are specified for each match attempt. options are specified for each match attempt.
JIT compilation is requested by the jit pattern modifier, which may JIT compilation is requested by the jit pattern modifier, which may op-
optionally be followed by an equals sign and a number in the range 0 to tionally be followed by an equals sign and a number in the range 0 to
7. The three bits that make up the number specify which of the three 7. The three bits that make up the number specify which of the three
JIT operating modes are to be compiled: JIT operating modes are to be compiled:
@ -799,8 +799,8 @@ PATTERN MODIFIERS
none was compiled for non-partial matching. none was compiled for non-partial matching.
If JIT compilation is successful, the compiled JIT code will automati- If JIT compilation is successful, the compiled JIT code will automati-
cally be used when an appropriate type of match is run, except when cally be used when an appropriate type of match is run, except when in-
incompatible run-time options are specified. For more details, see the compatible run-time options are specified. For more details, see the
pcre2jit documentation. See also the jitstack modifier below for a way pcre2jit documentation. See also the jitstack modifier below for a way
of setting the size of the JIT stack. of setting the size of the JIT stack.
@ -847,8 +847,8 @@ PATTERN MODIFIERS
Limiting nested parentheses Limiting nested parentheses
The parens_nest_limit modifier sets a limit on the depth of nested The parens_nest_limit modifier sets a limit on the depth of nested
parentheses in a pattern. Breaching the limit causes a compilation parentheses in a pattern. Breaching the limit causes a compilation er-
error. The default for the library is set when PCRE2 is built, but ror. The default for the library is set when PCRE2 is built, but
pcre2test sets its own default of 220, which is required for running pcre2test sets its own default of 220, which is required for running
the standard test suite. the standard test suite.
@ -886,13 +886,13 @@ PATTERN MODIFIERS
buffer is too small for the error message. If this modifier has not buffer is too small for the error message. If this modifier has not
been set, a large buffer is used. been set, a large buffer is used.
The aftertext and allaftertext subject modifiers work as described The aftertext and allaftertext subject modifiers work as described be-
below. All other modifiers are either ignored, with a warning message, low. All other modifiers are either ignored, with a warning message, or
or cause an error. cause an error.
The pattern is passed to regcomp() as a zero-terminated string by The pattern is passed to regcomp() as a zero-terminated string by de-
default, but if the use_length or hex modifiers are set, the REG_PEND fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
extension is used to pass it by length. tension is used to pass it by length.
Testing the stack guard feature Testing the stack guard feature
@ -920,8 +920,8 @@ PATTERN MODIFIERS
2 a set of tables defining ISO 8859 characters 2 a set of tables defining ISO 8859 characters
In table 2, some characters whose codes are greater than 128 are iden- In table 2, some characters whose codes are greater than 128 are iden-
tified as letters, digits, spaces, etc. Setting alternate character tified as letters, digits, spaces, etc. Setting alternate character ta-
tables and a locale are mutually exclusive. bles and a locale are mutually exclusive.
Setting certain match controls Setting certain match controls
@ -971,12 +971,12 @@ PATTERN MODIFIERS
terns" below. If pushcopy is used instead of push, a copy of the com- terns" below. If pushcopy is used instead of push, a copy of the com-
piled pattern is stacked, leaving the original as current, ready to piled pattern is stacked, leaving the original as current, ready to
match the following input lines. This provides a way of testing the match the following input lines. This provides a way of testing the
pcre2_code_copy() function. The push and pushcopy modifiers are pcre2_code_copy() function. The push and pushcopy modifiers are in-
incompatible with compilation modifiers such as global that act at compatible with compilation modifiers such as global that act at match
match time. Any that are specified are ignored (for the stacked copy), time. Any that are specified are ignored (for the stacked copy), with a
with a warning message, except for replace, which causes an error. Note warning message, except for replace, which causes an error. Note that
that jitverify, which is allowed, does not carry through to any subse- jitverify, which is allowed, does not carry through to any subsequent
quent matching that uses a stacked pattern. matching that uses a stacked pattern.
Testing foreign pattern conversion Testing foreign pattern conversion
@ -1124,12 +1124,12 @@ SUBJECT MODIFIERS
The allusedtext modifier requests that all the text that was consulted The allusedtext modifier requests that all the text that was consulted
during a successful pattern match by the interpreter should be shown. during a successful pattern match by the interpreter should be shown.
This feature is not supported for JIT matching, and if requested with This feature is not supported for JIT matching, and if requested with
JIT it is ignored (with a warning message). Setting this modifier JIT it is ignored (with a warning message). Setting this modifier af-
affects the output if there is a lookbehind at the start of a match, or fects the output if there is a lookbehind at the start of a match, or a
a lookahead at the end, or if \K is used in the pattern. Characters lookahead at the end, or if \K is used in the pattern. Characters that
that precede or follow the start and end of the actual match are indi- precede or follow the start and end of the actual match are indicated
cated in the output by '<' or '>' characters underneath them. Here is in the output by '<' or '>' characters underneath them. Here is an ex-
an example: ample:
re> /(?<=pqr)abc(?=xyz)/ re> /(?<=pqr)abc(?=xyz)/
data> 123pqrabcxyz456\=allusedtext data> 123pqrabcxyz456\=allusedtext
@ -1145,8 +1145,8 @@ SUBJECT MODIFIERS
string. The only time when this occurs is when \K has been processed as string. The only time when this occurs is when \K has been processed as
part of the match. In this situation, the output for the matched string part of the match. In this situation, the output for the matched string
is displayed from the starting character instead of from the match is displayed from the starting character instead of from the match
point, with circumflex characters under the earlier characters. For point, with circumflex characters under the earlier characters. For ex-
example: ample:
re> /abc\Kxyz/ re> /abc\Kxyz/
data> abcxyz\=startchar data> abcxyz\=startchar
@ -1171,12 +1171,12 @@ SUBJECT MODIFIERS
The allvector modifier requests that the entire ovector be shown, what- The allvector modifier requests that the entire ovector be shown, what-
ever the outcome of the match. Compare allcaptures, which shows only up ever the outcome of the match. Compare allcaptures, which shows only up
to the maximum number of capture groups for the pattern, and then only to the maximum number of capture groups for the pattern, and then only
for a successful complete non-DFA match. This modifier, which acts for a successful complete non-DFA match. This modifier, which acts af-
after any match result, and also for DFA matching, provides a means of ter any match result, and also for DFA matching, provides a means of
checking that there are no unexpected modifications to ovector fields. checking that there are no unexpected modifications to ovector fields.
Before each match attempt, the ovector is filled with a special value, Before each match attempt, the ovector is filled with a special value,
and if this is found in both elements of a capturing pair, and if this is found in both elements of a capturing pair, "<un-
"<unchanged>" is output. After a successful match, this applies to all changed>" is output. After a successful match, this applies to all
groups after the maximum capture group for the pattern. In other cases groups after the maximum capture group for the pattern. In other cases
it applies to the entire ovector. After a partial match, the first two it applies to the entire ovector. After a partial match, the first two
elements are the only ones that should be set. After a DFA match, the elements are the only ones that should be set. After a DFA match, the
@ -1207,12 +1207,12 @@ SUBJECT MODIFIERS
If an empty string is matched, the next match is done with the If an empty string is matched, the next match is done with the
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
for another, non-empty, match at the same point in the subject. If this for another, non-empty, match at the same point in the subject. If this
match fails, the start offset is advanced, and the normal match is match fails, the start offset is advanced, and the normal match is re-
retried. This imitates the way Perl handles such cases when using the tried. This imitates the way Perl handles such cases when using the /g
/g modifier or the split() function. Normally, the start offset is modifier or the split() function. Normally, the start offset is ad-
advanced by one character, but if the newline convention recognizes vanced by one character, but if the newline convention recognizes CRLF
CRLF as a newline, and the current character is CR followed by LF, an as a newline, and the current character is CR followed by LF, an ad-
advance of two characters occurs. vance of two characters occurs.
Testing substring extraction functions Testing substring extraction functions
@ -1275,8 +1275,8 @@ SUBJECT MODIFIERS
than 256 characters) for substitution tests, as fixed-size buffers are than 256 characters) for substitution tests, as fixed-size buffers are
used. To make it easy to test for buffer overflow, if the replacement used. To make it easy to test for buffer overflow, if the replacement
string starts with a number in square brackets, that number is passed string starts with a number in square brackets, that number is passed
to pcre2_substitute() as the size of the output buffer, with the to pcre2_substitute() as the size of the output buffer, with the re-
replacement string starting at the next character. Here is an example placement string starting at the next character. Here is an example
that tests the edge case: that tests the edge case:
/abc/ /abc/
@ -1285,10 +1285,10 @@ SUBJECT MODIFIERS
123abc123\=replace=[9]XYZ 123abc123\=replace=[9]XYZ
Failed: error -47: no more memory Failed: error -47: no more memory
The default action of pcre2_substitute() is to return The default action of pcre2_substitute() is to return PCRE2_ER-
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if ROR_NOMEMORY when the output buffer is too small. However, if the
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub- PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
stitute_overflow_length modifier), pcre2_substitute() continues to go tute_overflow_length modifier), pcre2_substitute() continues to go
through the motions of matching and substituting (but not doing any through the motions of matching and substituting (but not doing any
callouts), in order to compute the size of buffer that is required. callouts), in order to compute the size of buffer that is required.
When this happens, pcre2test shows the required buffer length (which When this happens, pcre2test shows the required buffer length (which
@ -1323,8 +1323,8 @@ SUBJECT MODIFIERS
Then are listed the offsets of the old substring, its contents, and the Then are listed the offsets of the old substring, its contents, and the
same for the replacement. same for the replacement.
By default, the substitution callout function returns zero, which By default, the substitution callout function returns zero, which ac-
accepts the replacement and causes matching to continue if /g was used. cepts the replacement and causes matching to continue if /g was used.
Two further modifiers can be used to test other return values. If sub- Two further modifiers can be used to test other return values. If sub-
stitute_skip is set to a value greater than zero the callout function stitute_skip is set to a value greater than zero the callout function
returns +1 for the match of that number, and similarly substitute_stop returns +1 for the match of that number, and similarly substitute_stop
@ -1411,8 +1411,8 @@ SUBJECT MODIFIERS
The memory modifier causes pcre2test to log the sizes of all heap mem- The memory modifier causes pcre2test to log the sizes of all heap mem-
ory allocation and freeing calls that occur during a call to ory allocation and freeing calls that occur during a call to
pcre2_match() or pcre2_dfa_match(). These occur only when a match pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
requires a bigger vector than the default for remembering backtracking quires a bigger vector than the default for remembering backtracking
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()). points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
In many cases there will be no heap memory used and therefore no addi- In many cases there will be no heap memory used and therefore no addi-
tional output. No heap memory is allocated during matching with JIT, so tional output. No heap memory is allocated during matching with JIT, so
@ -1435,9 +1435,9 @@ SUBJECT MODIFIERS
Setting the size of the output vector Setting the size of the output vector
The ovector modifier applies only to the subject line in which it The ovector modifier applies only to the subject line in which it ap-
appears, though of course it can also be used to set a default in a pears, though of course it can also be used to set a default in a #sub-
#subject command. It specifies the number of pairs of offsets that are ject command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15. available for storing matching information. The default is 15.
A value of zero is useful when testing the POSIX API because it causes A value of zero is useful when testing the POSIX API because it causes
@ -1491,12 +1491,12 @@ DEFAULT OUTPUT FROM pcre2test
When a match succeeds, pcre2test outputs the list of captured sub- When a match succeeds, pcre2test outputs the list of captured sub-
strings, starting with number 0 for the string that matched the whole strings, starting with number 0 for the string that matched the whole
pattern. Otherwise, it outputs "No match" when the return is pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially ROR_NOMATCH, or "Partial match:" followed by the partially matching
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
this is the entire substring that was inspected during the partial the entire substring that was inspected during the partial match; it
match; it may include characters before the actual match start if a may include characters before the actual match start if a lookbehind
lookbehind assertion, \K, \b, or \B was involved.) assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error number For any other return, pcre2test outputs the PCRE2 negative error number
and a short descriptive phrase. If the error is a failed UTF string and a short descriptive phrase. If the error is a failed UTF string
@ -1541,8 +1541,8 @@ DEFAULT OUTPUT FROM pcre2test
0: cat 0: cat
0+ aract 0+ aract
If global matching is requested, the results of successive matching If global matching is requested, the results of successive matching at-
attempts are output in sequence, like this: tempts are output in sequence, like this:
re> /\Bi(\w\w)/g re> /\Bi(\w\w)/g
data> Mississippi data> Mississippi
@ -1580,12 +1580,12 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
2: tan 2: tan
Using the normal matching function on this data finds only "tang". The Using the normal matching function on this data finds only "tang". The
longest matching string is always given first (and numbered zero). longest matching string is always given first (and numbered zero). Af-
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
followed by the partially matching substring. Note that this is the lowed by the partially matching substring. Note that this is the entire
entire substring that was inspected during the partial match; it may substring that was inspected during the partial match; it may include
include characters before the actual match start if a lookbehind asser- characters before the actual match start if a lookbehind assertion, \b,
tion, \b, or \B was involved. (\K is not supported for DFA matching.) or \B was involved. (\K is not supported for DFA matching.)
If global matching is requested, the search for further matches resumes If global matching is requested, the search for further matches resumes
at the end of the longest match. For example: at the end of the longest match. For example:
@ -1638,12 +1638,12 @@ CALLOUTS
--->pqrabcdef --->pqrabcdef
0 ^ ^ \d 0 ^ ^ \d
This output indicates that callout number 0 occurred for a match This output indicates that callout number 0 occurred for a match at-
attempt starting at the fourth character of the subject string, when tempt starting at the fourth character of the subject string, when the
the pointer was at the seventh character, and when the next pattern pointer was at the seventh character, and when the next pattern item
item was \d. Just one circumflex is output if the start and current was \d. Just one circumflex is output if the start and current posi-
positions are the same, or if the current position precedes the start tions are the same, or if the current position precedes the start posi-
position, which can happen if the callout is in a lookbehind assertion. tion, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the auto_callout pattern modifier. In this case, instead of a result of the auto_callout pattern modifier. In this case, instead of
@ -1660,8 +1660,8 @@ CALLOUTS
0: E* 0: E*
If a pattern contains (*MARK) items, an additional line is output when- If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For ever a change of latest mark is passed to the callout function. For ex-
example: ample:
re> /a(*MARK:X)bc/auto_callout re> /a(*MARK:X)bc/auto_callout
data> abc data> abc
@ -1683,8 +1683,8 @@ CALLOUTS
The output for a callout with a string argument is similar, except that The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators, instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output the callout string and its offset in the pattern string are output be-
before the reflection of the subject string, and the subject string is fore the reflection of the subject string, and the subject string is
reflected for each callout. For example: reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/ re> /^ab(?C'first')cd(?C"second")ef/
@ -1800,9 +1800,9 @@ NON-PRINTING CHARACTERS
When pcre2test is outputting text that is a matched part of a subject When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been string, it behaves in the same way, unless a different locale has been
set for the pattern (using the locale modifier). In this case, the set for the pattern (using the locale modifier). In this case, the is-
isprint() function is used to distinguish printing and non-printing print() function is used to distinguish printing and non-printing char-
characters. acters.
SAVING AND RESTORING COMPILED PATTERNS SAVING AND RESTORING COMPILED PATTERNS
@ -1814,14 +1814,14 @@ SAVING AND RESTORING COMPILED PATTERNS
have the same endianness, pointer width and PCRE2_SIZE type. Before have the same endianness, pointer width and PCRE2_SIZE type. Before
compiled patterns can be saved they must be serialized, that is, con- compiled patterns can be saved they must be serialized, that is, con-
verted to a stream of bytes. A single byte stream may contain any num- verted to a stream of bytes. A single byte stream may contain any num-
ber of compiled patterns, but they must all use the same character ber of compiled patterns, but they must all use the same character ta-
tables. A single copy of the tables is included in the byte stream (its bles. A single copy of the tables is included in the byte stream (its
size is 1088 bytes). size is 1088 bytes).
The functions whose names begin with pcre2_serialize_ are used for The functions whose names begin with pcre2_serialize_ are used for se-
serializing and de-serializing. They are described in the pcre2serial- rializing and de-serializing. They are described in the pcre2serialize
ize documentation. In this section we describe the features of documentation. In this section we describe the features of pcre2test
pcre2test that can be used to test these functions. that can be used to test these functions.
Note that "serialization" in PCRE2 does not convert compiled patterns Note that "serialization" in PCRE2 does not convert compiled patterns
to an abstract format like Java or .NET. It just makes a reloadable to an abstract format like Java or .NET. It just makes a reloadable
@ -1831,8 +1831,8 @@ SAVING AND RESTORING COMPILED PATTERNS
piled, it is pushed onto a stack of compiled patterns, and pcre2test piled, it is pushed onto a stack of compiled patterns, and pcre2test
expects the next line to contain a new pattern (or command) instead of expects the next line to contain a new pattern (or command) instead of
a subject line. By contrast, the pushcopy modifier causes a copy of the a subject line. By contrast, the pushcopy modifier causes a copy of the
compiled pattern to be stacked, leaving the original available for compiled pattern to be stacked, leaving the original available for im-
immediate matching. By using push and/or pushcopy, a number of patterns mediate matching. By using push and/or pushcopy, a number of patterns
can be compiled and retained. These modifiers are incompatible with can be compiled and retained. These modifiers are incompatible with
posix, and control modifiers that act at match time are ignored (with a posix, and control modifiers that act at match time are ignored (with a
message) for the stacked patterns. The jitverify modifier applies only message) for the stacked patterns. The jitverify modifier applies only
@ -1855,8 +1855,8 @@ SAVING AND RESTORING COMPILED PATTERNS
matched with the pattern, terminated as usual by an empty line or end matched with the pattern, terminated as usual by an empty line or end
of file. This command may be followed by a modifier list containing of file. This command may be followed by a modifier list containing
only control modifiers that act after a pattern has been compiled. In only control modifiers that act after a pattern has been compiled. In
particular, hex, posix, posix_nosub, push, and pushcopy are not particular, hex, posix, posix_nosub, push, and pushcopy are not al-
allowed, nor are any option-setting modifiers. The JIT modifiers are, lowed, nor are any option-setting modifiers. The JIT modifiers are,
however permitted. Here is an example that saves and reloads two pat- however permitted. Here is an example that saves and reloads two pat-
terns. terns.