Documentation update.
This commit is contained in:
parent
a89423624d
commit
c6ee84317d
|
@ -3525,9 +3525,10 @@ first match attempt, the second attempt would start at the second character
|
||||||
instead of skipping on to "c".
|
instead of skipping on to "c".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||||
is not later than the starting point of the current match, the position
|
starting position of the current match, or (by being inside a lookbehind)
|
||||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||||
|
"bumpalong" occurs.
|
||||||
<pre>
|
<pre>
|
||||||
(*SKIP:NAME)
|
(*SKIP:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3754,7 +3755,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 June 2019
|
Last updated: 22 June 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -16,8 +16,8 @@ DESCRIPTION
|
||||||
|
|
||||||
pcre2-config returns the configuration of the installed PCRE2 libraries
|
pcre2-config returns the configuration of the installed PCRE2 libraries
|
||||||
and the options required to compile a program to use them. Some of the
|
and the options required to compile a program to use them. Some of the
|
||||||
options apply only to the 8-bit, or 16-bit, or 32-bit libraries,
|
options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re-
|
||||||
respectively, and are not available for libraries that have not been
|
spectively, and are not available for libraries that have not been
|
||||||
built. If an unavailable option is encountered, the "usage" information
|
built. If an unavailable option is encountered, the "usage" information
|
||||||
is output.
|
is output.
|
||||||
|
|
||||||
|
@ -36,30 +36,30 @@ OPTIONS
|
||||||
--version Writes the version number of the installed PCRE2 libraries to
|
--version Writes the version number of the installed PCRE2 libraries to
|
||||||
the standard output.
|
the standard output.
|
||||||
|
|
||||||
--libs8 Writes to the standard output the command line options
|
--libs8 Writes to the standard output the command line options re-
|
||||||
required to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
quired to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
--libs16 Writes to the standard output the command line options
|
--libs16 Writes to the standard output the command line options re-
|
||||||
required to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
quired to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
--libs32 Writes to the standard output the command line options
|
--libs32 Writes to the standard output the command line options re-
|
||||||
required to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
quired to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
--libs-posix
|
--libs-posix
|
||||||
Writes to the standard output the command line options
|
Writes to the standard output the command line options re-
|
||||||
required to link with PCRE2's POSIX API wrapper library
|
quired to link with PCRE2's POSIX API wrapper library
|
||||||
(-lpcre2-posix -lpcre2-8 on many systems).
|
(-lpcre2-posix -lpcre2-8 on many systems).
|
||||||
|
|
||||||
--cflags Writes to the standard output the command line options
|
--cflags Writes to the standard output the command line options re-
|
||||||
required to compile files that use PCRE2 (this may include
|
quired to compile files that use PCRE2 (this may include some
|
||||||
some -I options, but is blank on many systems).
|
-I options, but is blank on many systems).
|
||||||
|
|
||||||
--cflags-posix
|
--cflags-posix
|
||||||
Writes to the standard output the command line options
|
Writes to the standard output the command line options re-
|
||||||
required to compile files that use PCRE2's POSIX API wrapper
|
quired to compile files that use PCRE2's POSIX API wrapper
|
||||||
library (this may include some -I options, but is blank on
|
library (this may include some -I options, but is blank on
|
||||||
many systems).
|
many systems).
|
||||||
|
|
||||||
|
|
1805
doc/pcre2.txt
1805
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -12,11 +12,11 @@ SYNOPSIS
|
||||||
DESCRIPTION
|
DESCRIPTION
|
||||||
|
|
||||||
pcre2grep searches files for character patterns, in the same way as
|
pcre2grep searches files for character patterns, in the same way as
|
||||||
other grep commands do, but it uses the PCRE2 regular expression
|
other grep commands do, but it uses the PCRE2 regular expression li-
|
||||||
library to support patterns that are compatible with the regular
|
brary to support patterns that are compatible with the regular expres-
|
||||||
expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
|
sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of
|
||||||
of pattern syntax, or pcre2pattern(3) for a full description of the
|
pattern syntax, or pcre2pattern(3) for a full description of the syntax
|
||||||
syntax and semantics of the regular expressions that PCRE2 supports.
|
and semantics of the regular expressions that PCRE2 supports.
|
||||||
|
|
||||||
Patterns, whether supplied on the command line or in a separate file,
|
Patterns, whether supplied on the command line or in a separate file,
|
||||||
are given without delimiters. For example:
|
are given without delimiters. For example:
|
||||||
|
@ -26,8 +26,8 @@ DESCRIPTION
|
||||||
If you attempt to use delimiters (for example, by surrounding a pattern
|
If you attempt to use delimiters (for example, by surrounding a pattern
|
||||||
with slashes, as is common in Perl scripts), they are interpreted as
|
with slashes, as is common in Perl scripts), they are interpreted as
|
||||||
part of the pattern. Quotes can of course be used to delimit patterns
|
part of the pattern. Quotes can of course be used to delimit patterns
|
||||||
on the command line because they are interpreted by the shell, and
|
on the command line because they are interpreted by the shell, and in-
|
||||||
indeed quotes are required if a pattern contains white space or shell
|
deed quotes are required if a pattern contains white space or shell
|
||||||
metacharacters.
|
metacharacters.
|
||||||
|
|
||||||
The first argument that follows any option settings is treated as the
|
The first argument that follows any option settings is treated as the
|
||||||
|
@ -54,8 +54,8 @@ DESCRIPTION
|
||||||
controlled by parameters that can be set by the --buffer-size and
|
controlled by parameters that can be set by the --buffer-size and
|
||||||
--max-buffer-size options. The first of these sets the size of buffer
|
--max-buffer-size options. The first of these sets the size of buffer
|
||||||
that is obtained at the start of processing. If an input file contains
|
that is obtained at the start of processing. If an input file contains
|
||||||
very long lines, a larger buffer may be needed; this is handled by
|
very long lines, a larger buffer may be needed; this is handled by au-
|
||||||
automatically extending the buffer, up to the limit specified by --max-
|
tomatically extending the buffer, up to the limit specified by --max-
|
||||||
buffer-size. The default values for these parameters can be set when
|
buffer-size. The default values for these parameters can be set when
|
||||||
pcre2grep is built; if nothing is specified, the defaults are set to
|
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||||
20KiB and 1MiB respectively. An error occurs if a line is too long and
|
20KiB and 1MiB respectively. An error occurs if a line is too long and
|
||||||
|
@ -75,12 +75,12 @@ DESCRIPTION
|
||||||
By default, as soon as one pattern matches a line, no further patterns
|
By default, as soon as one pattern matches a line, no further patterns
|
||||||
are considered. However, if --colour (or --color) is used to colour the
|
are considered. However, if --colour (or --color) is used to colour the
|
||||||
matching substrings, or if --only-matching, --file-offsets, or --line-
|
matching substrings, or if --only-matching, --file-offsets, or --line-
|
||||||
offsets is used to output only the part of the line that matched
|
offsets is used to output only the part of the line that matched (ei-
|
||||||
(either shown literally, or as an offset), scanning resumes immediately
|
ther shown literally, or as an offset), scanning resumes immediately
|
||||||
following the match, so that further matches on the same line can be
|
following the match, so that further matches on the same line can be
|
||||||
found. If there are multiple patterns, they are all tried on the
|
found. If there are multiple patterns, they are all tried on the re-
|
||||||
remainder of the line, but patterns that follow the one that matched
|
mainder of the line, but patterns that follow the one that matched are
|
||||||
are not tried on the earlier part of the line.
|
not tried on the earlier part of the line.
|
||||||
|
|
||||||
This behaviour means that the order in which multiple patterns are
|
This behaviour means that the order in which multiple patterns are
|
||||||
specified can affect the output when one of the above options is used.
|
specified can affect the output when one of the above options is used.
|
||||||
|
@ -89,11 +89,11 @@ DESCRIPTION
|
||||||
overlap).
|
overlap).
|
||||||
|
|
||||||
Patterns that can match an empty string are accepted, but empty string
|
Patterns that can match an empty string are accepted, but empty string
|
||||||
matches are never recognized. An example is the pattern
|
matches are never recognized. An example is the pattern "(su-
|
||||||
"(super)?(man)?", in which all components are optional. This pattern
|
per)?(man)?", in which all components are optional. This pattern finds
|
||||||
finds all occurrences of both "super" and "man"; the output differs
|
all occurrences of both "super" and "man"; the output differs from
|
||||||
from matching with "super|man" when only the matching substrings are
|
matching with "super|man" when only the matching substrings are being
|
||||||
being shown.
|
shown.
|
||||||
|
|
||||||
If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
|
If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
|
||||||
the value to set a locale when calling the PCRE2 library. The --locale
|
the value to set a locale when calling the PCRE2 library. The --locale
|
||||||
|
@ -116,10 +116,9 @@ BINARY FILES
|
||||||
By default, a file that contains a binary zero byte within the first
|
By default, a file that contains a binary zero byte within the first
|
||||||
1024 bytes is identified as a binary file, and is processed specially.
|
1024 bytes is identified as a binary file, and is processed specially.
|
||||||
(GNU grep identifies binary files in this manner.) However, if the new-
|
(GNU grep identifies binary files in this manner.) However, if the new-
|
||||||
line type is specified as "nul", that is, the line terminator is a
|
line type is specified as "nul", that is, the line terminator is a bi-
|
||||||
binary zero, the test for a binary file is not applied. See the
|
nary zero, the test for a binary file is not applied. See the --binary-
|
||||||
--binary-files option for a means of changing the way binary files are
|
files option for a means of changing the way binary files are handled.
|
||||||
handled.
|
|
||||||
|
|
||||||
|
|
||||||
BINARY ZEROS IN PATTERNS
|
BINARY ZEROS IN PATTERNS
|
||||||
|
@ -148,12 +147,12 @@ OPTIONS
|
||||||
Output up to number lines of context after each matching
|
Output up to number lines of context after each matching
|
||||||
line. Fewer lines are output if the next match or the end of
|
line. Fewer lines are output if the next match or the end of
|
||||||
the file is reached, or if the processing buffer size has
|
the file is reached, or if the processing buffer size has
|
||||||
been set too small. If file names and/or line numbers are
|
been set too small. If file names and/or line numbers are be-
|
||||||
being output, a hyphen separator is used instead of a colon
|
ing output, a hyphen separator is used instead of a colon for
|
||||||
for the context lines. A line containing "--" is output
|
the context lines. A line containing "--" is output between
|
||||||
between each group of lines, unless they are in fact contigu-
|
each group of lines, unless they are in fact contiguous in
|
||||||
ous in the input file. The value of number is expected to be
|
the input file. The value of number is expected to be rela-
|
||||||
relatively small. When -c is used, -A is ignored.
|
tively small. When -c is used, -A is ignored.
|
||||||
|
|
||||||
-a, --text
|
-a, --text
|
||||||
Treat binary files as text. This is equivalent to --binary-
|
Treat binary files as text. This is equivalent to --binary-
|
||||||
|
@ -164,26 +163,26 @@ OPTIONS
|
||||||
line. Fewer lines are output if the previous match or the
|
line. Fewer lines are output if the previous match or the
|
||||||
start of the file is within number lines, or if the process-
|
start of the file is within number lines, or if the process-
|
||||||
ing buffer size has been set too small. If file names and/or
|
ing buffer size has been set too small. If file names and/or
|
||||||
line numbers are being output, a hyphen separator is used
|
line numbers are being output, a hyphen separator is used in-
|
||||||
instead of a colon for the context lines. A line containing
|
stead of a colon for the context lines. A line containing
|
||||||
"--" is output between each group of lines, unless they are
|
"--" is output between each group of lines, unless they are
|
||||||
in fact contiguous in the input file. The value of number is
|
in fact contiguous in the input file. The value of number is
|
||||||
expected to be relatively small. When -c is used, -B is
|
expected to be relatively small. When -c is used, -B is ig-
|
||||||
ignored.
|
nored.
|
||||||
|
|
||||||
--binary-files=word
|
--binary-files=word
|
||||||
Specify how binary files are to be processed. If the word is
|
Specify how binary files are to be processed. If the word is
|
||||||
"binary" (the default), pattern matching is performed on
|
"binary" (the default), pattern matching is performed on bi-
|
||||||
binary files, but the only output is "Binary file <name>
|
nary files, but the only output is "Binary file <name>
|
||||||
matches" when a match succeeds. If the word is "text", which
|
matches" when a match succeeds. If the word is "text", which
|
||||||
is equivalent to the -a or --text option, binary files are
|
is equivalent to the -a or --text option, binary files are
|
||||||
processed in the same way as any other file. In this case,
|
processed in the same way as any other file. In this case,
|
||||||
when a match succeeds, the output may be binary garbage,
|
when a match succeeds, the output may be binary garbage,
|
||||||
which can have nasty effects if sent to a terminal. If the
|
which can have nasty effects if sent to a terminal. If the
|
||||||
word is "without-match", which is equivalent to the -I
|
word is "without-match", which is equivalent to the -I op-
|
||||||
option, binary files are not processed at all; they are
|
tion, binary files are not processed at all; they are assumed
|
||||||
assumed not to be of interest and are skipped without causing
|
not to be of interest and are skipped without causing any
|
||||||
any output or affecting the return code.
|
output or affecting the return code.
|
||||||
|
|
||||||
--buffer-size=number
|
--buffer-size=number
|
||||||
Set the parameter that controls how much memory is obtained
|
Set the parameter that controls how much memory is obtained
|
||||||
|
@ -208,10 +207,10 @@ OPTIONS
|
||||||
If no lines are selected, the number zero is output. If sev-
|
If no lines are selected, the number zero is output. If sev-
|
||||||
eral files are are being scanned, a count is output for each
|
eral files are are being scanned, a count is output for each
|
||||||
of them and the -t option can be used to cause a total to be
|
of them and the -t option can be used to cause a total to be
|
||||||
output at the end. However, if the --files-with-matches
|
output at the end. However, if the --files-with-matches op-
|
||||||
option is also used, only those files whose counts are
|
tion is also used, only those files whose counts are greater
|
||||||
greater than zero are listed. When -c is used, the -A, -B,
|
than zero are listed. When -c is used, the -A, -B, and -C op-
|
||||||
and -C options are ignored.
|
tions are ignored.
|
||||||
|
|
||||||
--colour, --color
|
--colour, --color
|
||||||
If this option is given without any data, it is equivalent to
|
If this option is given without any data, it is equivalent to
|
||||||
|
@ -238,8 +237,8 @@ OPTIONS
|
||||||
semicolon, except in the case of GREP_COLORS, which must
|
semicolon, except in the case of GREP_COLORS, which must
|
||||||
start with "ms=" or "mt=" followed by two semicolon-separated
|
start with "ms=" or "mt=" followed by two semicolon-separated
|
||||||
colours, terminated by the end of the string or by a colon.
|
colours, terminated by the end of the string or by a colon.
|
||||||
If GREP_COLORS does not start with "ms=" or "mt=" it is
|
If GREP_COLORS does not start with "ms=" or "mt=" it is ig-
|
||||||
ignored, and GREP_COLOR is checked.
|
nored, and GREP_COLOR is checked.
|
||||||
|
|
||||||
If the string obtained from one of the above variables con-
|
If the string obtained from one of the above variables con-
|
||||||
tains any characters other than semicolon or digits, the set-
|
tains any characters other than semicolon or digits, the set-
|
||||||
|
@ -250,9 +249,9 @@ OPTIONS
|
||||||
set, the default is "1;31", which gives red.
|
set, the default is "1;31", which gives red.
|
||||||
|
|
||||||
-D action, --devices=action
|
-D action, --devices=action
|
||||||
If an input path is not a regular file or a directory,
|
If an input path is not a regular file or a directory, "ac-
|
||||||
"action" specifies how it is to be processed. Valid values
|
tion" specifies how it is to be processed. Valid values are
|
||||||
are "read" (the default) or "skip" (silently skip the path).
|
"read" (the default) or "skip" (silently skip the path).
|
||||||
|
|
||||||
-d action, --directories=action
|
-d action, --directories=action
|
||||||
If an input path is a directory, "action" specifies how it is
|
If an input path is a directory, "action" specifies how it is
|
||||||
|
@ -261,8 +260,8 @@ OPTIONS
|
||||||
"recurse" (equivalent to the -r option), or "skip" (silently
|
"recurse" (equivalent to the -r option), or "skip" (silently
|
||||||
skip the path, the default in Windows environments). In the
|
skip the path, the default in Windows environments). In the
|
||||||
"read" case, directories are read as if they were ordinary
|
"read" case, directories are read as if they were ordinary
|
||||||
files. In some operating systems the effect of reading a
|
files. In some operating systems the effect of reading a di-
|
||||||
directory like this is an immediate end-of-file; in others it
|
rectory like this is an immediate end-of-file; in others it
|
||||||
may provoke an error.
|
may provoke an error.
|
||||||
|
|
||||||
--depth-limit=number
|
--depth-limit=number
|
||||||
|
@ -295,8 +294,8 @@ OPTIONS
|
||||||
whether listed on the command line, obtained from --file-
|
whether listed on the command line, obtained from --file-
|
||||||
list, or by scanning a directory. The pattern is a PCRE2 reg-
|
list, or by scanning a directory. The pattern is a PCRE2 reg-
|
||||||
ular expression, and is matched against the final component
|
ular expression, and is matched against the final component
|
||||||
of the file name, not the entire path. The -F, -w, and -x
|
of the file name, not the entire path. The -F, -w, and -x op-
|
||||||
options do not apply to this pattern. The option may be given
|
tions do not apply to this pattern. The option may be given
|
||||||
any number of times in order to specify multiple patterns. If
|
any number of times in order to specify multiple patterns. If
|
||||||
a file name matches both an --include and an --exclude pat-
|
a file name matches both an --include and an --exclude pat-
|
||||||
tern, it is excluded. There is no short form for this option.
|
tern, it is excluded. There is no short form for this option.
|
||||||
|
@ -310,29 +309,29 @@ OPTIONS
|
||||||
|
|
||||||
--exclude-dir=pattern
|
--exclude-dir=pattern
|
||||||
Directories whose names match the pattern are skipped without
|
Directories whose names match the pattern are skipped without
|
||||||
being processed, whatever the setting of the --recursive
|
being processed, whatever the setting of the --recursive op-
|
||||||
option. This applies to all directories, whether listed on
|
tion. This applies to all directories, whether listed on the
|
||||||
the command line, obtained from --file-list, or by scanning a
|
command line, obtained from --file-list, or by scanning a
|
||||||
parent directory. The pattern is a PCRE2 regular expression,
|
parent directory. The pattern is a PCRE2 regular expression,
|
||||||
and is matched against the final component of the directory
|
and is matched against the final component of the directory
|
||||||
name, not the entire path. The -F, -w, and -x options do not
|
name, not the entire path. The -F, -w, and -x options do not
|
||||||
apply to this pattern. The option may be given any number of
|
apply to this pattern. The option may be given any number of
|
||||||
times in order to specify more than one pattern. If a direc-
|
times in order to specify more than one pattern. If a direc-
|
||||||
tory matches both --include-dir and --exclude-dir, it is
|
tory matches both --include-dir and --exclude-dir, it is ex-
|
||||||
excluded. There is no short form for this option.
|
cluded. There is no short form for this option.
|
||||||
|
|
||||||
-F, --fixed-strings
|
-F, --fixed-strings
|
||||||
Interpret each data-matching pattern as a list of fixed
|
Interpret each data-matching pattern as a list of fixed
|
||||||
strings, separated by newlines, instead of as a regular
|
strings, separated by newlines, instead of as a regular ex-
|
||||||
expression. What constitutes a newline for this purpose is
|
pression. What constitutes a newline for this purpose is con-
|
||||||
controlled by the --newline option. The -w (match as a word)
|
trolled by the --newline option. The -w (match as a word) and
|
||||||
and -x (match whole line) options can be used with -F. They
|
-x (match whole line) options can be used with -F. They ap-
|
||||||
apply to each of the fixed strings. A line is selected if any
|
ply to each of the fixed strings. A line is selected if any
|
||||||
of the fixed strings are found in it (subject to -w or -x, if
|
of the fixed strings are found in it (subject to -w or -x, if
|
||||||
present). This option applies only to the patterns that are
|
present). This option applies only to the patterns that are
|
||||||
matched against the contents of files; it does not apply to
|
matched against the contents of files; it does not apply to
|
||||||
patterns specified by any of the --include or --exclude
|
patterns specified by any of the --include or --exclude op-
|
||||||
options.
|
tions.
|
||||||
|
|
||||||
-f filename, --file=filename
|
-f filename, --file=filename
|
||||||
Read patterns from the file, one per line, and match them
|
Read patterns from the file, one per line, and match them
|
||||||
|
@ -360,8 +359,8 @@ OPTIONS
|
||||||
--file-list=filename
|
--file-list=filename
|
||||||
Read a list of files and/or directories that are to be
|
Read a list of files and/or directories that are to be
|
||||||
scanned from the given file, one per line. What constitutes a
|
scanned from the given file, one per line. What constitutes a
|
||||||
newline when reading the file is the operating system's
|
newline when reading the file is the operating system's de-
|
||||||
default. Trailing white space is removed from each line, and
|
fault. Trailing white space is removed from each line, and
|
||||||
blank lines are ignored. These paths are processed before any
|
blank lines are ignored. These paths are processed before any
|
||||||
that are listed on the command line. The file name can be
|
that are listed on the command line. The file name can be
|
||||||
given as "-" to refer to the standard input. If --file and
|
given as "-" to refer to the standard input. If --file and
|
||||||
|
@ -388,8 +387,8 @@ OPTIONS
|
||||||
is used. If a line number is also being output, it follows
|
is used. If a line number is also being output, it follows
|
||||||
the file name. When the -M option causes a pattern to match
|
the file name. When the -M option causes a pattern to match
|
||||||
more than one line, only the first is preceded by the file
|
more than one line, only the first is preceded by the file
|
||||||
name. This option overrides any previous -h, -l, or -L
|
name. This option overrides any previous -h, -l, or -L op-
|
||||||
options.
|
tions.
|
||||||
|
|
||||||
-h, --no-filename
|
-h, --no-filename
|
||||||
Suppress the output file names when searching multiple files.
|
Suppress the output file names when searching multiple files.
|
||||||
|
@ -415,16 +414,16 @@ OPTIONS
|
||||||
--include=pattern
|
--include=pattern
|
||||||
If any --include patterns are specified, the only files that
|
If any --include patterns are specified, the only files that
|
||||||
are processed are those that match one of the patterns (and
|
are processed are those that match one of the patterns (and
|
||||||
do not match an --exclude pattern). This option does not
|
do not match an --exclude pattern). This option does not af-
|
||||||
affect directories, but it applies to all files, whether
|
fect directories, but it applies to all files, whether listed
|
||||||
listed on the command line, obtained from --file-list, or by
|
on the command line, obtained from --file-list, or by scan-
|
||||||
scanning a directory. The pattern is a PCRE2 regular expres-
|
ning a directory. The pattern is a PCRE2 regular expression,
|
||||||
sion, and is matched against the final component of the file
|
and is matched against the final component of the file name,
|
||||||
name, not the entire path. The -F, -w, and -x options do not
|
not the entire path. The -F, -w, and -x options do not apply
|
||||||
apply to this pattern. The option may be given any number of
|
to this pattern. The option may be given any number of times.
|
||||||
times. If a file name matches both an --include and an
|
If a file name matches both an --include and an --exclude
|
||||||
--exclude pattern, it is excluded. There is no short form
|
pattern, it is excluded. There is no short form for this op-
|
||||||
for this option.
|
tion.
|
||||||
|
|
||||||
--include-from=filename
|
--include-from=filename
|
||||||
Treat each non-empty line of the file as the data for an
|
Treat each non-empty line of the file as the data for an
|
||||||
|
@ -438,8 +437,8 @@ OPTIONS
|
||||||
tories that are processed are those that match one of the
|
tories that are processed are those that match one of the
|
||||||
patterns (and do not match an --exclude-dir pattern). This
|
patterns (and do not match an --exclude-dir pattern). This
|
||||||
applies to all directories, whether listed on the command
|
applies to all directories, whether listed on the command
|
||||||
line, obtained from --file-list, or by scanning a parent
|
line, obtained from --file-list, or by scanning a parent di-
|
||||||
directory. The pattern is a PCRE2 regular expression, and is
|
rectory. The pattern is a PCRE2 regular expression, and is
|
||||||
matched against the final component of the directory name,
|
matched against the final component of the directory name,
|
||||||
not the entire path. The -F, -w, and -x options do not apply
|
not the entire path. The -F, -w, and -x options do not apply
|
||||||
to this pattern. The option may be given any number of times.
|
to this pattern. The option may be given any number of times.
|
||||||
|
@ -480,8 +479,8 @@ OPTIONS
|
||||||
flushed by the operating system. This option can be useful
|
flushed by the operating system. This option can be useful
|
||||||
when the input or output is attached to a pipe and you do not
|
when the input or output is attached to a pipe and you do not
|
||||||
want pcre2grep to buffer up large amounts of data. However,
|
want pcre2grep to buffer up large amounts of data. However,
|
||||||
its use will affect performance, and the -M (multiline)
|
its use will affect performance, and the -M (multiline) op-
|
||||||
option ceases to work. When input is from a compressed .gz or
|
tion ceases to work. When input is from a compressed .gz or
|
||||||
.bz2 file, --line-buffered is ignored.
|
.bz2 file, --line-buffered is ignored.
|
||||||
|
|
||||||
--line-offsets
|
--line-offsets
|
||||||
|
@ -498,9 +497,9 @@ OPTIONS
|
||||||
--locale=locale-name
|
--locale=locale-name
|
||||||
This option specifies a locale to be used for pattern match-
|
This option specifies a locale to be used for pattern match-
|
||||||
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
|
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
|
||||||
ronment variables. If no locale is specified, the PCRE2
|
ronment variables. If no locale is specified, the PCRE2 li-
|
||||||
library's default (usually the "C" locale) is used. There is
|
brary's default (usually the "C" locale) is used. There is no
|
||||||
no short form for this option.
|
short form for this option.
|
||||||
|
|
||||||
--match-limit=number
|
--match-limit=number
|
||||||
Processing some regular expression patterns may take a very
|
Processing some regular expression patterns may take a very
|
||||||
|
@ -509,13 +508,13 @@ OPTIONS
|
||||||
options that set resource limits for matching.
|
options that set resource limits for matching.
|
||||||
|
|
||||||
The --match-limit option provides a means of limiting comput-
|
The --match-limit option provides a means of limiting comput-
|
||||||
ing resource usage when processing patterns that are not
|
ing resource usage when processing patterns that are not go-
|
||||||
going to match, but which have a very large number of possi-
|
ing to match, but which have a very large number of possibil-
|
||||||
bilities in their search trees. The classic example is a pat-
|
ities in their search trees. The classic example is a pattern
|
||||||
tern that uses nested unlimited repeats. Internally, PCRE2
|
that uses nested unlimited repeats. Internally, PCRE2 has a
|
||||||
has a counter that is incremented each time around its main
|
counter that is incremented each time around its main pro-
|
||||||
processing loop. If the value set by --match-limit is
|
cessing loop. If the value set by --match-limit is reached,
|
||||||
reached, an error occurs.
|
an error occurs.
|
||||||
|
|
||||||
The --heap-limit option specifies, as a number of kibibytes
|
The --heap-limit option specifies, as a number of kibibytes
|
||||||
(units of 1024 bytes), the amount of heap memory that may be
|
(units of 1024 bytes), the amount of heap memory that may be
|
||||||
|
@ -567,10 +566,10 @@ OPTIONS
|
||||||
|
|
||||||
pcre2grep -M 'regular\s+expression' <file>
|
pcre2grep -M 'regular\s+expression' <file>
|
||||||
|
|
||||||
The \s escape sequence matches any white space character,
|
The \s escape sequence matches any white space character, in-
|
||||||
including newlines, and is followed by + so as to match
|
cluding newlines, and is followed by + so as to match trail-
|
||||||
trailing white space on the first line as well as possibly
|
ing white space on the first line as well as possibly han-
|
||||||
handling a two-character newline sequence.
|
dling a two-character newline sequence.
|
||||||
|
|
||||||
There is a limit to the number of lines that can be matched,
|
There is a limit to the number of lines that can be matched,
|
||||||
imposed by the way that pcre2grep buffers the input file as
|
imposed by the way that pcre2grep buffers the input file as
|
||||||
|
@ -579,30 +578,30 @@ OPTIONS
|
||||||
when input is read line by line (see --line-buffered.)
|
when input is read line by line (see --line-buffered.)
|
||||||
|
|
||||||
-N newline-type, --newline=newline-type
|
-N newline-type, --newline=newline-type
|
||||||
The PCRE2 library supports five different conventions for
|
The PCRE2 library supports five different conventions for in-
|
||||||
indicating the ends of lines. They are the single-character
|
dicating the ends of lines. They are the single-character se-
|
||||||
sequences CR (carriage return) and LF (linefeed), the two-
|
quences CR (carriage return) and LF (linefeed), the two-char-
|
||||||
character sequence CRLF, an "anycrlf" convention, which rec-
|
acter sequence CRLF, an "anycrlf" convention, which recog-
|
||||||
ognizes any of the preceding three types, and an "any" con-
|
nizes any of the preceding three types, and an "any" conven-
|
||||||
vention, in which any Unicode line ending sequence is assumed
|
tion, in which any Unicode line ending sequence is assumed to
|
||||||
to end a line. The Unicode sequences are the three just men-
|
end a line. The Unicode sequences are the three just men-
|
||||||
tioned, plus VT (vertical tab, U+000B), FF (form feed,
|
tioned, plus VT (vertical tab, U+000B), FF (form feed,
|
||||||
U+000C), NEL (next line, U+0085), LS (line separator,
|
U+000C), NEL (next line, U+0085), LS (line separator,
|
||||||
U+2028), and PS (paragraph separator, U+2029).
|
U+2028), and PS (paragraph separator, U+2029).
|
||||||
|
|
||||||
When the PCRE2 library is built, a default line-ending
|
When the PCRE2 library is built, a default line-ending se-
|
||||||
sequence is specified. This is normally the standard
|
quence is specified. This is normally the standard sequence
|
||||||
sequence for the operating system. Unless otherwise specified
|
for the operating system. Unless otherwise specified by this
|
||||||
by this option, pcre2grep uses the library's default. The
|
option, pcre2grep uses the library's default. The possible
|
||||||
possible values for this option are CR, LF, CRLF, ANYCRLF, or
|
values for this option are CR, LF, CRLF, ANYCRLF, or ANY.
|
||||||
ANY. This makes it possible to use pcre2grep to scan files
|
This makes it possible to use pcre2grep to scan files that
|
||||||
that have come from other environments without having to mod-
|
have come from other environments without having to modify
|
||||||
ify their line endings. If the data that is being scanned
|
their line endings. If the data that is being scanned does
|
||||||
does not agree with the convention set by this option,
|
not agree with the convention set by this option, pcre2grep
|
||||||
pcre2grep may behave in strange ways. Note that this option
|
may behave in strange ways. Note that this option does not
|
||||||
does not apply to files specified by the -f, --exclude-from,
|
apply to files specified by the -f, --exclude-from, or --in-
|
||||||
or --include-from options, which are expected to use the
|
clude-from options, which are expected to use the operating
|
||||||
operating system's standard newline sequence.
|
system's standard newline sequence.
|
||||||
|
|
||||||
-n, --line-number
|
-n, --line-number
|
||||||
Precede each output line by its line number in the file, fol-
|
Precede each output line by its line number in the file, fol-
|
||||||
|
@ -621,8 +620,8 @@ OPTIONS
|
||||||
|
|
||||||
-O text, --output=text
|
-O text, --output=text
|
||||||
When there is a match, instead of outputting the whole line
|
When there is a match, instead of outputting the whole line
|
||||||
that matched, output just the given text. This option is
|
that matched, output just the given text. This option is mu-
|
||||||
mutually exclusive with --only-matching, --file-offsets, and
|
tually exclusive with --only-matching, --file-offsets, and
|
||||||
--line-offsets. Escape sequences starting with a dollar char-
|
--line-offsets. Escape sequences starting with a dollar char-
|
||||||
acter may be used to insert the contents of the matched part
|
acter may be used to insert the contents of the matched part
|
||||||
of the line and/or captured substrings into the text.
|
of the line and/or captured substrings into the text.
|
||||||
|
@ -651,9 +650,9 @@ OPTIONS
|
||||||
of the whole line. In this mode, no context is shown. That
|
of the whole line. In this mode, no context is shown. That
|
||||||
is, the -A, -B, and -C options are ignored. If there is more
|
is, the -A, -B, and -C options are ignored. If there is more
|
||||||
than one match in a line, each of them is shown separately,
|
than one match in a line, each of them is shown separately,
|
||||||
on a separate line of output. If -o is combined with -v
|
on a separate line of output. If -o is combined with -v (in-
|
||||||
(invert the sense of the match to find non-matching lines),
|
vert the sense of the match to find non-matching lines), no
|
||||||
no output is generated, but the return code is set appropri-
|
output is generated, but the return code is set appropri-
|
||||||
ately. If the matched portion of the line is empty, nothing
|
ately. If the matched portion of the line is empty, nothing
|
||||||
is output unless the file name or line number are being
|
is output unless the file name or line number are being
|
||||||
printed, in which case they are shown on an otherwise empty
|
printed, in which case they are shown on an otherwise empty
|
||||||
|
@ -671,8 +670,8 @@ OPTIONS
|
||||||
|
|
||||||
-o0 is the same as -o without a number. Because these options
|
-o0 is the same as -o without a number. Because these options
|
||||||
can be given without an argument (see above), if an argument
|
can be given without an argument (see above), if an argument
|
||||||
is present, it must be given in the same shell item, for
|
is present, it must be given in the same shell item, for ex-
|
||||||
example, -o3 or --only-matching=2. The comments given for the
|
ample, -o3 or --only-matching=2. The comments given for the
|
||||||
non-argument case above also apply to this option. If the
|
non-argument case above also apply to this option. If the
|
||||||
specified capturing parentheses do not exist in the pattern,
|
specified capturing parentheses do not exist in the pattern,
|
||||||
or were not set in the match, nothing is output unless the
|
or were not set in the match, nothing is output unless the
|
||||||
|
@ -704,8 +703,8 @@ OPTIONS
|
||||||
it contains, taking note of any --include and --exclude set-
|
it contains, taking note of any --include and --exclude set-
|
||||||
tings. By default, a directory is read as a normal file; in
|
tings. By default, a directory is read as a normal file; in
|
||||||
some operating systems this gives an immediate end-of-file.
|
some operating systems this gives an immediate end-of-file.
|
||||||
This option is a shorthand for setting the -d option to
|
This option is a shorthand for setting the -d option to "re-
|
||||||
"recurse".
|
curse".
|
||||||
|
|
||||||
--recursion-limit=number
|
--recursion-limit=number
|
||||||
See --match-limit above.
|
See --match-limit above.
|
||||||
|
@ -719,8 +718,8 @@ OPTIONS
|
||||||
This option is useful when scanning more than one file. If
|
This option is useful when scanning more than one file. If
|
||||||
used on its own, -t suppresses all output except for a grand
|
used on its own, -t suppresses all output except for a grand
|
||||||
total number of matching lines (or non-matching lines if -v
|
total number of matching lines (or non-matching lines if -v
|
||||||
is used) in all the files. If -t is used with -c, a grand
|
is used) in all the files. If -t is used with -c, a grand to-
|
||||||
total is output except when the previous output is just one
|
tal is output except when the previous output is just one
|
||||||
line. In other words, it is not output when just one file's
|
line. In other words, it is not output when just one file's
|
||||||
count is listed. If file names are being output, the grand
|
count is listed. If file names are being output, the grand
|
||||||
total is preceded by "TOTAL:". Otherwise, it appears as just
|
total is preceded by "TOTAL:". Otherwise, it appears as just
|
||||||
|
@ -773,10 +772,10 @@ OPTIONS
|
||||||
|
|
||||||
ENVIRONMENT VARIABLES
|
ENVIRONMENT VARIABLES
|
||||||
|
|
||||||
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
The environment variables LC_ALL and LC_CTYPE are examined, in that or-
|
||||||
order, for a locale. The first one that is set is used. This can be
|
der, for a locale. The first one that is set is used. This can be over-
|
||||||
overridden by the --locale option. If no locale is set, the PCRE2
|
ridden by the --locale option. If no locale is set, the PCRE2 library's
|
||||||
library's default (usually the "C" locale) is used.
|
default (usually the "C" locale) is used.
|
||||||
|
|
||||||
|
|
||||||
NEWLINES
|
NEWLINES
|
||||||
|
@ -834,13 +833,13 @@ OPTIONS WITH DATA
|
||||||
--file /some/file
|
--file /some/file
|
||||||
|
|
||||||
Note, however, that if you want to supply a file name beginning with ~
|
Note, however, that if you want to supply a file name beginning with ~
|
||||||
as data in a shell command, and have the shell expand ~ to a home
|
as data in a shell command, and have the shell expand ~ to a home di-
|
||||||
directory, you must separate the file name from the option, because the
|
rectory, you must separate the file name from the option, because the
|
||||||
shell does not treat ~ specially unless it is at the start of an item.
|
shell does not treat ~ specially unless it is at the start of an item.
|
||||||
|
|
||||||
The exceptions to the above are the --colour (or --color) and --only-
|
The exceptions to the above are the --colour (or --color) and --only-
|
||||||
matching options, for which the data is optional. If one of these
|
matching options, for which the data is optional. If one of these op-
|
||||||
options does have data, it must be given in the first form, using an
|
tions does have data, it must be given in the first form, using an
|
||||||
equals character. Otherwise pcre2grep will assume that it has no data.
|
equals character. Otherwise pcre2grep will assume that it has no data.
|
||||||
|
|
||||||
|
|
||||||
|
@ -850,8 +849,8 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
scripts or echoing specific strings during matching by making use of
|
scripts or echoing specific strings during matching by making use of
|
||||||
PCRE2's callout facility. However, this support can be completely or
|
PCRE2's callout facility. However, this support can be completely or
|
||||||
partially disabled when pcre2grep is built. You can find out whether
|
partially disabled when pcre2grep is built. You can find out whether
|
||||||
your binary has support for callouts by running it with the --help
|
your binary has support for callouts by running it with the --help op-
|
||||||
option. If callout support is completely disabled, all callouts in pat-
|
tion. If callout support is completely disabled, all callouts in pat-
|
||||||
terns are ignored by pcre2grep. If the facility is partially disabled,
|
terns are ignored by pcre2grep. If the facility is partially disabled,
|
||||||
calling external programs is not supported, and callouts that request
|
calling external programs is not supported, and callouts that request
|
||||||
it are ignored.
|
it are ignored.
|
||||||
|
@ -875,16 +874,16 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
|
|
||||||
executable_name|arg1|arg2|...
|
executable_name|arg1|arg2|...
|
||||||
|
|
||||||
Any substring (including the executable name) may contain escape
|
Any substring (including the executable name) may contain escape se-
|
||||||
sequences started by a dollar character: $<digits> or ${<digits>} is
|
quences started by a dollar character: $<digits> or ${<digits>} is re-
|
||||||
replaced by the captured substring of the given decimal number, which
|
placed by the captured substring of the given decimal number, which
|
||||||
must be greater than zero. If the number is greater than the number of
|
must be greater than zero. If the number is greater than the number of
|
||||||
capturing substrings, or if the capture is unset, the replacement is
|
capturing substrings, or if the capture is unset, the replacement is
|
||||||
empty.
|
empty.
|
||||||
|
|
||||||
Any other character is substituted by itself. In particular, $$ is
|
Any other character is substituted by itself. In particular, $$ is re-
|
||||||
replaced by a single dollar and $| is replaced by a pipe character.
|
placed by a single dollar and $| is replaced by a pipe character. Here
|
||||||
Here is an example:
|
is an example:
|
||||||
|
|
||||||
echo -e "abcde\n12345" | pcre2grep \
|
echo -e "abcde\n12345" | pcre2grep \
|
||||||
'(?x)(.)(..(.))
|
'(?x)(.)(..(.))
|
||||||
|
@ -914,10 +913,10 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
to the output, having been passed through the same escape processing as
|
to the output, having been passed through the same escape processing as
|
||||||
text from the --output option. This provides a simple echoing facility
|
text from the --output option. This provides a simple echoing facility
|
||||||
that avoids calling an external program or script. No terminator is
|
that avoids calling an external program or script. No terminator is
|
||||||
added to the string, so if you want a newline, you must include it
|
added to the string, so if you want a newline, you must include it ex-
|
||||||
explicitly. Matching continues normally after the string is output. If
|
plicitly. Matching continues normally after the string is output. If
|
||||||
you want to see only the callout output but not any output from an
|
you want to see only the callout output but not any output from an ac-
|
||||||
actual match, you should end the relevant pattern with (*FAIL).
|
tual match, you should end the relevant pattern with (*FAIL).
|
||||||
|
|
||||||
|
|
||||||
MATCHING ERRORS
|
MATCHING ERRORS
|
||||||
|
@ -925,8 +924,8 @@ MATCHING ERRORS
|
||||||
It is possible to supply a regular expression that takes a very long
|
It is possible to supply a regular expression that takes a very long
|
||||||
time to fail to match certain lines. Such patterns normally involve
|
time to fail to match certain lines. Such patterns normally involve
|
||||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||||
line of a's with no final digit. The PCRE2 matching function has a
|
line of a's with no final digit. The PCRE2 matching function has a re-
|
||||||
resource limit that causes it to abort in these circumstances. If this
|
source limit that causes it to abort in these circumstances. If this
|
||||||
happens, pcre2grep outputs an error message and the line that caused
|
happens, pcre2grep outputs an error message and the line that caused
|
||||||
the problem to the standard error stream. If there are more than 20
|
the problem to the standard error stream. If there are more than 20
|
||||||
such errors, pcre2grep gives up.
|
such errors, pcre2grep gives up.
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34"
|
.TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -3564,9 +3564,10 @@ effect as this example; although it would suppress backtracking during the
|
||||||
first match attempt, the second attempt would start at the second character
|
first match attempt, the second attempt would start at the second character
|
||||||
instead of skipping on to "c".
|
instead of skipping on to "c".
|
||||||
.P
|
.P
|
||||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||||
is not later than the starting point of the current match, the position
|
starting position of the current match, or (by being inside a lookbehind)
|
||||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||||
|
"bumpalong" occurs.
|
||||||
.sp
|
.sp
|
||||||
(*SKIP:NAME)
|
(*SKIP:NAME)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3787,6 +3788,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 June 2019
|
Last updated: 22 June 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -13,8 +13,8 @@ SYNOPSIS
|
||||||
but it can also be used for experimenting with regular expressions.
|
but it can also be used for experimenting with regular expressions.
|
||||||
This document describes the features of the test program; for details
|
This document describes the features of the test program; for details
|
||||||
of the regular expressions themselves, see the pcre2pattern documenta-
|
of the regular expressions themselves, see the pcre2pattern documenta-
|
||||||
tion. For details of the PCRE2 library function calls and their
|
tion. For details of the PCRE2 library function calls and their op-
|
||||||
options, see the pcre2api documentation.
|
tions, see the pcre2api documentation.
|
||||||
|
|
||||||
The input for pcre2test is a sequence of regular expression patterns
|
The input for pcre2test is a sequence of regular expression patterns
|
||||||
and subject strings to be matched. There are also command lines for
|
and subject strings to be matched. There are also command lines for
|
||||||
|
@ -33,26 +33,26 @@ SYNOPSIS
|
||||||
which are specifically designed for use in conjunction with the test
|
which are specifically designed for use in conjunction with the test
|
||||||
script and data files that are distributed as part of PCRE2. All the
|
script and data files that are distributed as part of PCRE2. All the
|
||||||
modifiers are documented here, some without much justification, but
|
modifiers are documented here, some without much justification, but
|
||||||
many of them are unlikely to be of use except when testing the
|
many of them are unlikely to be of use except when testing the li-
|
||||||
libraries.
|
braries.
|
||||||
|
|
||||||
|
|
||||||
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||||
|
|
||||||
Different versions of the PCRE2 library can be built to support charac-
|
Different versions of the PCRE2 library can be built to support charac-
|
||||||
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
|
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
|
||||||
One, two, or all three of these libraries may be simultaneously
|
One, two, or all three of these libraries may be simultaneously in-
|
||||||
installed. The pcre2test program can be used to test all the libraries.
|
stalled. The pcre2test program can be used to test all the libraries.
|
||||||
However, its own input and output are always in 8-bit format. When
|
However, its own input and output are always in 8-bit format. When
|
||||||
testing the 16-bit or 32-bit libraries, patterns and subject strings
|
testing the 16-bit or 32-bit libraries, patterns and subject strings
|
||||||
are converted to 16-bit or 32-bit format before being passed to the
|
are converted to 16-bit or 32-bit format before being passed to the li-
|
||||||
library functions. Results are converted back to 8-bit code units for
|
brary functions. Results are converted back to 8-bit code units for
|
||||||
output.
|
output.
|
||||||
|
|
||||||
In the rest of this document, the names of library functions and struc-
|
In the rest of this document, the names of library functions and struc-
|
||||||
tures are given in generic form, for example, pcre_compile(). The
|
tures are given in generic form, for example, pcre_compile(). The ac-
|
||||||
actual names used in the libraries have a suffix _8, _16, or _32, as
|
tual names used in the libraries have a suffix _8, _16, or _32, as ap-
|
||||||
appropriate.
|
propriate.
|
||||||
|
|
||||||
|
|
||||||
INPUT ENCODING
|
INPUT ENCODING
|
||||||
|
@ -70,18 +70,18 @@ INPUT ENCODING
|
||||||
processed for backslash escapes, which makes it possible to include any
|
processed for backslash escapes, which makes it possible to include any
|
||||||
data value in strings that are passed to the library for matching. For
|
data value in strings that are passed to the library for matching. For
|
||||||
patterns, there is a facility for specifying some or all of the 8-bit
|
patterns, there is a facility for specifying some or all of the 8-bit
|
||||||
input characters as hexadecimal pairs, which makes it possible to
|
input characters as hexadecimal pairs, which makes it possible to in-
|
||||||
include binary zeros.
|
clude binary zeros.
|
||||||
|
|
||||||
Input for the 16-bit and 32-bit libraries
|
Input for the 16-bit and 32-bit libraries
|
||||||
|
|
||||||
When testing the 16-bit or 32-bit libraries, there is a need to be able
|
When testing the 16-bit or 32-bit libraries, there is a need to be able
|
||||||
to generate character code points greater than 255 in the strings that
|
to generate character code points greater than 255 in the strings that
|
||||||
are passed to the library. For subject lines, backslash escapes can be
|
are passed to the library. For subject lines, backslash escapes can be
|
||||||
used. In addition, when the utf modifier (see "Setting compilation
|
used. In addition, when the utf modifier (see "Setting compilation op-
|
||||||
options" below) is set, the pattern and any following subject lines are
|
tions" below) is set, the pattern and any following subject lines are
|
||||||
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as
|
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
|
||||||
appropriate.
|
propriate.
|
||||||
|
|
||||||
For non-UTF testing of wide characters, the utf8_input modifier can be
|
For non-UTF testing of wide characters, the utf8_input modifier can be
|
||||||
used. This is mutually exclusive with utf, and is allowed only in
|
used. This is mutually exclusive with utf, and is allowed only in
|
||||||
|
@ -121,8 +121,8 @@ COMMAND LINE OPTIONS
|
||||||
piled.
|
piled.
|
||||||
|
|
||||||
-AC As for -ac, but in addition behave as if each subject line
|
-AC As for -ac, but in addition behave as if each subject line
|
||||||
has the callout_extra modifier, that is, show additional
|
has the callout_extra modifier, that is, show additional in-
|
||||||
information from callouts.
|
formation from callouts.
|
||||||
|
|
||||||
-b Behave as if each pattern has the fullbincode modifier; the
|
-b Behave as if each pattern has the fullbincode modifier; the
|
||||||
full internal binary form of the pattern is output after com-
|
full internal binary form of the pattern is output after com-
|
||||||
|
@ -130,9 +130,9 @@ COMMAND LINE OPTIONS
|
||||||
|
|
||||||
-C Output the version number of the PCRE2 library, and all
|
-C Output the version number of the PCRE2 library, and all
|
||||||
available information about the optional features that are
|
available information about the optional features that are
|
||||||
included, and then exit with zero exit code. All other
|
included, and then exit with zero exit code. All other op-
|
||||||
options are ignored. If both -C and -LM are present, which-
|
tions are ignored. If both -C and -LM are present, whichever
|
||||||
ever is first is recognized.
|
is first is recognized.
|
||||||
|
|
||||||
-C option Output information about a specific build-time option, then
|
-C option Output information about a specific build-time option, then
|
||||||
exit. This functionality is intended for use in scripts such
|
exit. This functionality is intended for use in scripts such
|
||||||
|
@ -269,8 +269,8 @@ DESCRIPTION
|
||||||
supply them explicitly.
|
supply them explicitly.
|
||||||
|
|
||||||
An empty line or the end of the file signals the end of the subject
|
An empty line or the end of the file signals the end of the subject
|
||||||
lines for a test, at which point a new pattern or command line is
|
lines for a test, at which point a new pattern or command line is ex-
|
||||||
expected if there is still input to be read.
|
pected if there is still input to be read.
|
||||||
|
|
||||||
|
|
||||||
COMMAND LINES
|
COMMAND LINES
|
||||||
|
@ -311,8 +311,8 @@ COMMAND LINES
|
||||||
as indicating a newline in a pattern or subject string. The default can
|
as indicating a newline in a pattern or subject string. The default can
|
||||||
be overridden when a pattern is compiled. The standard test files con-
|
be overridden when a pattern is compiled. The standard test files con-
|
||||||
tain tests of various newline conventions, but the majority of the
|
tain tests of various newline conventions, but the majority of the
|
||||||
tests expect a single linefeed to be recognized as a newline by
|
tests expect a single linefeed to be recognized as a newline by de-
|
||||||
default. Without special action the tests would fail when PCRE2 is com-
|
fault. Without special action the tests would fail when PCRE2 is com-
|
||||||
piled with either CR or CRLF as the default newline.
|
piled with either CR or CRLF as the default newline.
|
||||||
|
|
||||||
The #newline_default command specifies a list of newline types that are
|
The #newline_default command specifies a list of newline types that are
|
||||||
|
@ -323,14 +323,14 @@ COMMAND LINES
|
||||||
|
|
||||||
If the default newline is in the list, this command has no effect. Oth-
|
If the default newline is in the list, this command has no effect. Oth-
|
||||||
erwise, except when testing the POSIX API, a newline modifier that
|
erwise, except when testing the POSIX API, a newline modifier that
|
||||||
specifies the first newline convention in the list (LF in the above
|
specifies the first newline convention in the list (LF in the above ex-
|
||||||
example) is added to any pattern that does not already have a newline
|
ample) is added to any pattern that does not already have a newline
|
||||||
modifier. If the newline list is empty, the feature is turned off. This
|
modifier. If the newline list is empty, the feature is turned off. This
|
||||||
command is present in a number of the standard test input files.
|
command is present in a number of the standard test input files.
|
||||||
|
|
||||||
When the POSIX API is being tested there is no way to override the
|
When the POSIX API is being tested there is no way to override the de-
|
||||||
default newline convention, though it is possible to set the newline
|
fault newline convention, though it is possible to set the newline con-
|
||||||
convention from within the pattern. A warning is given if the posix or
|
vention from within the pattern. A warning is given if the posix or
|
||||||
posix_nosub modifier is used when #newline_default would set a default
|
posix_nosub modifier is used when #newline_default would set a default
|
||||||
for the non-POSIX API.
|
for the non-POSIX API.
|
||||||
|
|
||||||
|
@ -344,8 +344,8 @@ COMMAND LINES
|
||||||
The appearance of this line causes all subsequent modifier settings to
|
The appearance of this line causes all subsequent modifier settings to
|
||||||
be checked for compatibility with the perltest.sh script, which is used
|
be checked for compatibility with the perltest.sh script, which is used
|
||||||
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
||||||
comment lines, #pattern commands, and #subject commands that set or
|
comment lines, #pattern commands, and #subject commands that set or un-
|
||||||
unset "mark", no command lines are permitted, because they and many of
|
set "mark", no command lines are permitted, because they and many of
|
||||||
the modifiers are specific to pcre2test, and should not be used in test
|
the modifiers are specific to pcre2test, and should not be used in test
|
||||||
files that are also processed by perltest.sh. The #perltest command
|
files that are also processed by perltest.sh. The #perltest command
|
||||||
helps detect tests that are accidentally put in the wrong file.
|
helps detect tests that are accidentally put in the wrong file.
|
||||||
|
@ -376,8 +376,8 @@ MODIFIER SYNTAX
|
||||||
list are separated by commas followed by optional white space. Trailing
|
list are separated by commas followed by optional white space. Trailing
|
||||||
whitespace in a modifier list is ignored. Some modifiers may be given
|
whitespace in a modifier list is ignored. Some modifiers may be given
|
||||||
for both patterns and subject lines, whereas others are valid only for
|
for both patterns and subject lines, whereas others are valid only for
|
||||||
one or the other. Each modifier has a long name, for example
|
one or the other. Each modifier has a long name, for example "an-
|
||||||
"anchored", and some of them must be followed by an equals sign and a
|
chored", and some of them must be followed by an equals sign and a
|
||||||
value, for example, "offset=12". Values cannot contain comma charac-
|
value, for example, "offset=12". Values cannot contain comma charac-
|
||||||
ters, but may contain spaces. Modifiers that do not take values may be
|
ters, but may contain spaces. Modifiers that do not take values may be
|
||||||
preceded by a minus sign to turn off a previous setting.
|
preceded by a minus sign to turn off a previous setting.
|
||||||
|
@ -498,8 +498,8 @@ SUBJECT LINE SYNTAX
|
||||||
\= This is a comment.
|
\= This is a comment.
|
||||||
abc\= This is an invalid modifier list.
|
abc\= This is an invalid modifier list.
|
||||||
|
|
||||||
A backslash followed by any other non-alphanumeric character just
|
A backslash followed by any other non-alphanumeric character just es-
|
||||||
escapes that character. A backslash followed by anything else causes an
|
capes that character. A backslash followed by anything else causes an
|
||||||
error. However, if the very last character in the line is a backslash
|
error. However, if the very last character in the line is a backslash
|
||||||
(and there is no modifier list), it is ignored. This gives a way of
|
(and there is no modifier list), it is ignored. This gives a way of
|
||||||
passing an empty line as data, since a real empty line terminates the
|
passing an empty line as data, since a real empty line terminates the
|
||||||
|
@ -523,13 +523,13 @@ PATTERN MODIFIERS
|
||||||
The following modifiers set options for pcre2_compile(). Most of them
|
The following modifiers set options for pcre2_compile(). Most of them
|
||||||
set bits in the options argument of that function, but those whose
|
set bits in the options argument of that function, but those whose
|
||||||
names start with PCRE2_EXTRA are additional options that are set in the
|
names start with PCRE2_EXTRA are additional options that are set in the
|
||||||
compile context. For the main options, there are some single-letter
|
compile context. For the main options, there are some single-letter ab-
|
||||||
abbreviations that are the same as Perl options. There is special han-
|
breviations that are the same as Perl options. There is special han-
|
||||||
dling for /x: if a second x is present, PCRE2_EXTENDED is converted
|
dling for /x: if a second x is present, PCRE2_EXTENDED is converted
|
||||||
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds
|
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
|
||||||
PCRE2_EXTENDED as well, though this makes no difference to the way
|
TENDED as well, though this makes no difference to the way pcre2_com-
|
||||||
pcre2_compile() behaves. See pcre2api for a description of the effects
|
pile() behaves. See pcre2api for a description of the effects of these
|
||||||
of these options.
|
options.
|
||||||
|
|
||||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||||
|
@ -577,9 +577,9 @@ PATTERN MODIFIERS
|
||||||
|
|
||||||
Setting compilation controls
|
Setting compilation controls
|
||||||
|
|
||||||
The following modifiers affect the compilation process or request
|
The following modifiers affect the compilation process or request in-
|
||||||
information about the pattern. There are single-letter abbreviations
|
formation about the pattern. There are single-letter abbreviations for
|
||||||
for some that are heavily used in the test files.
|
some that are heavily used in the test files.
|
||||||
|
|
||||||
bsr=[anycrlf|unicode] specify \R handling
|
bsr=[anycrlf|unicode] specify \R handling
|
||||||
/B bincode show binary code without lengths
|
/B bincode show binary code without lengths
|
||||||
|
@ -717,8 +717,8 @@ PATTERN MODIFIERS
|
||||||
minated strings but can be passed by length instead of being zero-ter-
|
minated strings but can be passed by length instead of being zero-ter-
|
||||||
minated. The use_length modifier causes this to happen. Using a length
|
minated. The use_length modifier causes this to happen. Using a length
|
||||||
happens automatically (whether or not use_length is set) when hex is
|
happens automatically (whether or not use_length is set) when hex is
|
||||||
set, because patterns specified in hexadecimal may contain binary
|
set, because patterns specified in hexadecimal may contain binary ze-
|
||||||
zeros.
|
ros.
|
||||||
|
|
||||||
If hex or use_length is used with the POSIX wrapper API (see "Using the
|
If hex or use_length is used with the POSIX wrapper API (see "Using the
|
||||||
POSIX wrapper API" below), the REG_PEND extension is used to pass the
|
POSIX wrapper API" below), the REG_PEND extension is used to pass the
|
||||||
|
@ -770,8 +770,8 @@ PATTERN MODIFIERS
|
||||||
partial modifier in "Subject Modifiers" below for details of how these
|
partial modifier in "Subject Modifiers" below for details of how these
|
||||||
options are specified for each match attempt.
|
options are specified for each match attempt.
|
||||||
|
|
||||||
JIT compilation is requested by the jit pattern modifier, which may
|
JIT compilation is requested by the jit pattern modifier, which may op-
|
||||||
optionally be followed by an equals sign and a number in the range 0 to
|
tionally be followed by an equals sign and a number in the range 0 to
|
||||||
7. The three bits that make up the number specify which of the three
|
7. The three bits that make up the number specify which of the three
|
||||||
JIT operating modes are to be compiled:
|
JIT operating modes are to be compiled:
|
||||||
|
|
||||||
|
@ -799,8 +799,8 @@ PATTERN MODIFIERS
|
||||||
none was compiled for non-partial matching.
|
none was compiled for non-partial matching.
|
||||||
|
|
||||||
If JIT compilation is successful, the compiled JIT code will automati-
|
If JIT compilation is successful, the compiled JIT code will automati-
|
||||||
cally be used when an appropriate type of match is run, except when
|
cally be used when an appropriate type of match is run, except when in-
|
||||||
incompatible run-time options are specified. For more details, see the
|
compatible run-time options are specified. For more details, see the
|
||||||
pcre2jit documentation. See also the jitstack modifier below for a way
|
pcre2jit documentation. See also the jitstack modifier below for a way
|
||||||
of setting the size of the JIT stack.
|
of setting the size of the JIT stack.
|
||||||
|
|
||||||
|
@ -847,8 +847,8 @@ PATTERN MODIFIERS
|
||||||
Limiting nested parentheses
|
Limiting nested parentheses
|
||||||
|
|
||||||
The parens_nest_limit modifier sets a limit on the depth of nested
|
The parens_nest_limit modifier sets a limit on the depth of nested
|
||||||
parentheses in a pattern. Breaching the limit causes a compilation
|
parentheses in a pattern. Breaching the limit causes a compilation er-
|
||||||
error. The default for the library is set when PCRE2 is built, but
|
ror. The default for the library is set when PCRE2 is built, but
|
||||||
pcre2test sets its own default of 220, which is required for running
|
pcre2test sets its own default of 220, which is required for running
|
||||||
the standard test suite.
|
the standard test suite.
|
||||||
|
|
||||||
|
@ -886,13 +886,13 @@ PATTERN MODIFIERS
|
||||||
buffer is too small for the error message. If this modifier has not
|
buffer is too small for the error message. If this modifier has not
|
||||||
been set, a large buffer is used.
|
been set, a large buffer is used.
|
||||||
|
|
||||||
The aftertext and allaftertext subject modifiers work as described
|
The aftertext and allaftertext subject modifiers work as described be-
|
||||||
below. All other modifiers are either ignored, with a warning message,
|
low. All other modifiers are either ignored, with a warning message, or
|
||||||
or cause an error.
|
cause an error.
|
||||||
|
|
||||||
The pattern is passed to regcomp() as a zero-terminated string by
|
The pattern is passed to regcomp() as a zero-terminated string by de-
|
||||||
default, but if the use_length or hex modifiers are set, the REG_PEND
|
fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
|
||||||
extension is used to pass it by length.
|
tension is used to pass it by length.
|
||||||
|
|
||||||
Testing the stack guard feature
|
Testing the stack guard feature
|
||||||
|
|
||||||
|
@ -920,8 +920,8 @@ PATTERN MODIFIERS
|
||||||
2 a set of tables defining ISO 8859 characters
|
2 a set of tables defining ISO 8859 characters
|
||||||
|
|
||||||
In table 2, some characters whose codes are greater than 128 are iden-
|
In table 2, some characters whose codes are greater than 128 are iden-
|
||||||
tified as letters, digits, spaces, etc. Setting alternate character
|
tified as letters, digits, spaces, etc. Setting alternate character ta-
|
||||||
tables and a locale are mutually exclusive.
|
bles and a locale are mutually exclusive.
|
||||||
|
|
||||||
Setting certain match controls
|
Setting certain match controls
|
||||||
|
|
||||||
|
@ -971,12 +971,12 @@ PATTERN MODIFIERS
|
||||||
terns" below. If pushcopy is used instead of push, a copy of the com-
|
terns" below. If pushcopy is used instead of push, a copy of the com-
|
||||||
piled pattern is stacked, leaving the original as current, ready to
|
piled pattern is stacked, leaving the original as current, ready to
|
||||||
match the following input lines. This provides a way of testing the
|
match the following input lines. This provides a way of testing the
|
||||||
pcre2_code_copy() function. The push and pushcopy modifiers are
|
pcre2_code_copy() function. The push and pushcopy modifiers are in-
|
||||||
incompatible with compilation modifiers such as global that act at
|
compatible with compilation modifiers such as global that act at match
|
||||||
match time. Any that are specified are ignored (for the stacked copy),
|
time. Any that are specified are ignored (for the stacked copy), with a
|
||||||
with a warning message, except for replace, which causes an error. Note
|
warning message, except for replace, which causes an error. Note that
|
||||||
that jitverify, which is allowed, does not carry through to any subse-
|
jitverify, which is allowed, does not carry through to any subsequent
|
||||||
quent matching that uses a stacked pattern.
|
matching that uses a stacked pattern.
|
||||||
|
|
||||||
Testing foreign pattern conversion
|
Testing foreign pattern conversion
|
||||||
|
|
||||||
|
@ -1124,12 +1124,12 @@ SUBJECT MODIFIERS
|
||||||
The allusedtext modifier requests that all the text that was consulted
|
The allusedtext modifier requests that all the text that was consulted
|
||||||
during a successful pattern match by the interpreter should be shown.
|
during a successful pattern match by the interpreter should be shown.
|
||||||
This feature is not supported for JIT matching, and if requested with
|
This feature is not supported for JIT matching, and if requested with
|
||||||
JIT it is ignored (with a warning message). Setting this modifier
|
JIT it is ignored (with a warning message). Setting this modifier af-
|
||||||
affects the output if there is a lookbehind at the start of a match, or
|
fects the output if there is a lookbehind at the start of a match, or a
|
||||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
lookahead at the end, or if \K is used in the pattern. Characters that
|
||||||
that precede or follow the start and end of the actual match are indi-
|
precede or follow the start and end of the actual match are indicated
|
||||||
cated in the output by '<' or '>' characters underneath them. Here is
|
in the output by '<' or '>' characters underneath them. Here is an ex-
|
||||||
an example:
|
ample:
|
||||||
|
|
||||||
re> /(?<=pqr)abc(?=xyz)/
|
re> /(?<=pqr)abc(?=xyz)/
|
||||||
data> 123pqrabcxyz456\=allusedtext
|
data> 123pqrabcxyz456\=allusedtext
|
||||||
|
@ -1145,8 +1145,8 @@ SUBJECT MODIFIERS
|
||||||
string. The only time when this occurs is when \K has been processed as
|
string. The only time when this occurs is when \K has been processed as
|
||||||
part of the match. In this situation, the output for the matched string
|
part of the match. In this situation, the output for the matched string
|
||||||
is displayed from the starting character instead of from the match
|
is displayed from the starting character instead of from the match
|
||||||
point, with circumflex characters under the earlier characters. For
|
point, with circumflex characters under the earlier characters. For ex-
|
||||||
example:
|
ample:
|
||||||
|
|
||||||
re> /abc\Kxyz/
|
re> /abc\Kxyz/
|
||||||
data> abcxyz\=startchar
|
data> abcxyz\=startchar
|
||||||
|
@ -1171,12 +1171,12 @@ SUBJECT MODIFIERS
|
||||||
The allvector modifier requests that the entire ovector be shown, what-
|
The allvector modifier requests that the entire ovector be shown, what-
|
||||||
ever the outcome of the match. Compare allcaptures, which shows only up
|
ever the outcome of the match. Compare allcaptures, which shows only up
|
||||||
to the maximum number of capture groups for the pattern, and then only
|
to the maximum number of capture groups for the pattern, and then only
|
||||||
for a successful complete non-DFA match. This modifier, which acts
|
for a successful complete non-DFA match. This modifier, which acts af-
|
||||||
after any match result, and also for DFA matching, provides a means of
|
ter any match result, and also for DFA matching, provides a means of
|
||||||
checking that there are no unexpected modifications to ovector fields.
|
checking that there are no unexpected modifications to ovector fields.
|
||||||
Before each match attempt, the ovector is filled with a special value,
|
Before each match attempt, the ovector is filled with a special value,
|
||||||
and if this is found in both elements of a capturing pair,
|
and if this is found in both elements of a capturing pair, "<un-
|
||||||
"<unchanged>" is output. After a successful match, this applies to all
|
changed>" is output. After a successful match, this applies to all
|
||||||
groups after the maximum capture group for the pattern. In other cases
|
groups after the maximum capture group for the pattern. In other cases
|
||||||
it applies to the entire ovector. After a partial match, the first two
|
it applies to the entire ovector. After a partial match, the first two
|
||||||
elements are the only ones that should be set. After a DFA match, the
|
elements are the only ones that should be set. After a DFA match, the
|
||||||
|
@ -1207,12 +1207,12 @@ SUBJECT MODIFIERS
|
||||||
If an empty string is matched, the next match is done with the
|
If an empty string is matched, the next match is done with the
|
||||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||||
for another, non-empty, match at the same point in the subject. If this
|
for another, non-empty, match at the same point in the subject. If this
|
||||||
match fails, the start offset is advanced, and the normal match is
|
match fails, the start offset is advanced, and the normal match is re-
|
||||||
retried. This imitates the way Perl handles such cases when using the
|
tried. This imitates the way Perl handles such cases when using the /g
|
||||||
/g modifier or the split() function. Normally, the start offset is
|
modifier or the split() function. Normally, the start offset is ad-
|
||||||
advanced by one character, but if the newline convention recognizes
|
vanced by one character, but if the newline convention recognizes CRLF
|
||||||
CRLF as a newline, and the current character is CR followed by LF, an
|
as a newline, and the current character is CR followed by LF, an ad-
|
||||||
advance of two characters occurs.
|
vance of two characters occurs.
|
||||||
|
|
||||||
Testing substring extraction functions
|
Testing substring extraction functions
|
||||||
|
|
||||||
|
@ -1275,8 +1275,8 @@ SUBJECT MODIFIERS
|
||||||
than 256 characters) for substitution tests, as fixed-size buffers are
|
than 256 characters) for substitution tests, as fixed-size buffers are
|
||||||
used. To make it easy to test for buffer overflow, if the replacement
|
used. To make it easy to test for buffer overflow, if the replacement
|
||||||
string starts with a number in square brackets, that number is passed
|
string starts with a number in square brackets, that number is passed
|
||||||
to pcre2_substitute() as the size of the output buffer, with the
|
to pcre2_substitute() as the size of the output buffer, with the re-
|
||||||
replacement string starting at the next character. Here is an example
|
placement string starting at the next character. Here is an example
|
||||||
that tests the edge case:
|
that tests the edge case:
|
||||||
|
|
||||||
/abc/
|
/abc/
|
||||||
|
@ -1285,10 +1285,10 @@ SUBJECT MODIFIERS
|
||||||
123abc123\=replace=[9]XYZ
|
123abc123\=replace=[9]XYZ
|
||||||
Failed: error -47: no more memory
|
Failed: error -47: no more memory
|
||||||
|
|
||||||
The default action of pcre2_substitute() is to return
|
The default action of pcre2_substitute() is to return PCRE2_ER-
|
||||||
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
|
ROR_NOMEMORY when the output buffer is too small. However, if the
|
||||||
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
|
||||||
stitute_overflow_length modifier), pcre2_substitute() continues to go
|
tute_overflow_length modifier), pcre2_substitute() continues to go
|
||||||
through the motions of matching and substituting (but not doing any
|
through the motions of matching and substituting (but not doing any
|
||||||
callouts), in order to compute the size of buffer that is required.
|
callouts), in order to compute the size of buffer that is required.
|
||||||
When this happens, pcre2test shows the required buffer length (which
|
When this happens, pcre2test shows the required buffer length (which
|
||||||
|
@ -1323,8 +1323,8 @@ SUBJECT MODIFIERS
|
||||||
Then are listed the offsets of the old substring, its contents, and the
|
Then are listed the offsets of the old substring, its contents, and the
|
||||||
same for the replacement.
|
same for the replacement.
|
||||||
|
|
||||||
By default, the substitution callout function returns zero, which
|
By default, the substitution callout function returns zero, which ac-
|
||||||
accepts the replacement and causes matching to continue if /g was used.
|
cepts the replacement and causes matching to continue if /g was used.
|
||||||
Two further modifiers can be used to test other return values. If sub-
|
Two further modifiers can be used to test other return values. If sub-
|
||||||
stitute_skip is set to a value greater than zero the callout function
|
stitute_skip is set to a value greater than zero the callout function
|
||||||
returns +1 for the match of that number, and similarly substitute_stop
|
returns +1 for the match of that number, and similarly substitute_stop
|
||||||
|
@ -1411,8 +1411,8 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||||
ory allocation and freeing calls that occur during a call to
|
ory allocation and freeing calls that occur during a call to
|
||||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match
|
pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
|
||||||
requires a bigger vector than the default for remembering backtracking
|
quires a bigger vector than the default for remembering backtracking
|
||||||
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
||||||
In many cases there will be no heap memory used and therefore no addi-
|
In many cases there will be no heap memory used and therefore no addi-
|
||||||
tional output. No heap memory is allocated during matching with JIT, so
|
tional output. No heap memory is allocated during matching with JIT, so
|
||||||
|
@ -1435,9 +1435,9 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
Setting the size of the output vector
|
Setting the size of the output vector
|
||||||
|
|
||||||
The ovector modifier applies only to the subject line in which it
|
The ovector modifier applies only to the subject line in which it ap-
|
||||||
appears, though of course it can also be used to set a default in a
|
pears, though of course it can also be used to set a default in a #sub-
|
||||||
#subject command. It specifies the number of pairs of offsets that are
|
ject command. It specifies the number of pairs of offsets that are
|
||||||
available for storing matching information. The default is 15.
|
available for storing matching information. The default is 15.
|
||||||
|
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
|
@ -1491,12 +1491,12 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
|
|
||||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||||
strings, starting with number 0 for the string that matched the whole
|
strings, starting with number 0 for the string that matched the whole
|
||||||
pattern. Otherwise, it outputs "No match" when the return is
|
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
|
||||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
ROR_NOMATCH, or "Partial match:" followed by the partially matching
|
||||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
|
||||||
this is the entire substring that was inspected during the partial
|
the entire substring that was inspected during the partial match; it
|
||||||
match; it may include characters before the actual match start if a
|
may include characters before the actual match start if a lookbehind
|
||||||
lookbehind assertion, \K, \b, or \B was involved.)
|
assertion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||||
and a short descriptive phrase. If the error is a failed UTF string
|
and a short descriptive phrase. If the error is a failed UTF string
|
||||||
|
@ -1541,8 +1541,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: cat
|
0: cat
|
||||||
0+ aract
|
0+ aract
|
||||||
|
|
||||||
If global matching is requested, the results of successive matching
|
If global matching is requested, the results of successive matching at-
|
||||||
attempts are output in sequence, like this:
|
tempts are output in sequence, like this:
|
||||||
|
|
||||||
re> /\Bi(\w\w)/g
|
re> /\Bi(\w\w)/g
|
||||||
data> Mississippi
|
data> Mississippi
|
||||||
|
@ -1580,12 +1580,12 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
2: tan
|
2: tan
|
||||||
|
|
||||||
Using the normal matching function on this data finds only "tang". The
|
Using the normal matching function on this data finds only "tang". The
|
||||||
longest matching string is always given first (and numbered zero).
|
longest matching string is always given first (and numbered zero). Af-
|
||||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
|
||||||
followed by the partially matching substring. Note that this is the
|
lowed by the partially matching substring. Note that this is the entire
|
||||||
entire substring that was inspected during the partial match; it may
|
substring that was inspected during the partial match; it may include
|
||||||
include characters before the actual match start if a lookbehind asser-
|
characters before the actual match start if a lookbehind assertion, \b,
|
||||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
or \B was involved. (\K is not supported for DFA matching.)
|
||||||
|
|
||||||
If global matching is requested, the search for further matches resumes
|
If global matching is requested, the search for further matches resumes
|
||||||
at the end of the longest match. For example:
|
at the end of the longest match. For example:
|
||||||
|
@ -1638,12 +1638,12 @@ CALLOUTS
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match
|
This output indicates that callout number 0 occurred for a match at-
|
||||||
attempt starting at the fourth character of the subject string, when
|
tempt starting at the fourth character of the subject string, when the
|
||||||
the pointer was at the seventh character, and when the next pattern
|
pointer was at the seventh character, and when the next pattern item
|
||||||
item was \d. Just one circumflex is output if the start and current
|
was \d. Just one circumflex is output if the start and current posi-
|
||||||
positions are the same, or if the current position precedes the start
|
tions are the same, or if the current position precedes the start posi-
|
||||||
position, which can happen if the callout is in a lookbehind assertion.
|
tion, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the auto_callout pattern modifier. In this case, instead of
|
a result of the auto_callout pattern modifier. In this case, instead of
|
||||||
|
@ -1660,8 +1660,8 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For ex-
|
||||||
example:
|
ample:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
data> abc
|
data> abc
|
||||||
|
@ -1683,8 +1683,8 @@ CALLOUTS
|
||||||
|
|
||||||
The output for a callout with a string argument is similar, except that
|
The output for a callout with a string argument is similar, except that
|
||||||
instead of outputting a callout number before the position indicators,
|
instead of outputting a callout number before the position indicators,
|
||||||
the callout string and its offset in the pattern string are output
|
the callout string and its offset in the pattern string are output be-
|
||||||
before the reflection of the subject string, and the subject string is
|
fore the reflection of the subject string, and the subject string is
|
||||||
reflected for each callout. For example:
|
reflected for each callout. For example:
|
||||||
|
|
||||||
re> /^ab(?C'first')cd(?C"second")ef/
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -1800,9 +1800,9 @@ NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the locale modifier). In this case, the
|
set for the pattern (using the locale modifier). In this case, the is-
|
||||||
isprint() function is used to distinguish printing and non-printing
|
print() function is used to distinguish printing and non-printing char-
|
||||||
characters.
|
acters.
|
||||||
|
|
||||||
|
|
||||||
SAVING AND RESTORING COMPILED PATTERNS
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
@ -1814,14 +1814,14 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
compiled patterns can be saved they must be serialized, that is, con-
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
verted to a stream of bytes. A single byte stream may contain any num-
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
ber of compiled patterns, but they must all use the same character
|
ber of compiled patterns, but they must all use the same character ta-
|
||||||
tables. A single copy of the tables is included in the byte stream (its
|
bles. A single copy of the tables is included in the byte stream (its
|
||||||
size is 1088 bytes).
|
size is 1088 bytes).
|
||||||
|
|
||||||
The functions whose names begin with pcre2_serialize_ are used for
|
The functions whose names begin with pcre2_serialize_ are used for se-
|
||||||
serializing and de-serializing. They are described in the pcre2serial-
|
rializing and de-serializing. They are described in the pcre2serialize
|
||||||
ize documentation. In this section we describe the features of
|
documentation. In this section we describe the features of pcre2test
|
||||||
pcre2test that can be used to test these functions.
|
that can be used to test these functions.
|
||||||
|
|
||||||
Note that "serialization" in PCRE2 does not convert compiled patterns
|
Note that "serialization" in PCRE2 does not convert compiled patterns
|
||||||
to an abstract format like Java or .NET. It just makes a reloadable
|
to an abstract format like Java or .NET. It just makes a reloadable
|
||||||
|
@ -1831,8 +1831,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
||||||
expects the next line to contain a new pattern (or command) instead of
|
expects the next line to contain a new pattern (or command) instead of
|
||||||
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
||||||
compiled pattern to be stacked, leaving the original available for
|
compiled pattern to be stacked, leaving the original available for im-
|
||||||
immediate matching. By using push and/or pushcopy, a number of patterns
|
mediate matching. By using push and/or pushcopy, a number of patterns
|
||||||
can be compiled and retained. These modifiers are incompatible with
|
can be compiled and retained. These modifiers are incompatible with
|
||||||
posix, and control modifiers that act at match time are ignored (with a
|
posix, and control modifiers that act at match time are ignored (with a
|
||||||
message) for the stacked patterns. The jitverify modifier applies only
|
message) for the stacked patterns. The jitverify modifier applies only
|
||||||
|
@ -1855,8 +1855,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
matched with the pattern, terminated as usual by an empty line or end
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
of file. This command may be followed by a modifier list containing
|
of file. This command may be followed by a modifier list containing
|
||||||
only control modifiers that act after a pattern has been compiled. In
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
particular, hex, posix, posix_nosub, push, and pushcopy are not al-
|
||||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
lowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||||
however permitted. Here is an example that saves and reloads two pat-
|
however permitted. Here is an example that saves and reloads two pat-
|
||||||
terns.
|
terns.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue