Documentation update.
This commit is contained in:
parent
a89423624d
commit
c6ee84317d
|
@ -3525,9 +3525,10 @@ first match attempt, the second attempt would start at the second character
|
|||
instead of skipping on to "c".
|
||||
</P>
|
||||
<P>
|
||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
||||
is not later than the starting point of the current match, the position
|
||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
||||
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||
starting position of the current match, or (by being inside a lookbehind)
|
||||
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||
"bumpalong" occurs.
|
||||
<pre>
|
||||
(*SKIP:NAME)
|
||||
</pre>
|
||||
|
@ -3754,7 +3755,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 June 2019
|
||||
Last updated: 22 June 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -16,8 +16,8 @@ DESCRIPTION
|
|||
|
||||
pcre2-config returns the configuration of the installed PCRE2 libraries
|
||||
and the options required to compile a program to use them. Some of the
|
||||
options apply only to the 8-bit, or 16-bit, or 32-bit libraries,
|
||||
respectively, and are not available for libraries that have not been
|
||||
options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re-
|
||||
spectively, and are not available for libraries that have not been
|
||||
built. If an unavailable option is encountered, the "usage" information
|
||||
is output.
|
||||
|
||||
|
@ -36,30 +36,30 @@ OPTIONS
|
|||
--version Writes the version number of the installed PCRE2 libraries to
|
||||
the standard output.
|
||||
|
||||
--libs8 Writes to the standard output the command line options
|
||||
required to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
||||
--libs8 Writes to the standard output the command line options re-
|
||||
quired to link with the 8-bit PCRE2 library (-lpcre2-8 on
|
||||
many systems).
|
||||
|
||||
--libs16 Writes to the standard output the command line options
|
||||
required to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
||||
--libs16 Writes to the standard output the command line options re-
|
||||
quired to link with the 16-bit PCRE2 library (-lpcre2-16 on
|
||||
many systems).
|
||||
|
||||
--libs32 Writes to the standard output the command line options
|
||||
required to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
||||
--libs32 Writes to the standard output the command line options re-
|
||||
quired to link with the 32-bit PCRE2 library (-lpcre2-32 on
|
||||
many systems).
|
||||
|
||||
--libs-posix
|
||||
Writes to the standard output the command line options
|
||||
required to link with PCRE2's POSIX API wrapper library
|
||||
Writes to the standard output the command line options re-
|
||||
quired to link with PCRE2's POSIX API wrapper library
|
||||
(-lpcre2-posix -lpcre2-8 on many systems).
|
||||
|
||||
--cflags Writes to the standard output the command line options
|
||||
required to compile files that use PCRE2 (this may include
|
||||
some -I options, but is blank on many systems).
|
||||
--cflags Writes to the standard output the command line options re-
|
||||
quired to compile files that use PCRE2 (this may include some
|
||||
-I options, but is blank on many systems).
|
||||
|
||||
--cflags-posix
|
||||
Writes to the standard output the command line options
|
||||
required to compile files that use PCRE2's POSIX API wrapper
|
||||
Writes to the standard output the command line options re-
|
||||
quired to compile files that use PCRE2's POSIX API wrapper
|
||||
library (this may include some -I options, but is blank on
|
||||
many systems).
|
||||
|
||||
|
|
1805
doc/pcre2.txt
1805
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -12,11 +12,11 @@ SYNOPSIS
|
|||
DESCRIPTION
|
||||
|
||||
pcre2grep searches files for character patterns, in the same way as
|
||||
other grep commands do, but it uses the PCRE2 regular expression
|
||||
library to support patterns that are compatible with the regular
|
||||
expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
|
||||
of pattern syntax, or pcre2pattern(3) for a full description of the
|
||||
syntax and semantics of the regular expressions that PCRE2 supports.
|
||||
other grep commands do, but it uses the PCRE2 regular expression li-
|
||||
brary to support patterns that are compatible with the regular expres-
|
||||
sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of
|
||||
pattern syntax, or pcre2pattern(3) for a full description of the syntax
|
||||
and semantics of the regular expressions that PCRE2 supports.
|
||||
|
||||
Patterns, whether supplied on the command line or in a separate file,
|
||||
are given without delimiters. For example:
|
||||
|
@ -26,8 +26,8 @@ DESCRIPTION
|
|||
If you attempt to use delimiters (for example, by surrounding a pattern
|
||||
with slashes, as is common in Perl scripts), they are interpreted as
|
||||
part of the pattern. Quotes can of course be used to delimit patterns
|
||||
on the command line because they are interpreted by the shell, and
|
||||
indeed quotes are required if a pattern contains white space or shell
|
||||
on the command line because they are interpreted by the shell, and in-
|
||||
deed quotes are required if a pattern contains white space or shell
|
||||
metacharacters.
|
||||
|
||||
The first argument that follows any option settings is treated as the
|
||||
|
@ -54,8 +54,8 @@ DESCRIPTION
|
|||
controlled by parameters that can be set by the --buffer-size and
|
||||
--max-buffer-size options. The first of these sets the size of buffer
|
||||
that is obtained at the start of processing. If an input file contains
|
||||
very long lines, a larger buffer may be needed; this is handled by
|
||||
automatically extending the buffer, up to the limit specified by --max-
|
||||
very long lines, a larger buffer may be needed; this is handled by au-
|
||||
tomatically extending the buffer, up to the limit specified by --max-
|
||||
buffer-size. The default values for these parameters can be set when
|
||||
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||
20KiB and 1MiB respectively. An error occurs if a line is too long and
|
||||
|
@ -75,12 +75,12 @@ DESCRIPTION
|
|||
By default, as soon as one pattern matches a line, no further patterns
|
||||
are considered. However, if --colour (or --color) is used to colour the
|
||||
matching substrings, or if --only-matching, --file-offsets, or --line-
|
||||
offsets is used to output only the part of the line that matched
|
||||
(either shown literally, or as an offset), scanning resumes immediately
|
||||
offsets is used to output only the part of the line that matched (ei-
|
||||
ther shown literally, or as an offset), scanning resumes immediately
|
||||
following the match, so that further matches on the same line can be
|
||||
found. If there are multiple patterns, they are all tried on the
|
||||
remainder of the line, but patterns that follow the one that matched
|
||||
are not tried on the earlier part of the line.
|
||||
found. If there are multiple patterns, they are all tried on the re-
|
||||
mainder of the line, but patterns that follow the one that matched are
|
||||
not tried on the earlier part of the line.
|
||||
|
||||
This behaviour means that the order in which multiple patterns are
|
||||
specified can affect the output when one of the above options is used.
|
||||
|
@ -89,11 +89,11 @@ DESCRIPTION
|
|||
overlap).
|
||||
|
||||
Patterns that can match an empty string are accepted, but empty string
|
||||
matches are never recognized. An example is the pattern
|
||||
"(super)?(man)?", in which all components are optional. This pattern
|
||||
finds all occurrences of both "super" and "man"; the output differs
|
||||
from matching with "super|man" when only the matching substrings are
|
||||
being shown.
|
||||
matches are never recognized. An example is the pattern "(su-
|
||||
per)?(man)?", in which all components are optional. This pattern finds
|
||||
all occurrences of both "super" and "man"; the output differs from
|
||||
matching with "super|man" when only the matching substrings are being
|
||||
shown.
|
||||
|
||||
If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
|
||||
the value to set a locale when calling the PCRE2 library. The --locale
|
||||
|
@ -116,10 +116,9 @@ BINARY FILES
|
|||
By default, a file that contains a binary zero byte within the first
|
||||
1024 bytes is identified as a binary file, and is processed specially.
|
||||
(GNU grep identifies binary files in this manner.) However, if the new-
|
||||
line type is specified as "nul", that is, the line terminator is a
|
||||
binary zero, the test for a binary file is not applied. See the
|
||||
--binary-files option for a means of changing the way binary files are
|
||||
handled.
|
||||
line type is specified as "nul", that is, the line terminator is a bi-
|
||||
nary zero, the test for a binary file is not applied. See the --binary-
|
||||
files option for a means of changing the way binary files are handled.
|
||||
|
||||
|
||||
BINARY ZEROS IN PATTERNS
|
||||
|
@ -148,12 +147,12 @@ OPTIONS
|
|||
Output up to number lines of context after each matching
|
||||
line. Fewer lines are output if the next match or the end of
|
||||
the file is reached, or if the processing buffer size has
|
||||
been set too small. If file names and/or line numbers are
|
||||
being output, a hyphen separator is used instead of a colon
|
||||
for the context lines. A line containing "--" is output
|
||||
between each group of lines, unless they are in fact contigu-
|
||||
ous in the input file. The value of number is expected to be
|
||||
relatively small. When -c is used, -A is ignored.
|
||||
been set too small. If file names and/or line numbers are be-
|
||||
ing output, a hyphen separator is used instead of a colon for
|
||||
the context lines. A line containing "--" is output between
|
||||
each group of lines, unless they are in fact contiguous in
|
||||
the input file. The value of number is expected to be rela-
|
||||
tively small. When -c is used, -A is ignored.
|
||||
|
||||
-a, --text
|
||||
Treat binary files as text. This is equivalent to --binary-
|
||||
|
@ -164,26 +163,26 @@ OPTIONS
|
|||
line. Fewer lines are output if the previous match or the
|
||||
start of the file is within number lines, or if the process-
|
||||
ing buffer size has been set too small. If file names and/or
|
||||
line numbers are being output, a hyphen separator is used
|
||||
instead of a colon for the context lines. A line containing
|
||||
line numbers are being output, a hyphen separator is used in-
|
||||
stead of a colon for the context lines. A line containing
|
||||
"--" is output between each group of lines, unless they are
|
||||
in fact contiguous in the input file. The value of number is
|
||||
expected to be relatively small. When -c is used, -B is
|
||||
ignored.
|
||||
expected to be relatively small. When -c is used, -B is ig-
|
||||
nored.
|
||||
|
||||
--binary-files=word
|
||||
Specify how binary files are to be processed. If the word is
|
||||
"binary" (the default), pattern matching is performed on
|
||||
binary files, but the only output is "Binary file <name>
|
||||
"binary" (the default), pattern matching is performed on bi-
|
||||
nary files, but the only output is "Binary file <name>
|
||||
matches" when a match succeeds. If the word is "text", which
|
||||
is equivalent to the -a or --text option, binary files are
|
||||
processed in the same way as any other file. In this case,
|
||||
when a match succeeds, the output may be binary garbage,
|
||||
which can have nasty effects if sent to a terminal. If the
|
||||
word is "without-match", which is equivalent to the -I
|
||||
option, binary files are not processed at all; they are
|
||||
assumed not to be of interest and are skipped without causing
|
||||
any output or affecting the return code.
|
||||
word is "without-match", which is equivalent to the -I op-
|
||||
tion, binary files are not processed at all; they are assumed
|
||||
not to be of interest and are skipped without causing any
|
||||
output or affecting the return code.
|
||||
|
||||
--buffer-size=number
|
||||
Set the parameter that controls how much memory is obtained
|
||||
|
@ -208,10 +207,10 @@ OPTIONS
|
|||
If no lines are selected, the number zero is output. If sev-
|
||||
eral files are are being scanned, a count is output for each
|
||||
of them and the -t option can be used to cause a total to be
|
||||
output at the end. However, if the --files-with-matches
|
||||
option is also used, only those files whose counts are
|
||||
greater than zero are listed. When -c is used, the -A, -B,
|
||||
and -C options are ignored.
|
||||
output at the end. However, if the --files-with-matches op-
|
||||
tion is also used, only those files whose counts are greater
|
||||
than zero are listed. When -c is used, the -A, -B, and -C op-
|
||||
tions are ignored.
|
||||
|
||||
--colour, --color
|
||||
If this option is given without any data, it is equivalent to
|
||||
|
@ -238,8 +237,8 @@ OPTIONS
|
|||
semicolon, except in the case of GREP_COLORS, which must
|
||||
start with "ms=" or "mt=" followed by two semicolon-separated
|
||||
colours, terminated by the end of the string or by a colon.
|
||||
If GREP_COLORS does not start with "ms=" or "mt=" it is
|
||||
ignored, and GREP_COLOR is checked.
|
||||
If GREP_COLORS does not start with "ms=" or "mt=" it is ig-
|
||||
nored, and GREP_COLOR is checked.
|
||||
|
||||
If the string obtained from one of the above variables con-
|
||||
tains any characters other than semicolon or digits, the set-
|
||||
|
@ -250,9 +249,9 @@ OPTIONS
|
|||
set, the default is "1;31", which gives red.
|
||||
|
||||
-D action, --devices=action
|
||||
If an input path is not a regular file or a directory,
|
||||
"action" specifies how it is to be processed. Valid values
|
||||
are "read" (the default) or "skip" (silently skip the path).
|
||||
If an input path is not a regular file or a directory, "ac-
|
||||
tion" specifies how it is to be processed. Valid values are
|
||||
"read" (the default) or "skip" (silently skip the path).
|
||||
|
||||
-d action, --directories=action
|
||||
If an input path is a directory, "action" specifies how it is
|
||||
|
@ -261,8 +260,8 @@ OPTIONS
|
|||
"recurse" (equivalent to the -r option), or "skip" (silently
|
||||
skip the path, the default in Windows environments). In the
|
||||
"read" case, directories are read as if they were ordinary
|
||||
files. In some operating systems the effect of reading a
|
||||
directory like this is an immediate end-of-file; in others it
|
||||
files. In some operating systems the effect of reading a di-
|
||||
rectory like this is an immediate end-of-file; in others it
|
||||
may provoke an error.
|
||||
|
||||
--depth-limit=number
|
||||
|
@ -295,8 +294,8 @@ OPTIONS
|
|||
whether listed on the command line, obtained from --file-
|
||||
list, or by scanning a directory. The pattern is a PCRE2 reg-
|
||||
ular expression, and is matched against the final component
|
||||
of the file name, not the entire path. The -F, -w, and -x
|
||||
options do not apply to this pattern. The option may be given
|
||||
of the file name, not the entire path. The -F, -w, and -x op-
|
||||
tions do not apply to this pattern. The option may be given
|
||||
any number of times in order to specify multiple patterns. If
|
||||
a file name matches both an --include and an --exclude pat-
|
||||
tern, it is excluded. There is no short form for this option.
|
||||
|
@ -310,29 +309,29 @@ OPTIONS
|
|||
|
||||
--exclude-dir=pattern
|
||||
Directories whose names match the pattern are skipped without
|
||||
being processed, whatever the setting of the --recursive
|
||||
option. This applies to all directories, whether listed on
|
||||
the command line, obtained from --file-list, or by scanning a
|
||||
being processed, whatever the setting of the --recursive op-
|
||||
tion. This applies to all directories, whether listed on the
|
||||
command line, obtained from --file-list, or by scanning a
|
||||
parent directory. The pattern is a PCRE2 regular expression,
|
||||
and is matched against the final component of the directory
|
||||
name, not the entire path. The -F, -w, and -x options do not
|
||||
apply to this pattern. The option may be given any number of
|
||||
times in order to specify more than one pattern. If a direc-
|
||||
tory matches both --include-dir and --exclude-dir, it is
|
||||
excluded. There is no short form for this option.
|
||||
tory matches both --include-dir and --exclude-dir, it is ex-
|
||||
cluded. There is no short form for this option.
|
||||
|
||||
-F, --fixed-strings
|
||||
Interpret each data-matching pattern as a list of fixed
|
||||
strings, separated by newlines, instead of as a regular
|
||||
expression. What constitutes a newline for this purpose is
|
||||
controlled by the --newline option. The -w (match as a word)
|
||||
and -x (match whole line) options can be used with -F. They
|
||||
apply to each of the fixed strings. A line is selected if any
|
||||
strings, separated by newlines, instead of as a regular ex-
|
||||
pression. What constitutes a newline for this purpose is con-
|
||||
trolled by the --newline option. The -w (match as a word) and
|
||||
-x (match whole line) options can be used with -F. They ap-
|
||||
ply to each of the fixed strings. A line is selected if any
|
||||
of the fixed strings are found in it (subject to -w or -x, if
|
||||
present). This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to
|
||||
patterns specified by any of the --include or --exclude
|
||||
options.
|
||||
patterns specified by any of the --include or --exclude op-
|
||||
tions.
|
||||
|
||||
-f filename, --file=filename
|
||||
Read patterns from the file, one per line, and match them
|
||||
|
@ -360,8 +359,8 @@ OPTIONS
|
|||
--file-list=filename
|
||||
Read a list of files and/or directories that are to be
|
||||
scanned from the given file, one per line. What constitutes a
|
||||
newline when reading the file is the operating system's
|
||||
default. Trailing white space is removed from each line, and
|
||||
newline when reading the file is the operating system's de-
|
||||
fault. Trailing white space is removed from each line, and
|
||||
blank lines are ignored. These paths are processed before any
|
||||
that are listed on the command line. The file name can be
|
||||
given as "-" to refer to the standard input. If --file and
|
||||
|
@ -388,8 +387,8 @@ OPTIONS
|
|||
is used. If a line number is also being output, it follows
|
||||
the file name. When the -M option causes a pattern to match
|
||||
more than one line, only the first is preceded by the file
|
||||
name. This option overrides any previous -h, -l, or -L
|
||||
options.
|
||||
name. This option overrides any previous -h, -l, or -L op-
|
||||
tions.
|
||||
|
||||
-h, --no-filename
|
||||
Suppress the output file names when searching multiple files.
|
||||
|
@ -415,16 +414,16 @@ OPTIONS
|
|||
--include=pattern
|
||||
If any --include patterns are specified, the only files that
|
||||
are processed are those that match one of the patterns (and
|
||||
do not match an --exclude pattern). This option does not
|
||||
affect directories, but it applies to all files, whether
|
||||
listed on the command line, obtained from --file-list, or by
|
||||
scanning a directory. The pattern is a PCRE2 regular expres-
|
||||
sion, and is matched against the final component of the file
|
||||
name, not the entire path. The -F, -w, and -x options do not
|
||||
apply to this pattern. The option may be given any number of
|
||||
times. If a file name matches both an --include and an
|
||||
--exclude pattern, it is excluded. There is no short form
|
||||
for this option.
|
||||
do not match an --exclude pattern). This option does not af-
|
||||
fect directories, but it applies to all files, whether listed
|
||||
on the command line, obtained from --file-list, or by scan-
|
||||
ning a directory. The pattern is a PCRE2 regular expression,
|
||||
and is matched against the final component of the file name,
|
||||
not the entire path. The -F, -w, and -x options do not apply
|
||||
to this pattern. The option may be given any number of times.
|
||||
If a file name matches both an --include and an --exclude
|
||||
pattern, it is excluded. There is no short form for this op-
|
||||
tion.
|
||||
|
||||
--include-from=filename
|
||||
Treat each non-empty line of the file as the data for an
|
||||
|
@ -438,8 +437,8 @@ OPTIONS
|
|||
tories that are processed are those that match one of the
|
||||
patterns (and do not match an --exclude-dir pattern). This
|
||||
applies to all directories, whether listed on the command
|
||||
line, obtained from --file-list, or by scanning a parent
|
||||
directory. The pattern is a PCRE2 regular expression, and is
|
||||
line, obtained from --file-list, or by scanning a parent di-
|
||||
rectory. The pattern is a PCRE2 regular expression, and is
|
||||
matched against the final component of the directory name,
|
||||
not the entire path. The -F, -w, and -x options do not apply
|
||||
to this pattern. The option may be given any number of times.
|
||||
|
@ -480,8 +479,8 @@ OPTIONS
|
|||
flushed by the operating system. This option can be useful
|
||||
when the input or output is attached to a pipe and you do not
|
||||
want pcre2grep to buffer up large amounts of data. However,
|
||||
its use will affect performance, and the -M (multiline)
|
||||
option ceases to work. When input is from a compressed .gz or
|
||||
its use will affect performance, and the -M (multiline) op-
|
||||
tion ceases to work. When input is from a compressed .gz or
|
||||
.bz2 file, --line-buffered is ignored.
|
||||
|
||||
--line-offsets
|
||||
|
@ -498,9 +497,9 @@ OPTIONS
|
|||
--locale=locale-name
|
||||
This option specifies a locale to be used for pattern match-
|
||||
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
|
||||
ronment variables. If no locale is specified, the PCRE2
|
||||
library's default (usually the "C" locale) is used. There is
|
||||
no short form for this option.
|
||||
ronment variables. If no locale is specified, the PCRE2 li-
|
||||
brary's default (usually the "C" locale) is used. There is no
|
||||
short form for this option.
|
||||
|
||||
--match-limit=number
|
||||
Processing some regular expression patterns may take a very
|
||||
|
@ -509,13 +508,13 @@ OPTIONS
|
|||
options that set resource limits for matching.
|
||||
|
||||
The --match-limit option provides a means of limiting comput-
|
||||
ing resource usage when processing patterns that are not
|
||||
going to match, but which have a very large number of possi-
|
||||
bilities in their search trees. The classic example is a pat-
|
||||
tern that uses nested unlimited repeats. Internally, PCRE2
|
||||
has a counter that is incremented each time around its main
|
||||
processing loop. If the value set by --match-limit is
|
||||
reached, an error occurs.
|
||||
ing resource usage when processing patterns that are not go-
|
||||
ing to match, but which have a very large number of possibil-
|
||||
ities in their search trees. The classic example is a pattern
|
||||
that uses nested unlimited repeats. Internally, PCRE2 has a
|
||||
counter that is incremented each time around its main pro-
|
||||
cessing loop. If the value set by --match-limit is reached,
|
||||
an error occurs.
|
||||
|
||||
The --heap-limit option specifies, as a number of kibibytes
|
||||
(units of 1024 bytes), the amount of heap memory that may be
|
||||
|
@ -567,10 +566,10 @@ OPTIONS
|
|||
|
||||
pcre2grep -M 'regular\s+expression' <file>
|
||||
|
||||
The \s escape sequence matches any white space character,
|
||||
including newlines, and is followed by + so as to match
|
||||
trailing white space on the first line as well as possibly
|
||||
handling a two-character newline sequence.
|
||||
The \s escape sequence matches any white space character, in-
|
||||
cluding newlines, and is followed by + so as to match trail-
|
||||
ing white space on the first line as well as possibly han-
|
||||
dling a two-character newline sequence.
|
||||
|
||||
There is a limit to the number of lines that can be matched,
|
||||
imposed by the way that pcre2grep buffers the input file as
|
||||
|
@ -579,30 +578,30 @@ OPTIONS
|
|||
when input is read line by line (see --line-buffered.)
|
||||
|
||||
-N newline-type, --newline=newline-type
|
||||
The PCRE2 library supports five different conventions for
|
||||
indicating the ends of lines. They are the single-character
|
||||
sequences CR (carriage return) and LF (linefeed), the two-
|
||||
character sequence CRLF, an "anycrlf" convention, which rec-
|
||||
ognizes any of the preceding three types, and an "any" con-
|
||||
vention, in which any Unicode line ending sequence is assumed
|
||||
to end a line. The Unicode sequences are the three just men-
|
||||
The PCRE2 library supports five different conventions for in-
|
||||
dicating the ends of lines. They are the single-character se-
|
||||
quences CR (carriage return) and LF (linefeed), the two-char-
|
||||
acter sequence CRLF, an "anycrlf" convention, which recog-
|
||||
nizes any of the preceding three types, and an "any" conven-
|
||||
tion, in which any Unicode line ending sequence is assumed to
|
||||
end a line. The Unicode sequences are the three just men-
|
||||
tioned, plus VT (vertical tab, U+000B), FF (form feed,
|
||||
U+000C), NEL (next line, U+0085), LS (line separator,
|
||||
U+2028), and PS (paragraph separator, U+2029).
|
||||
|
||||
When the PCRE2 library is built, a default line-ending
|
||||
sequence is specified. This is normally the standard
|
||||
sequence for the operating system. Unless otherwise specified
|
||||
by this option, pcre2grep uses the library's default. The
|
||||
possible values for this option are CR, LF, CRLF, ANYCRLF, or
|
||||
ANY. This makes it possible to use pcre2grep to scan files
|
||||
that have come from other environments without having to mod-
|
||||
ify their line endings. If the data that is being scanned
|
||||
does not agree with the convention set by this option,
|
||||
pcre2grep may behave in strange ways. Note that this option
|
||||
does not apply to files specified by the -f, --exclude-from,
|
||||
or --include-from options, which are expected to use the
|
||||
operating system's standard newline sequence.
|
||||
When the PCRE2 library is built, a default line-ending se-
|
||||
quence is specified. This is normally the standard sequence
|
||||
for the operating system. Unless otherwise specified by this
|
||||
option, pcre2grep uses the library's default. The possible
|
||||
values for this option are CR, LF, CRLF, ANYCRLF, or ANY.
|
||||
This makes it possible to use pcre2grep to scan files that
|
||||
have come from other environments without having to modify
|
||||
their line endings. If the data that is being scanned does
|
||||
not agree with the convention set by this option, pcre2grep
|
||||
may behave in strange ways. Note that this option does not
|
||||
apply to files specified by the -f, --exclude-from, or --in-
|
||||
clude-from options, which are expected to use the operating
|
||||
system's standard newline sequence.
|
||||
|
||||
-n, --line-number
|
||||
Precede each output line by its line number in the file, fol-
|
||||
|
@ -621,8 +620,8 @@ OPTIONS
|
|||
|
||||
-O text, --output=text
|
||||
When there is a match, instead of outputting the whole line
|
||||
that matched, output just the given text. This option is
|
||||
mutually exclusive with --only-matching, --file-offsets, and
|
||||
that matched, output just the given text. This option is mu-
|
||||
tually exclusive with --only-matching, --file-offsets, and
|
||||
--line-offsets. Escape sequences starting with a dollar char-
|
||||
acter may be used to insert the contents of the matched part
|
||||
of the line and/or captured substrings into the text.
|
||||
|
@ -651,9 +650,9 @@ OPTIONS
|
|||
of the whole line. In this mode, no context is shown. That
|
||||
is, the -A, -B, and -C options are ignored. If there is more
|
||||
than one match in a line, each of them is shown separately,
|
||||
on a separate line of output. If -o is combined with -v
|
||||
(invert the sense of the match to find non-matching lines),
|
||||
no output is generated, but the return code is set appropri-
|
||||
on a separate line of output. If -o is combined with -v (in-
|
||||
vert the sense of the match to find non-matching lines), no
|
||||
output is generated, but the return code is set appropri-
|
||||
ately. If the matched portion of the line is empty, nothing
|
||||
is output unless the file name or line number are being
|
||||
printed, in which case they are shown on an otherwise empty
|
||||
|
@ -671,8 +670,8 @@ OPTIONS
|
|||
|
||||
-o0 is the same as -o without a number. Because these options
|
||||
can be given without an argument (see above), if an argument
|
||||
is present, it must be given in the same shell item, for
|
||||
example, -o3 or --only-matching=2. The comments given for the
|
||||
is present, it must be given in the same shell item, for ex-
|
||||
ample, -o3 or --only-matching=2. The comments given for the
|
||||
non-argument case above also apply to this option. If the
|
||||
specified capturing parentheses do not exist in the pattern,
|
||||
or were not set in the match, nothing is output unless the
|
||||
|
@ -704,8 +703,8 @@ OPTIONS
|
|||
it contains, taking note of any --include and --exclude set-
|
||||
tings. By default, a directory is read as a normal file; in
|
||||
some operating systems this gives an immediate end-of-file.
|
||||
This option is a shorthand for setting the -d option to
|
||||
"recurse".
|
||||
This option is a shorthand for setting the -d option to "re-
|
||||
curse".
|
||||
|
||||
--recursion-limit=number
|
||||
See --match-limit above.
|
||||
|
@ -719,8 +718,8 @@ OPTIONS
|
|||
This option is useful when scanning more than one file. If
|
||||
used on its own, -t suppresses all output except for a grand
|
||||
total number of matching lines (or non-matching lines if -v
|
||||
is used) in all the files. If -t is used with -c, a grand
|
||||
total is output except when the previous output is just one
|
||||
is used) in all the files. If -t is used with -c, a grand to-
|
||||
tal is output except when the previous output is just one
|
||||
line. In other words, it is not output when just one file's
|
||||
count is listed. If file names are being output, the grand
|
||||
total is preceded by "TOTAL:". Otherwise, it appears as just
|
||||
|
@ -773,10 +772,10 @@ OPTIONS
|
|||
|
||||
ENVIRONMENT VARIABLES
|
||||
|
||||
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be
|
||||
overridden by the --locale option. If no locale is set, the PCRE2
|
||||
library's default (usually the "C" locale) is used.
|
||||
The environment variables LC_ALL and LC_CTYPE are examined, in that or-
|
||||
der, for a locale. The first one that is set is used. This can be over-
|
||||
ridden by the --locale option. If no locale is set, the PCRE2 library's
|
||||
default (usually the "C" locale) is used.
|
||||
|
||||
|
||||
NEWLINES
|
||||
|
@ -834,13 +833,13 @@ OPTIONS WITH DATA
|
|||
--file /some/file
|
||||
|
||||
Note, however, that if you want to supply a file name beginning with ~
|
||||
as data in a shell command, and have the shell expand ~ to a home
|
||||
directory, you must separate the file name from the option, because the
|
||||
as data in a shell command, and have the shell expand ~ to a home di-
|
||||
rectory, you must separate the file name from the option, because the
|
||||
shell does not treat ~ specially unless it is at the start of an item.
|
||||
|
||||
The exceptions to the above are the --colour (or --color) and --only-
|
||||
matching options, for which the data is optional. If one of these
|
||||
options does have data, it must be given in the first form, using an
|
||||
matching options, for which the data is optional. If one of these op-
|
||||
tions does have data, it must be given in the first form, using an
|
||||
equals character. Otherwise pcre2grep will assume that it has no data.
|
||||
|
||||
|
||||
|
@ -850,8 +849,8 @@ USING PCRE2'S CALLOUT FACILITY
|
|||
scripts or echoing specific strings during matching by making use of
|
||||
PCRE2's callout facility. However, this support can be completely or
|
||||
partially disabled when pcre2grep is built. You can find out whether
|
||||
your binary has support for callouts by running it with the --help
|
||||
option. If callout support is completely disabled, all callouts in pat-
|
||||
your binary has support for callouts by running it with the --help op-
|
||||
tion. If callout support is completely disabled, all callouts in pat-
|
||||
terns are ignored by pcre2grep. If the facility is partially disabled,
|
||||
calling external programs is not supported, and callouts that request
|
||||
it are ignored.
|
||||
|
@ -875,16 +874,16 @@ USING PCRE2'S CALLOUT FACILITY
|
|||
|
||||
executable_name|arg1|arg2|...
|
||||
|
||||
Any substring (including the executable name) may contain escape
|
||||
sequences started by a dollar character: $<digits> or ${<digits>} is
|
||||
replaced by the captured substring of the given decimal number, which
|
||||
Any substring (including the executable name) may contain escape se-
|
||||
quences started by a dollar character: $<digits> or ${<digits>} is re-
|
||||
placed by the captured substring of the given decimal number, which
|
||||
must be greater than zero. If the number is greater than the number of
|
||||
capturing substrings, or if the capture is unset, the replacement is
|
||||
empty.
|
||||
|
||||
Any other character is substituted by itself. In particular, $$ is
|
||||
replaced by a single dollar and $| is replaced by a pipe character.
|
||||
Here is an example:
|
||||
Any other character is substituted by itself. In particular, $$ is re-
|
||||
placed by a single dollar and $| is replaced by a pipe character. Here
|
||||
is an example:
|
||||
|
||||
echo -e "abcde\n12345" | pcre2grep \
|
||||
'(?x)(.)(..(.))
|
||||
|
@ -914,10 +913,10 @@ USING PCRE2'S CALLOUT FACILITY
|
|||
to the output, having been passed through the same escape processing as
|
||||
text from the --output option. This provides a simple echoing facility
|
||||
that avoids calling an external program or script. No terminator is
|
||||
added to the string, so if you want a newline, you must include it
|
||||
explicitly. Matching continues normally after the string is output. If
|
||||
you want to see only the callout output but not any output from an
|
||||
actual match, you should end the relevant pattern with (*FAIL).
|
||||
added to the string, so if you want a newline, you must include it ex-
|
||||
plicitly. Matching continues normally after the string is output. If
|
||||
you want to see only the callout output but not any output from an ac-
|
||||
tual match, you should end the relevant pattern with (*FAIL).
|
||||
|
||||
|
||||
MATCHING ERRORS
|
||||
|
@ -925,8 +924,8 @@ MATCHING ERRORS
|
|||
It is possible to supply a regular expression that takes a very long
|
||||
time to fail to match certain lines. Such patterns normally involve
|
||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||
line of a's with no final digit. The PCRE2 matching function has a
|
||||
resource limit that causes it to abort in these circumstances. If this
|
||||
line of a's with no final digit. The PCRE2 matching function has a re-
|
||||
source limit that causes it to abort in these circumstances. If this
|
||||
happens, pcre2grep outputs an error message and the line that caused
|
||||
the problem to the standard error stream. If there are more than 20
|
||||
such errors, pcre2grep gives up.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "21 June 2019" "PCRE2 10.34"
|
||||
.TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -3564,9 +3564,10 @@ effect as this example; although it would suppress backtracking during the
|
|||
first match attempt, the second attempt would start at the second character
|
||||
instead of skipping on to "c".
|
||||
.P
|
||||
If (*SKIP) is used inside a lookbehind to specify a new starting position that
|
||||
is not later than the starting point of the current match, the position
|
||||
specified by (*SKIP) is ignored, and instead the normal "bumpalong" occurs.
|
||||
If (*SKIP) is used to specify a new starting position that is the same as the
|
||||
starting position of the current match, or (by being inside a lookbehind)
|
||||
earlier, the position specified by (*SKIP) is ignored, and instead the normal
|
||||
"bumpalong" occurs.
|
||||
.sp
|
||||
(*SKIP:NAME)
|
||||
.sp
|
||||
|
@ -3787,6 +3788,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 June 2019
|
||||
Last updated: 22 June 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -13,8 +13,8 @@ SYNOPSIS
|
|||
but it can also be used for experimenting with regular expressions.
|
||||
This document describes the features of the test program; for details
|
||||
of the regular expressions themselves, see the pcre2pattern documenta-
|
||||
tion. For details of the PCRE2 library function calls and their
|
||||
options, see the pcre2api documentation.
|
||||
tion. For details of the PCRE2 library function calls and their op-
|
||||
tions, see the pcre2api documentation.
|
||||
|
||||
The input for pcre2test is a sequence of regular expression patterns
|
||||
and subject strings to be matched. There are also command lines for
|
||||
|
@ -33,26 +33,26 @@ SYNOPSIS
|
|||
which are specifically designed for use in conjunction with the test
|
||||
script and data files that are distributed as part of PCRE2. All the
|
||||
modifiers are documented here, some without much justification, but
|
||||
many of them are unlikely to be of use except when testing the
|
||||
libraries.
|
||||
many of them are unlikely to be of use except when testing the li-
|
||||
braries.
|
||||
|
||||
|
||||
PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
|
||||
|
||||
Different versions of the PCRE2 library can be built to support charac-
|
||||
ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units.
|
||||
One, two, or all three of these libraries may be simultaneously
|
||||
installed. The pcre2test program can be used to test all the libraries.
|
||||
One, two, or all three of these libraries may be simultaneously in-
|
||||
stalled. The pcre2test program can be used to test all the libraries.
|
||||
However, its own input and output are always in 8-bit format. When
|
||||
testing the 16-bit or 32-bit libraries, patterns and subject strings
|
||||
are converted to 16-bit or 32-bit format before being passed to the
|
||||
library functions. Results are converted back to 8-bit code units for
|
||||
are converted to 16-bit or 32-bit format before being passed to the li-
|
||||
brary functions. Results are converted back to 8-bit code units for
|
||||
output.
|
||||
|
||||
In the rest of this document, the names of library functions and struc-
|
||||
tures are given in generic form, for example, pcre_compile(). The
|
||||
actual names used in the libraries have a suffix _8, _16, or _32, as
|
||||
appropriate.
|
||||
tures are given in generic form, for example, pcre_compile(). The ac-
|
||||
tual names used in the libraries have a suffix _8, _16, or _32, as ap-
|
||||
propriate.
|
||||
|
||||
|
||||
INPUT ENCODING
|
||||
|
@ -70,18 +70,18 @@ INPUT ENCODING
|
|||
processed for backslash escapes, which makes it possible to include any
|
||||
data value in strings that are passed to the library for matching. For
|
||||
patterns, there is a facility for specifying some or all of the 8-bit
|
||||
input characters as hexadecimal pairs, which makes it possible to
|
||||
include binary zeros.
|
||||
input characters as hexadecimal pairs, which makes it possible to in-
|
||||
clude binary zeros.
|
||||
|
||||
Input for the 16-bit and 32-bit libraries
|
||||
|
||||
When testing the 16-bit or 32-bit libraries, there is a need to be able
|
||||
to generate character code points greater than 255 in the strings that
|
||||
are passed to the library. For subject lines, backslash escapes can be
|
||||
used. In addition, when the utf modifier (see "Setting compilation
|
||||
options" below) is set, the pattern and any following subject lines are
|
||||
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as
|
||||
appropriate.
|
||||
used. In addition, when the utf modifier (see "Setting compilation op-
|
||||
tions" below) is set, the pattern and any following subject lines are
|
||||
interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap-
|
||||
propriate.
|
||||
|
||||
For non-UTF testing of wide characters, the utf8_input modifier can be
|
||||
used. This is mutually exclusive with utf, and is allowed only in
|
||||
|
@ -121,8 +121,8 @@ COMMAND LINE OPTIONS
|
|||
piled.
|
||||
|
||||
-AC As for -ac, but in addition behave as if each subject line
|
||||
has the callout_extra modifier, that is, show additional
|
||||
information from callouts.
|
||||
has the callout_extra modifier, that is, show additional in-
|
||||
formation from callouts.
|
||||
|
||||
-b Behave as if each pattern has the fullbincode modifier; the
|
||||
full internal binary form of the pattern is output after com-
|
||||
|
@ -130,9 +130,9 @@ COMMAND LINE OPTIONS
|
|||
|
||||
-C Output the version number of the PCRE2 library, and all
|
||||
available information about the optional features that are
|
||||
included, and then exit with zero exit code. All other
|
||||
options are ignored. If both -C and -LM are present, which-
|
||||
ever is first is recognized.
|
||||
included, and then exit with zero exit code. All other op-
|
||||
tions are ignored. If both -C and -LM are present, whichever
|
||||
is first is recognized.
|
||||
|
||||
-C option Output information about a specific build-time option, then
|
||||
exit. This functionality is intended for use in scripts such
|
||||
|
@ -269,8 +269,8 @@ DESCRIPTION
|
|||
supply them explicitly.
|
||||
|
||||
An empty line or the end of the file signals the end of the subject
|
||||
lines for a test, at which point a new pattern or command line is
|
||||
expected if there is still input to be read.
|
||||
lines for a test, at which point a new pattern or command line is ex-
|
||||
pected if there is still input to be read.
|
||||
|
||||
|
||||
COMMAND LINES
|
||||
|
@ -311,8 +311,8 @@ COMMAND LINES
|
|||
as indicating a newline in a pattern or subject string. The default can
|
||||
be overridden when a pattern is compiled. The standard test files con-
|
||||
tain tests of various newline conventions, but the majority of the
|
||||
tests expect a single linefeed to be recognized as a newline by
|
||||
default. Without special action the tests would fail when PCRE2 is com-
|
||||
tests expect a single linefeed to be recognized as a newline by de-
|
||||
fault. Without special action the tests would fail when PCRE2 is com-
|
||||
piled with either CR or CRLF as the default newline.
|
||||
|
||||
The #newline_default command specifies a list of newline types that are
|
||||
|
@ -323,14 +323,14 @@ COMMAND LINES
|
|||
|
||||
If the default newline is in the list, this command has no effect. Oth-
|
||||
erwise, except when testing the POSIX API, a newline modifier that
|
||||
specifies the first newline convention in the list (LF in the above
|
||||
example) is added to any pattern that does not already have a newline
|
||||
specifies the first newline convention in the list (LF in the above ex-
|
||||
ample) is added to any pattern that does not already have a newline
|
||||
modifier. If the newline list is empty, the feature is turned off. This
|
||||
command is present in a number of the standard test input files.
|
||||
|
||||
When the POSIX API is being tested there is no way to override the
|
||||
default newline convention, though it is possible to set the newline
|
||||
convention from within the pattern. A warning is given if the posix or
|
||||
When the POSIX API is being tested there is no way to override the de-
|
||||
fault newline convention, though it is possible to set the newline con-
|
||||
vention from within the pattern. A warning is given if the posix or
|
||||
posix_nosub modifier is used when #newline_default would set a default
|
||||
for the non-POSIX API.
|
||||
|
||||
|
@ -344,8 +344,8 @@ COMMAND LINES
|
|||
The appearance of this line causes all subsequent modifier settings to
|
||||
be checked for compatibility with the perltest.sh script, which is used
|
||||
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
||||
comment lines, #pattern commands, and #subject commands that set or
|
||||
unset "mark", no command lines are permitted, because they and many of
|
||||
comment lines, #pattern commands, and #subject commands that set or un-
|
||||
set "mark", no command lines are permitted, because they and many of
|
||||
the modifiers are specific to pcre2test, and should not be used in test
|
||||
files that are also processed by perltest.sh. The #perltest command
|
||||
helps detect tests that are accidentally put in the wrong file.
|
||||
|
@ -376,8 +376,8 @@ MODIFIER SYNTAX
|
|||
list are separated by commas followed by optional white space. Trailing
|
||||
whitespace in a modifier list is ignored. Some modifiers may be given
|
||||
for both patterns and subject lines, whereas others are valid only for
|
||||
one or the other. Each modifier has a long name, for example
|
||||
"anchored", and some of them must be followed by an equals sign and a
|
||||
one or the other. Each modifier has a long name, for example "an-
|
||||
chored", and some of them must be followed by an equals sign and a
|
||||
value, for example, "offset=12". Values cannot contain comma charac-
|
||||
ters, but may contain spaces. Modifiers that do not take values may be
|
||||
preceded by a minus sign to turn off a previous setting.
|
||||
|
@ -498,8 +498,8 @@ SUBJECT LINE SYNTAX
|
|||
\= This is a comment.
|
||||
abc\= This is an invalid modifier list.
|
||||
|
||||
A backslash followed by any other non-alphanumeric character just
|
||||
escapes that character. A backslash followed by anything else causes an
|
||||
A backslash followed by any other non-alphanumeric character just es-
|
||||
capes that character. A backslash followed by anything else causes an
|
||||
error. However, if the very last character in the line is a backslash
|
||||
(and there is no modifier list), it is ignored. This gives a way of
|
||||
passing an empty line as data, since a real empty line terminates the
|
||||
|
@ -523,13 +523,13 @@ PATTERN MODIFIERS
|
|||
The following modifiers set options for pcre2_compile(). Most of them
|
||||
set bits in the options argument of that function, but those whose
|
||||
names start with PCRE2_EXTRA are additional options that are set in the
|
||||
compile context. For the main options, there are some single-letter
|
||||
abbreviations that are the same as Perl options. There is special han-
|
||||
compile context. For the main options, there are some single-letter ab-
|
||||
breviations that are the same as Perl options. There is special han-
|
||||
dling for /x: if a second x is present, PCRE2_EXTENDED is converted
|
||||
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds
|
||||
PCRE2_EXTENDED as well, though this makes no difference to the way
|
||||
pcre2_compile() behaves. See pcre2api for a description of the effects
|
||||
of these options.
|
||||
into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX-
|
||||
TENDED as well, though this makes no difference to the way pcre2_com-
|
||||
pile() behaves. See pcre2api for a description of the effects of these
|
||||
options.
|
||||
|
||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
|
||||
|
@ -577,9 +577,9 @@ PATTERN MODIFIERS
|
|||
|
||||
Setting compilation controls
|
||||
|
||||
The following modifiers affect the compilation process or request
|
||||
information about the pattern. There are single-letter abbreviations
|
||||
for some that are heavily used in the test files.
|
||||
The following modifiers affect the compilation process or request in-
|
||||
formation about the pattern. There are single-letter abbreviations for
|
||||
some that are heavily used in the test files.
|
||||
|
||||
bsr=[anycrlf|unicode] specify \R handling
|
||||
/B bincode show binary code without lengths
|
||||
|
@ -717,8 +717,8 @@ PATTERN MODIFIERS
|
|||
minated strings but can be passed by length instead of being zero-ter-
|
||||
minated. The use_length modifier causes this to happen. Using a length
|
||||
happens automatically (whether or not use_length is set) when hex is
|
||||
set, because patterns specified in hexadecimal may contain binary
|
||||
zeros.
|
||||
set, because patterns specified in hexadecimal may contain binary ze-
|
||||
ros.
|
||||
|
||||
If hex or use_length is used with the POSIX wrapper API (see "Using the
|
||||
POSIX wrapper API" below), the REG_PEND extension is used to pass the
|
||||
|
@ -770,8 +770,8 @@ PATTERN MODIFIERS
|
|||
partial modifier in "Subject Modifiers" below for details of how these
|
||||
options are specified for each match attempt.
|
||||
|
||||
JIT compilation is requested by the jit pattern modifier, which may
|
||||
optionally be followed by an equals sign and a number in the range 0 to
|
||||
JIT compilation is requested by the jit pattern modifier, which may op-
|
||||
tionally be followed by an equals sign and a number in the range 0 to
|
||||
7. The three bits that make up the number specify which of the three
|
||||
JIT operating modes are to be compiled:
|
||||
|
||||
|
@ -799,8 +799,8 @@ PATTERN MODIFIERS
|
|||
none was compiled for non-partial matching.
|
||||
|
||||
If JIT compilation is successful, the compiled JIT code will automati-
|
||||
cally be used when an appropriate type of match is run, except when
|
||||
incompatible run-time options are specified. For more details, see the
|
||||
cally be used when an appropriate type of match is run, except when in-
|
||||
compatible run-time options are specified. For more details, see the
|
||||
pcre2jit documentation. See also the jitstack modifier below for a way
|
||||
of setting the size of the JIT stack.
|
||||
|
||||
|
@ -847,8 +847,8 @@ PATTERN MODIFIERS
|
|||
Limiting nested parentheses
|
||||
|
||||
The parens_nest_limit modifier sets a limit on the depth of nested
|
||||
parentheses in a pattern. Breaching the limit causes a compilation
|
||||
error. The default for the library is set when PCRE2 is built, but
|
||||
parentheses in a pattern. Breaching the limit causes a compilation er-
|
||||
ror. The default for the library is set when PCRE2 is built, but
|
||||
pcre2test sets its own default of 220, which is required for running
|
||||
the standard test suite.
|
||||
|
||||
|
@ -886,13 +886,13 @@ PATTERN MODIFIERS
|
|||
buffer is too small for the error message. If this modifier has not
|
||||
been set, a large buffer is used.
|
||||
|
||||
The aftertext and allaftertext subject modifiers work as described
|
||||
below. All other modifiers are either ignored, with a warning message,
|
||||
or cause an error.
|
||||
The aftertext and allaftertext subject modifiers work as described be-
|
||||
low. All other modifiers are either ignored, with a warning message, or
|
||||
cause an error.
|
||||
|
||||
The pattern is passed to regcomp() as a zero-terminated string by
|
||||
default, but if the use_length or hex modifiers are set, the REG_PEND
|
||||
extension is used to pass it by length.
|
||||
The pattern is passed to regcomp() as a zero-terminated string by de-
|
||||
fault, but if the use_length or hex modifiers are set, the REG_PEND ex-
|
||||
tension is used to pass it by length.
|
||||
|
||||
Testing the stack guard feature
|
||||
|
||||
|
@ -920,8 +920,8 @@ PATTERN MODIFIERS
|
|||
2 a set of tables defining ISO 8859 characters
|
||||
|
||||
In table 2, some characters whose codes are greater than 128 are iden-
|
||||
tified as letters, digits, spaces, etc. Setting alternate character
|
||||
tables and a locale are mutually exclusive.
|
||||
tified as letters, digits, spaces, etc. Setting alternate character ta-
|
||||
bles and a locale are mutually exclusive.
|
||||
|
||||
Setting certain match controls
|
||||
|
||||
|
@ -971,12 +971,12 @@ PATTERN MODIFIERS
|
|||
terns" below. If pushcopy is used instead of push, a copy of the com-
|
||||
piled pattern is stacked, leaving the original as current, ready to
|
||||
match the following input lines. This provides a way of testing the
|
||||
pcre2_code_copy() function. The push and pushcopy modifiers are
|
||||
incompatible with compilation modifiers such as global that act at
|
||||
match time. Any that are specified are ignored (for the stacked copy),
|
||||
with a warning message, except for replace, which causes an error. Note
|
||||
that jitverify, which is allowed, does not carry through to any subse-
|
||||
quent matching that uses a stacked pattern.
|
||||
pcre2_code_copy() function. The push and pushcopy modifiers are in-
|
||||
compatible with compilation modifiers such as global that act at match
|
||||
time. Any that are specified are ignored (for the stacked copy), with a
|
||||
warning message, except for replace, which causes an error. Note that
|
||||
jitverify, which is allowed, does not carry through to any subsequent
|
||||
matching that uses a stacked pattern.
|
||||
|
||||
Testing foreign pattern conversion
|
||||
|
||||
|
@ -1124,12 +1124,12 @@ SUBJECT MODIFIERS
|
|||
The allusedtext modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown.
|
||||
This feature is not supported for JIT matching, and if requested with
|
||||
JIT it is ignored (with a warning message). Setting this modifier
|
||||
affects the output if there is a lookbehind at the start of a match, or
|
||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||
that precede or follow the start and end of the actual match are indi-
|
||||
cated in the output by '<' or '>' characters underneath them. Here is
|
||||
an example:
|
||||
JIT it is ignored (with a warning message). Setting this modifier af-
|
||||
fects the output if there is a lookbehind at the start of a match, or a
|
||||
lookahead at the end, or if \K is used in the pattern. Characters that
|
||||
precede or follow the start and end of the actual match are indicated
|
||||
in the output by '<' or '>' characters underneath them. Here is an ex-
|
||||
ample:
|
||||
|
||||
re> /(?<=pqr)abc(?=xyz)/
|
||||
data> 123pqrabcxyz456\=allusedtext
|
||||
|
@ -1145,8 +1145,8 @@ SUBJECT MODIFIERS
|
|||
string. The only time when this occurs is when \K has been processed as
|
||||
part of the match. In this situation, the output for the matched string
|
||||
is displayed from the starting character instead of from the match
|
||||
point, with circumflex characters under the earlier characters. For
|
||||
example:
|
||||
point, with circumflex characters under the earlier characters. For ex-
|
||||
ample:
|
||||
|
||||
re> /abc\Kxyz/
|
||||
data> abcxyz\=startchar
|
||||
|
@ -1171,12 +1171,12 @@ SUBJECT MODIFIERS
|
|||
The allvector modifier requests that the entire ovector be shown, what-
|
||||
ever the outcome of the match. Compare allcaptures, which shows only up
|
||||
to the maximum number of capture groups for the pattern, and then only
|
||||
for a successful complete non-DFA match. This modifier, which acts
|
||||
after any match result, and also for DFA matching, provides a means of
|
||||
for a successful complete non-DFA match. This modifier, which acts af-
|
||||
ter any match result, and also for DFA matching, provides a means of
|
||||
checking that there are no unexpected modifications to ovector fields.
|
||||
Before each match attempt, the ovector is filled with a special value,
|
||||
and if this is found in both elements of a capturing pair,
|
||||
"<unchanged>" is output. After a successful match, this applies to all
|
||||
and if this is found in both elements of a capturing pair, "<un-
|
||||
changed>" is output. After a successful match, this applies to all
|
||||
groups after the maximum capture group for the pattern. In other cases
|
||||
it applies to the entire ovector. After a partial match, the first two
|
||||
elements are the only ones that should be set. After a DFA match, the
|
||||
|
@ -1207,12 +1207,12 @@ SUBJECT MODIFIERS
|
|||
If an empty string is matched, the next match is done with the
|
||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||
for another, non-empty, match at the same point in the subject. If this
|
||||
match fails, the start offset is advanced, and the normal match is
|
||||
retried. This imitates the way Perl handles such cases when using the
|
||||
/g modifier or the split() function. Normally, the start offset is
|
||||
advanced by one character, but if the newline convention recognizes
|
||||
CRLF as a newline, and the current character is CR followed by LF, an
|
||||
advance of two characters occurs.
|
||||
match fails, the start offset is advanced, and the normal match is re-
|
||||
tried. This imitates the way Perl handles such cases when using the /g
|
||||
modifier or the split() function. Normally, the start offset is ad-
|
||||
vanced by one character, but if the newline convention recognizes CRLF
|
||||
as a newline, and the current character is CR followed by LF, an ad-
|
||||
vance of two characters occurs.
|
||||
|
||||
Testing substring extraction functions
|
||||
|
||||
|
@ -1275,8 +1275,8 @@ SUBJECT MODIFIERS
|
|||
than 256 characters) for substitution tests, as fixed-size buffers are
|
||||
used. To make it easy to test for buffer overflow, if the replacement
|
||||
string starts with a number in square brackets, that number is passed
|
||||
to pcre2_substitute() as the size of the output buffer, with the
|
||||
replacement string starting at the next character. Here is an example
|
||||
to pcre2_substitute() as the size of the output buffer, with the re-
|
||||
placement string starting at the next character. Here is an example
|
||||
that tests the edge case:
|
||||
|
||||
/abc/
|
||||
|
@ -1285,10 +1285,10 @@ SUBJECT MODIFIERS
|
|||
123abc123\=replace=[9]XYZ
|
||||
Failed: error -47: no more memory
|
||||
|
||||
The default action of pcre2_substitute() is to return
|
||||
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
|
||||
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
|
||||
stitute_overflow_length modifier), pcre2_substitute() continues to go
|
||||
The default action of pcre2_substitute() is to return PCRE2_ER-
|
||||
ROR_NOMEMORY when the output buffer is too small. However, if the
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
|
||||
tute_overflow_length modifier), pcre2_substitute() continues to go
|
||||
through the motions of matching and substituting (but not doing any
|
||||
callouts), in order to compute the size of buffer that is required.
|
||||
When this happens, pcre2test shows the required buffer length (which
|
||||
|
@ -1323,8 +1323,8 @@ SUBJECT MODIFIERS
|
|||
Then are listed the offsets of the old substring, its contents, and the
|
||||
same for the replacement.
|
||||
|
||||
By default, the substitution callout function returns zero, which
|
||||
accepts the replacement and causes matching to continue if /g was used.
|
||||
By default, the substitution callout function returns zero, which ac-
|
||||
cepts the replacement and causes matching to continue if /g was used.
|
||||
Two further modifiers can be used to test other return values. If sub-
|
||||
stitute_skip is set to a value greater than zero the callout function
|
||||
returns +1 for the match of that number, and similarly substitute_stop
|
||||
|
@ -1411,8 +1411,8 @@ SUBJECT MODIFIERS
|
|||
|
||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||
ory allocation and freeing calls that occur during a call to
|
||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match
|
||||
requires a bigger vector than the default for remembering backtracking
|
||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
|
||||
quires a bigger vector than the default for remembering backtracking
|
||||
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
||||
In many cases there will be no heap memory used and therefore no addi-
|
||||
tional output. No heap memory is allocated during matching with JIT, so
|
||||
|
@ -1435,9 +1435,9 @@ SUBJECT MODIFIERS
|
|||
|
||||
Setting the size of the output vector
|
||||
|
||||
The ovector modifier applies only to the subject line in which it
|
||||
appears, though of course it can also be used to set a default in a
|
||||
#subject command. It specifies the number of pairs of offsets that are
|
||||
The ovector modifier applies only to the subject line in which it ap-
|
||||
pears, though of course it can also be used to set a default in a #sub-
|
||||
ject command. It specifies the number of pairs of offsets that are
|
||||
available for storing matching information. The default is 15.
|
||||
|
||||
A value of zero is useful when testing the POSIX API because it causes
|
||||
|
@ -1491,12 +1491,12 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
|
||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||
strings, starting with number 0 for the string that matched the whole
|
||||
pattern. Otherwise, it outputs "No match" when the return is
|
||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||
this is the entire substring that was inspected during the partial
|
||||
match; it may include characters before the actual match start if a
|
||||
lookbehind assertion, \K, \b, or \B was involved.)
|
||||
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
|
||||
ROR_NOMATCH, or "Partial match:" followed by the partially matching
|
||||
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
|
||||
the entire substring that was inspected during the partial match; it
|
||||
may include characters before the actual match start if a lookbehind
|
||||
assertion, \K, \b, or \B was involved.)
|
||||
|
||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||
and a short descriptive phrase. If the error is a failed UTF string
|
||||
|
@ -1541,8 +1541,8 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
0: cat
|
||||
0+ aract
|
||||
|
||||
If global matching is requested, the results of successive matching
|
||||
attempts are output in sequence, like this:
|
||||
If global matching is requested, the results of successive matching at-
|
||||
tempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
|
@ -1580,12 +1580,12 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
|||
2: tan
|
||||
|
||||
Using the normal matching function on this data finds only "tang". The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||
followed by the partially matching substring. Note that this is the
|
||||
entire substring that was inspected during the partial match; it may
|
||||
include characters before the actual match start if a lookbehind asser-
|
||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
||||
longest matching string is always given first (and numbered zero). Af-
|
||||
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
|
||||
lowed by the partially matching substring. Note that this is the entire
|
||||
substring that was inspected during the partial match; it may include
|
||||
characters before the actual match start if a lookbehind assertion, \b,
|
||||
or \B was involved. (\K is not supported for DFA matching.)
|
||||
|
||||
If global matching is requested, the search for further matches resumes
|
||||
at the end of the longest match. For example:
|
||||
|
@ -1638,12 +1638,12 @@ CALLOUTS
|
|||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
positions are the same, or if the current position precedes the start
|
||||
position, which can happen if the callout is in a lookbehind assertion.
|
||||
This output indicates that callout number 0 occurred for a match at-
|
||||
tempt starting at the fourth character of the subject string, when the
|
||||
pointer was at the seventh character, and when the next pattern item
|
||||
was \d. Just one circumflex is output if the start and current posi-
|
||||
tions are the same, or if the current position precedes the start posi-
|
||||
tion, which can happen if the callout is in a lookbehind assertion.
|
||||
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||
a result of the auto_callout pattern modifier. In this case, instead of
|
||||
|
@ -1660,8 +1660,8 @@ CALLOUTS
|
|||
0: E*
|
||||
|
||||
If a pattern contains (*MARK) items, an additional line is output when-
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
example:
|
||||
ever a change of latest mark is passed to the callout function. For ex-
|
||||
ample:
|
||||
|
||||
re> /a(*MARK:X)bc/auto_callout
|
||||
data> abc
|
||||
|
@ -1683,8 +1683,8 @@ CALLOUTS
|
|||
|
||||
The output for a callout with a string argument is similar, except that
|
||||
instead of outputting a callout number before the position indicators,
|
||||
the callout string and its offset in the pattern string are output
|
||||
before the reflection of the subject string, and the subject string is
|
||||
the callout string and its offset in the pattern string are output be-
|
||||
fore the reflection of the subject string, and the subject string is
|
||||
reflected for each callout. For example:
|
||||
|
||||
re> /^ab(?C'first')cd(?C"second")ef/
|
||||
|
@ -1800,9 +1800,9 @@ NON-PRINTING CHARACTERS
|
|||
|
||||
When pcre2test is outputting text that is a matched part of a subject
|
||||
string, it behaves in the same way, unless a different locale has been
|
||||
set for the pattern (using the locale modifier). In this case, the
|
||||
isprint() function is used to distinguish printing and non-printing
|
||||
characters.
|
||||
set for the pattern (using the locale modifier). In this case, the is-
|
||||
print() function is used to distinguish printing and non-printing char-
|
||||
acters.
|
||||
|
||||
|
||||
SAVING AND RESTORING COMPILED PATTERNS
|
||||
|
@ -1814,14 +1814,14 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||
compiled patterns can be saved they must be serialized, that is, con-
|
||||
verted to a stream of bytes. A single byte stream may contain any num-
|
||||
ber of compiled patterns, but they must all use the same character
|
||||
tables. A single copy of the tables is included in the byte stream (its
|
||||
ber of compiled patterns, but they must all use the same character ta-
|
||||
bles. A single copy of the tables is included in the byte stream (its
|
||||
size is 1088 bytes).
|
||||
|
||||
The functions whose names begin with pcre2_serialize_ are used for
|
||||
serializing and de-serializing. They are described in the pcre2serial-
|
||||
ize documentation. In this section we describe the features of
|
||||
pcre2test that can be used to test these functions.
|
||||
The functions whose names begin with pcre2_serialize_ are used for se-
|
||||
rializing and de-serializing. They are described in the pcre2serialize
|
||||
documentation. In this section we describe the features of pcre2test
|
||||
that can be used to test these functions.
|
||||
|
||||
Note that "serialization" in PCRE2 does not convert compiled patterns
|
||||
to an abstract format like Java or .NET. It just makes a reloadable
|
||||
|
@ -1831,8 +1831,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
||||
expects the next line to contain a new pattern (or command) instead of
|
||||
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
||||
compiled pattern to be stacked, leaving the original available for
|
||||
immediate matching. By using push and/or pushcopy, a number of patterns
|
||||
compiled pattern to be stacked, leaving the original available for im-
|
||||
mediate matching. By using push and/or pushcopy, a number of patterns
|
||||
can be compiled and retained. These modifiers are incompatible with
|
||||
posix, and control modifiers that act at match time are ignored (with a
|
||||
message) for the stacked patterns. The jitverify modifier applies only
|
||||
|
@ -1855,8 +1855,8 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
matched with the pattern, terminated as usual by an empty line or end
|
||||
of file. This command may be followed by a modifier list containing
|
||||
only control modifiers that act after a pattern has been compiled. In
|
||||
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||
particular, hex, posix, posix_nosub, push, and pushcopy are not al-
|
||||
lowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||
however permitted. Here is an example that saves and reloads two pat-
|
||||
terns.
|
||||
|
||||
|
|
Loading…
Reference in New Issue