Documentation update
This commit is contained in:
parent
25cec7a966
commit
935183d2d5
|
@ -43,26 +43,25 @@ offsets to captured substrings. Its arguments are:
|
||||||
A match context is needed only if you want to:
|
A match context is needed only if you want to:
|
||||||
<pre>
|
<pre>
|
||||||
Set up a callout function
|
Set up a callout function
|
||||||
Change the limit for calling the internal function <i>match()</i>
|
Set a matching offset limit
|
||||||
Change the limit for calling <i>match()</i> recursively
|
Change the backtracking match limit
|
||||||
Set custom memory management when the heap is used for recursion
|
Change the backtracking depth limit
|
||||||
|
Set custom memory management in the match context
|
||||||
</pre>
|
</pre>
|
||||||
The <i>length</i> and <i>startoffset</i> values are code
|
The <i>length</i> and <i>startoffset</i> values are code
|
||||||
units, not characters. The options are:
|
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||||
|
subject that is terminated by a binary zero code unit. The options are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
PCRE2_NOTBOL Subject string is not the beginning of a line
|
PCRE2_NOTBOL Subject string is not the beginning of a line
|
||||||
PCRE2_NOTEOL Subject string is not the end of a line
|
PCRE2_NOTEOL Subject string is not the end of a line
|
||||||
PCRE2_NOTEMPTY An empty string is not a valid match
|
PCRE2_NOTEMPTY An empty string is not a valid match
|
||||||
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject
|
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject is not a valid match
|
||||||
is not a valid match
|
PCRE2_NO_JIT Do not use JIT matching
|
||||||
PCRE2_NO_UTF_CHECK Do not check the subject for UTF
|
PCRE2_NO_UTF_CHECK Do not check the subject for UTF validity (only relevant if PCRE2_UTF
|
||||||
validity (only relevant if PCRE2_UTF
|
|
||||||
was set at compile time)
|
was set at compile time)
|
||||||
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
|
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match
|
||||||
match if no full matches are found
|
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial match if no full matches are found
|
||||||
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match
|
|
||||||
if that is found before a full match
|
|
||||||
</pre>
|
</pre>
|
||||||
For details of partial matching, see the
|
For details of partial matching, see the
|
||||||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||||
|
|
|
@ -1082,14 +1082,14 @@ pattern.
|
||||||
copy=<number or name> copy captured substring
|
copy=<number or name> copy captured substring
|
||||||
depth_limit=<n> set a depth limit
|
depth_limit=<n> set a depth limit
|
||||||
dfa use <b>pcre2_dfa_match()</b>
|
dfa use <b>pcre2_dfa_match()</b>
|
||||||
find_limits find match and recursion limits
|
find_limits find match and depth limits
|
||||||
get=<number or name> extract captured substring
|
get=<number or name> extract captured substring
|
||||||
getall extract all captured substrings
|
getall extract all captured substrings
|
||||||
/g global global matching
|
/g global global matching
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
match_limit=<n> set a match limit
|
match_limit=<n> set a match limit
|
||||||
memory show memory usage
|
memory show heap memory usage
|
||||||
null_context match with a NULL context
|
null_context match with a NULL context
|
||||||
offset=<n> set starting offset
|
offset=<n> set starting offset
|
||||||
offset_limit=<n> set offset limit
|
offset_limit=<n> set offset limit
|
||||||
|
@ -1339,25 +1339,29 @@ limits in the match context. These values are ignored when the
|
||||||
Finding minimum limits
|
Finding minimum limits
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
If the <b>find_limits</b> modifier is present, <b>pcre2test</b> calls
|
If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
|
||||||
<b>pcre2_match()</b> several times, setting different values in the match
|
calls the relevant matching function several times, setting different values in
|
||||||
context via <b>pcre2_set_match_limit()</b> and <b>pcre2_set_depth_limit()</b>
|
the match context via <b>pcre2_set_match_limit()</b> or
|
||||||
until it finds the minimum values for each parameter that allow
|
<b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
|
||||||
<b>pcre2_match()</b> to complete without error.
|
parameter that allows the match to complete without error.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If JIT is being used, only the match limit is relevant. If DFA matching is
|
If JIT is being used, only the match limit is relevant. If DFA matching is
|
||||||
being used, only the depth limit is relevant, but at present this modifier is
|
being used, only the depth limit is relevant.
|
||||||
ignored (with a warning message).
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>match_limit</i> number is a measure of the amount of backtracking
|
The <i>match_limit</i> number is a measure of the amount of backtracking
|
||||||
that takes place, and learning the minimum value can be instructive. For most
|
that takes place, and learning the minimum value can be instructive. For most
|
||||||
simple matches, the number is quite small, but for patterns with very large
|
simple matches, the number is quite small, but for patterns with very large
|
||||||
numbers of matching possibilities, it can become large very quickly with
|
numbers of matching possibilities, it can become large very quickly with
|
||||||
increasing length of subject string. The <i>depth_limit</i> number is
|
increasing length of subject string.
|
||||||
a measure of how much memory for recording backtracking points is needed to
|
</P>
|
||||||
complete the match attempt.
|
<P>
|
||||||
|
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
|
||||||
|
much memory for recording backtracking points is needed to complete the match
|
||||||
|
attempt. In the case of DFA matching, <i>depth_limit</i> controls the depth of
|
||||||
|
recursive calls of the internal function that is used for handling pattern
|
||||||
|
recursion, lookaround assertions, and atomic groups.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
@ -1373,8 +1377,11 @@ is added to the non-match message.
|
||||||
Showing memory usage
|
Showing memory usage
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The <b>memory</b> modifier causes <b>pcre2test</b> to log all memory allocation
|
The <b>memory</b> modifier causes <b>pcre2test</b> to log all heap memory
|
||||||
and freeing calls that occur during a match operation.
|
allocation and freeing calls that occur during a call to <b>pcre2_match()</b>.
|
||||||
|
These occur only when a match requires a bigger vector than the default for
|
||||||
|
remembering backtracking points. In many cases there will be none. No heap
|
||||||
|
memory is allocated during matching with <b>pcre2_dfa_match</b> or with JIT.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting a starting offset
|
Setting a starting offset
|
||||||
|
@ -1787,7 +1794,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 March 2017
|
Last updated: 24 March 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_MATCH 3 "21 October 2014" "PCRE2 10.00"
|
.TH PCRE2_MATCH 3 "25 March 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -31,26 +31,33 @@ offsets to captured substrings. Its arguments are:
|
||||||
A match context is needed only if you want to:
|
A match context is needed only if you want to:
|
||||||
.sp
|
.sp
|
||||||
Set up a callout function
|
Set up a callout function
|
||||||
Change the limit for calling the internal function \fImatch()\fP
|
Set a matching offset limit
|
||||||
Change the limit for calling \fImatch()\fP recursively
|
Change the backtracking match limit
|
||||||
Set custom memory management when the heap is used for recursion
|
Change the backtracking depth limit
|
||||||
|
Set custom memory management in the match context
|
||||||
.sp
|
.sp
|
||||||
The \fIlength\fP and \fIstartoffset\fP values are code
|
The \fIlength\fP and \fIstartoffset\fP values are code
|
||||||
units, not characters. The options are:
|
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||||
|
subject that is terminated by a binary zero code unit. The options are:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
PCRE2_NOTBOL Subject string is not the beginning of a line
|
PCRE2_NOTBOL Subject string is not the beginning of a line
|
||||||
PCRE2_NOTEOL Subject string is not the end of a line
|
PCRE2_NOTEOL Subject string is not the end of a line
|
||||||
PCRE2_NOTEMPTY An empty string is not a valid match
|
PCRE2_NOTEMPTY An empty string is not a valid match
|
||||||
|
.\" JOIN
|
||||||
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject
|
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject
|
||||||
is not a valid match
|
is not a valid match
|
||||||
|
PCRE2_NO_JIT Do not use JIT matching
|
||||||
|
.\" JOIN
|
||||||
PCRE2_NO_UTF_CHECK Do not check the subject for UTF
|
PCRE2_NO_UTF_CHECK Do not check the subject for UTF
|
||||||
validity (only relevant if PCRE2_UTF
|
validity (only relevant if PCRE2_UTF
|
||||||
was set at compile time)
|
was set at compile time)
|
||||||
|
.\" JOIN
|
||||||
|
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial
|
||||||
|
match even if there is a full match
|
||||||
|
.\" JOIN
|
||||||
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
|
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
|
||||||
match if no full matches are found
|
match if no full matches are found
|
||||||
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match
|
|
||||||
if that is found before a full match
|
|
||||||
.sp
|
.sp
|
||||||
For details of partial matching, see the
|
For details of partial matching, see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -966,14 +966,14 @@ SUBJECT MODIFIERS
|
||||||
copy=<number or name> copy captured substring
|
copy=<number or name> copy captured substring
|
||||||
depth_limit=<n> set a depth limit
|
depth_limit=<n> set a depth limit
|
||||||
dfa use pcre2_dfa_match()
|
dfa use pcre2_dfa_match()
|
||||||
find_limits find match and recursion limits
|
find_limits find match and depth limits
|
||||||
get=<number or name> extract captured substring
|
get=<number or name> extract captured substring
|
||||||
getall extract all captured substrings
|
getall extract all captured substrings
|
||||||
/g global global matching
|
/g global global matching
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
match_limit=<n> set a match limit
|
match_limit=<n> set a match limit
|
||||||
memory show memory usage
|
memory show heap memory usage
|
||||||
null_context match with a NULL context
|
null_context match with a NULL context
|
||||||
offset=<n> set starting offset
|
offset=<n> set starting offset
|
||||||
offset_limit=<n> set offset limit
|
offset_limit=<n> set offset limit
|
||||||
|
@ -1202,120 +1202,127 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
Finding minimum limits
|
Finding minimum limits
|
||||||
|
|
||||||
If the find_limits modifier is present, pcre2test calls pcre2_match()
|
If the find_limits modifier is present on a subject line, pcre2test
|
||||||
several times, setting different values in the match context via
|
calls the relevant matching function several times, setting different
|
||||||
pcre2_set_match_limit() and pcre2_set_depth_limit() until it finds the
|
values in the match context via pcre2_set_match_limit() or
|
||||||
minimum values for each parameter that allow pcre2_match() to complete
|
pcre2_set_depth_limit() until it finds the minimum values for each
|
||||||
without error.
|
parameter that allows the match to complete without error.
|
||||||
|
|
||||||
If JIT is being used, only the match limit is relevant. If DFA matching
|
If JIT is being used, only the match limit is relevant. If DFA matching
|
||||||
is being used, only the depth limit is relevant, but at present this
|
is being used, only the depth limit is relevant.
|
||||||
modifier is ignored (with a warning message).
|
|
||||||
|
|
||||||
The match_limit number is a measure of the amount of backtracking that
|
The match_limit number is a measure of the amount of backtracking that
|
||||||
takes place, and learning the minimum value can be instructive. For
|
takes place, and learning the minimum value can be instructive. For
|
||||||
most simple matches, the number is quite small, but for patterns with
|
most simple matches, the number is quite small, but for patterns with
|
||||||
very large numbers of matching possibilities, it can become large very
|
very large numbers of matching possibilities, it can become large very
|
||||||
quickly with increasing length of subject string. The depth_limit num-
|
quickly with increasing length of subject string.
|
||||||
ber is a measure of how much memory for recording backtracking points
|
|
||||||
is needed to complete the match attempt.
|
For non-DFA matching, the minimum depth_limit number is a measure of
|
||||||
|
how much memory for recording backtracking points is needed to complete
|
||||||
|
the match attempt. In the case of DFA matching, depth_limit controls
|
||||||
|
the depth of recursive calls of the internal function that is used for
|
||||||
|
handling pattern recursion, lookaround assertions, and atomic groups.
|
||||||
|
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
||||||
|
|
||||||
The mark modifier causes the names from backtracking control verbs that
|
The mark modifier causes the names from backtracking control verbs that
|
||||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||||
it is added to the non-match message.
|
it is added to the non-match message.
|
||||||
|
|
||||||
Showing memory usage
|
Showing memory usage
|
||||||
|
|
||||||
The memory modifier causes pcre2test to log all memory allocation and
|
The memory modifier causes pcre2test to log all heap memory allocation
|
||||||
freeing calls that occur during a match operation.
|
and freeing calls that occur during a call to pcre2_match(). These
|
||||||
|
occur only when a match requires a bigger vector than the default for
|
||||||
|
remembering backtracking points. In many cases there will be none. No
|
||||||
|
heap memory is allocated during matching with pcre2_dfa_match or with
|
||||||
|
JIT.
|
||||||
|
|
||||||
Setting a starting offset
|
Setting a starting offset
|
||||||
|
|
||||||
The offset modifier sets an offset in the subject string at which
|
The offset modifier sets an offset in the subject string at which
|
||||||
matching starts. Its value is a number of code units, not characters.
|
matching starts. Its value is a number of code units, not characters.
|
||||||
|
|
||||||
Setting an offset limit
|
Setting an offset limit
|
||||||
|
|
||||||
The offset_limit modifier sets a limit for unanchored matches. If a
|
The offset_limit modifier sets a limit for unanchored matches. If a
|
||||||
match cannot be found starting at or before this offset in the subject,
|
match cannot be found starting at or before this offset in the subject,
|
||||||
a "no match" return is given. The data value is a number of code units,
|
a "no match" return is given. The data value is a number of code units,
|
||||||
not characters. When this modifier is used, the use_offset_limit modi-
|
not characters. When this modifier is used, the use_offset_limit modi-
|
||||||
fier must have been set for the pattern; if not, an error is generated.
|
fier must have been set for the pattern; if not, an error is generated.
|
||||||
|
|
||||||
Setting the size of the output vector
|
Setting the size of the output vector
|
||||||
|
|
||||||
The ovector modifier applies only to the subject line in which it
|
The ovector modifier applies only to the subject line in which it
|
||||||
appears, though of course it can also be used to set a default in a
|
appears, though of course it can also be used to set a default in a
|
||||||
#subject command. It specifies the number of pairs of offsets that are
|
#subject command. It specifies the number of pairs of offsets that are
|
||||||
available for storing matching information. The default is 15.
|
available for storing matching information. The default is 15.
|
||||||
|
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
regexec() to be called with a NULL capture vector. When not testing the
|
regexec() to be called with a NULL capture vector. When not testing the
|
||||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||||
ate_from_pattern() to be called, in order to create a match block of
|
ate_from_pattern() to be called, in order to create a match block of
|
||||||
exactly the right size for the pattern. (It is not possible to create a
|
exactly the right size for the pattern. (It is not possible to create a
|
||||||
match block with a zero-length ovector; there is always at least one
|
match block with a zero-length ovector; there is always at least one
|
||||||
pair of offsets.)
|
pair of offsets.)
|
||||||
|
|
||||||
Passing the subject as zero-terminated
|
Passing the subject as zero-terminated
|
||||||
|
|
||||||
By default, the subject string is passed to a native API matching func-
|
By default, the subject string is passed to a native API matching func-
|
||||||
tion with its correct length. In order to test the facility for passing
|
tion with its correct length. In order to test the facility for passing
|
||||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
||||||
via the POSIX interface, this modifier has no effect, as there is no
|
via the POSIX interface, this modifier has no effect, as there is no
|
||||||
facility for passing a length.)
|
facility for passing a length.)
|
||||||
|
|
||||||
When testing pcre2_substitute(), this modifier also has the effect of
|
When testing pcre2_substitute(), this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
|
|
||||||
Passing a NULL context
|
Passing a NULL context
|
||||||
|
|
||||||
Normally, pcre2test passes a context block to pcre2_match(),
|
Normally, pcre2test passes a context block to pcre2_match(),
|
||||||
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
|
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
|
||||||
set, however, NULL is passed. This is for testing that the matching
|
set, however, NULL is passed. This is for testing that the matching
|
||||||
functions behave correctly in this case (they use default values). This
|
functions behave correctly in this case (they use default values). This
|
||||||
modifier cannot be used with the find_limits modifier or when testing
|
modifier cannot be used with the find_limits modifier or when testing
|
||||||
the substitution function.
|
the substitution function.
|
||||||
|
|
||||||
|
|
||||||
THE ALTERNATIVE MATCHING FUNCTION
|
THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
By default, pcre2test uses the standard PCRE2 matching function,
|
By default, pcre2test uses the standard PCRE2 matching function,
|
||||||
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
||||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||||
ferent way, and has some restrictions. The differences between the two
|
ferent way, and has some restrictions. The differences between the two
|
||||||
functions are described in the pcre2matching documentation.
|
functions are described in the pcre2matching documentation.
|
||||||
|
|
||||||
If the dfa modifier is set, the alternative matching function is used.
|
If the dfa modifier is set, the alternative matching function is used.
|
||||||
This function finds all possible matches at a given point in the sub-
|
This function finds all possible matches at a given point in the sub-
|
||||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||||
after the first match is found. This is always the shortest possible
|
after the first match is found. This is always the shortest possible
|
||||||
match.
|
match.
|
||||||
|
|
||||||
|
|
||||||
DEFAULT OUTPUT FROM pcre2test
|
DEFAULT OUTPUT FROM pcre2test
|
||||||
|
|
||||||
This section describes the output when the normal matching function,
|
This section describes the output when the normal matching function,
|
||||||
pcre2_match(), is being used.
|
pcre2_match(), is being used.
|
||||||
|
|
||||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||||
strings, starting with number 0 for the string that matched the whole
|
strings, starting with number 0 for the string that matched the whole
|
||||||
pattern. Otherwise, it outputs "No match" when the return is
|
pattern. Otherwise, it outputs "No match" when the return is
|
||||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||||
this is the entire substring that was inspected during the partial
|
this is the entire substring that was inspected during the partial
|
||||||
match; it may include characters before the actual match start if a
|
match; it may include characters before the actual match start if a
|
||||||
lookbehind assertion, \K, \b, or \B was involved.)
|
lookbehind assertion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||||
and a short descriptive phrase. If the error is a failed UTF string
|
and a short descriptive phrase. If the error is a failed UTF string
|
||||||
check, the code unit offset of the start of the failing character is
|
check, the code unit offset of the start of the failing character is
|
||||||
also output. Here is an example of an interactive pcre2test run.
|
also output. Here is an example of an interactive pcre2test run.
|
||||||
|
|
||||||
$ pcre2test
|
$ pcre2test
|
||||||
|
@ -1331,8 +1338,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Unset capturing substrings that are not followed by one that is set are
|
Unset capturing substrings that are not followed by one that is set are
|
||||||
not shown by pcre2test unless the allcaptures modifier is specified. In
|
not shown by pcre2test unless the allcaptures modifier is specified. In
|
||||||
the following example, there are two capturing substrings, but when the
|
the following example, there are two capturing substrings, but when the
|
||||||
first data line is matched, the second, unset substring is not shown.
|
first data line is matched, the second, unset substring is not shown.
|
||||||
An "internal" unset substring is shown as "<unset>", as for the second
|
An "internal" unset substring is shown as "<unset>", as for the second
|
||||||
data line.
|
data line.
|
||||||
|
|
||||||
re> /(a)|(b)/
|
re> /(a)|(b)/
|
||||||
|
@ -1344,11 +1351,11 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
1: <unset>
|
1: <unset>
|
||||||
2: b
|
2: b
|
||||||
|
|
||||||
If the strings contain any non-printing characters, they are output as
|
If the strings contain any non-printing characters, they are output as
|
||||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||||
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
||||||
nition of non-printing characters. If the aftertext modifier is set,
|
nition of non-printing characters. If the aftertext modifier is set,
|
||||||
the output for substring 0 is followed by the the rest of the subject
|
the output for substring 0 is followed by the the rest of the subject
|
||||||
string, identified by "0+" like this:
|
string, identified by "0+" like this:
|
||||||
|
|
||||||
re> /cat/aftertext
|
re> /cat/aftertext
|
||||||
|
@ -1356,7 +1363,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: cat
|
0: cat
|
||||||
0+ aract
|
0+ aract
|
||||||
|
|
||||||
If global matching is requested, the results of successive matching
|
If global matching is requested, the results of successive matching
|
||||||
attempts are output in sequence, like this:
|
attempts are output in sequence, like this:
|
||||||
|
|
||||||
re> /\Bi(\w\w)/g
|
re> /\Bi(\w\w)/g
|
||||||
|
@ -1368,8 +1375,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: ipp
|
0: ipp
|
||||||
1: pp
|
1: pp
|
||||||
|
|
||||||
"No match" is output only if the first match attempt fails. Here is an
|
"No match" is output only if the first match attempt fails. Here is an
|
||||||
example of a failure message (the offset 4 that is specified by the
|
example of a failure message (the offset 4 that is specified by the
|
||||||
offset modifier is past the end of the subject string):
|
offset modifier is past the end of the subject string):
|
||||||
|
|
||||||
re> /xyz/
|
re> /xyz/
|
||||||
|
@ -1377,7 +1384,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Error -24 (bad offset value)
|
Error -24 (bad offset value)
|
||||||
|
|
||||||
Note that whereas patterns can be continued over several lines (a plain
|
Note that whereas patterns can be continued over several lines (a plain
|
||||||
">" prompt is used for continuations), subject lines may not. However
|
">" prompt is used for continuations), subject lines may not. However
|
||||||
newlines can be included in a subject by means of the \n escape (or \r,
|
newlines can be included in a subject by means of the \n escape (or \r,
|
||||||
\r\n, etc., depending on the newline sequence setting).
|
\r\n, etc., depending on the newline sequence setting).
|
||||||
|
|
||||||
|
@ -1385,7 +1392,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
When the alternative matching function, pcre2_dfa_match(), is used, the
|
When the alternative matching function, pcre2_dfa_match(), is used, the
|
||||||
output consists of a list of all the matches that start at the first
|
output consists of a list of all the matches that start at the first
|
||||||
point in the subject where there is at least one match. For example:
|
point in the subject where there is at least one match. For example:
|
||||||
|
|
||||||
re> /(tang|tangerine|tan)/
|
re> /(tang|tangerine|tan)/
|
||||||
|
@ -1394,11 +1401,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tang
|
1: tang
|
||||||
2: tan
|
2: tan
|
||||||
|
|
||||||
Using the normal matching function on this data finds only "tang". The
|
Using the normal matching function on this data finds only "tang". The
|
||||||
longest matching string is always given first (and numbered zero).
|
longest matching string is always given first (and numbered zero).
|
||||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||||
followed by the partially matching substring. Note that this is the
|
followed by the partially matching substring. Note that this is the
|
||||||
entire substring that was inspected during the partial match; it may
|
entire substring that was inspected during the partial match; it may
|
||||||
include characters before the actual match start if a lookbehind asser-
|
include characters before the actual match start if a lookbehind asser-
|
||||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
||||||
|
|
||||||
|
@ -1414,16 +1421,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tan
|
1: tan
|
||||||
0: tan
|
0: tan
|
||||||
|
|
||||||
The alternative matching function does not support substring capture,
|
The alternative matching function does not support substring capture,
|
||||||
so the modifiers that are concerned with captured substrings are not
|
so the modifiers that are concerned with captured substrings are not
|
||||||
relevant.
|
relevant.
|
||||||
|
|
||||||
|
|
||||||
RESTARTING AFTER A PARTIAL MATCH
|
RESTARTING AFTER A PARTIAL MATCH
|
||||||
|
|
||||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||||
TIAL return, indicating that the subject partially matched the pattern,
|
TIAL return, indicating that the subject partially matched the pattern,
|
||||||
you can restart the match with additional subject data by means of the
|
you can restart the match with additional subject data by means of the
|
||||||
dfa_restart modifier. For example:
|
dfa_restart modifier. For example:
|
||||||
|
|
||||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||||
|
@ -1432,45 +1439,45 @@ RESTARTING AFTER A PARTIAL MATCH
|
||||||
data> n05\=dfa,dfa_restart
|
data> n05\=dfa,dfa_restart
|
||||||
0: n05
|
0: n05
|
||||||
|
|
||||||
For further information about partial matching, see the pcre2partial
|
For further information about partial matching, see the pcre2partial
|
||||||
documentation.
|
documentation.
|
||||||
|
|
||||||
|
|
||||||
CALLOUTS
|
CALLOUTS
|
||||||
|
|
||||||
If the pattern contains any callout requests, pcre2test's callout func-
|
If the pattern contains any callout requests, pcre2test's callout func-
|
||||||
tion is called during matching unless callout_none is specified. This
|
tion is called during matching unless callout_none is specified. This
|
||||||
works with both matching functions.
|
works with both matching functions.
|
||||||
|
|
||||||
The callout function in pcre2test returns zero (carry on matching) by
|
The callout function in pcre2test returns zero (carry on matching) by
|
||||||
default, but you can use a callout_fail modifier in a subject line (as
|
default, but you can use a callout_fail modifier in a subject line (as
|
||||||
described above) to change this and other parameters of the callout.
|
described above) to change this and other parameters of the callout.
|
||||||
|
|
||||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||||
cated regular expressions. For further information about callouts, see
|
cated regular expressions. For further information about callouts, see
|
||||||
the pcre2callout documentation.
|
the pcre2callout documentation.
|
||||||
|
|
||||||
The output for callouts with numerical arguments and those with string
|
The output for callouts with numerical arguments and those with string
|
||||||
arguments is slightly different.
|
arguments is slightly different.
|
||||||
|
|
||||||
Callouts with numerical arguments
|
Callouts with numerical arguments
|
||||||
|
|
||||||
By default, the callout function displays the callout number, the start
|
By default, the callout function displays the callout number, the start
|
||||||
and current positions in the subject text at the callout time, and the
|
and current positions in the subject text at the callout time, and the
|
||||||
next pattern item to be tested. For example:
|
next pattern item to be tested. For example:
|
||||||
|
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match
|
This output indicates that callout number 0 occurred for a match
|
||||||
attempt starting at the fourth character of the subject string, when
|
attempt starting at the fourth character of the subject string, when
|
||||||
the pointer was at the seventh character, and when the next pattern
|
the pointer was at the seventh character, and when the next pattern
|
||||||
item was \d. Just one circumflex is output if the start and current
|
item was \d. Just one circumflex is output if the start and current
|
||||||
positions are the same, or if the current position precedes the start
|
positions are the same, or if the current position precedes the start
|
||||||
position, which can happen if the callout is in a lookbehind assertion.
|
position, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the /auto_callout pattern modifier. In this case, instead
|
a result of the /auto_callout pattern modifier. In this case, instead
|
||||||
of showing the callout number, the offset in the pattern, preceded by a
|
of showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
|
@ -1484,7 +1491,7 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
|
@ -1498,17 +1505,17 @@ CALLOUTS
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
The mark changes between matching "a" and "b", but stays the same for
|
The mark changes between matching "a" and "b", but stays the same for
|
||||||
the rest of the match, so nothing more is output. If, as a result of
|
the rest of the match, so nothing more is output. If, as a result of
|
||||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||||
output.
|
output.
|
||||||
|
|
||||||
Callouts with string arguments
|
Callouts with string arguments
|
||||||
|
|
||||||
The output for a callout with a string argument is similar, except that
|
The output for a callout with a string argument is similar, except that
|
||||||
instead of outputting a callout number before the position indicators,
|
instead of outputting a callout number before the position indicators,
|
||||||
the callout string and its offset in the pattern string are output
|
the callout string and its offset in the pattern string are output
|
||||||
before the reflection of the subject string, and the subject string is
|
before the reflection of the subject string, and the subject string is
|
||||||
reflected for each callout. For example:
|
reflected for each callout. For example:
|
||||||
|
|
||||||
re> /^ab(?C'first')cd(?C"second")ef/
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -1525,43 +1532,43 @@ CALLOUTS
|
||||||
NON-PRINTING CHARACTERS
|
NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text in the compiled version of a pattern,
|
When pcre2test is outputting text in the compiled version of a pattern,
|
||||||
bytes other than 32-126 are always treated as non-printing characters
|
bytes other than 32-126 are always treated as non-printing characters
|
||||||
and are therefore shown as hex escapes.
|
and are therefore shown as hex escapes.
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the locale modifier). In this case, the
|
set for the pattern (using the locale modifier). In this case, the
|
||||||
isprint() function is used to distinguish printing and non-printing
|
isprint() function is used to distinguish printing and non-printing
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
|
||||||
SAVING AND RESTORING COMPILED PATTERNS
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
|
||||||
It is possible to save compiled patterns on disc or elsewhere, and
|
It is possible to save compiled patterns on disc or elsewhere, and
|
||||||
reload them later, subject to a number of restrictions. JIT data cannot
|
reload them later, subject to a number of restrictions. JIT data cannot
|
||||||
be saved. The host on which the patterns are reloaded must be running
|
be saved. The host on which the patterns are reloaded must be running
|
||||||
the same version of PCRE2, with the same code unit width, and must also
|
the same version of PCRE2, with the same code unit width, and must also
|
||||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
compiled patterns can be saved they must be serialized, that is, con-
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
verted to a stream of bytes. A single byte stream may contain any num-
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
ber of compiled patterns, but they must all use the same character
|
ber of compiled patterns, but they must all use the same character
|
||||||
tables. A single copy of the tables is included in the byte stream (its
|
tables. A single copy of the tables is included in the byte stream (its
|
||||||
size is 1088 bytes).
|
size is 1088 bytes).
|
||||||
|
|
||||||
The functions whose names begin with pcre2_serialize_ are used for
|
The functions whose names begin with pcre2_serialize_ are used for
|
||||||
serializing and de-serializing. They are described in the pcre2serial-
|
serializing and de-serializing. They are described in the pcre2serial-
|
||||||
ize documentation. In this section we describe the features of
|
ize documentation. In this section we describe the features of
|
||||||
pcre2test that can be used to test these functions.
|
pcre2test that can be used to test these functions.
|
||||||
|
|
||||||
When a pattern with push modifier is successfully compiled, it is
|
When a pattern with push modifier is successfully compiled, it is
|
||||||
pushed onto a stack of compiled patterns, and pcre2test expects the
|
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||||
next line to contain a new pattern (or command) instead of a subject
|
next line to contain a new pattern (or command) instead of a subject
|
||||||
line. By contrast, the pushcopy modifier causes a copy of the compiled
|
line. By contrast, the pushcopy modifier causes a copy of the compiled
|
||||||
pattern to be stacked, leaving the original available for immediate
|
pattern to be stacked, leaving the original available for immediate
|
||||||
matching. By using push and/or pushcopy, a number of patterns can be
|
matching. By using push and/or pushcopy, a number of patterns can be
|
||||||
compiled and retained. These modifiers are incompatible with posix, and
|
compiled and retained. These modifiers are incompatible with posix, and
|
||||||
control modifiers that act at match time are ignored (with a message)
|
control modifiers that act at match time are ignored (with a message)
|
||||||
for the stacked patterns. The jitverify modifier applies only at com-
|
for the stacked patterns. The jitverify modifier applies only at com-
|
||||||
pile time.
|
pile time.
|
||||||
|
|
||||||
The command
|
The command
|
||||||
|
@ -1569,21 +1576,21 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
#save <filename>
|
#save <filename>
|
||||||
|
|
||||||
causes all the stacked patterns to be serialized and the result written
|
causes all the stacked patterns to be serialized and the result written
|
||||||
to the named file. Afterwards, all the stacked patterns are freed. The
|
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||||
command
|
command
|
||||||
|
|
||||||
#load <filename>
|
#load <filename>
|
||||||
|
|
||||||
reads the data in the file, and then arranges for it to be de-serial-
|
reads the data in the file, and then arranges for it to be de-serial-
|
||||||
ized, with the resulting compiled patterns added to the pattern stack.
|
ized, with the resulting compiled patterns added to the pattern stack.
|
||||||
The pattern on the top of the stack can be retrieved by the #pop com-
|
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||||
mand, which must be followed by lines of subjects that are to be
|
mand, which must be followed by lines of subjects that are to be
|
||||||
matched with the pattern, terminated as usual by an empty line or end
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
of file. This command may be followed by a modifier list containing
|
of file. This command may be followed by a modifier list containing
|
||||||
only control modifiers that act after a pattern has been compiled. In
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
||||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||||
however permitted. Here is an example that saves and reloads two pat-
|
however permitted. Here is an example that saves and reloads two pat-
|
||||||
terns.
|
terns.
|
||||||
|
|
||||||
/abc/push
|
/abc/push
|
||||||
|
@ -1596,10 +1603,10 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
#pop jit,bincode
|
#pop jit,bincode
|
||||||
abc
|
abc
|
||||||
|
|
||||||
If jitverify is used with #pop, it does not automatically imply jit,
|
If jitverify is used with #pop, it does not automatically imply jit,
|
||||||
which is different behaviour from when it is used on a pattern.
|
which is different behaviour from when it is used on a pattern.
|
||||||
|
|
||||||
The #popcopy command is analagous to the pushcopy modifier in that it
|
The #popcopy command is analagous to the pushcopy modifier in that it
|
||||||
makes current a copy of the topmost stack pattern, leaving the original
|
makes current a copy of the topmost stack pattern, leaving the original
|
||||||
still on the stack.
|
still on the stack.
|
||||||
|
|
||||||
|
@ -1619,5 +1626,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 21 March 2017
|
Last updated: 24 March 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue