Documentation update

This commit is contained in:
Philip.Hazel 2017-03-25 11:52:22 +00:00
parent 25cec7a966
commit 935183d2d5
4 changed files with 182 additions and 162 deletions

View File

@ -43,26 +43,25 @@ offsets to captured substrings. Its arguments are:
A match context is needed only if you want to: A match context is needed only if you want to:
<pre> <pre>
Set up a callout function Set up a callout function
Change the limit for calling the internal function <i>match()</i> Set a matching offset limit
Change the limit for calling <i>match()</i> recursively Change the backtracking match limit
Set custom memory management when the heap is used for recursion Change the backtracking depth limit
Set custom memory management in the match context
</pre> </pre>
The <i>length</i> and <i>startoffset</i> values are code The <i>length</i> and <i>startoffset</i> values are code
units, not characters. The options are: units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
subject that is terminated by a binary zero code unit. The options are:
<pre> <pre>
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject string is not the beginning of a line PCRE2_NOTBOL Subject string is not the beginning of a line
PCRE2_NOTEOL Subject string is not the end of a line PCRE2_NOTEOL Subject string is not the end of a line
PCRE2_NOTEMPTY An empty string is not a valid match PCRE2_NOTEMPTY An empty string is not a valid match
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject is not a valid match
is not a valid match PCRE2_NO_JIT Do not use JIT matching
PCRE2_NO_UTF_CHECK Do not check the subject for UTF PCRE2_NO_UTF_CHECK Do not check the subject for UTF validity (only relevant if PCRE2_UTF
validity (only relevant if PCRE2_UTF
was set at compile time) was set at compile time)
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match
match if no full matches are found PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial match if no full matches are found
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match
if that is found before a full match
</pre> </pre>
For details of partial matching, see the For details of partial matching, see the
<a href="pcre2partial.html"><b>pcre2partial</b></a> <a href="pcre2partial.html"><b>pcre2partial</b></a>

View File

@ -1082,14 +1082,14 @@ pattern.
copy=&#60;number or name&#62; copy captured substring copy=&#60;number or name&#62; copy captured substring
depth_limit=&#60;n&#62; set a depth limit depth_limit=&#60;n&#62; set a depth limit
dfa use <b>pcre2_dfa_match()</b> dfa use <b>pcre2_dfa_match()</b>
find_limits find match and recursion limits find_limits find match and depth limits
get=&#60;number or name&#62; extract captured substring get=&#60;number or name&#62; extract captured substring
getall extract all captured substrings getall extract all captured substrings
/g global global matching /g global global matching
jitstack=&#60;n&#62; set size of JIT stack jitstack=&#60;n&#62; set size of JIT stack
mark show mark values mark show mark values
match_limit=&#60;n&#62; set a match limit match_limit=&#60;n&#62; set a match limit
memory show memory usage memory show heap memory usage
null_context match with a NULL context null_context match with a NULL context
offset=&#60;n&#62; set starting offset offset=&#60;n&#62; set starting offset
offset_limit=&#60;n&#62; set offset limit offset_limit=&#60;n&#62; set offset limit
@ -1339,25 +1339,29 @@ limits in the match context. These values are ignored when the
Finding minimum limits Finding minimum limits
</b><br> </b><br>
<P> <P>
If the <b>find_limits</b> modifier is present, <b>pcre2test</b> calls If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
<b>pcre2_match()</b> several times, setting different values in the match calls the relevant matching function several times, setting different values in
context via <b>pcre2_set_match_limit()</b> and <b>pcre2_set_depth_limit()</b> the match context via <b>pcre2_set_match_limit()</b> or
until it finds the minimum values for each parameter that allow <b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
<b>pcre2_match()</b> to complete without error. parameter that allows the match to complete without error.
</P> </P>
<P> <P>
If JIT is being used, only the match limit is relevant. If DFA matching is If JIT is being used, only the match limit is relevant. If DFA matching is
being used, only the depth limit is relevant, but at present this modifier is being used, only the depth limit is relevant.
ignored (with a warning message).
</P> </P>
<P> <P>
The <i>match_limit</i> number is a measure of the amount of backtracking The <i>match_limit</i> number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with numbers of matching possibilities, it can become large very quickly with
increasing length of subject string. The <i>depth_limit</i> number is increasing length of subject string.
a measure of how much memory for recording backtracking points is needed to </P>
complete the match attempt. <P>
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
much memory for recording backtracking points is needed to complete the match
attempt. In the case of DFA matching, <i>depth_limit</i> controls the depth of
recursive calls of the internal function that is used for handling pattern
recursion, lookaround assertions, and atomic groups.
</P> </P>
<br><b> <br><b>
Showing MARK names Showing MARK names
@ -1373,8 +1377,11 @@ is added to the non-match message.
Showing memory usage Showing memory usage
</b><br> </b><br>
<P> <P>
The <b>memory</b> modifier causes <b>pcre2test</b> to log all memory allocation The <b>memory</b> modifier causes <b>pcre2test</b> to log all heap memory
and freeing calls that occur during a match operation. allocation and freeing calls that occur during a call to <b>pcre2_match()</b>.
These occur only when a match requires a bigger vector than the default for
remembering backtracking points. In many cases there will be none. No heap
memory is allocated during matching with <b>pcre2_dfa_match</b> or with JIT.
</P> </P>
<br><b> <br><b>
Setting a starting offset Setting a starting offset
@ -1787,7 +1794,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 March 2017 Last updated: 24 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -1,4 +1,4 @@
.TH PCRE2_MATCH 3 "21 October 2014" "PCRE2 10.00" .TH PCRE2_MATCH 3 "25 March 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -31,26 +31,33 @@ offsets to captured substrings. Its arguments are:
A match context is needed only if you want to: A match context is needed only if you want to:
.sp .sp
Set up a callout function Set up a callout function
Change the limit for calling the internal function \fImatch()\fP Set a matching offset limit
Change the limit for calling \fImatch()\fP recursively Change the backtracking match limit
Set custom memory management when the heap is used for recursion Change the backtracking depth limit
Set custom memory management in the match context
.sp .sp
The \fIlength\fP and \fIstartoffset\fP values are code The \fIlength\fP and \fIstartoffset\fP values are code
units, not characters. The options are: units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
subject that is terminated by a binary zero code unit. The options are:
.sp .sp
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject string is not the beginning of a line PCRE2_NOTBOL Subject string is not the beginning of a line
PCRE2_NOTEOL Subject string is not the end of a line PCRE2_NOTEOL Subject string is not the end of a line
PCRE2_NOTEMPTY An empty string is not a valid match PCRE2_NOTEMPTY An empty string is not a valid match
.\" JOIN
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject
is not a valid match is not a valid match
PCRE2_NO_JIT Do not use JIT matching
.\" JOIN
PCRE2_NO_UTF_CHECK Do not check the subject for UTF PCRE2_NO_UTF_CHECK Do not check the subject for UTF
validity (only relevant if PCRE2_UTF validity (only relevant if PCRE2_UTF
was set at compile time) was set at compile time)
.\" JOIN
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial
match even if there is a full match
.\" JOIN
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
match if no full matches are found match if no full matches are found
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match
if that is found before a full match
.sp .sp
For details of partial matching, see the For details of partial matching, see the
.\" HREF .\" HREF

View File

@ -966,14 +966,14 @@ SUBJECT MODIFIERS
copy=<number or name> copy captured substring copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit depth_limit=<n> set a depth limit
dfa use pcre2_dfa_match() dfa use pcre2_dfa_match()
find_limits find match and recursion limits find_limits find match and depth limits
get=<number or name> extract captured substring get=<number or name> extract captured substring
getall extract all captured substrings getall extract all captured substrings
/g global global matching /g global global matching
jitstack=<n> set size of JIT stack jitstack=<n> set size of JIT stack
mark show mark values mark show mark values
match_limit=<n> set a match limit match_limit=<n> set a match limit
memory show memory usage memory show heap memory usage
null_context match with a NULL context null_context match with a NULL context
offset=<n> set starting offset offset=<n> set starting offset
offset_limit=<n> set offset limit offset_limit=<n> set offset limit
@ -1202,120 +1202,127 @@ SUBJECT MODIFIERS
Finding minimum limits Finding minimum limits
If the find_limits modifier is present, pcre2test calls pcre2_match() If the find_limits modifier is present on a subject line, pcre2test
several times, setting different values in the match context via calls the relevant matching function several times, setting different
pcre2_set_match_limit() and pcre2_set_depth_limit() until it finds the values in the match context via pcre2_set_match_limit() or
minimum values for each parameter that allow pcre2_match() to complete pcre2_set_depth_limit() until it finds the minimum values for each
without error. parameter that allows the match to complete without error.
If JIT is being used, only the match limit is relevant. If DFA matching If JIT is being used, only the match limit is relevant. If DFA matching
is being used, only the depth limit is relevant, but at present this is being used, only the depth limit is relevant.
modifier is ignored (with a warning message).
The match_limit number is a measure of the amount of backtracking that The match_limit number is a measure of the amount of backtracking that
takes place, and learning the minimum value can be instructive. For takes place, and learning the minimum value can be instructive. For
most simple matches, the number is quite small, but for patterns with most simple matches, the number is quite small, but for patterns with
very large numbers of matching possibilities, it can become large very very large numbers of matching possibilities, it can become large very
quickly with increasing length of subject string. The depth_limit num- quickly with increasing length of subject string.
ber is a measure of how much memory for recording backtracking points
is needed to complete the match attempt. For non-DFA matching, the minimum depth_limit number is a measure of
how much memory for recording backtracking points is needed to complete
the match attempt. In the case of DFA matching, depth_limit controls
the depth of recursive calls of the internal function that is used for
handling pattern recursion, lookaround assertions, and atomic groups.
Showing MARK names Showing MARK names
The mark modifier causes the names from backtracking control verbs that The mark modifier causes the names from backtracking control verbs that
are returned from calls to pcre2_match() to be displayed. If a mark is are returned from calls to pcre2_match() to be displayed. If a mark is
returned for a match, non-match, or partial match, pcre2test shows it. returned for a match, non-match, or partial match, pcre2test shows it.
For a match, it is on a line by itself, tagged with "MK:". Otherwise, For a match, it is on a line by itself, tagged with "MK:". Otherwise,
it is added to the non-match message. it is added to the non-match message.
Showing memory usage Showing memory usage
The memory modifier causes pcre2test to log all memory allocation and The memory modifier causes pcre2test to log all heap memory allocation
freeing calls that occur during a match operation. and freeing calls that occur during a call to pcre2_match(). These
occur only when a match requires a bigger vector than the default for
remembering backtracking points. In many cases there will be none. No
heap memory is allocated during matching with pcre2_dfa_match or with
JIT.
Setting a starting offset Setting a starting offset
The offset modifier sets an offset in the subject string at which The offset modifier sets an offset in the subject string at which
matching starts. Its value is a number of code units, not characters. matching starts. Its value is a number of code units, not characters.
Setting an offset limit Setting an offset limit
The offset_limit modifier sets a limit for unanchored matches. If a The offset_limit modifier sets a limit for unanchored matches. If a
match cannot be found starting at or before this offset in the subject, match cannot be found starting at or before this offset in the subject,
a "no match" return is given. The data value is a number of code units, a "no match" return is given. The data value is a number of code units,
not characters. When this modifier is used, the use_offset_limit modi- not characters. When this modifier is used, the use_offset_limit modi-
fier must have been set for the pattern; if not, an error is generated. fier must have been set for the pattern; if not, an error is generated.
Setting the size of the output vector Setting the size of the output vector
The ovector modifier applies only to the subject line in which it The ovector modifier applies only to the subject line in which it
appears, though of course it can also be used to set a default in a appears, though of course it can also be used to set a default in a
#subject command. It specifies the number of pairs of offsets that are #subject command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15. available for storing matching information. The default is 15.
A value of zero is useful when testing the POSIX API because it causes A value of zero is useful when testing the POSIX API because it causes
regexec() to be called with a NULL capture vector. When not testing the regexec() to be called with a NULL capture vector. When not testing the
POSIX API, a value of zero is used to cause pcre2_match_data_cre- POSIX API, a value of zero is used to cause pcre2_match_data_cre-
ate_from_pattern() to be called, in order to create a match block of ate_from_pattern() to be called, in order to create a match block of
exactly the right size for the pattern. (It is not possible to create a exactly the right size for the pattern. (It is not possible to create a
match block with a zero-length ovector; there is always at least one match block with a zero-length ovector; there is always at least one
pair of offsets.) pair of offsets.)
Passing the subject as zero-terminated Passing the subject as zero-terminated
By default, the subject string is passed to a native API matching func- By default, the subject string is passed to a native API matching func-
tion with its correct length. In order to test the facility for passing tion with its correct length. In order to test the facility for passing
a zero-terminated string, the zero_terminate modifier is provided. It a zero-terminated string, the zero_terminate modifier is provided. It
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
via the POSIX interface, this modifier has no effect, as there is no via the POSIX interface, this modifier has no effect, as there is no
facility for passing a length.) facility for passing a length.)
When testing pcre2_substitute(), this modifier also has the effect of When testing pcre2_substitute(), this modifier also has the effect of
passing the replacement string as zero-terminated. passing the replacement string as zero-terminated.
Passing a NULL context Passing a NULL context
Normally, pcre2test passes a context block to pcre2_match(), Normally, pcre2test passes a context block to pcre2_match(),
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
set, however, NULL is passed. This is for testing that the matching set, however, NULL is passed. This is for testing that the matching
functions behave correctly in this case (they use default values). This functions behave correctly in this case (they use default values). This
modifier cannot be used with the find_limits modifier or when testing modifier cannot be used with the find_limits modifier or when testing
the substitution function. the substitution function.
THE ALTERNATIVE MATCHING FUNCTION THE ALTERNATIVE MATCHING FUNCTION
By default, pcre2test uses the standard PCRE2 matching function, By default, pcre2test uses the standard PCRE2 matching function,
pcre2_match() to match each subject line. PCRE2 also supports an alter- pcre2_match() to match each subject line. PCRE2 also supports an alter-
native matching function, pcre2_dfa_match(), which operates in a dif- native matching function, pcre2_dfa_match(), which operates in a dif-
ferent way, and has some restrictions. The differences between the two ferent way, and has some restrictions. The differences between the two
functions are described in the pcre2matching documentation. functions are described in the pcre2matching documentation.
If the dfa modifier is set, the alternative matching function is used. If the dfa modifier is set, the alternative matching function is used.
This function finds all possible matches at a given point in the sub- This function finds all possible matches at a given point in the sub-
ject. If, however, the dfa_shortest modifier is set, processing stops ject. If, however, the dfa_shortest modifier is set, processing stops
after the first match is found. This is always the shortest possible after the first match is found. This is always the shortest possible
match. match.
DEFAULT OUTPUT FROM pcre2test DEFAULT OUTPUT FROM pcre2test
This section describes the output when the normal matching function, This section describes the output when the normal matching function,
pcre2_match(), is being used. pcre2_match(), is being used.
When a match succeeds, pcre2test outputs the list of captured sub- When a match succeeds, pcre2test outputs the list of captured sub-
strings, starting with number 0 for the string that matched the whole strings, starting with number 0 for the string that matched the whole
pattern. Otherwise, it outputs "No match" when the return is pattern. Otherwise, it outputs "No match" when the return is
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
this is the entire substring that was inspected during the partial this is the entire substring that was inspected during the partial
match; it may include characters before the actual match start if a match; it may include characters before the actual match start if a
lookbehind assertion, \K, \b, or \B was involved.) lookbehind assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error number For any other return, pcre2test outputs the PCRE2 negative error number
and a short descriptive phrase. If the error is a failed UTF string and a short descriptive phrase. If the error is a failed UTF string
check, the code unit offset of the start of the failing character is check, the code unit offset of the start of the failing character is
also output. Here is an example of an interactive pcre2test run. also output. Here is an example of an interactive pcre2test run.
$ pcre2test $ pcre2test
@ -1331,8 +1338,8 @@ DEFAULT OUTPUT FROM pcre2test
Unset capturing substrings that are not followed by one that is set are Unset capturing substrings that are not followed by one that is set are
not shown by pcre2test unless the allcaptures modifier is specified. In not shown by pcre2test unless the allcaptures modifier is specified. In
the following example, there are two capturing substrings, but when the the following example, there are two capturing substrings, but when the
first data line is matched, the second, unset substring is not shown. first data line is matched, the second, unset substring is not shown.
An "internal" unset substring is shown as "<unset>", as for the second An "internal" unset substring is shown as "<unset>", as for the second
data line. data line.
re> /(a)|(b)/ re> /(a)|(b)/
@ -1344,11 +1351,11 @@ DEFAULT OUTPUT FROM pcre2test
1: <unset> 1: <unset>
2: b 2: b
If the strings contain any non-printing characters, they are output as If the strings contain any non-printing characters, they are output as
\xhh escapes if the value is less than 256 and UTF mode is not set. \xhh escapes if the value is less than 256 and UTF mode is not set.
Otherwise they are output as \x{hh...} escapes. See below for the defi- Otherwise they are output as \x{hh...} escapes. See below for the defi-
nition of non-printing characters. If the aftertext modifier is set, nition of non-printing characters. If the aftertext modifier is set,
the output for substring 0 is followed by the the rest of the subject the output for substring 0 is followed by the the rest of the subject
string, identified by "0+" like this: string, identified by "0+" like this:
re> /cat/aftertext re> /cat/aftertext
@ -1356,7 +1363,7 @@ DEFAULT OUTPUT FROM pcre2test
0: cat 0: cat
0+ aract 0+ aract
If global matching is requested, the results of successive matching If global matching is requested, the results of successive matching
attempts are output in sequence, like this: attempts are output in sequence, like this:
re> /\Bi(\w\w)/g re> /\Bi(\w\w)/g
@ -1368,8 +1375,8 @@ DEFAULT OUTPUT FROM pcre2test
0: ipp 0: ipp
1: pp 1: pp
"No match" is output only if the first match attempt fails. Here is an "No match" is output only if the first match attempt fails. Here is an
example of a failure message (the offset 4 that is specified by the example of a failure message (the offset 4 that is specified by the
offset modifier is past the end of the subject string): offset modifier is past the end of the subject string):
re> /xyz/ re> /xyz/
@ -1377,7 +1384,7 @@ DEFAULT OUTPUT FROM pcre2test
Error -24 (bad offset value) Error -24 (bad offset value)
Note that whereas patterns can be continued over several lines (a plain Note that whereas patterns can be continued over several lines (a plain
">" prompt is used for continuations), subject lines may not. However ">" prompt is used for continuations), subject lines may not. However
newlines can be included in a subject by means of the \n escape (or \r, newlines can be included in a subject by means of the \n escape (or \r,
\r\n, etc., depending on the newline sequence setting). \r\n, etc., depending on the newline sequence setting).
@ -1385,7 +1392,7 @@ DEFAULT OUTPUT FROM pcre2test
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
When the alternative matching function, pcre2_dfa_match(), is used, the When the alternative matching function, pcre2_dfa_match(), is used, the
output consists of a list of all the matches that start at the first output consists of a list of all the matches that start at the first
point in the subject where there is at least one match. For example: point in the subject where there is at least one match. For example:
re> /(tang|tangerine|tan)/ re> /(tang|tangerine|tan)/
@ -1394,11 +1401,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tang 1: tang
2: tan 2: tan
Using the normal matching function on this data finds only "tang". The Using the normal matching function on this data finds only "tang". The
longest matching string is always given first (and numbered zero). longest matching string is always given first (and numbered zero).
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
followed by the partially matching substring. Note that this is the followed by the partially matching substring. Note that this is the
entire substring that was inspected during the partial match; it may entire substring that was inspected during the partial match; it may
include characters before the actual match start if a lookbehind asser- include characters before the actual match start if a lookbehind asser-
tion, \b, or \B was involved. (\K is not supported for DFA matching.) tion, \b, or \B was involved. (\K is not supported for DFA matching.)
@ -1414,16 +1421,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tan 1: tan
0: tan 0: tan
The alternative matching function does not support substring capture, The alternative matching function does not support substring capture,
so the modifiers that are concerned with captured substrings are not so the modifiers that are concerned with captured substrings are not
relevant. relevant.
RESTARTING AFTER A PARTIAL MATCH RESTARTING AFTER A PARTIAL MATCH
When the alternative matching function has given the PCRE2_ERROR_PAR- When the alternative matching function has given the PCRE2_ERROR_PAR-
TIAL return, indicating that the subject partially matched the pattern, TIAL return, indicating that the subject partially matched the pattern,
you can restart the match with additional subject data by means of the you can restart the match with additional subject data by means of the
dfa_restart modifier. For example: dfa_restart modifier. For example:
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@ -1432,45 +1439,45 @@ RESTARTING AFTER A PARTIAL MATCH
data> n05\=dfa,dfa_restart data> n05\=dfa,dfa_restart
0: n05 0: n05
For further information about partial matching, see the pcre2partial For further information about partial matching, see the pcre2partial
documentation. documentation.
CALLOUTS CALLOUTS
If the pattern contains any callout requests, pcre2test's callout func- If the pattern contains any callout requests, pcre2test's callout func-
tion is called during matching unless callout_none is specified. This tion is called during matching unless callout_none is specified. This
works with both matching functions. works with both matching functions.
The callout function in pcre2test returns zero (carry on matching) by The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line (as default, but you can use a callout_fail modifier in a subject line (as
described above) to change this and other parameters of the callout. described above) to change this and other parameters of the callout.
Inserting callouts can be helpful when using pcre2test to check compli- Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see cated regular expressions. For further information about callouts, see
the pcre2callout documentation. the pcre2callout documentation.
The output for callouts with numerical arguments and those with string The output for callouts with numerical arguments and those with string
arguments is slightly different. arguments is slightly different.
Callouts with numerical arguments Callouts with numerical arguments
By default, the callout function displays the callout number, the start By default, the callout function displays the callout number, the start
and current positions in the subject text at the callout time, and the and current positions in the subject text at the callout time, and the
next pattern item to be tested. For example: next pattern item to be tested. For example:
--->pqrabcdef --->pqrabcdef
0 ^ ^ \d 0 ^ ^ \d
This output indicates that callout number 0 occurred for a match This output indicates that callout number 0 occurred for a match
attempt starting at the fourth character of the subject string, when attempt starting at the fourth character of the subject string, when
the pointer was at the seventh character, and when the next pattern the pointer was at the seventh character, and when the next pattern
item was \d. Just one circumflex is output if the start and current item was \d. Just one circumflex is output if the start and current
positions are the same, or if the current position precedes the start positions are the same, or if the current position precedes the start
position, which can happen if the callout is in a lookbehind assertion. position, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the /auto_callout pattern modifier. In this case, instead a result of the /auto_callout pattern modifier. In this case, instead
of showing the callout number, the offset in the pattern, preceded by a of showing the callout number, the offset in the pattern, preceded by a
plus, is output. For example: plus, is output. For example:
@ -1484,7 +1491,7 @@ CALLOUTS
0: E* 0: E*
If a pattern contains (*MARK) items, an additional line is output when- If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For ever a change of latest mark is passed to the callout function. For
example: example:
re> /a(*MARK:X)bc/auto_callout re> /a(*MARK:X)bc/auto_callout
@ -1498,17 +1505,17 @@ CALLOUTS
+12 ^ ^ +12 ^ ^
0: abc 0: abc
The mark changes between matching "a" and "b", but stays the same for The mark changes between matching "a" and "b", but stays the same for
the rest of the match, so nothing more is output. If, as a result of the rest of the match, so nothing more is output. If, as a result of
backtracking, the mark reverts to being unset, the text "<unset>" is backtracking, the mark reverts to being unset, the text "<unset>" is
output. output.
Callouts with string arguments Callouts with string arguments
The output for a callout with a string argument is similar, except that The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators, instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output the callout string and its offset in the pattern string are output
before the reflection of the subject string, and the subject string is before the reflection of the subject string, and the subject string is
reflected for each callout. For example: reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/ re> /^ab(?C'first')cd(?C"second")ef/
@ -1525,43 +1532,43 @@ CALLOUTS
NON-PRINTING CHARACTERS NON-PRINTING CHARACTERS
When pcre2test is outputting text in the compiled version of a pattern, When pcre2test is outputting text in the compiled version of a pattern,
bytes other than 32-126 are always treated as non-printing characters bytes other than 32-126 are always treated as non-printing characters
and are therefore shown as hex escapes. and are therefore shown as hex escapes.
When pcre2test is outputting text that is a matched part of a subject When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been string, it behaves in the same way, unless a different locale has been
set for the pattern (using the locale modifier). In this case, the set for the pattern (using the locale modifier). In this case, the
isprint() function is used to distinguish printing and non-printing isprint() function is used to distinguish printing and non-printing
characters. characters.
SAVING AND RESTORING COMPILED PATTERNS SAVING AND RESTORING COMPILED PATTERNS
It is possible to save compiled patterns on disc or elsewhere, and It is possible to save compiled patterns on disc or elsewhere, and
reload them later, subject to a number of restrictions. JIT data cannot reload them later, subject to a number of restrictions. JIT data cannot
be saved. The host on which the patterns are reloaded must be running be saved. The host on which the patterns are reloaded must be running
the same version of PCRE2, with the same code unit width, and must also the same version of PCRE2, with the same code unit width, and must also
have the same endianness, pointer width and PCRE2_SIZE type. Before have the same endianness, pointer width and PCRE2_SIZE type. Before
compiled patterns can be saved they must be serialized, that is, con- compiled patterns can be saved they must be serialized, that is, con-
verted to a stream of bytes. A single byte stream may contain any num- verted to a stream of bytes. A single byte stream may contain any num-
ber of compiled patterns, but they must all use the same character ber of compiled patterns, but they must all use the same character
tables. A single copy of the tables is included in the byte stream (its tables. A single copy of the tables is included in the byte stream (its
size is 1088 bytes). size is 1088 bytes).
The functions whose names begin with pcre2_serialize_ are used for The functions whose names begin with pcre2_serialize_ are used for
serializing and de-serializing. They are described in the pcre2serial- serializing and de-serializing. They are described in the pcre2serial-
ize documentation. In this section we describe the features of ize documentation. In this section we describe the features of
pcre2test that can be used to test these functions. pcre2test that can be used to test these functions.
When a pattern with push modifier is successfully compiled, it is When a pattern with push modifier is successfully compiled, it is
pushed onto a stack of compiled patterns, and pcre2test expects the pushed onto a stack of compiled patterns, and pcre2test expects the
next line to contain a new pattern (or command) instead of a subject next line to contain a new pattern (or command) instead of a subject
line. By contrast, the pushcopy modifier causes a copy of the compiled line. By contrast, the pushcopy modifier causes a copy of the compiled
pattern to be stacked, leaving the original available for immediate pattern to be stacked, leaving the original available for immediate
matching. By using push and/or pushcopy, a number of patterns can be matching. By using push and/or pushcopy, a number of patterns can be
compiled and retained. These modifiers are incompatible with posix, and compiled and retained. These modifiers are incompatible with posix, and
control modifiers that act at match time are ignored (with a message) control modifiers that act at match time are ignored (with a message)
for the stacked patterns. The jitverify modifier applies only at com- for the stacked patterns. The jitverify modifier applies only at com-
pile time. pile time.
The command The command
@ -1569,21 +1576,21 @@ SAVING AND RESTORING COMPILED PATTERNS
#save <filename> #save <filename>
causes all the stacked patterns to be serialized and the result written causes all the stacked patterns to be serialized and the result written
to the named file. Afterwards, all the stacked patterns are freed. The to the named file. Afterwards, all the stacked patterns are freed. The
command command
#load <filename> #load <filename>
reads the data in the file, and then arranges for it to be de-serial- reads the data in the file, and then arranges for it to be de-serial-
ized, with the resulting compiled patterns added to the pattern stack. ized, with the resulting compiled patterns added to the pattern stack.
The pattern on the top of the stack can be retrieved by the #pop com- The pattern on the top of the stack can be retrieved by the #pop com-
mand, which must be followed by lines of subjects that are to be mand, which must be followed by lines of subjects that are to be
matched with the pattern, terminated as usual by an empty line or end matched with the pattern, terminated as usual by an empty line or end
of file. This command may be followed by a modifier list containing of file. This command may be followed by a modifier list containing
only control modifiers that act after a pattern has been compiled. In only control modifiers that act after a pattern has been compiled. In
particular, hex, posix, posix_nosub, push, and pushcopy are not particular, hex, posix, posix_nosub, push, and pushcopy are not
allowed, nor are any option-setting modifiers. The JIT modifiers are, allowed, nor are any option-setting modifiers. The JIT modifiers are,
however permitted. Here is an example that saves and reloads two pat- however permitted. Here is an example that saves and reloads two pat-
terns. terns.
/abc/push /abc/push
@ -1596,10 +1603,10 @@ SAVING AND RESTORING COMPILED PATTERNS
#pop jit,bincode #pop jit,bincode
abc abc
If jitverify is used with #pop, it does not automatically imply jit, If jitverify is used with #pop, it does not automatically imply jit,
which is different behaviour from when it is used on a pattern. which is different behaviour from when it is used on a pattern.
The #popcopy command is analagous to the pushcopy modifier in that it The #popcopy command is analagous to the pushcopy modifier in that it
makes current a copy of the topmost stack pattern, leaving the original makes current a copy of the topmost stack pattern, leaving the original
still on the stack. still on the stack.
@ -1619,5 +1626,5 @@ AUTHOR
REVISION REVISION
Last updated: 21 March 2017 Last updated: 24 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.