Implement REG_PEND (GNU extension) for the POSIX wrapper.

This commit is contained in:
Philip.Hazel 2017-06-05 18:25:47 +00:00
parent f850015168
commit bcba497c0b
13 changed files with 447 additions and 327 deletions

View File

@ -182,6 +182,8 @@ deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
38. Fix returned offsets from regexec() when REG_STARTEND is used with a 38. Fix returned offsets from regexec() when REG_STARTEND is used with a
starting offset greater than zero. starting offset greater than zero.
39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
Version 10.23 14-February-2017 Version 10.23 14-February-2017
------------------------------ ------------------------------

View File

@ -69,7 +69,7 @@ replacement library. Other POSIX options are not even defined.
<P> <P>
There are also some options that are not defined by POSIX. These have been There are also some options that are not defined by POSIX. These have been
added at the request of users who want to make use of certain PCRE2-specific added at the request of users who want to make use of certain PCRE2-specific
features via the POSIX calling interface. features via the POSIX calling interface or to add BSD or GNU functionality.
</P> </P>
<P> <P>
When PCRE2 is called via these functions, it is only the API that is POSIX-like When PCRE2 is called via these functions, it is only the API that is POSIX-like
@ -91,10 +91,11 @@ identifying error codes.
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br> <br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
<P> <P>
The function <b>regcomp()</b> is called to compile a pattern into an The function <b>regcomp()</b> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and internal form. By default, the pattern is a C string terminated by a binary
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer zero (but see REG_PEND below). The <i>preg</i> argument is a pointer to a
to a <b>regex_t</b> structure that is used as a base for storing information <b>regex_t</b> structure that is used as a base for storing information about
about the compiled regular expression. the compiled regular expression. (It is also used for input when REG_PEND is
set.)
</P> </P>
<P> <P>
The argument <i>cflags</i> is either zero, or contains one or more of the bits The argument <i>cflags</i> is either zero, or contains one or more of the bits
@ -124,6 +125,16 @@ matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
captured strings are returned. Versions of the PCRE library prior to 10.22 used captured strings are returned. Versions of the PCRE library prior to 10.22 used
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
because it disables the use of back references. because it disables the use of back references.
<pre>
REG_PEND
</pre>
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
(which has the type const char *) must be set to point to the character beyond
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
now contain binary zeroes, which are treated as data characters. Without
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
ignored. This is a GNU extension to the POSIX standard and should be used with
caution in software intended to be portable to other systems.
<pre> <pre>
REG_UCP REG_UCP
</pre> </pre>
@ -156,9 +167,10 @@ class such as [^a] (they are).
</P> </P>
<P> <P>
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
<i>preg</i> structure is filled in on success, and one member of the structure <i>preg</i> structure is filled in on success, and one other member of the
is public: <i>re_nsub</i> contains the number of capturing subpatterns in structure (as well as <i>re_endp</i>) is public: <i>re_nsub</i> contains the
the regular expression. Various error codes are defined in the header file. number of capturing subpatterns in the regular expression. Various error codes
are defined in the header file.
</P> </P>
<P> <P>
NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
@ -228,15 +240,26 @@ function.
<pre> <pre>
REG_STARTEND REG_STARTEND
</pre> </pre>
The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and When this option is set, the subject string is starts at <i>string</i> +
to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i> <i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
(there need not actually be a NUL at that location), regardless of the value of should point to the first character beyond the string. There may be binary
<i>nmatch</i>. This is a BSD extension, compatible with but not specified by zeroes within the subject string, and indeed, using REG_STARTEND is the only
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software way to pass a subject string that contains a binary zero.
intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does </P>
not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not <P>
how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL are Whatever the value of <i>pmatch[0].rm_so</i>, the offsets of the matched string
mutually exclusive; the error REG_INVARG is returned. and any captured substrings are still given relative to the start of
<i>string</i> itself. (Before PCRE2 release 10.30 these were given relative to
<i>string</i> + <i>pmatch[0].rm_so</i>, but this differs from other
implementations.)
</P>
<P>
This is a BSD extension, compatible with but not specified by IEEE Standard
1003.2 (POSIX.2), and should be used with caution in software intended to be
portable to other systems. Note that a non-zero <i>rm_so</i> does not imply
REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
not how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL
are mutually exclusive; the error REG_INVARG is returned.
</P> </P>
<P> <P>
If the pattern was compiled with the REG_NOSUB flag, no data about any matched If the pattern was compiled with the REG_NOSUB flag, no data about any matched
@ -291,9 +314,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC9" href="#TOC1">REVISION</a><br> <br><a name="SEC9" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 31 January 2016 Last updated: 05 June 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -1078,6 +1078,19 @@ are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>. REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
The other modifiers are ignored, with a warning message. The other modifiers are ignored, with a warning message.
</P> </P>
<P>
There is one additional modifier that can be used with the POSIX wrapper. It is
ignored (with a warning) if used for non-POSIX matching.
<pre>
posix_startend=&#60;n&#62;[:&#60;m&#62;]
</pre>
This causes the subject string to be passed to <b>regexec()</b> using the
REG_STARTEND option, which uses offsets to restrict which part of the string is
searched. If only one number is given, the end offset is passed as the end of
the subject string. For more detail of REG_STARTEND, see the
<a href="pcre2posix.html"><b>pcre2posix</b></a>
documentation.
</P>
<br><b> <br><b>
Setting match controls Setting match controls
</b><br> </b><br>
@ -1817,7 +1830,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 01 June 2017 Last updated: 03 June 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -8986,32 +8986,34 @@ DESCRIPTION
There are also some options that are not defined by POSIX. These have There are also some options that are not defined by POSIX. These have
been added at the request of users who want to make use of certain been added at the request of users who want to make use of certain
PCRE2-specific features via the POSIX calling interface. PCRE2-specific features via the POSIX calling interface or to add BSD
or GNU functionality.
When PCRE2 is called via these functions, it is only the API that is When PCRE2 is called via these functions, it is only the API that is
POSIX-like in style. The syntax and semantics of the regular expres- POSIX-like in style. The syntax and semantics of the regular expres-
sions themselves are still those of Perl, subject to the setting of sions themselves are still those of Perl, subject to the setting of
various PCRE2 options, as described below. "POSIX-like in style" means various PCRE2 options, as described below. "POSIX-like in style" means
that the API approximates to the POSIX definition; it is not fully that the API approximates to the POSIX definition; it is not fully
POSIX-compatible, and in multi-unit encoding domains it is probably POSIX-compatible, and in multi-unit encoding domains it is probably
even less compatible. even less compatible.
The header for these functions is supplied as pcre2posix.h to avoid any The header for these functions is supplied as pcre2posix.h to avoid any
potential clash with other POSIX libraries. It can, of course, be potential clash with other POSIX libraries. It can, of course, be
renamed or aliased as regex.h, which is the "correct" name. It provides renamed or aliased as regex.h, which is the "correct" name. It provides
two structure types, regex_t for compiled internal forms, and reg- two structure types, regex_t for compiled internal forms, and reg-
match_t for returning captured substrings. It also defines some con- match_t for returning captured substrings. It also defines some con-
stants whose names start with "REG_"; these are used for setting stants whose names start with "REG_"; these are used for setting
options and identifying error codes. options and identifying error codes.
COMPILING A PATTERN COMPILING A PATTERN
The function regcomp() is called to compile a pattern into an internal The function regcomp() is called to compile a pattern into an internal
form. The pattern is a C string terminated by a binary zero, and is form. By default, the pattern is a C string terminated by a binary zero
passed in the argument pattern. The preg argument is a pointer to a (but see REG_PEND below). The preg argument is a pointer to a regex_t
regex_t structure that is used as a base for storing information about structure that is used as a base for storing information about the com-
the compiled regular expression. piled regular expression. (It is also used for input when REG_PEND is
set.)
The argument cflags is either zero, or contains one or more of the bits The argument cflags is either zero, or contains one or more of the bits
defined by the following macros: defined by the following macros:
@ -9042,38 +9044,50 @@ COMPILING A PATTERN
used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no
longer happens because it disables the use of back references. longer happens because it disables the use of back references.
REG_PEND
If this option is set, the reg_endp field in the preg structure (which
has the type const char *) must be set to point to the character beyond
the end of the pattern before calling regcomp(). The pattern itself may
now contain binary zeroes, which are treated as data characters. With-
out REG_PEND, a binary zero terminates the pattern and the re_endp
field is ignored. This is a GNU extension to the POSIX standard and
should be used with caution in software intended to be portable to
other systems.
REG_UCP REG_UCP
The PCRE2_UCP option is set when the regular expression is passed for The PCRE2_UCP option is set when the regular expression is passed for
compilation to the native function. This causes PCRE2 to use Unicode compilation to the native function. This causes PCRE2 to use Unicode
properties when matchine \d, \w, etc., instead of just recognizing properties when matchine \d, \w, etc., instead of just recognizing
ASCII values. Note that REG_UCP is not part of the POSIX standard. ASCII values. Note that REG_UCP is not part of the POSIX standard.
REG_UNGREEDY REG_UNGREEDY
The PCRE2_UNGREEDY option is set when the regular expression is passed The PCRE2_UNGREEDY option is set when the regular expression is passed
for compilation to the native function. Note that REG_UNGREEDY is not for compilation to the native function. Note that REG_UNGREEDY is not
part of the POSIX standard. part of the POSIX standard.
REG_UTF REG_UTF
The PCRE2_UTF option is set when the regular expression is passed for The PCRE2_UTF option is set when the regular expression is passed for
compilation to the native function. This causes the pattern itself and compilation to the native function. This causes the pattern itself and
all data strings used for matching it to be treated as UTF-8 strings. all data strings used for matching it to be treated as UTF-8 strings.
Note that REG_UTF is not part of the POSIX standard. Note that REG_UTF is not part of the POSIX standard.
In the absence of these flags, no options are passed to the native In the absence of these flags, no options are passed to the native
function. This means the the regex is compiled with PCRE2 default function. This means the the regex is compiled with PCRE2 default
semantics. In particular, the way it handles newline characters in the semantics. In particular, the way it handles newline characters in the
subject string is the Perl way, not the POSIX way. Note that setting subject string is the Perl way, not the POSIX way. Note that setting
PCRE2_MULTILINE has only some of the effects specified for REG_NEWLINE. PCRE2_MULTILINE has only some of the effects specified for REG_NEWLINE.
It does not affect the way newlines are matched by the dot metacharac- It does not affect the way newlines are matched by the dot metacharac-
ter (they are not) or by a negative class such as [^a] (they are). ter (they are not) or by a negative class such as [^a] (they are).
The yield of regcomp() is zero on success, and non-zero otherwise. The The yield of regcomp() is zero on success, and non-zero otherwise. The
preg structure is filled in on success, and one member of the structure preg structure is filled in on success, and one other member of the
is public: re_nsub contains the number of capturing subpatterns in the structure (as well as re_endp) is public: re_nsub contains the number
regular expression. Various error codes are defined in the header file. of capturing subpatterns in the regular expression. Various error codes
are defined in the header file.
NOTE: If the yield of regcomp() is non-zero, you must not attempt to NOTE: If the yield of regcomp() is non-zero, you must not attempt to
use the contents of the preg structure. If, for example, you pass it to use the contents of the preg structure. If, for example, you pass it to
@ -9146,57 +9160,66 @@ MATCHING A PATTERN
REG_STARTEND REG_STARTEND
The string is considered to start at string + pmatch[0].rm_so and to When this option is set, the subject string is starts at string +
have a terminating NUL located at string + pmatch[0].rm_eo (there need pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
not actually be a NUL at that location), regardless of the value of point to the first character beyond the string. There may be binary
nmatch. This is a BSD extension, compatible with but not specified by zeroes within the subject string, and indeed, using REG_STARTEND is the
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in only way to pass a subject string that contains a binary zero.
software intended to be portable to other systems. Note that a non-zero
rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location Whatever the value of pmatch[0].rm_so, the offsets of the matched
of the string, not how it is matched. Setting REG_STARTEND and passing string and any captured substrings are still given relative to the
pmatch as NULL are mutually exclusive; the error REG_INVARG is start of string itself. (Before PCRE2 release 10.30 these were given
relative to string + pmatch[0].rm_so, but this differs from other
implementations.)
This is a BSD extension, compatible with but not specified by IEEE
Standard 1003.2 (POSIX.2), and should be used with caution in software
intended to be portable to other systems. Note that a non-zero rm_so
does not imply REG_NOTBOL; REG_STARTEND affects only the location and
length of the string, not how it is matched. Setting REG_STARTEND and
passing pmatch as NULL are mutually exclusive; the error REG_INVARG is
returned. returned.
If the pattern was compiled with the REG_NOSUB flag, no data about any If the pattern was compiled with the REG_NOSUB flag, no data about any
matched strings is returned. The nmatch and pmatch arguments of matched strings is returned. The nmatch and pmatch arguments of
regexec() are ignored (except possibly as input for REG_STARTEND). regexec() are ignored (except possibly as input for REG_STARTEND).
The value of nmatch may be zero, and the value pmatch may be NULL The value of nmatch may be zero, and the value pmatch may be NULL
(unless REG_STARTEND is set); in both these cases no data about any (unless REG_STARTEND is set); in both these cases no data about any
matched strings is returned. matched strings is returned.
Otherwise, the portion of the string that was matched, and also any Otherwise, the portion of the string that was matched, and also any
captured substrings, are returned via the pmatch argument, which points captured substrings, are returned via the pmatch argument, which points
to an array of nmatch structures of type regmatch_t, containing the to an array of nmatch structures of type regmatch_t, containing the
members rm_so and rm_eo. These contain the byte offset to the first members rm_so and rm_eo. These contain the byte offset to the first
character of each substring and the offset to the first character after character of each substring and the offset to the first character after
the end of each substring, respectively. The 0th element of the vector the end of each substring, respectively. The 0th element of the vector
relates to the entire portion of string that was matched; subsequent relates to the entire portion of string that was matched; subsequent
elements relate to the capturing subpatterns of the regular expression. elements relate to the capturing subpatterns of the regular expression.
Unused entries in the array have both structure members set to -1. Unused entries in the array have both structure members set to -1.
A successful match yields a zero return; various error codes are A successful match yields a zero return; various error codes are
defined in the header file, of which REG_NOMATCH is the "expected" defined in the header file, of which REG_NOMATCH is the "expected"
failure code. failure code.
ERROR MESSAGES ERROR MESSAGES
The regerror() function maps a non-zero errorcode from either regcomp() The regerror() function maps a non-zero errorcode from either regcomp()
or regexec() to a printable message. If preg is not NULL, the error or regexec() to a printable message. If preg is not NULL, the error
should have arisen from the use of that structure. A message terminated should have arisen from the use of that structure. A message terminated
by a binary zero is placed in errbuf. If the buffer is too short, only by a binary zero is placed in errbuf. If the buffer is too short, only
the first errbuf_size - 1 characters of the error message are used. The the first errbuf_size - 1 characters of the error message are used. The
yield of the function is the size of buffer needed to hold the whole yield of the function is the size of buffer needed to hold the whole
message, including the terminating zero. This value is greater than message, including the terminating zero. This value is greater than
errbuf_size if the message was truncated. errbuf_size if the message was truncated.
MEMORY USAGE MEMORY USAGE
Compiling a regular expression causes memory to be allocated and asso- Compiling a regular expression causes memory to be allocated and asso-
ciated with the preg structure. The function regfree() frees all such ciated with the preg structure. The function regfree() frees all such
memory, after which preg may no longer be used as a compiled expres- memory, after which preg may no longer be used as a compiled expres-
sion. sion.
@ -9209,8 +9232,8 @@ AUTHOR
REVISION REVISION
Last updated: 31 January 2016 Last updated: 05 June 2017
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2POSIX 3 "03 June 2017" "PCRE2 10.30" .TH PCRE2POSIX 3 "05 June 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "SYNOPSIS" .SH "SYNOPSIS"
@ -46,7 +46,7 @@ replacement library. Other POSIX options are not even defined.
.P .P
There are also some options that are not defined by POSIX. These have been There are also some options that are not defined by POSIX. These have been
added at the request of users who want to make use of certain PCRE2-specific added at the request of users who want to make use of certain PCRE2-specific
features via the POSIX calling interface. features via the POSIX calling interface or to add BSD or GNU functionality.
.P .P
When PCRE2 is called via these functions, it is only the API that is POSIX-like When PCRE2 is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are in style. The syntax and semantics of the regular expressions themselves are
@ -68,10 +68,11 @@ identifying error codes.
.rs .rs
.sp .sp
The function \fBregcomp()\fP is called to compile a pattern into an The function \fBregcomp()\fP is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and internal form. By default, the pattern is a C string terminated by a binary
is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer zero (but see REG_PEND below). The \fIpreg\fP argument is a pointer to a
to a \fBregex_t\fP structure that is used as a base for storing information \fBregex_t\fP structure that is used as a base for storing information about
about the compiled regular expression. the compiled regular expression. (It is also used for input when REG_PEND is
set.)
.P .P
The argument \fIcflags\fP is either zero, or contains one or more of the bits The argument \fIcflags\fP is either zero, or contains one or more of the bits
defined by the following macros: defined by the following macros:
@ -100,6 +101,16 @@ matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
captured strings are returned. Versions of the PCRE library prior to 10.22 used captured strings are returned. Versions of the PCRE library prior to 10.22 used
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
because it disables the use of back references. because it disables the use of back references.
.sp
REG_PEND
.sp
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
(which has the type const char *) must be set to point to the character beyond
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
now contain binary zeroes, which are treated as data characters. Without
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
ignored. This is a GNU extension to the POSIX standard and should be used with
caution in software intended to be portable to other systems.
.sp .sp
REG_UCP REG_UCP
.sp .sp
@ -130,9 +141,10 @@ newlines are matched by the dot metacharacter (they are not) or by a negative
class such as [^a] (they are). class such as [^a] (they are).
.P .P
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
\fIpreg\fP structure is filled in on success, and one member of the structure \fIpreg\fP structure is filled in on success, and one other member of the
is public: \fIre_nsub\fP contains the number of capturing subpatterns in structure (as well as \fIre_endp\fP) is public: \fIre_nsub\fP contains the
the regular expression. Various error codes are defined in the header file. number of capturing subpatterns in the regular expression. Various error codes
are defined in the header file.
.P .P
NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to
use the contents of the \fIpreg\fP structure. If, for example, you pass it to use the contents of the \fIpreg\fP structure. If, for example, you pass it to
@ -204,21 +216,24 @@ function.
.sp .sp
REG_STARTEND REG_STARTEND
.sp .sp
When this option is set, the string is considered to start at \fIstring\fP + When this option is set, the subject string is starts at \fIstring\fP +
\fIpmatch[0].rm_so\fP and to have a terminating NUL located at \fIstring\fP + \fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
\fIpmatch[0].rm_eo\fP (there need not actually be a NUL at that location), should point to the first character beyond the string. There may be binary
regardless of the value of \fInmatch\fP. However, the offsets of the matched zeroes within the subject string, and indeed, using REG_STARTEND is the only
string and any captured substrings are still given relative to the start of way to pass a subject string that contains a binary zero.
\fIstring\fP. (Before PCRE2 release 10.30 these were given relative to .P
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
and any captured substrings are still given relative to the start of
\fIstring\fP itself. (Before PCRE2 release 10.30 these were given relative to
\fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other \fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other
implementations.) implementations.)
.P .P
This is a BSD extension, compatible with but not specified by IEEE Standard This is a BSD extension, compatible with but not specified by IEEE Standard
1003.2 (POSIX.2), and should be used with caution in software intended to be 1003.2 (POSIX.2), and should be used with caution in software intended to be
portable to other systems. Note that a non-zero \fIrm_so\fP does not imply portable to other systems. Note that a non-zero \fIrm_so\fP does not imply
REG_NOTBOL; REG_STARTEND affects only the location of the string, not how it is REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL are mutually not how it is matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL
exclusive; the error REG_INVARG is returned. are mutually exclusive; the error REG_INVARG is returned.
.P .P
If the pattern was compiled with the REG_NOSUB flag, no data about any matched If the pattern was compiled with the REG_NOSUB flag, no data about any matched
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
@ -277,6 +292,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 03 June 2017 Last updated: 05 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -965,11 +965,22 @@ SUBJECT MODIFIERS
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec(). REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
The other modifiers are ignored, with a warning message. The other modifiers are ignored, with a warning message.
There is one additional modifier that can be used with the POSIX wrap-
per. It is ignored (with a warning) if used for non-POSIX matching.
posix_startend=<n>[:<m>]
This causes the subject string to be passed to regexec() using the
REG_STARTEND option, which uses offsets to restrict which part of the
string is searched. If only one number is given, the end offset is
passed as the end of the subject string. For more detail of REG_STAR-
TEND, see the pcre2posix documentation.
Setting match controls Setting match controls
The following modifiers affect the matching process or request addi- The following modifiers affect the matching process or request addi-
tional information. Some of them may also be specified on a pattern tional information. Some of them may also be specified on a pattern
line (see above), in which case they apply to every subject line that line (see above), in which case they apply to every subject line that
is matched against that pattern. is matched against that pattern.
aftertext show text after match aftertext show text after match
@ -1009,29 +1020,29 @@ SUBJECT MODIFIERS
zero_terminate pass the subject as zero-terminated zero_terminate pass the subject as zero-terminated
The effects of these modifiers are described in the following sections. The effects of these modifiers are described in the following sections.
When matching via the POSIX wrapper API, the aftertext, allaftertext, When matching via the POSIX wrapper API, the aftertext, allaftertext,
and ovector subject modifiers work as described below. All other modi- and ovector subject modifiers work as described below. All other modi-
fiers are either ignored, with a warning message, or cause an error. fiers are either ignored, with a warning message, or cause an error.
Showing more text Showing more text
The aftertext modifier requests that as well as outputting the part of The aftertext modifier requests that as well as outputting the part of
the subject string that matched the entire pattern, pcre2test should in the subject string that matched the entire pattern, pcre2test should in
addition output the remainder of the subject string. This is useful for addition output the remainder of the subject string. This is useful for
tests where the subject contains multiple copies of the same substring. tests where the subject contains multiple copies of the same substring.
The allaftertext modifier requests the same action for captured sub- The allaftertext modifier requests the same action for captured sub-
strings as well as the main matched substring. In each case the remain- strings as well as the main matched substring. In each case the remain-
der is output on the following line with a plus character following the der is output on the following line with a plus character following the
capture number. capture number.
The allusedtext modifier requests that all the text that was consulted The allusedtext modifier requests that all the text that was consulted
during a successful pattern match by the interpreter should be shown. during a successful pattern match by the interpreter should be shown.
This feature is not supported for JIT matching, and if requested with This feature is not supported for JIT matching, and if requested with
JIT it is ignored (with a warning message). Setting this modifier JIT it is ignored (with a warning message). Setting this modifier
affects the output if there is a lookbehind at the start of a match, or affects the output if there is a lookbehind at the start of a match, or
a lookahead at the end, or if \K is used in the pattern. Characters a lookahead at the end, or if \K is used in the pattern. Characters
that precede or follow the start and end of the actual match are indi- that precede or follow the start and end of the actual match are indi-
cated in the output by '<' or '>' characters underneath them. Here is cated in the output by '<' or '>' characters underneath them. Here is
an example: an example:
re> /(?<=pqr)abc(?=xyz)/ re> /(?<=pqr)abc(?=xyz)/
@ -1039,16 +1050,16 @@ SUBJECT MODIFIERS
0: pqrabcxyz 0: pqrabcxyz
<<< >>> <<< >>>
This shows that the matched string is "abc", with the preceding and This shows that the matched string is "abc", with the preceding and
following strings "pqr" and "xyz" having been consulted during the following strings "pqr" and "xyz" having been consulted during the
match (when processing the assertions). match (when processing the assertions).
The startchar modifier requests that the starting character for the The startchar modifier requests that the starting character for the
match be indicated, if it is different to the start of the matched match be indicated, if it is different to the start of the matched
string. The only time when this occurs is when \K has been processed as string. The only time when this occurs is when \K has been processed as
part of the match. In this situation, the output for the matched string part of the match. In this situation, the output for the matched string
is displayed from the starting character instead of from the match is displayed from the starting character instead of from the match
point, with circumflex characters under the earlier characters. For point, with circumflex characters under the earlier characters. For
example: example:
re> /abc\Kxyz/ re> /abc\Kxyz/
@ -1056,7 +1067,7 @@ SUBJECT MODIFIERS
0: abcxyz 0: abcxyz
^^^ ^^^
Unlike allusedtext, the startchar modifier can be used with JIT. How- Unlike allusedtext, the startchar modifier can be used with JIT. How-
ever, these two modifiers are mutually exclusive. ever, these two modifiers are mutually exclusive.
Showing the value of all capture groups Showing the value of all capture groups
@ -1064,98 +1075,98 @@ SUBJECT MODIFIERS
The allcaptures modifier requests that the values of all potential cap- The allcaptures modifier requests that the values of all potential cap-
tured parentheses be output after a match. By default, only those up to tured parentheses be output after a match. By default, only those up to
the highest one actually used in the match are output (corresponding to the highest one actually used in the match are output (corresponding to
the return code from pcre2_match()). Groups that did not take part in the return code from pcre2_match()). Groups that did not take part in
the match are output as "<unset>". This modifier is not relevant for the match are output as "<unset>". This modifier is not relevant for
DFA matching (which does no capturing); it is ignored, with a warning DFA matching (which does no capturing); it is ignored, with a warning
message, if present. message, if present.
Testing callouts Testing callouts
A callout function is supplied when pcre2test calls the library match- A callout function is supplied when pcre2test calls the library match-
ing functions, unless callout_none is specified. If callout_capture is ing functions, unless callout_none is specified. If callout_capture is
set, the current captured groups are output when a callout occurs. The set, the current captured groups are output when a callout occurs. The
default return from the callout function is zero, which allows matching default return from the callout function is zero, which allows matching
to continue. to continue.
The callout_fail modifier can be given one or two numbers. If there is The callout_fail modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 (causing matching to back- only one number, 1 is returned instead of 0 (causing matching to back-
track) when a callout of that number is reached. If two numbers track) when a callout of that number is reached. If two numbers
(<n>:<m>) are given, 1 is returned when callout <n> is reached and (<n>:<m>) are given, 1 is returned when callout <n> is reached and
there have been at least <m> callouts. The callout_error modifier is there have been at least <m> callouts. The callout_error modifier is
similar, except that PCRE2_ERROR_CALLOUT is returned, causing the similar, except that PCRE2_ERROR_CALLOUT is returned, causing the
entire matching process to be aborted. If both these modifiers are set entire matching process to be aborted. If both these modifiers are set
for the same callout number, callout_error takes precedence. for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number Note that callouts with string arguments are always given the number
zero. See "Callouts" below for a description of the output when a call- zero. See "Callouts" below for a description of the output when a call-
out it taken. out it taken.
The callout_data modifier can be given an unsigned or a negative num- The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout value other than zero is used as a return from pcre2test's callout
function. function.
Finding all matches in a string Finding all matches in a string
Searching for all possible matches within a subject can be requested by Searching for all possible matches within a subject can be requested by
the global or altglobal modifier. After finding a match, the matching the global or altglobal modifier. After finding a match, the matching
function is called again to search the remainder of the subject. The function is called again to search the remainder of the subject. The
difference between global and altglobal is that the former uses the difference between global and altglobal is that the former uses the
start_offset argument to pcre2_match() or pcre2_dfa_match() to start start_offset argument to pcre2_match() or pcre2_dfa_match() to start
searching at a new point within the entire string (which is what Perl searching at a new point within the entire string (which is what Perl
does), whereas the latter passes over a shortened subject. This makes a does), whereas the latter passes over a shortened subject. This makes a
difference to the matching process if the pattern begins with a lookbe- difference to the matching process if the pattern begins with a lookbe-
hind assertion (including \b or \B). hind assertion (including \b or \B).
If an empty string is matched, the next match is done with the If an empty string is matched, the next match is done with the
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
for another, non-empty, match at the same point in the subject. If this for another, non-empty, match at the same point in the subject. If this
match fails, the start offset is advanced, and the normal match is match fails, the start offset is advanced, and the normal match is
retried. This imitates the way Perl handles such cases when using the retried. This imitates the way Perl handles such cases when using the
/g modifier or the split() function. Normally, the start offset is /g modifier or the split() function. Normally, the start offset is
advanced by one character, but if the newline convention recognizes advanced by one character, but if the newline convention recognizes
CRLF as a newline, and the current character is CR followed by LF, an CRLF as a newline, and the current character is CR followed by LF, an
advance of two characters occurs. advance of two characters occurs.
Testing substring extraction functions Testing substring extraction functions
The copy and get modifiers can be used to test the pcre2_sub- The copy and get modifiers can be used to test the pcre2_sub-
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
given more than once, and each can specify a group name or number, for given more than once, and each can specify a group name or number, for
example: example:
abcd\=copy=1,copy=3,get=G1 abcd\=copy=1,copy=3,get=G1
If the #subject command is used to set default copy and/or get lists, If the #subject command is used to set default copy and/or get lists,
these can be unset by specifying a negative number to cancel all num- these can be unset by specifying a negative number to cancel all num-
bered groups and an empty name to cancel all named groups. bered groups and an empty name to cancel all named groups.
The getall modifier tests pcre2_substring_list_get(), which extracts The getall modifier tests pcre2_substring_list_get(), which extracts
all captured substrings. all captured substrings.
If the subject line is successfully matched, the substrings extracted If the subject line is successfully matched, the substrings extracted
by the convenience functions are output with C, G, or L after the by the convenience functions are output with C, G, or L after the
string number instead of a colon. This is in addition to the normal string number instead of a colon. This is in addition to the normal
full list. The string length (that is, the return from the extraction full list. The string length (that is, the return from the extraction
function) is given in parentheses after each substring, followed by the function) is given in parentheses after each substring, followed by the
name when the extraction was by name. name when the extraction was by name.
Testing the substitution function Testing the substitution function
If the replace modifier is set, the pcre2_substitute() function is If the replace modifier is set, the pcre2_substitute() function is
called instead of one of the matching functions. Note that replacement called instead of one of the matching functions. Note that replacement
strings cannot contain commas, because a comma signifies the end of a strings cannot contain commas, because a comma signifies the end of a
modifier. This is not thought to be an issue in a test program. modifier. This is not thought to be an issue in a test program.
Unlike subject strings, pcre2test does not process replacement strings Unlike subject strings, pcre2test does not process replacement strings
for escape sequences. In UTF mode, a replacement string is checked to for escape sequences. In UTF mode, a replacement string is checked to
see if it is a valid UTF-8 string. If so, it is correctly converted to see if it is a valid UTF-8 string. If so, it is correctly converted to
a UTF string of the appropriate code unit width. If it is not a valid a UTF string of the appropriate code unit width. If it is not a valid
UTF-8 string, the individual code units are copied directly. This pro- UTF-8 string, the individual code units are copied directly. This pro-
vides a means of passing an invalid UTF-8 string for testing purposes. vides a means of passing an invalid UTF-8 string for testing purposes.
The following modifiers set options (in additional to the normal match The following modifiers set options (in additional to the normal match
options) for pcre2_substitute(): options) for pcre2_substitute():
global PCRE2_SUBSTITUTE_GLOBAL global PCRE2_SUBSTITUTE_GLOBAL
@ -1165,8 +1176,8 @@ SUBJECT MODIFIERS
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
After a successful substitution, the modified string is output, pre- After a successful substitution, the modified string is output, pre-
ceded by the number of replacements. This may be zero if there were no ceded by the number of replacements. This may be zero if there were no
matches. Here is a simple example of a substitution test: matches. Here is a simple example of a substitution test:
/abc/replace=xxx /abc/replace=xxx
@ -1175,12 +1186,12 @@ SUBJECT MODIFIERS
=abc=abc=\=global =abc=abc=\=global
2: =xxx=xxx= 2: =xxx=xxx=
Subject and replacement strings should be kept relatively short (fewer Subject and replacement strings should be kept relatively short (fewer
than 256 characters) for substitution tests, as fixed-size buffers are than 256 characters) for substitution tests, as fixed-size buffers are
used. To make it easy to test for buffer overflow, if the replacement used. To make it easy to test for buffer overflow, if the replacement
string starts with a number in square brackets, that number is passed string starts with a number in square brackets, that number is passed
to pcre2_substitute() as the size of the output buffer, with the to pcre2_substitute() as the size of the output buffer, with the
replacement string starting at the next character. Here is an example replacement string starting at the next character. Here is an example
that tests the edge case: that tests the edge case:
/abc/ /abc/
@ -1189,11 +1200,11 @@ SUBJECT MODIFIERS
123abc123\=replace=[9]XYZ 123abc123\=replace=[9]XYZ
Failed: error -47: no more memory Failed: error -47: no more memory
The default action of pcre2_substitute() is to return The default action of pcre2_substitute() is to return
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub- the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
stitute_overflow_length modifier), pcre2_substitute() continues to go stitute_overflow_length modifier), pcre2_substitute() continues to go
through the motions of matching and substituting, in order to compute through the motions of matching and substituting, in order to compute
the size of buffer that is required. When this happens, pcre2test shows the size of buffer that is required. When this happens, pcre2test shows
the required buffer length (which includes space for the trailing zero) the required buffer length (which includes space for the trailing zero)
as part of the error message. For example: as part of the error message. For example:
@ -1203,151 +1214,151 @@ SUBJECT MODIFIERS
Failed: error -47: no more memory: 10 code units are needed Failed: error -47: no more memory: 10 code units are needed
A replacement string is ignored with POSIX and DFA matching. Specifying A replacement string is ignored with POSIX and DFA matching. Specifying
partial matching provokes an error return ("bad option value") from partial matching provokes an error return ("bad option value") from
pcre2_substitute(). pcre2_substitute().
Setting the JIT stack size Setting the JIT stack size
The jitstack modifier provides a way of setting the maximum stack size The jitstack modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if that is used by the just-in-time optimization code. It is ignored if
JIT optimization is not being used. The value is a number of kilobytes. JIT optimization is not being used. The value is a number of kilobytes.
Providing a stack that is larger than the default 32K is necessary only Providing a stack that is larger than the default 32K is necessary only
for very complicated patterns. for very complicated patterns.
Setting heap, match, and depth limits Setting heap, match, and depth limits
The heap_limit, match_limit, and depth_limit modifiers set the appro- The heap_limit, match_limit, and depth_limit modifiers set the appro-
priate limits in the match context. These values are ignored when the priate limits in the match context. These values are ignored when the
find_limits modifier is specified. find_limits modifier is specified.
Finding minimum limits Finding minimum limits
If the find_limits modifier is present on a subject line, pcre2test If the find_limits modifier is present on a subject line, pcre2test
calls the relevant matching function several times, setting different calls the relevant matching function several times, setting different
values in the match context via pcre2_set_heap_limit(), values in the match context via pcre2_set_heap_limit(),
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
minimum values for each parameter that allows the match to complete minimum values for each parameter that allows the match to complete
without error. without error.
If JIT is being used, only the match limit is relevant. If DFA matching If JIT is being used, only the match limit is relevant. If DFA matching
is being used, only the depth limit is relevant. is being used, only the depth limit is relevant.
The match_limit number is a measure of the amount of backtracking that The match_limit number is a measure of the amount of backtracking that
takes place, and learning the minimum value can be instructive. For takes place, and learning the minimum value can be instructive. For
most simple matches, the number is quite small, but for patterns with most simple matches, the number is quite small, but for patterns with
very large numbers of matching possibilities, it can become large very very large numbers of matching possibilities, it can become large very
quickly with increasing length of subject string. quickly with increasing length of subject string.
For non-DFA matching, the minimum depth_limit number is a measure of For non-DFA matching, the minimum depth_limit number is a measure of
how much nested backtracking happens (that is, how deeply the pattern's how much nested backtracking happens (that is, how deeply the pattern's
tree is searched). In the case of DFA matching, depth_limit controls tree is searched). In the case of DFA matching, depth_limit controls
the depth of recursive calls of the internal function that is used for the depth of recursive calls of the internal function that is used for
handling pattern recursion, lookaround assertions, and atomic groups. handling pattern recursion, lookaround assertions, and atomic groups.
Showing MARK names Showing MARK names
The mark modifier causes the names from backtracking control verbs that The mark modifier causes the names from backtracking control verbs that
are returned from calls to pcre2_match() to be displayed. If a mark is are returned from calls to pcre2_match() to be displayed. If a mark is
returned for a match, non-match, or partial match, pcre2test shows it. returned for a match, non-match, or partial match, pcre2test shows it.
For a match, it is on a line by itself, tagged with "MK:". Otherwise, For a match, it is on a line by itself, tagged with "MK:". Otherwise,
it is added to the non-match message. it is added to the non-match message.
Showing memory usage Showing memory usage
The memory modifier causes pcre2test to log the sizes of all heap mem- The memory modifier causes pcre2test to log the sizes of all heap mem-
ory allocation and freeing calls that occur during a call to ory allocation and freeing calls that occur during a call to
pcre2_match(). These occur only when a match requires a bigger vector pcre2_match(). These occur only when a match requires a bigger vector
than the default for remembering backtracking points. In many cases than the default for remembering backtracking points. In many cases
there will be no heap memory used and therefore no additional output. there will be no heap memory used and therefore no additional output.
No heap memory is allocated during matching with pcre2_dfa_match or No heap memory is allocated during matching with pcre2_dfa_match or
with JIT, so in those cases the memory modifier never has any effect. with JIT, so in those cases the memory modifier never has any effect.
For this modifier to work, the null_context modifier must not be set on For this modifier to work, the null_context modifier must not be set on
both the pattern and the subject, though it can be set on one or the both the pattern and the subject, though it can be set on one or the
other. other.
Setting a starting offset Setting a starting offset
The offset modifier sets an offset in the subject string at which The offset modifier sets an offset in the subject string at which
matching starts. Its value is a number of code units, not characters. matching starts. Its value is a number of code units, not characters.
Setting an offset limit Setting an offset limit
The offset_limit modifier sets a limit for unanchored matches. If a The offset_limit modifier sets a limit for unanchored matches. If a
match cannot be found starting at or before this offset in the subject, match cannot be found starting at or before this offset in the subject,
a "no match" return is given. The data value is a number of code units, a "no match" return is given. The data value is a number of code units,
not characters. When this modifier is used, the use_offset_limit modi- not characters. When this modifier is used, the use_offset_limit modi-
fier must have been set for the pattern; if not, an error is generated. fier must have been set for the pattern; if not, an error is generated.
Setting the size of the output vector Setting the size of the output vector
The ovector modifier applies only to the subject line in which it The ovector modifier applies only to the subject line in which it
appears, though of course it can also be used to set a default in a appears, though of course it can also be used to set a default in a
#subject command. It specifies the number of pairs of offsets that are #subject command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15. available for storing matching information. The default is 15.
A value of zero is useful when testing the POSIX API because it causes A value of zero is useful when testing the POSIX API because it causes
regexec() to be called with a NULL capture vector. When not testing the regexec() to be called with a NULL capture vector. When not testing the
POSIX API, a value of zero is used to cause pcre2_match_data_cre- POSIX API, a value of zero is used to cause pcre2_match_data_cre-
ate_from_pattern() to be called, in order to create a match block of ate_from_pattern() to be called, in order to create a match block of
exactly the right size for the pattern. (It is not possible to create a exactly the right size for the pattern. (It is not possible to create a
match block with a zero-length ovector; there is always at least one match block with a zero-length ovector; there is always at least one
pair of offsets.) pair of offsets.)
Passing the subject as zero-terminated Passing the subject as zero-terminated
By default, the subject string is passed to a native API matching func- By default, the subject string is passed to a native API matching func-
tion with its correct length. In order to test the facility for passing tion with its correct length. In order to test the facility for passing
a zero-terminated string, the zero_terminate modifier is provided. It a zero-terminated string, the zero_terminate modifier is provided. It
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
via the POSIX interface, this modifier has no effect, as there is no via the POSIX interface, this modifier has no effect, as there is no
facility for passing a length.) facility for passing a length.)
When testing pcre2_substitute(), this modifier also has the effect of When testing pcre2_substitute(), this modifier also has the effect of
passing the replacement string as zero-terminated. passing the replacement string as zero-terminated.
Passing a NULL context Passing a NULL context
Normally, pcre2test passes a context block to pcre2_match(), Normally, pcre2test passes a context block to pcre2_match(),
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
set, however, NULL is passed. This is for testing that the matching set, however, NULL is passed. This is for testing that the matching
functions behave correctly in this case (they use default values). This functions behave correctly in this case (they use default values). This
modifier cannot be used with the find_limits modifier or when testing modifier cannot be used with the find_limits modifier or when testing
the substitution function. the substitution function.
THE ALTERNATIVE MATCHING FUNCTION THE ALTERNATIVE MATCHING FUNCTION
By default, pcre2test uses the standard PCRE2 matching function, By default, pcre2test uses the standard PCRE2 matching function,
pcre2_match() to match each subject line. PCRE2 also supports an alter- pcre2_match() to match each subject line. PCRE2 also supports an alter-
native matching function, pcre2_dfa_match(), which operates in a dif- native matching function, pcre2_dfa_match(), which operates in a dif-
ferent way, and has some restrictions. The differences between the two ferent way, and has some restrictions. The differences between the two
functions are described in the pcre2matching documentation. functions are described in the pcre2matching documentation.
If the dfa modifier is set, the alternative matching function is used. If the dfa modifier is set, the alternative matching function is used.
This function finds all possible matches at a given point in the sub- This function finds all possible matches at a given point in the sub-
ject. If, however, the dfa_shortest modifier is set, processing stops ject. If, however, the dfa_shortest modifier is set, processing stops
after the first match is found. This is always the shortest possible after the first match is found. This is always the shortest possible
match. match.
DEFAULT OUTPUT FROM pcre2test DEFAULT OUTPUT FROM pcre2test
This section describes the output when the normal matching function, This section describes the output when the normal matching function,
pcre2_match(), is being used. pcre2_match(), is being used.
When a match succeeds, pcre2test outputs the list of captured sub- When a match succeeds, pcre2test outputs the list of captured sub-
strings, starting with number 0 for the string that matched the whole strings, starting with number 0 for the string that matched the whole
pattern. Otherwise, it outputs "No match" when the return is pattern. Otherwise, it outputs "No match" when the return is
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
this is the entire substring that was inspected during the partial this is the entire substring that was inspected during the partial
match; it may include characters before the actual match start if a match; it may include characters before the actual match start if a
lookbehind assertion, \K, \b, or \B was involved.) lookbehind assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error number For any other return, pcre2test outputs the PCRE2 negative error number
and a short descriptive phrase. If the error is a failed UTF string and a short descriptive phrase. If the error is a failed UTF string
check, the code unit offset of the start of the failing character is check, the code unit offset of the start of the failing character is
also output. Here is an example of an interactive pcre2test run. also output. Here is an example of an interactive pcre2test run.
$ pcre2test $ pcre2test
@ -1363,8 +1374,8 @@ DEFAULT OUTPUT FROM pcre2test
Unset capturing substrings that are not followed by one that is set are Unset capturing substrings that are not followed by one that is set are
not shown by pcre2test unless the allcaptures modifier is specified. In not shown by pcre2test unless the allcaptures modifier is specified. In
the following example, there are two capturing substrings, but when the the following example, there are two capturing substrings, but when the
first data line is matched, the second, unset substring is not shown. first data line is matched, the second, unset substring is not shown.
An "internal" unset substring is shown as "<unset>", as for the second An "internal" unset substring is shown as "<unset>", as for the second
data line. data line.
re> /(a)|(b)/ re> /(a)|(b)/
@ -1376,11 +1387,11 @@ DEFAULT OUTPUT FROM pcre2test
1: <unset> 1: <unset>
2: b 2: b
If the strings contain any non-printing characters, they are output as If the strings contain any non-printing characters, they are output as
\xhh escapes if the value is less than 256 and UTF mode is not set. \xhh escapes if the value is less than 256 and UTF mode is not set.
Otherwise they are output as \x{hh...} escapes. See below for the defi- Otherwise they are output as \x{hh...} escapes. See below for the defi-
nition of non-printing characters. If the aftertext modifier is set, nition of non-printing characters. If the aftertext modifier is set,
the output for substring 0 is followed by the the rest of the subject the output for substring 0 is followed by the the rest of the subject
string, identified by "0+" like this: string, identified by "0+" like this:
re> /cat/aftertext re> /cat/aftertext
@ -1388,7 +1399,7 @@ DEFAULT OUTPUT FROM pcre2test
0: cat 0: cat
0+ aract 0+ aract
If global matching is requested, the results of successive matching If global matching is requested, the results of successive matching
attempts are output in sequence, like this: attempts are output in sequence, like this:
re> /\Bi(\w\w)/g re> /\Bi(\w\w)/g
@ -1400,8 +1411,8 @@ DEFAULT OUTPUT FROM pcre2test
0: ipp 0: ipp
1: pp 1: pp
"No match" is output only if the first match attempt fails. Here is an "No match" is output only if the first match attempt fails. Here is an
example of a failure message (the offset 4 that is specified by the example of a failure message (the offset 4 that is specified by the
offset modifier is past the end of the subject string): offset modifier is past the end of the subject string):
re> /xyz/ re> /xyz/
@ -1409,7 +1420,7 @@ DEFAULT OUTPUT FROM pcre2test
Error -24 (bad offset value) Error -24 (bad offset value)
Note that whereas patterns can be continued over several lines (a plain Note that whereas patterns can be continued over several lines (a plain
">" prompt is used for continuations), subject lines may not. However ">" prompt is used for continuations), subject lines may not. However
newlines can be included in a subject by means of the \n escape (or \r, newlines can be included in a subject by means of the \n escape (or \r,
\r\n, etc., depending on the newline sequence setting). \r\n, etc., depending on the newline sequence setting).
@ -1417,7 +1428,7 @@ DEFAULT OUTPUT FROM pcre2test
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
When the alternative matching function, pcre2_dfa_match(), is used, the When the alternative matching function, pcre2_dfa_match(), is used, the
output consists of a list of all the matches that start at the first output consists of a list of all the matches that start at the first
point in the subject where there is at least one match. For example: point in the subject where there is at least one match. For example:
re> /(tang|tangerine|tan)/ re> /(tang|tangerine|tan)/
@ -1426,11 +1437,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tang 1: tang
2: tan 2: tan
Using the normal matching function on this data finds only "tang". The Using the normal matching function on this data finds only "tang". The
longest matching string is always given first (and numbered zero). longest matching string is always given first (and numbered zero).
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
followed by the partially matching substring. Note that this is the followed by the partially matching substring. Note that this is the
entire substring that was inspected during the partial match; it may entire substring that was inspected during the partial match; it may
include characters before the actual match start if a lookbehind asser- include characters before the actual match start if a lookbehind asser-
tion, \b, or \B was involved. (\K is not supported for DFA matching.) tion, \b, or \B was involved. (\K is not supported for DFA matching.)
@ -1446,16 +1457,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tan 1: tan
0: tan 0: tan
The alternative matching function does not support substring capture, The alternative matching function does not support substring capture,
so the modifiers that are concerned with captured substrings are not so the modifiers that are concerned with captured substrings are not
relevant. relevant.
RESTARTING AFTER A PARTIAL MATCH RESTARTING AFTER A PARTIAL MATCH
When the alternative matching function has given the PCRE2_ERROR_PAR- When the alternative matching function has given the PCRE2_ERROR_PAR-
TIAL return, indicating that the subject partially matched the pattern, TIAL return, indicating that the subject partially matched the pattern,
you can restart the match with additional subject data by means of the you can restart the match with additional subject data by means of the
dfa_restart modifier. For example: dfa_restart modifier. For example:
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@ -1464,45 +1475,45 @@ RESTARTING AFTER A PARTIAL MATCH
data> n05\=dfa,dfa_restart data> n05\=dfa,dfa_restart
0: n05 0: n05
For further information about partial matching, see the pcre2partial For further information about partial matching, see the pcre2partial
documentation. documentation.
CALLOUTS CALLOUTS
If the pattern contains any callout requests, pcre2test's callout func- If the pattern contains any callout requests, pcre2test's callout func-
tion is called during matching unless callout_none is specified. This tion is called during matching unless callout_none is specified. This
works with both matching functions. works with both matching functions.
The callout function in pcre2test returns zero (carry on matching) by The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line (as default, but you can use a callout_fail modifier in a subject line (as
described above) to change this and other parameters of the callout. described above) to change this and other parameters of the callout.
Inserting callouts can be helpful when using pcre2test to check compli- Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see cated regular expressions. For further information about callouts, see
the pcre2callout documentation. the pcre2callout documentation.
The output for callouts with numerical arguments and those with string The output for callouts with numerical arguments and those with string
arguments is slightly different. arguments is slightly different.
Callouts with numerical arguments Callouts with numerical arguments
By default, the callout function displays the callout number, the start By default, the callout function displays the callout number, the start
and current positions in the subject text at the callout time, and the and current positions in the subject text at the callout time, and the
next pattern item to be tested. For example: next pattern item to be tested. For example:
--->pqrabcdef --->pqrabcdef
0 ^ ^ \d 0 ^ ^ \d
This output indicates that callout number 0 occurred for a match This output indicates that callout number 0 occurred for a match
attempt starting at the fourth character of the subject string, when attempt starting at the fourth character of the subject string, when
the pointer was at the seventh character, and when the next pattern the pointer was at the seventh character, and when the next pattern
item was \d. Just one circumflex is output if the start and current item was \d. Just one circumflex is output if the start and current
positions are the same, or if the current position precedes the start positions are the same, or if the current position precedes the start
position, which can happen if the callout is in a lookbehind assertion. position, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the /auto_callout pattern modifier. In this case, instead a result of the /auto_callout pattern modifier. In this case, instead
of showing the callout number, the offset in the pattern, preceded by a of showing the callout number, the offset in the pattern, preceded by a
plus, is output. For example: plus, is output. For example:
@ -1516,7 +1527,7 @@ CALLOUTS
0: E* 0: E*
If a pattern contains (*MARK) items, an additional line is output when- If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For ever a change of latest mark is passed to the callout function. For
example: example:
re> /a(*MARK:X)bc/auto_callout re> /a(*MARK:X)bc/auto_callout
@ -1530,17 +1541,17 @@ CALLOUTS
+12 ^ ^ +12 ^ ^
0: abc 0: abc
The mark changes between matching "a" and "b", but stays the same for The mark changes between matching "a" and "b", but stays the same for
the rest of the match, so nothing more is output. If, as a result of the rest of the match, so nothing more is output. If, as a result of
backtracking, the mark reverts to being unset, the text "<unset>" is backtracking, the mark reverts to being unset, the text "<unset>" is
output. output.
Callouts with string arguments Callouts with string arguments
The output for a callout with a string argument is similar, except that The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators, instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output the callout string and its offset in the pattern string are output
before the reflection of the subject string, and the subject string is before the reflection of the subject string, and the subject string is
reflected for each callout. For example: reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/ re> /^ab(?C'first')cd(?C"second")ef/
@ -1557,43 +1568,43 @@ CALLOUTS
NON-PRINTING CHARACTERS NON-PRINTING CHARACTERS
When pcre2test is outputting text in the compiled version of a pattern, When pcre2test is outputting text in the compiled version of a pattern,
bytes other than 32-126 are always treated as non-printing characters bytes other than 32-126 are always treated as non-printing characters
and are therefore shown as hex escapes. and are therefore shown as hex escapes.
When pcre2test is outputting text that is a matched part of a subject When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been string, it behaves in the same way, unless a different locale has been
set for the pattern (using the locale modifier). In this case, the set for the pattern (using the locale modifier). In this case, the
isprint() function is used to distinguish printing and non-printing isprint() function is used to distinguish printing and non-printing
characters. characters.
SAVING AND RESTORING COMPILED PATTERNS SAVING AND RESTORING COMPILED PATTERNS
It is possible to save compiled patterns on disc or elsewhere, and It is possible to save compiled patterns on disc or elsewhere, and
reload them later, subject to a number of restrictions. JIT data cannot reload them later, subject to a number of restrictions. JIT data cannot
be saved. The host on which the patterns are reloaded must be running be saved. The host on which the patterns are reloaded must be running
the same version of PCRE2, with the same code unit width, and must also the same version of PCRE2, with the same code unit width, and must also
have the same endianness, pointer width and PCRE2_SIZE type. Before have the same endianness, pointer width and PCRE2_SIZE type. Before
compiled patterns can be saved they must be serialized, that is, con- compiled patterns can be saved they must be serialized, that is, con-
verted to a stream of bytes. A single byte stream may contain any num- verted to a stream of bytes. A single byte stream may contain any num-
ber of compiled patterns, but they must all use the same character ber of compiled patterns, but they must all use the same character
tables. A single copy of the tables is included in the byte stream (its tables. A single copy of the tables is included in the byte stream (its
size is 1088 bytes). size is 1088 bytes).
The functions whose names begin with pcre2_serialize_ are used for The functions whose names begin with pcre2_serialize_ are used for
serializing and de-serializing. They are described in the pcre2serial- serializing and de-serializing. They are described in the pcre2serial-
ize documentation. In this section we describe the features of ize documentation. In this section we describe the features of
pcre2test that can be used to test these functions. pcre2test that can be used to test these functions.
When a pattern with push modifier is successfully compiled, it is When a pattern with push modifier is successfully compiled, it is
pushed onto a stack of compiled patterns, and pcre2test expects the pushed onto a stack of compiled patterns, and pcre2test expects the
next line to contain a new pattern (or command) instead of a subject next line to contain a new pattern (or command) instead of a subject
line. By contrast, the pushcopy modifier causes a copy of the compiled line. By contrast, the pushcopy modifier causes a copy of the compiled
pattern to be stacked, leaving the original available for immediate pattern to be stacked, leaving the original available for immediate
matching. By using push and/or pushcopy, a number of patterns can be matching. By using push and/or pushcopy, a number of patterns can be
compiled and retained. These modifiers are incompatible with posix, and compiled and retained. These modifiers are incompatible with posix, and
control modifiers that act at match time are ignored (with a message) control modifiers that act at match time are ignored (with a message)
for the stacked patterns. The jitverify modifier applies only at com- for the stacked patterns. The jitverify modifier applies only at com-
pile time. pile time.
The command The command
@ -1601,21 +1612,21 @@ SAVING AND RESTORING COMPILED PATTERNS
#save <filename> #save <filename>
causes all the stacked patterns to be serialized and the result written causes all the stacked patterns to be serialized and the result written
to the named file. Afterwards, all the stacked patterns are freed. The to the named file. Afterwards, all the stacked patterns are freed. The
command command
#load <filename> #load <filename>
reads the data in the file, and then arranges for it to be de-serial- reads the data in the file, and then arranges for it to be de-serial-
ized, with the resulting compiled patterns added to the pattern stack. ized, with the resulting compiled patterns added to the pattern stack.
The pattern on the top of the stack can be retrieved by the #pop com- The pattern on the top of the stack can be retrieved by the #pop com-
mand, which must be followed by lines of subjects that are to be mand, which must be followed by lines of subjects that are to be
matched with the pattern, terminated as usual by an empty line or end matched with the pattern, terminated as usual by an empty line or end
of file. This command may be followed by a modifier list containing of file. This command may be followed by a modifier list containing
only control modifiers that act after a pattern has been compiled. In only control modifiers that act after a pattern has been compiled. In
particular, hex, posix, posix_nosub, push, and pushcopy are not particular, hex, posix, posix_nosub, push, and pushcopy are not
allowed, nor are any option-setting modifiers. The JIT modifiers are, allowed, nor are any option-setting modifiers. The JIT modifiers are,
however permitted. Here is an example that saves and reloads two pat- however permitted. Here is an example that saves and reloads two pat-
terns. terns.
/abc/push /abc/push
@ -1628,10 +1639,10 @@ SAVING AND RESTORING COMPILED PATTERNS
#pop jit,bincode #pop jit,bincode
abc abc
If jitverify is used with #pop, it does not automatically imply jit, If jitverify is used with #pop, it does not automatically imply jit,
which is different behaviour from when it is used on a pattern. which is different behaviour from when it is used on a pattern.
The #popcopy command is analagous to the pushcopy modifier in that it The #popcopy command is analagous to the pushcopy modifier in that it
makes current a copy of the topmost stack pattern, leaving the original makes current a copy of the topmost stack pattern, leaving the original
still on the stack. still on the stack.
@ -1651,5 +1662,5 @@ AUTHOR
REVISION REVISION
Last updated: 01 June 2017 Last updated: 03 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.

View File

@ -231,10 +231,14 @@ PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION
regcomp(regex_t *preg, const char *pattern, int cflags) regcomp(regex_t *preg, const char *pattern, int cflags)
{ {
PCRE2_SIZE erroffset; PCRE2_SIZE erroffset;
PCRE2_SIZE patlen;
int errorcode; int errorcode;
int options = 0; int options = 0;
int re_nsub = 0; int re_nsub = 0;
patlen = ((cflags & REG_PEND) != 0)? (PCRE2_SIZE)(preg->re_endp - pattern) :
PCRE2_ZERO_TERMINATED;
if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS; if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS;
if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE; if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE;
if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL; if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL;
@ -243,8 +247,8 @@ if ((cflags & REG_UCP) != 0) options |= PCRE2_UCP;
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY; if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;
preg->re_cflags = cflags; preg->re_cflags = cflags;
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, PCRE2_ZERO_TERMINATED, preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, patlen, options,
options, &errorcode, &erroffset, NULL); &errorcode, &erroffset, NULL);
preg->re_erroffset = erroffset; preg->re_erroffset = erroffset;
if (preg->re_pcre2_code == NULL) if (preg->re_pcre2_code == NULL)

View File

@ -62,6 +62,7 @@ extern "C" {
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */ #define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */ #define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */
#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */ #define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */
#define REG_PEND 0x0800 /* GNU feature: pass end pattern by re_endp */
/* This is not used by PCRE2, but by defining it we make it easier /* This is not used by PCRE2, but by defining it we make it easier
to slot PCRE2 into existing programs that make POSIX calls. */ to slot PCRE2 into existing programs that make POSIX calls. */
@ -91,11 +92,13 @@ enum {
}; };
/* The structure representing a compiled regular expression. */ /* The structure representing a compiled regular expression. It is also used
for passing the pattern end pointer when REG_PEND is set. */
typedef struct { typedef struct {
void *re_pcre2_code; void *re_pcre2_code;
void *re_match_data; void *re_match_data;
const char *re_endp;
size_t re_nsub; size_t re_nsub;
size_t re_erroffset; size_t re_erroffset;
int re_cflags; int re_cflags;

View File

@ -538,7 +538,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
uint32_t control; /* Must be in same position as patctl */ uint32_t control; /* Must be in same position as patctl */
uint32_t control2; /* Must be in same position as patctl */ uint32_t control2; /* Must be in same position as patctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t startend[2]; uint32_t startend[2];
uint32_t cerror[2]; uint32_t cerror[2];
uint32_t cfail[2]; uint32_t cfail[2];
int32_t callout_data; int32_t callout_data;
@ -699,7 +699,8 @@ static modstruct modlist[] = {
#define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0) #define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0)
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \ #define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX|CTL_POSIX_NOSUB) CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_HEXPAT|CTL_POSIX| \
CTL_POSIX_NOSUB|CTL_USE_LENGTH)
#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0) #define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0)
@ -733,11 +734,9 @@ the first control word. Note that CTL_POSIX_NOSUB is always accompanied by
CTL_POSIX, so it doesn't need its own entries. */ CTL_POSIX, so it doesn't need its own entries. */
static uint32_t exclusive_pat_controls[] = { static uint32_t exclusive_pat_controls[] = {
CTL_POSIX | CTL_HEXPAT,
CTL_POSIX | CTL_PUSH, CTL_POSIX | CTL_PUSH,
CTL_POSIX | CTL_PUSHCOPY, CTL_POSIX | CTL_PUSHCOPY,
CTL_POSIX | CTL_PUSHTABLESCOPY, CTL_POSIX | CTL_PUSHTABLESCOPY,
CTL_POSIX | CTL_USE_LENGTH,
CTL_PUSH | CTL_PUSHCOPY, CTL_PUSH | CTL_PUSHCOPY,
CTL_PUSH | CTL_PUSHTABLESCOPY, CTL_PUSH | CTL_PUSHTABLESCOPY,
CTL_PUSHCOPY | CTL_PUSHTABLESCOPY, CTL_PUSHCOPY | CTL_PUSHTABLESCOPY,
@ -896,7 +895,7 @@ static PCRE2_SIZE malloclistlength[MALLOCLISTSIZE];
static uint32_t malloclistptr = 0; static uint32_t malloclistptr = 0;
#ifdef SUPPORT_PCRE2_8 #ifdef SUPPORT_PCRE2_8
static regex_t preg = { NULL, NULL, 0, 0, 0 }; static regex_t preg = { NULL, NULL, 0, 0, 0, 0 };
#endif #endif
static int *dfa_workspace = NULL; static int *dfa_workspace = NULL;
@ -5264,6 +5263,12 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL; if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY; if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
{
preg.re_endp = (char *)pbuffer8 + patlen;
cflags |= REG_PEND;
}
rc = regcomp(&preg, (char *)pbuffer8, cflags); rc = regcomp(&preg, (char *)pbuffer8, cflags);
/* Compiling failed */ /* Compiling failed */
@ -6665,10 +6670,10 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if (dat_datctl.startend[0] != CFORE_UNSET) if (dat_datctl.startend[0] != CFORE_UNSET)
{ {
pmatch[0].rm_so = dat_datctl.startend[0]; pmatch[0].rm_so = dat_datctl.startend[0];
pmatch[0].rm_eo = (dat_datctl.startend[1] != 0)? pmatch[0].rm_eo = (dat_datctl.startend[1] != 0)?
dat_datctl.startend[1] : len; dat_datctl.startend[1] : len;
eflags |= REG_STARTEND; eflags |= REG_STARTEND;
} }
if ((dat_datctl.options & PCRE2_NOTBOL) != 0) eflags |= REG_NOTBOL; if ((dat_datctl.options & PCRE2_NOTBOL) != 0) eflags |= REG_NOTBOL;
if ((dat_datctl.options & PCRE2_NOTEOL) != 0) eflags |= REG_NOTEOL; if ((dat_datctl.options & PCRE2_NOTEOL) != 0) eflags |= REG_NOTEOL;

View File

@ -123,4 +123,10 @@
/^a\x{00}b$/posix /^a\x{00}b$/posix
a\x{00}b\=posix_startend=0:3 a\x{00}b\=posix_startend=0:3
/"A" 00 "B"/hex
A\x{00}B\=posix_startend=0:3
/ABC/use_length
ABC
# End of testdata/testinput18 # End of testdata/testinput18

View File

@ -15,4 +15,7 @@
/\w/ucp /\w/ucp
+++\x{c2} +++\x{c2}
/"^AB" 00 "\x{1234}$"/hex,utf
AB\x{00}\x{1234}\=posix_startend=0:6
# End of testdata/testinput19 # End of testdata/testinput19

View File

@ -191,4 +191,12 @@ No match: POSIX code 17: match failed
a\x{00}b\=posix_startend=0:3 a\x{00}b\=posix_startend=0:3
0: a\x00b 0: a\x00b
/"A" 00 "B"/hex
A\x{00}B\=posix_startend=0:3
0: A\x00B
/ABC/use_length
ABC
0: ABC
# End of testdata/testinput18 # End of testdata/testinput18

View File

@ -18,4 +18,8 @@ No match: POSIX code 17: match failed
+++\x{c2} +++\x{c2}
0: \xc2 0: \xc2
/"^AB" 00 "\x{1234}$"/hex,utf
AB\x{00}\x{1234}\=posix_startend=0:6
0: AB\x{00}\x{1234}
# End of testdata/testinput19 # End of testdata/testinput19