Implement REG_PEND (GNU extension) for the POSIX wrapper.
This commit is contained in:
parent
f850015168
commit
bcba497c0b
|
@ -182,6 +182,8 @@ deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
|
|||
38. Fix returned offsets from regexec() when REG_STARTEND is used with a
|
||||
starting offset greater than zero.
|
||||
|
||||
39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
|
||||
|
||||
|
||||
Version 10.23 14-February-2017
|
||||
------------------------------
|
||||
|
|
|
@ -69,7 +69,7 @@ replacement library. Other POSIX options are not even defined.
|
|||
<P>
|
||||
There are also some options that are not defined by POSIX. These have been
|
||||
added at the request of users who want to make use of certain PCRE2-specific
|
||||
features via the POSIX calling interface.
|
||||
features via the POSIX calling interface or to add BSD or GNU functionality.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE2 is called via these functions, it is only the API that is POSIX-like
|
||||
|
@ -91,10 +91,11 @@ identifying error codes.
|
|||
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<P>
|
||||
The function <b>regcomp()</b> is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
|
||||
to a <b>regex_t</b> structure that is used as a base for storing information
|
||||
about the compiled regular expression.
|
||||
internal form. By default, the pattern is a C string terminated by a binary
|
||||
zero (but see REG_PEND below). The <i>preg</i> argument is a pointer to a
|
||||
<b>regex_t</b> structure that is used as a base for storing information about
|
||||
the compiled regular expression. (It is also used for input when REG_PEND is
|
||||
set.)
|
||||
</P>
|
||||
<P>
|
||||
The argument <i>cflags</i> is either zero, or contains one or more of the bits
|
||||
|
@ -124,6 +125,16 @@ matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
|
|||
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
||||
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
||||
because it disables the use of back references.
|
||||
<pre>
|
||||
REG_PEND
|
||||
</pre>
|
||||
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
|
||||
(which has the type const char *) must be set to point to the character beyond
|
||||
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
|
||||
now contain binary zeroes, which are treated as data characters. Without
|
||||
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
|
||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||
caution in software intended to be portable to other systems.
|
||||
<pre>
|
||||
REG_UCP
|
||||
</pre>
|
||||
|
@ -156,9 +167,10 @@ class such as [^a] (they are).
|
|||
</P>
|
||||
<P>
|
||||
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
|
||||
<i>preg</i> structure is filled in on success, and one member of the structure
|
||||
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
<i>preg</i> structure is filled in on success, and one other member of the
|
||||
structure (as well as <i>re_endp</i>) is public: <i>re_nsub</i> contains the
|
||||
number of capturing subpatterns in the regular expression. Various error codes
|
||||
are defined in the header file.
|
||||
</P>
|
||||
<P>
|
||||
NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
|
||||
|
@ -228,15 +240,26 @@ function.
|
|||
<pre>
|
||||
REG_STARTEND
|
||||
</pre>
|
||||
The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and
|
||||
to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i>
|
||||
(there need not actually be a NUL at that location), regardless of the value of
|
||||
<i>nmatch</i>. This is a BSD extension, compatible with but not specified by
|
||||
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
|
||||
intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does
|
||||
not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
|
||||
how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL are
|
||||
mutually exclusive; the error REG_INVARG is returned.
|
||||
When this option is set, the subject string is starts at <i>string</i> +
|
||||
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
|
||||
should point to the first character beyond the string. There may be binary
|
||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
||||
way to pass a subject string that contains a binary zero.
|
||||
</P>
|
||||
<P>
|
||||
Whatever the value of <i>pmatch[0].rm_so</i>, the offsets of the matched string
|
||||
and any captured substrings are still given relative to the start of
|
||||
<i>string</i> itself. (Before PCRE2 release 10.30 these were given relative to
|
||||
<i>string</i> + <i>pmatch[0].rm_so</i>, but this differs from other
|
||||
implementations.)
|
||||
</P>
|
||||
<P>
|
||||
This is a BSD extension, compatible with but not specified by IEEE Standard
|
||||
1003.2 (POSIX.2), and should be used with caution in software intended to be
|
||||
portable to other systems. Note that a non-zero <i>rm_so</i> does not imply
|
||||
REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
|
||||
not how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL
|
||||
are mutually exclusive; the error REG_INVARG is returned.
|
||||
</P>
|
||||
<P>
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||
|
@ -291,9 +314,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 31 January 2016
|
||||
Last updated: 05 June 2017
|
||||
<br>
|
||||
Copyright © 1997-2016 University of Cambridge.
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -1078,6 +1078,19 @@ are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
|
|||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
|
||||
The other modifiers are ignored, with a warning message.
|
||||
</P>
|
||||
<P>
|
||||
There is one additional modifier that can be used with the POSIX wrapper. It is
|
||||
ignored (with a warning) if used for non-POSIX matching.
|
||||
<pre>
|
||||
posix_startend=<n>[:<m>]
|
||||
</pre>
|
||||
This causes the subject string to be passed to <b>regexec()</b> using the
|
||||
REG_STARTEND option, which uses offsets to restrict which part of the string is
|
||||
searched. If only one number is given, the end offset is passed as the end of
|
||||
the subject string. For more detail of REG_STARTEND, see the
|
||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><b>
|
||||
Setting match controls
|
||||
</b><br>
|
||||
|
@ -1817,7 +1830,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 01 June 2017
|
||||
Last updated: 03 June 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
147
doc/pcre2.txt
147
doc/pcre2.txt
|
@ -8986,32 +8986,34 @@ DESCRIPTION
|
|||
|
||||
There are also some options that are not defined by POSIX. These have
|
||||
been added at the request of users who want to make use of certain
|
||||
PCRE2-specific features via the POSIX calling interface.
|
||||
PCRE2-specific features via the POSIX calling interface or to add BSD
|
||||
or GNU functionality.
|
||||
|
||||
When PCRE2 is called via these functions, it is only the API that is
|
||||
POSIX-like in style. The syntax and semantics of the regular expres-
|
||||
sions themselves are still those of Perl, subject to the setting of
|
||||
various PCRE2 options, as described below. "POSIX-like in style" means
|
||||
that the API approximates to the POSIX definition; it is not fully
|
||||
POSIX-compatible, and in multi-unit encoding domains it is probably
|
||||
When PCRE2 is called via these functions, it is only the API that is
|
||||
POSIX-like in style. The syntax and semantics of the regular expres-
|
||||
sions themselves are still those of Perl, subject to the setting of
|
||||
various PCRE2 options, as described below. "POSIX-like in style" means
|
||||
that the API approximates to the POSIX definition; it is not fully
|
||||
POSIX-compatible, and in multi-unit encoding domains it is probably
|
||||
even less compatible.
|
||||
|
||||
The header for these functions is supplied as pcre2posix.h to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be
|
||||
potential clash with other POSIX libraries. It can, of course, be
|
||||
renamed or aliased as regex.h, which is the "correct" name. It provides
|
||||
two structure types, regex_t for compiled internal forms, and reg-
|
||||
match_t for returning captured substrings. It also defines some con-
|
||||
stants whose names start with "REG_"; these are used for setting
|
||||
two structure types, regex_t for compiled internal forms, and reg-
|
||||
match_t for returning captured substrings. It also defines some con-
|
||||
stants whose names start with "REG_"; these are used for setting
|
||||
options and identifying error codes.
|
||||
|
||||
|
||||
COMPILING A PATTERN
|
||||
|
||||
The function regcomp() is called to compile a pattern into an internal
|
||||
form. The pattern is a C string terminated by a binary zero, and is
|
||||
passed in the argument pattern. The preg argument is a pointer to a
|
||||
regex_t structure that is used as a base for storing information about
|
||||
the compiled regular expression.
|
||||
The function regcomp() is called to compile a pattern into an internal
|
||||
form. By default, the pattern is a C string terminated by a binary zero
|
||||
(but see REG_PEND below). The preg argument is a pointer to a regex_t
|
||||
structure that is used as a base for storing information about the com-
|
||||
piled regular expression. (It is also used for input when REG_PEND is
|
||||
set.)
|
||||
|
||||
The argument cflags is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
|
@ -9042,38 +9044,50 @@ COMPILING A PATTERN
|
|||
used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no
|
||||
longer happens because it disables the use of back references.
|
||||
|
||||
REG_PEND
|
||||
|
||||
If this option is set, the reg_endp field in the preg structure (which
|
||||
has the type const char *) must be set to point to the character beyond
|
||||
the end of the pattern before calling regcomp(). The pattern itself may
|
||||
now contain binary zeroes, which are treated as data characters. With-
|
||||
out REG_PEND, a binary zero terminates the pattern and the re_endp
|
||||
field is ignored. This is a GNU extension to the POSIX standard and
|
||||
should be used with caution in software intended to be portable to
|
||||
other systems.
|
||||
|
||||
REG_UCP
|
||||
|
||||
The PCRE2_UCP option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes PCRE2 to use Unicode
|
||||
properties when matchine \d, \w, etc., instead of just recognizing
|
||||
The PCRE2_UCP option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes PCRE2 to use Unicode
|
||||
properties when matchine \d, \w, etc., instead of just recognizing
|
||||
ASCII values. Note that REG_UCP is not part of the POSIX standard.
|
||||
|
||||
REG_UNGREEDY
|
||||
|
||||
The PCRE2_UNGREEDY option is set when the regular expression is passed
|
||||
for compilation to the native function. Note that REG_UNGREEDY is not
|
||||
The PCRE2_UNGREEDY option is set when the regular expression is passed
|
||||
for compilation to the native function. Note that REG_UNGREEDY is not
|
||||
part of the POSIX standard.
|
||||
|
||||
REG_UTF
|
||||
|
||||
The PCRE2_UTF option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes the pattern itself and
|
||||
all data strings used for matching it to be treated as UTF-8 strings.
|
||||
The PCRE2_UTF option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes the pattern itself and
|
||||
all data strings used for matching it to be treated as UTF-8 strings.
|
||||
Note that REG_UTF is not part of the POSIX standard.
|
||||
|
||||
In the absence of these flags, no options are passed to the native
|
||||
function. This means the the regex is compiled with PCRE2 default
|
||||
semantics. In particular, the way it handles newline characters in the
|
||||
subject string is the Perl way, not the POSIX way. Note that setting
|
||||
In the absence of these flags, no options are passed to the native
|
||||
function. This means the the regex is compiled with PCRE2 default
|
||||
semantics. In particular, the way it handles newline characters in the
|
||||
subject string is the Perl way, not the POSIX way. Note that setting
|
||||
PCRE2_MULTILINE has only some of the effects specified for REG_NEWLINE.
|
||||
It does not affect the way newlines are matched by the dot metacharac-
|
||||
It does not affect the way newlines are matched by the dot metacharac-
|
||||
ter (they are not) or by a negative class such as [^a] (they are).
|
||||
|
||||
The yield of regcomp() is zero on success, and non-zero otherwise. The
|
||||
preg structure is filled in on success, and one member of the structure
|
||||
is public: re_nsub contains the number of capturing subpatterns in the
|
||||
regular expression. Various error codes are defined in the header file.
|
||||
The yield of regcomp() is zero on success, and non-zero otherwise. The
|
||||
preg structure is filled in on success, and one other member of the
|
||||
structure (as well as re_endp) is public: re_nsub contains the number
|
||||
of capturing subpatterns in the regular expression. Various error codes
|
||||
are defined in the header file.
|
||||
|
||||
NOTE: If the yield of regcomp() is non-zero, you must not attempt to
|
||||
use the contents of the preg structure. If, for example, you pass it to
|
||||
|
@ -9146,57 +9160,66 @@ MATCHING A PATTERN
|
|||
|
||||
REG_STARTEND
|
||||
|
||||
The string is considered to start at string + pmatch[0].rm_so and to
|
||||
have a terminating NUL located at string + pmatch[0].rm_eo (there need
|
||||
not actually be a NUL at that location), regardless of the value of
|
||||
nmatch. This is a BSD extension, compatible with but not specified by
|
||||
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in
|
||||
software intended to be portable to other systems. Note that a non-zero
|
||||
rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location
|
||||
of the string, not how it is matched. Setting REG_STARTEND and passing
|
||||
pmatch as NULL are mutually exclusive; the error REG_INVARG is
|
||||
When this option is set, the subject string is starts at string +
|
||||
pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
|
||||
point to the first character beyond the string. There may be binary
|
||||
zeroes within the subject string, and indeed, using REG_STARTEND is the
|
||||
only way to pass a subject string that contains a binary zero.
|
||||
|
||||
Whatever the value of pmatch[0].rm_so, the offsets of the matched
|
||||
string and any captured substrings are still given relative to the
|
||||
start of string itself. (Before PCRE2 release 10.30 these were given
|
||||
relative to string + pmatch[0].rm_so, but this differs from other
|
||||
implementations.)
|
||||
|
||||
This is a BSD extension, compatible with but not specified by IEEE
|
||||
Standard 1003.2 (POSIX.2), and should be used with caution in software
|
||||
intended to be portable to other systems. Note that a non-zero rm_so
|
||||
does not imply REG_NOTBOL; REG_STARTEND affects only the location and
|
||||
length of the string, not how it is matched. Setting REG_STARTEND and
|
||||
passing pmatch as NULL are mutually exclusive; the error REG_INVARG is
|
||||
returned.
|
||||
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any
|
||||
matched strings is returned. The nmatch and pmatch arguments of
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any
|
||||
matched strings is returned. The nmatch and pmatch arguments of
|
||||
regexec() are ignored (except possibly as input for REG_STARTEND).
|
||||
|
||||
The value of nmatch may be zero, and the value pmatch may be NULL
|
||||
(unless REG_STARTEND is set); in both these cases no data about any
|
||||
The value of nmatch may be zero, and the value pmatch may be NULL
|
||||
(unless REG_STARTEND is set); in both these cases no data about any
|
||||
matched strings is returned.
|
||||
|
||||
Otherwise, the portion of the string that was matched, and also any
|
||||
Otherwise, the portion of the string that was matched, and also any
|
||||
captured substrings, are returned via the pmatch argument, which points
|
||||
to an array of nmatch structures of type regmatch_t, containing the
|
||||
members rm_so and rm_eo. These contain the byte offset to the first
|
||||
to an array of nmatch structures of type regmatch_t, containing the
|
||||
members rm_so and rm_eo. These contain the byte offset to the first
|
||||
character of each substring and the offset to the first character after
|
||||
the end of each substring, respectively. The 0th element of the vector
|
||||
relates to the entire portion of string that was matched; subsequent
|
||||
the end of each substring, respectively. The 0th element of the vector
|
||||
relates to the entire portion of string that was matched; subsequent
|
||||
elements relate to the capturing subpatterns of the regular expression.
|
||||
Unused entries in the array have both structure members set to -1.
|
||||
|
||||
A successful match yields a zero return; various error codes are
|
||||
defined in the header file, of which REG_NOMATCH is the "expected"
|
||||
A successful match yields a zero return; various error codes are
|
||||
defined in the header file, of which REG_NOMATCH is the "expected"
|
||||
failure code.
|
||||
|
||||
|
||||
ERROR MESSAGES
|
||||
|
||||
The regerror() function maps a non-zero errorcode from either regcomp()
|
||||
or regexec() to a printable message. If preg is not NULL, the error
|
||||
or regexec() to a printable message. If preg is not NULL, the error
|
||||
should have arisen from the use of that structure. A message terminated
|
||||
by a binary zero is placed in errbuf. If the buffer is too short, only
|
||||
by a binary zero is placed in errbuf. If the buffer is too short, only
|
||||
the first errbuf_size - 1 characters of the error message are used. The
|
||||
yield of the function is the size of buffer needed to hold the whole
|
||||
message, including the terminating zero. This value is greater than
|
||||
yield of the function is the size of buffer needed to hold the whole
|
||||
message, including the terminating zero. This value is greater than
|
||||
errbuf_size if the message was truncated.
|
||||
|
||||
|
||||
MEMORY USAGE
|
||||
|
||||
Compiling a regular expression causes memory to be allocated and asso-
|
||||
ciated with the preg structure. The function regfree() frees all such
|
||||
memory, after which preg may no longer be used as a compiled expres-
|
||||
Compiling a regular expression causes memory to be allocated and asso-
|
||||
ciated with the preg structure. The function regfree() frees all such
|
||||
memory, after which preg may no longer be used as a compiled expres-
|
||||
sion.
|
||||
|
||||
|
||||
|
@ -9209,8 +9232,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 31 January 2016
|
||||
Copyright (c) 1997-2016 University of Cambridge.
|
||||
Last updated: 05 June 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2POSIX 3 "03 June 2017" "PCRE2 10.30"
|
||||
.TH PCRE2POSIX 3 "05 June 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "SYNOPSIS"
|
||||
|
@ -46,7 +46,7 @@ replacement library. Other POSIX options are not even defined.
|
|||
.P
|
||||
There are also some options that are not defined by POSIX. These have been
|
||||
added at the request of users who want to make use of certain PCRE2-specific
|
||||
features via the POSIX calling interface.
|
||||
features via the POSIX calling interface or to add BSD or GNU functionality.
|
||||
.P
|
||||
When PCRE2 is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
|
@ -68,10 +68,11 @@ identifying error codes.
|
|||
.rs
|
||||
.sp
|
||||
The function \fBregcomp()\fP is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer
|
||||
to a \fBregex_t\fP structure that is used as a base for storing information
|
||||
about the compiled regular expression.
|
||||
internal form. By default, the pattern is a C string terminated by a binary
|
||||
zero (but see REG_PEND below). The \fIpreg\fP argument is a pointer to a
|
||||
\fBregex_t\fP structure that is used as a base for storing information about
|
||||
the compiled regular expression. (It is also used for input when REG_PEND is
|
||||
set.)
|
||||
.P
|
||||
The argument \fIcflags\fP is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
|
@ -100,6 +101,16 @@ matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
|
|||
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
||||
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
||||
because it disables the use of back references.
|
||||
.sp
|
||||
REG_PEND
|
||||
.sp
|
||||
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
|
||||
(which has the type const char *) must be set to point to the character beyond
|
||||
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
|
||||
now contain binary zeroes, which are treated as data characters. Without
|
||||
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
|
||||
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||
caution in software intended to be portable to other systems.
|
||||
.sp
|
||||
REG_UCP
|
||||
.sp
|
||||
|
@ -130,9 +141,10 @@ newlines are matched by the dot metacharacter (they are not) or by a negative
|
|||
class such as [^a] (they are).
|
||||
.P
|
||||
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
|
||||
\fIpreg\fP structure is filled in on success, and one member of the structure
|
||||
is public: \fIre_nsub\fP contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
\fIpreg\fP structure is filled in on success, and one other member of the
|
||||
structure (as well as \fIre_endp\fP) is public: \fIre_nsub\fP contains the
|
||||
number of capturing subpatterns in the regular expression. Various error codes
|
||||
are defined in the header file.
|
||||
.P
|
||||
NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to
|
||||
use the contents of the \fIpreg\fP structure. If, for example, you pass it to
|
||||
|
@ -204,21 +216,24 @@ function.
|
|||
.sp
|
||||
REG_STARTEND
|
||||
.sp
|
||||
When this option is set, the string is considered to start at \fIstring\fP +
|
||||
\fIpmatch[0].rm_so\fP and to have a terminating NUL located at \fIstring\fP +
|
||||
\fIpmatch[0].rm_eo\fP (there need not actually be a NUL at that location),
|
||||
regardless of the value of \fInmatch\fP. However, the offsets of the matched
|
||||
string and any captured substrings are still given relative to the start of
|
||||
\fIstring\fP. (Before PCRE2 release 10.30 these were given relative to
|
||||
When this option is set, the subject string is starts at \fIstring\fP +
|
||||
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
|
||||
should point to the first character beyond the string. There may be binary
|
||||
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
||||
way to pass a subject string that contains a binary zero.
|
||||
.P
|
||||
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
|
||||
and any captured substrings are still given relative to the start of
|
||||
\fIstring\fP itself. (Before PCRE2 release 10.30 these were given relative to
|
||||
\fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other
|
||||
implementations.)
|
||||
.P
|
||||
This is a BSD extension, compatible with but not specified by IEEE Standard
|
||||
1003.2 (POSIX.2), and should be used with caution in software intended to be
|
||||
portable to other systems. Note that a non-zero \fIrm_so\fP does not imply
|
||||
REG_NOTBOL; REG_STARTEND affects only the location of the string, not how it is
|
||||
matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL are mutually
|
||||
exclusive; the error REG_INVARG is returned.
|
||||
REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
|
||||
not how it is matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL
|
||||
are mutually exclusive; the error REG_INVARG is returned.
|
||||
.P
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
|
||||
|
@ -277,6 +292,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 03 June 2017
|
||||
Last updated: 05 June 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -965,11 +965,22 @@ SUBJECT MODIFIERS
|
|||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
||||
The other modifiers are ignored, with a warning message.
|
||||
|
||||
There is one additional modifier that can be used with the POSIX wrap-
|
||||
per. It is ignored (with a warning) if used for non-POSIX matching.
|
||||
|
||||
posix_startend=<n>[:<m>]
|
||||
|
||||
This causes the subject string to be passed to regexec() using the
|
||||
REG_STARTEND option, which uses offsets to restrict which part of the
|
||||
string is searched. If only one number is given, the end offset is
|
||||
passed as the end of the subject string. For more detail of REG_STAR-
|
||||
TEND, see the pcre2posix documentation.
|
||||
|
||||
Setting match controls
|
||||
|
||||
The following modifiers affect the matching process or request addi-
|
||||
tional information. Some of them may also be specified on a pattern
|
||||
line (see above), in which case they apply to every subject line that
|
||||
The following modifiers affect the matching process or request addi-
|
||||
tional information. Some of them may also be specified on a pattern
|
||||
line (see above), in which case they apply to every subject line that
|
||||
is matched against that pattern.
|
||||
|
||||
aftertext show text after match
|
||||
|
@ -1009,29 +1020,29 @@ SUBJECT MODIFIERS
|
|||
zero_terminate pass the subject as zero-terminated
|
||||
|
||||
The effects of these modifiers are described in the following sections.
|
||||
When matching via the POSIX wrapper API, the aftertext, allaftertext,
|
||||
and ovector subject modifiers work as described below. All other modi-
|
||||
When matching via the POSIX wrapper API, the aftertext, allaftertext,
|
||||
and ovector subject modifiers work as described below. All other modi-
|
||||
fiers are either ignored, with a warning message, or cause an error.
|
||||
|
||||
Showing more text
|
||||
|
||||
The aftertext modifier requests that as well as outputting the part of
|
||||
The aftertext modifier requests that as well as outputting the part of
|
||||
the subject string that matched the entire pattern, pcre2test should in
|
||||
addition output the remainder of the subject string. This is useful for
|
||||
tests where the subject contains multiple copies of the same substring.
|
||||
The allaftertext modifier requests the same action for captured sub-
|
||||
The allaftertext modifier requests the same action for captured sub-
|
||||
strings as well as the main matched substring. In each case the remain-
|
||||
der is output on the following line with a plus character following the
|
||||
capture number.
|
||||
|
||||
The allusedtext modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown.
|
||||
This feature is not supported for JIT matching, and if requested with
|
||||
JIT it is ignored (with a warning message). Setting this modifier
|
||||
The allusedtext modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown.
|
||||
This feature is not supported for JIT matching, and if requested with
|
||||
JIT it is ignored (with a warning message). Setting this modifier
|
||||
affects the output if there is a lookbehind at the start of a match, or
|
||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||
that precede or follow the start and end of the actual match are indi-
|
||||
cated in the output by '<' or '>' characters underneath them. Here is
|
||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||
that precede or follow the start and end of the actual match are indi-
|
||||
cated in the output by '<' or '>' characters underneath them. Here is
|
||||
an example:
|
||||
|
||||
re> /(?<=pqr)abc(?=xyz)/
|
||||
|
@ -1039,16 +1050,16 @@ SUBJECT MODIFIERS
|
|||
0: pqrabcxyz
|
||||
<<< >>>
|
||||
|
||||
This shows that the matched string is "abc", with the preceding and
|
||||
following strings "pqr" and "xyz" having been consulted during the
|
||||
This shows that the matched string is "abc", with the preceding and
|
||||
following strings "pqr" and "xyz" having been consulted during the
|
||||
match (when processing the assertions).
|
||||
|
||||
The startchar modifier requests that the starting character for the
|
||||
match be indicated, if it is different to the start of the matched
|
||||
The startchar modifier requests that the starting character for the
|
||||
match be indicated, if it is different to the start of the matched
|
||||
string. The only time when this occurs is when \K has been processed as
|
||||
part of the match. In this situation, the output for the matched string
|
||||
is displayed from the starting character instead of from the match
|
||||
point, with circumflex characters under the earlier characters. For
|
||||
is displayed from the starting character instead of from the match
|
||||
point, with circumflex characters under the earlier characters. For
|
||||
example:
|
||||
|
||||
re> /abc\Kxyz/
|
||||
|
@ -1056,7 +1067,7 @@ SUBJECT MODIFIERS
|
|||
0: abcxyz
|
||||
^^^
|
||||
|
||||
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
||||
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
||||
ever, these two modifiers are mutually exclusive.
|
||||
|
||||
Showing the value of all capture groups
|
||||
|
@ -1064,98 +1075,98 @@ SUBJECT MODIFIERS
|
|||
The allcaptures modifier requests that the values of all potential cap-
|
||||
tured parentheses be output after a match. By default, only those up to
|
||||
the highest one actually used in the match are output (corresponding to
|
||||
the return code from pcre2_match()). Groups that did not take part in
|
||||
the match are output as "<unset>". This modifier is not relevant for
|
||||
DFA matching (which does no capturing); it is ignored, with a warning
|
||||
the return code from pcre2_match()). Groups that did not take part in
|
||||
the match are output as "<unset>". This modifier is not relevant for
|
||||
DFA matching (which does no capturing); it is ignored, with a warning
|
||||
message, if present.
|
||||
|
||||
Testing callouts
|
||||
|
||||
A callout function is supplied when pcre2test calls the library match-
|
||||
ing functions, unless callout_none is specified. If callout_capture is
|
||||
set, the current captured groups are output when a callout occurs. The
|
||||
A callout function is supplied when pcre2test calls the library match-
|
||||
ing functions, unless callout_none is specified. If callout_capture is
|
||||
set, the current captured groups are output when a callout occurs. The
|
||||
default return from the callout function is zero, which allows matching
|
||||
to continue.
|
||||
|
||||
The callout_fail modifier can be given one or two numbers. If there is
|
||||
only one number, 1 is returned instead of 0 (causing matching to back-
|
||||
track) when a callout of that number is reached. If two numbers
|
||||
(<n>:<m>) are given, 1 is returned when callout <n> is reached and
|
||||
there have been at least <m> callouts. The callout_error modifier is
|
||||
similar, except that PCRE2_ERROR_CALLOUT is returned, causing the
|
||||
entire matching process to be aborted. If both these modifiers are set
|
||||
The callout_fail modifier can be given one or two numbers. If there is
|
||||
only one number, 1 is returned instead of 0 (causing matching to back-
|
||||
track) when a callout of that number is reached. If two numbers
|
||||
(<n>:<m>) are given, 1 is returned when callout <n> is reached and
|
||||
there have been at least <m> callouts. The callout_error modifier is
|
||||
similar, except that PCRE2_ERROR_CALLOUT is returned, causing the
|
||||
entire matching process to be aborted. If both these modifiers are set
|
||||
for the same callout number, callout_error takes precedence.
|
||||
|
||||
Note that callouts with string arguments are always given the number
|
||||
Note that callouts with string arguments are always given the number
|
||||
zero. See "Callouts" below for a description of the output when a call-
|
||||
out it taken.
|
||||
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. This is set as the "user data" that is passed to the matching
|
||||
function, and passed back when the callout function is invoked. Any
|
||||
value other than zero is used as a return from pcre2test's callout
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. This is set as the "user data" that is passed to the matching
|
||||
function, and passed back when the callout function is invoked. Any
|
||||
value other than zero is used as a return from pcre2test's callout
|
||||
function.
|
||||
|
||||
Finding all matches in a string
|
||||
|
||||
Searching for all possible matches within a subject can be requested by
|
||||
the global or altglobal modifier. After finding a match, the matching
|
||||
function is called again to search the remainder of the subject. The
|
||||
difference between global and altglobal is that the former uses the
|
||||
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
||||
searching at a new point within the entire string (which is what Perl
|
||||
the global or altglobal modifier. After finding a match, the matching
|
||||
function is called again to search the remainder of the subject. The
|
||||
difference between global and altglobal is that the former uses the
|
||||
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
||||
searching at a new point within the entire string (which is what Perl
|
||||
does), whereas the latter passes over a shortened subject. This makes a
|
||||
difference to the matching process if the pattern begins with a lookbe-
|
||||
hind assertion (including \b or \B).
|
||||
|
||||
If an empty string is matched, the next match is done with the
|
||||
If an empty string is matched, the next match is done with the
|
||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||
for another, non-empty, match at the same point in the subject. If this
|
||||
match fails, the start offset is advanced, and the normal match is
|
||||
retried. This imitates the way Perl handles such cases when using the
|
||||
/g modifier or the split() function. Normally, the start offset is
|
||||
advanced by one character, but if the newline convention recognizes
|
||||
CRLF as a newline, and the current character is CR followed by LF, an
|
||||
match fails, the start offset is advanced, and the normal match is
|
||||
retried. This imitates the way Perl handles such cases when using the
|
||||
/g modifier or the split() function. Normally, the start offset is
|
||||
advanced by one character, but if the newline convention recognizes
|
||||
CRLF as a newline, and the current character is CR followed by LF, an
|
||||
advance of two characters occurs.
|
||||
|
||||
Testing substring extraction functions
|
||||
|
||||
The copy and get modifiers can be used to test the pcre2_sub-
|
||||
The copy and get modifiers can be used to test the pcre2_sub-
|
||||
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
||||
given more than once, and each can specify a group name or number, for
|
||||
given more than once, and each can specify a group name or number, for
|
||||
example:
|
||||
|
||||
abcd\=copy=1,copy=3,get=G1
|
||||
|
||||
If the #subject command is used to set default copy and/or get lists,
|
||||
these can be unset by specifying a negative number to cancel all num-
|
||||
If the #subject command is used to set default copy and/or get lists,
|
||||
these can be unset by specifying a negative number to cancel all num-
|
||||
bered groups and an empty name to cancel all named groups.
|
||||
|
||||
The getall modifier tests pcre2_substring_list_get(), which extracts
|
||||
The getall modifier tests pcre2_substring_list_get(), which extracts
|
||||
all captured substrings.
|
||||
|
||||
If the subject line is successfully matched, the substrings extracted
|
||||
by the convenience functions are output with C, G, or L after the
|
||||
string number instead of a colon. This is in addition to the normal
|
||||
full list. The string length (that is, the return from the extraction
|
||||
If the subject line is successfully matched, the substrings extracted
|
||||
by the convenience functions are output with C, G, or L after the
|
||||
string number instead of a colon. This is in addition to the normal
|
||||
full list. The string length (that is, the return from the extraction
|
||||
function) is given in parentheses after each substring, followed by the
|
||||
name when the extraction was by name.
|
||||
|
||||
Testing the substitution function
|
||||
|
||||
If the replace modifier is set, the pcre2_substitute() function is
|
||||
called instead of one of the matching functions. Note that replacement
|
||||
strings cannot contain commas, because a comma signifies the end of a
|
||||
If the replace modifier is set, the pcre2_substitute() function is
|
||||
called instead of one of the matching functions. Note that replacement
|
||||
strings cannot contain commas, because a comma signifies the end of a
|
||||
modifier. This is not thought to be an issue in a test program.
|
||||
|
||||
Unlike subject strings, pcre2test does not process replacement strings
|
||||
for escape sequences. In UTF mode, a replacement string is checked to
|
||||
see if it is a valid UTF-8 string. If so, it is correctly converted to
|
||||
a UTF string of the appropriate code unit width. If it is not a valid
|
||||
UTF-8 string, the individual code units are copied directly. This pro-
|
||||
Unlike subject strings, pcre2test does not process replacement strings
|
||||
for escape sequences. In UTF mode, a replacement string is checked to
|
||||
see if it is a valid UTF-8 string. If so, it is correctly converted to
|
||||
a UTF string of the appropriate code unit width. If it is not a valid
|
||||
UTF-8 string, the individual code units are copied directly. This pro-
|
||||
vides a means of passing an invalid UTF-8 string for testing purposes.
|
||||
|
||||
The following modifiers set options (in additional to the normal match
|
||||
The following modifiers set options (in additional to the normal match
|
||||
options) for pcre2_substitute():
|
||||
|
||||
global PCRE2_SUBSTITUTE_GLOBAL
|
||||
|
@ -1165,8 +1176,8 @@ SUBJECT MODIFIERS
|
|||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||
|
||||
|
||||
After a successful substitution, the modified string is output, pre-
|
||||
ceded by the number of replacements. This may be zero if there were no
|
||||
After a successful substitution, the modified string is output, pre-
|
||||
ceded by the number of replacements. This may be zero if there were no
|
||||
matches. Here is a simple example of a substitution test:
|
||||
|
||||
/abc/replace=xxx
|
||||
|
@ -1175,12 +1186,12 @@ SUBJECT MODIFIERS
|
|||
=abc=abc=\=global
|
||||
2: =xxx=xxx=
|
||||
|
||||
Subject and replacement strings should be kept relatively short (fewer
|
||||
than 256 characters) for substitution tests, as fixed-size buffers are
|
||||
used. To make it easy to test for buffer overflow, if the replacement
|
||||
string starts with a number in square brackets, that number is passed
|
||||
to pcre2_substitute() as the size of the output buffer, with the
|
||||
replacement string starting at the next character. Here is an example
|
||||
Subject and replacement strings should be kept relatively short (fewer
|
||||
than 256 characters) for substitution tests, as fixed-size buffers are
|
||||
used. To make it easy to test for buffer overflow, if the replacement
|
||||
string starts with a number in square brackets, that number is passed
|
||||
to pcre2_substitute() as the size of the output buffer, with the
|
||||
replacement string starting at the next character. Here is an example
|
||||
that tests the edge case:
|
||||
|
||||
/abc/
|
||||
|
@ -1189,11 +1200,11 @@ SUBJECT MODIFIERS
|
|||
123abc123\=replace=[9]XYZ
|
||||
Failed: error -47: no more memory
|
||||
|
||||
The default action of pcre2_substitute() is to return
|
||||
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
|
||||
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
|
||||
stitute_overflow_length modifier), pcre2_substitute() continues to go
|
||||
through the motions of matching and substituting, in order to compute
|
||||
The default action of pcre2_substitute() is to return
|
||||
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
|
||||
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
|
||||
stitute_overflow_length modifier), pcre2_substitute() continues to go
|
||||
through the motions of matching and substituting, in order to compute
|
||||
the size of buffer that is required. When this happens, pcre2test shows
|
||||
the required buffer length (which includes space for the trailing zero)
|
||||
as part of the error message. For example:
|
||||
|
@ -1203,151 +1214,151 @@ SUBJECT MODIFIERS
|
|||
Failed: error -47: no more memory: 10 code units are needed
|
||||
|
||||
A replacement string is ignored with POSIX and DFA matching. Specifying
|
||||
partial matching provokes an error return ("bad option value") from
|
||||
partial matching provokes an error return ("bad option value") from
|
||||
pcre2_substitute().
|
||||
|
||||
Setting the JIT stack size
|
||||
|
||||
The jitstack modifier provides a way of setting the maximum stack size
|
||||
that is used by the just-in-time optimization code. It is ignored if
|
||||
The jitstack modifier provides a way of setting the maximum stack size
|
||||
that is used by the just-in-time optimization code. It is ignored if
|
||||
JIT optimization is not being used. The value is a number of kilobytes.
|
||||
Providing a stack that is larger than the default 32K is necessary only
|
||||
for very complicated patterns.
|
||||
|
||||
Setting heap, match, and depth limits
|
||||
|
||||
The heap_limit, match_limit, and depth_limit modifiers set the appro-
|
||||
priate limits in the match context. These values are ignored when the
|
||||
The heap_limit, match_limit, and depth_limit modifiers set the appro-
|
||||
priate limits in the match context. These values are ignored when the
|
||||
find_limits modifier is specified.
|
||||
|
||||
Finding minimum limits
|
||||
|
||||
If the find_limits modifier is present on a subject line, pcre2test
|
||||
calls the relevant matching function several times, setting different
|
||||
values in the match context via pcre2_set_heap_limit(),
|
||||
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
|
||||
minimum values for each parameter that allows the match to complete
|
||||
If the find_limits modifier is present on a subject line, pcre2test
|
||||
calls the relevant matching function several times, setting different
|
||||
values in the match context via pcre2_set_heap_limit(),
|
||||
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
|
||||
minimum values for each parameter that allows the match to complete
|
||||
without error.
|
||||
|
||||
If JIT is being used, only the match limit is relevant. If DFA matching
|
||||
is being used, only the depth limit is relevant.
|
||||
|
||||
The match_limit number is a measure of the amount of backtracking that
|
||||
takes place, and learning the minimum value can be instructive. For
|
||||
most simple matches, the number is quite small, but for patterns with
|
||||
very large numbers of matching possibilities, it can become large very
|
||||
The match_limit number is a measure of the amount of backtracking that
|
||||
takes place, and learning the minimum value can be instructive. For
|
||||
most simple matches, the number is quite small, but for patterns with
|
||||
very large numbers of matching possibilities, it can become large very
|
||||
quickly with increasing length of subject string.
|
||||
|
||||
For non-DFA matching, the minimum depth_limit number is a measure of
|
||||
For non-DFA matching, the minimum depth_limit number is a measure of
|
||||
how much nested backtracking happens (that is, how deeply the pattern's
|
||||
tree is searched). In the case of DFA matching, depth_limit controls
|
||||
the depth of recursive calls of the internal function that is used for
|
||||
tree is searched). In the case of DFA matching, depth_limit controls
|
||||
the depth of recursive calls of the internal function that is used for
|
||||
handling pattern recursion, lookaround assertions, and atomic groups.
|
||||
|
||||
Showing MARK names
|
||||
|
||||
|
||||
The mark modifier causes the names from backtracking control verbs that
|
||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||
it is added to the non-match message.
|
||||
|
||||
Showing memory usage
|
||||
|
||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||
ory allocation and freeing calls that occur during a call to
|
||||
pcre2_match(). These occur only when a match requires a bigger vector
|
||||
than the default for remembering backtracking points. In many cases
|
||||
there will be no heap memory used and therefore no additional output.
|
||||
No heap memory is allocated during matching with pcre2_dfa_match or
|
||||
with JIT, so in those cases the memory modifier never has any effect.
|
||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||
ory allocation and freeing calls that occur during a call to
|
||||
pcre2_match(). These occur only when a match requires a bigger vector
|
||||
than the default for remembering backtracking points. In many cases
|
||||
there will be no heap memory used and therefore no additional output.
|
||||
No heap memory is allocated during matching with pcre2_dfa_match or
|
||||
with JIT, so in those cases the memory modifier never has any effect.
|
||||
For this modifier to work, the null_context modifier must not be set on
|
||||
both the pattern and the subject, though it can be set on one or the
|
||||
both the pattern and the subject, though it can be set on one or the
|
||||
other.
|
||||
|
||||
Setting a starting offset
|
||||
|
||||
The offset modifier sets an offset in the subject string at which
|
||||
The offset modifier sets an offset in the subject string at which
|
||||
matching starts. Its value is a number of code units, not characters.
|
||||
|
||||
Setting an offset limit
|
||||
|
||||
The offset_limit modifier sets a limit for unanchored matches. If a
|
||||
The offset_limit modifier sets a limit for unanchored matches. If a
|
||||
match cannot be found starting at or before this offset in the subject,
|
||||
a "no match" return is given. The data value is a number of code units,
|
||||
not characters. When this modifier is used, the use_offset_limit modi-
|
||||
not characters. When this modifier is used, the use_offset_limit modi-
|
||||
fier must have been set for the pattern; if not, an error is generated.
|
||||
|
||||
Setting the size of the output vector
|
||||
|
||||
The ovector modifier applies only to the subject line in which it
|
||||
appears, though of course it can also be used to set a default in a
|
||||
#subject command. It specifies the number of pairs of offsets that are
|
||||
The ovector modifier applies only to the subject line in which it
|
||||
appears, though of course it can also be used to set a default in a
|
||||
#subject command. It specifies the number of pairs of offsets that are
|
||||
available for storing matching information. The default is 15.
|
||||
|
||||
A value of zero is useful when testing the POSIX API because it causes
|
||||
A value of zero is useful when testing the POSIX API because it causes
|
||||
regexec() to be called with a NULL capture vector. When not testing the
|
||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||
ate_from_pattern() to be called, in order to create a match block of
|
||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||
ate_from_pattern() to be called, in order to create a match block of
|
||||
exactly the right size for the pattern. (It is not possible to create a
|
||||
match block with a zero-length ovector; there is always at least one
|
||||
match block with a zero-length ovector; there is always at least one
|
||||
pair of offsets.)
|
||||
|
||||
Passing the subject as zero-terminated
|
||||
|
||||
By default, the subject string is passed to a native API matching func-
|
||||
tion with its correct length. In order to test the facility for passing
|
||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
||||
via the POSIX interface, this modifier has no effect, as there is no
|
||||
via the POSIX interface, this modifier has no effect, as there is no
|
||||
facility for passing a length.)
|
||||
|
||||
When testing pcre2_substitute(), this modifier also has the effect of
|
||||
When testing pcre2_substitute(), this modifier also has the effect of
|
||||
passing the replacement string as zero-terminated.
|
||||
|
||||
Passing a NULL context
|
||||
|
||||
Normally, pcre2test passes a context block to pcre2_match(),
|
||||
Normally, pcre2test passes a context block to pcre2_match(),
|
||||
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
|
||||
set, however, NULL is passed. This is for testing that the matching
|
||||
set, however, NULL is passed. This is for testing that the matching
|
||||
functions behave correctly in this case (they use default values). This
|
||||
modifier cannot be used with the find_limits modifier or when testing
|
||||
modifier cannot be used with the find_limits modifier or when testing
|
||||
the substitution function.
|
||||
|
||||
|
||||
THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
By default, pcre2test uses the standard PCRE2 matching function,
|
||||
By default, pcre2test uses the standard PCRE2 matching function,
|
||||
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||
ferent way, and has some restrictions. The differences between the two
|
||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||
ferent way, and has some restrictions. The differences between the two
|
||||
functions are described in the pcre2matching documentation.
|
||||
|
||||
If the dfa modifier is set, the alternative matching function is used.
|
||||
This function finds all possible matches at a given point in the sub-
|
||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||
after the first match is found. This is always the shortest possible
|
||||
If the dfa modifier is set, the alternative matching function is used.
|
||||
This function finds all possible matches at a given point in the sub-
|
||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||
after the first match is found. This is always the shortest possible
|
||||
match.
|
||||
|
||||
|
||||
DEFAULT OUTPUT FROM pcre2test
|
||||
|
||||
This section describes the output when the normal matching function,
|
||||
This section describes the output when the normal matching function,
|
||||
pcre2_match(), is being used.
|
||||
|
||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||
strings, starting with number 0 for the string that matched the whole
|
||||
pattern. Otherwise, it outputs "No match" when the return is
|
||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||
this is the entire substring that was inspected during the partial
|
||||
match; it may include characters before the actual match start if a
|
||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||
strings, starting with number 0 for the string that matched the whole
|
||||
pattern. Otherwise, it outputs "No match" when the return is
|
||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||
this is the entire substring that was inspected during the partial
|
||||
match; it may include characters before the actual match start if a
|
||||
lookbehind assertion, \K, \b, or \B was involved.)
|
||||
|
||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||
and a short descriptive phrase. If the error is a failed UTF string
|
||||
check, the code unit offset of the start of the failing character is
|
||||
and a short descriptive phrase. If the error is a failed UTF string
|
||||
check, the code unit offset of the start of the failing character is
|
||||
also output. Here is an example of an interactive pcre2test run.
|
||||
|
||||
$ pcre2test
|
||||
|
@ -1363,8 +1374,8 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
Unset capturing substrings that are not followed by one that is set are
|
||||
not shown by pcre2test unless the allcaptures modifier is specified. In
|
||||
the following example, there are two capturing substrings, but when the
|
||||
first data line is matched, the second, unset substring is not shown.
|
||||
An "internal" unset substring is shown as "<unset>", as for the second
|
||||
first data line is matched, the second, unset substring is not shown.
|
||||
An "internal" unset substring is shown as "<unset>", as for the second
|
||||
data line.
|
||||
|
||||
re> /(a)|(b)/
|
||||
|
@ -1376,11 +1387,11 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
1: <unset>
|
||||
2: b
|
||||
|
||||
If the strings contain any non-printing characters, they are output as
|
||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||
If the strings contain any non-printing characters, they are output as
|
||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
||||
nition of non-printing characters. If the aftertext modifier is set,
|
||||
the output for substring 0 is followed by the the rest of the subject
|
||||
nition of non-printing characters. If the aftertext modifier is set,
|
||||
the output for substring 0 is followed by the the rest of the subject
|
||||
string, identified by "0+" like this:
|
||||
|
||||
re> /cat/aftertext
|
||||
|
@ -1388,7 +1399,7 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
0: cat
|
||||
0+ aract
|
||||
|
||||
If global matching is requested, the results of successive matching
|
||||
If global matching is requested, the results of successive matching
|
||||
attempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
|
@ -1400,8 +1411,8 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails. Here is an
|
||||
example of a failure message (the offset 4 that is specified by the
|
||||
"No match" is output only if the first match attempt fails. Here is an
|
||||
example of a failure message (the offset 4 that is specified by the
|
||||
offset modifier is past the end of the subject string):
|
||||
|
||||
re> /xyz/
|
||||
|
@ -1409,7 +1420,7 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
Error -24 (bad offset value)
|
||||
|
||||
Note that whereas patterns can be continued over several lines (a plain
|
||||
">" prompt is used for continuations), subject lines may not. However
|
||||
">" prompt is used for continuations), subject lines may not. However
|
||||
newlines can be included in a subject by means of the \n escape (or \r,
|
||||
\r\n, etc., depending on the newline sequence setting).
|
||||
|
||||
|
@ -1417,7 +1428,7 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
When the alternative matching function, pcre2_dfa_match(), is used, the
|
||||
output consists of a list of all the matches that start at the first
|
||||
output consists of a list of all the matches that start at the first
|
||||
point in the subject where there is at least one match. For example:
|
||||
|
||||
re> /(tang|tangerine|tan)/
|
||||
|
@ -1426,11 +1437,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
|||
1: tang
|
||||
2: tan
|
||||
|
||||
Using the normal matching function on this data finds only "tang". The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||
followed by the partially matching substring. Note that this is the
|
||||
entire substring that was inspected during the partial match; it may
|
||||
Using the normal matching function on this data finds only "tang". The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||
followed by the partially matching substring. Note that this is the
|
||||
entire substring that was inspected during the partial match; it may
|
||||
include characters before the actual match start if a lookbehind asser-
|
||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
||||
|
||||
|
@ -1446,16 +1457,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
|||
1: tan
|
||||
0: tan
|
||||
|
||||
The alternative matching function does not support substring capture,
|
||||
so the modifiers that are concerned with captured substrings are not
|
||||
The alternative matching function does not support substring capture,
|
||||
so the modifiers that are concerned with captured substrings are not
|
||||
relevant.
|
||||
|
||||
|
||||
RESTARTING AFTER A PARTIAL MATCH
|
||||
|
||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||
TIAL return, indicating that the subject partially matched the pattern,
|
||||
you can restart the match with additional subject data by means of the
|
||||
you can restart the match with additional subject data by means of the
|
||||
dfa_restart modifier. For example:
|
||||
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
|
@ -1464,45 +1475,45 @@ RESTARTING AFTER A PARTIAL MATCH
|
|||
data> n05\=dfa,dfa_restart
|
||||
0: n05
|
||||
|
||||
For further information about partial matching, see the pcre2partial
|
||||
For further information about partial matching, see the pcre2partial
|
||||
documentation.
|
||||
|
||||
|
||||
CALLOUTS
|
||||
|
||||
If the pattern contains any callout requests, pcre2test's callout func-
|
||||
tion is called during matching unless callout_none is specified. This
|
||||
tion is called during matching unless callout_none is specified. This
|
||||
works with both matching functions.
|
||||
|
||||
The callout function in pcre2test returns zero (carry on matching) by
|
||||
default, but you can use a callout_fail modifier in a subject line (as
|
||||
The callout function in pcre2test returns zero (carry on matching) by
|
||||
default, but you can use a callout_fail modifier in a subject line (as
|
||||
described above) to change this and other parameters of the callout.
|
||||
|
||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
cated regular expressions. For further information about callouts, see
|
||||
the pcre2callout documentation.
|
||||
|
||||
The output for callouts with numerical arguments and those with string
|
||||
The output for callouts with numerical arguments and those with string
|
||||
arguments is slightly different.
|
||||
|
||||
Callouts with numerical arguments
|
||||
|
||||
By default, the callout function displays the callout number, the start
|
||||
and current positions in the subject text at the callout time, and the
|
||||
and current positions in the subject text at the callout time, and the
|
||||
next pattern item to be tested. For example:
|
||||
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
positions are the same, or if the current position precedes the start
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
positions are the same, or if the current position precedes the start
|
||||
position, which can happen if the callout is in a lookbehind assertion.
|
||||
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||
a result of the /auto_callout pattern modifier. In this case, instead
|
||||
a result of the /auto_callout pattern modifier. In this case, instead
|
||||
of showing the callout number, the offset in the pattern, preceded by a
|
||||
plus, is output. For example:
|
||||
|
||||
|
@ -1516,7 +1527,7 @@ CALLOUTS
|
|||
0: E*
|
||||
|
||||
If a pattern contains (*MARK) items, an additional line is output when-
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
example:
|
||||
|
||||
re> /a(*MARK:X)bc/auto_callout
|
||||
|
@ -1530,17 +1541,17 @@ CALLOUTS
|
|||
+12 ^ ^
|
||||
0: abc
|
||||
|
||||
The mark changes between matching "a" and "b", but stays the same for
|
||||
the rest of the match, so nothing more is output. If, as a result of
|
||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||
The mark changes between matching "a" and "b", but stays the same for
|
||||
the rest of the match, so nothing more is output. If, as a result of
|
||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||
output.
|
||||
|
||||
Callouts with string arguments
|
||||
|
||||
The output for a callout with a string argument is similar, except that
|
||||
instead of outputting a callout number before the position indicators,
|
||||
the callout string and its offset in the pattern string are output
|
||||
before the reflection of the subject string, and the subject string is
|
||||
instead of outputting a callout number before the position indicators,
|
||||
the callout string and its offset in the pattern string are output
|
||||
before the reflection of the subject string, and the subject string is
|
||||
reflected for each callout. For example:
|
||||
|
||||
re> /^ab(?C'first')cd(?C"second")ef/
|
||||
|
@ -1557,43 +1568,43 @@ CALLOUTS
|
|||
NON-PRINTING CHARACTERS
|
||||
|
||||
When pcre2test is outputting text in the compiled version of a pattern,
|
||||
bytes other than 32-126 are always treated as non-printing characters
|
||||
bytes other than 32-126 are always treated as non-printing characters
|
||||
and are therefore shown as hex escapes.
|
||||
|
||||
When pcre2test is outputting text that is a matched part of a subject
|
||||
string, it behaves in the same way, unless a different locale has been
|
||||
set for the pattern (using the locale modifier). In this case, the
|
||||
isprint() function is used to distinguish printing and non-printing
|
||||
When pcre2test is outputting text that is a matched part of a subject
|
||||
string, it behaves in the same way, unless a different locale has been
|
||||
set for the pattern (using the locale modifier). In this case, the
|
||||
isprint() function is used to distinguish printing and non-printing
|
||||
characters.
|
||||
|
||||
|
||||
SAVING AND RESTORING COMPILED PATTERNS
|
||||
|
||||
It is possible to save compiled patterns on disc or elsewhere, and
|
||||
It is possible to save compiled patterns on disc or elsewhere, and
|
||||
reload them later, subject to a number of restrictions. JIT data cannot
|
||||
be saved. The host on which the patterns are reloaded must be running
|
||||
be saved. The host on which the patterns are reloaded must be running
|
||||
the same version of PCRE2, with the same code unit width, and must also
|
||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||
compiled patterns can be saved they must be serialized, that is, con-
|
||||
verted to a stream of bytes. A single byte stream may contain any num-
|
||||
ber of compiled patterns, but they must all use the same character
|
||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||
compiled patterns can be saved they must be serialized, that is, con-
|
||||
verted to a stream of bytes. A single byte stream may contain any num-
|
||||
ber of compiled patterns, but they must all use the same character
|
||||
tables. A single copy of the tables is included in the byte stream (its
|
||||
size is 1088 bytes).
|
||||
|
||||
The functions whose names begin with pcre2_serialize_ are used for
|
||||
serializing and de-serializing. They are described in the pcre2serial-
|
||||
The functions whose names begin with pcre2_serialize_ are used for
|
||||
serializing and de-serializing. They are described in the pcre2serial-
|
||||
ize documentation. In this section we describe the features of
|
||||
pcre2test that can be used to test these functions.
|
||||
|
||||
When a pattern with push modifier is successfully compiled, it is
|
||||
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||
next line to contain a new pattern (or command) instead of a subject
|
||||
line. By contrast, the pushcopy modifier causes a copy of the compiled
|
||||
pattern to be stacked, leaving the original available for immediate
|
||||
matching. By using push and/or pushcopy, a number of patterns can be
|
||||
When a pattern with push modifier is successfully compiled, it is
|
||||
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||
next line to contain a new pattern (or command) instead of a subject
|
||||
line. By contrast, the pushcopy modifier causes a copy of the compiled
|
||||
pattern to be stacked, leaving the original available for immediate
|
||||
matching. By using push and/or pushcopy, a number of patterns can be
|
||||
compiled and retained. These modifiers are incompatible with posix, and
|
||||
control modifiers that act at match time are ignored (with a message)
|
||||
for the stacked patterns. The jitverify modifier applies only at com-
|
||||
control modifiers that act at match time are ignored (with a message)
|
||||
for the stacked patterns. The jitverify modifier applies only at com-
|
||||
pile time.
|
||||
|
||||
The command
|
||||
|
@ -1601,21 +1612,21 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
#save <filename>
|
||||
|
||||
causes all the stacked patterns to be serialized and the result written
|
||||
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||
command
|
||||
|
||||
#load <filename>
|
||||
|
||||
reads the data in the file, and then arranges for it to be de-serial-
|
||||
ized, with the resulting compiled patterns added to the pattern stack.
|
||||
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||
mand, which must be followed by lines of subjects that are to be
|
||||
matched with the pattern, terminated as usual by an empty line or end
|
||||
of file. This command may be followed by a modifier list containing
|
||||
only control modifiers that act after a pattern has been compiled. In
|
||||
reads the data in the file, and then arranges for it to be de-serial-
|
||||
ized, with the resulting compiled patterns added to the pattern stack.
|
||||
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||
mand, which must be followed by lines of subjects that are to be
|
||||
matched with the pattern, terminated as usual by an empty line or end
|
||||
of file. This command may be followed by a modifier list containing
|
||||
only control modifiers that act after a pattern has been compiled. In
|
||||
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||
however permitted. Here is an example that saves and reloads two pat-
|
||||
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||
however permitted. Here is an example that saves and reloads two pat-
|
||||
terns.
|
||||
|
||||
/abc/push
|
||||
|
@ -1628,10 +1639,10 @@ SAVING AND RESTORING COMPILED PATTERNS
|
|||
#pop jit,bincode
|
||||
abc
|
||||
|
||||
If jitverify is used with #pop, it does not automatically imply jit,
|
||||
If jitverify is used with #pop, it does not automatically imply jit,
|
||||
which is different behaviour from when it is used on a pattern.
|
||||
|
||||
The #popcopy command is analagous to the pushcopy modifier in that it
|
||||
The #popcopy command is analagous to the pushcopy modifier in that it
|
||||
makes current a copy of the topmost stack pattern, leaving the original
|
||||
still on the stack.
|
||||
|
||||
|
@ -1651,5 +1662,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 01 June 2017
|
||||
Last updated: 03 June 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
|
|
|
@ -231,10 +231,14 @@ PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION
|
|||
regcomp(regex_t *preg, const char *pattern, int cflags)
|
||||
{
|
||||
PCRE2_SIZE erroffset;
|
||||
PCRE2_SIZE patlen;
|
||||
int errorcode;
|
||||
int options = 0;
|
||||
int re_nsub = 0;
|
||||
|
||||
patlen = ((cflags & REG_PEND) != 0)? (PCRE2_SIZE)(preg->re_endp - pattern) :
|
||||
PCRE2_ZERO_TERMINATED;
|
||||
|
||||
if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS;
|
||||
if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE;
|
||||
if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL;
|
||||
|
@ -243,8 +247,8 @@ if ((cflags & REG_UCP) != 0) options |= PCRE2_UCP;
|
|||
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;
|
||||
|
||||
preg->re_cflags = cflags;
|
||||
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, PCRE2_ZERO_TERMINATED,
|
||||
options, &errorcode, &erroffset, NULL);
|
||||
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, patlen, options,
|
||||
&errorcode, &erroffset, NULL);
|
||||
preg->re_erroffset = erroffset;
|
||||
|
||||
if (preg->re_pcre2_code == NULL)
|
||||
|
|
|
@ -62,6 +62,7 @@ extern "C" {
|
|||
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */
|
||||
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */
|
||||
#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */
|
||||
#define REG_PEND 0x0800 /* GNU feature: pass end pattern by re_endp */
|
||||
|
||||
/* This is not used by PCRE2, but by defining it we make it easier
|
||||
to slot PCRE2 into existing programs that make POSIX calls. */
|
||||
|
@ -91,11 +92,13 @@ enum {
|
|||
};
|
||||
|
||||
|
||||
/* The structure representing a compiled regular expression. */
|
||||
/* The structure representing a compiled regular expression. It is also used
|
||||
for passing the pattern end pointer when REG_PEND is set. */
|
||||
|
||||
typedef struct {
|
||||
void *re_pcre2_code;
|
||||
void *re_match_data;
|
||||
const char *re_endp;
|
||||
size_t re_nsub;
|
||||
size_t re_erroffset;
|
||||
int re_cflags;
|
||||
|
|
|
@ -699,7 +699,8 @@ static modstruct modlist[] = {
|
|||
#define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0)
|
||||
|
||||
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
|
||||
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX|CTL_POSIX_NOSUB)
|
||||
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_HEXPAT|CTL_POSIX| \
|
||||
CTL_POSIX_NOSUB|CTL_USE_LENGTH)
|
||||
|
||||
#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0)
|
||||
|
||||
|
@ -733,11 +734,9 @@ the first control word. Note that CTL_POSIX_NOSUB is always accompanied by
|
|||
CTL_POSIX, so it doesn't need its own entries. */
|
||||
|
||||
static uint32_t exclusive_pat_controls[] = {
|
||||
CTL_POSIX | CTL_HEXPAT,
|
||||
CTL_POSIX | CTL_PUSH,
|
||||
CTL_POSIX | CTL_PUSHCOPY,
|
||||
CTL_POSIX | CTL_PUSHTABLESCOPY,
|
||||
CTL_POSIX | CTL_USE_LENGTH,
|
||||
CTL_PUSH | CTL_PUSHCOPY,
|
||||
CTL_PUSH | CTL_PUSHTABLESCOPY,
|
||||
CTL_PUSHCOPY | CTL_PUSHTABLESCOPY,
|
||||
|
@ -896,7 +895,7 @@ static PCRE2_SIZE malloclistlength[MALLOCLISTSIZE];
|
|||
static uint32_t malloclistptr = 0;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
static regex_t preg = { NULL, NULL, 0, 0, 0 };
|
||||
static regex_t preg = { NULL, NULL, 0, 0, 0, 0 };
|
||||
#endif
|
||||
|
||||
static int *dfa_workspace = NULL;
|
||||
|
@ -5264,6 +5263,12 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
|||
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
|
||||
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
|
||||
|
||||
if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
|
||||
{
|
||||
preg.re_endp = (char *)pbuffer8 + patlen;
|
||||
cflags |= REG_PEND;
|
||||
}
|
||||
|
||||
rc = regcomp(&preg, (char *)pbuffer8, cflags);
|
||||
|
||||
/* Compiling failed */
|
||||
|
|
|
@ -123,4 +123,10 @@
|
|||
/^a\x{00}b$/posix
|
||||
a\x{00}b\=posix_startend=0:3
|
||||
|
||||
/"A" 00 "B"/hex
|
||||
A\x{00}B\=posix_startend=0:3
|
||||
|
||||
/ABC/use_length
|
||||
ABC
|
||||
|
||||
# End of testdata/testinput18
|
||||
|
|
|
@ -15,4 +15,7 @@
|
|||
/\w/ucp
|
||||
+++\x{c2}
|
||||
|
||||
/"^AB" 00 "\x{1234}$"/hex,utf
|
||||
AB\x{00}\x{1234}\=posix_startend=0:6
|
||||
|
||||
# End of testdata/testinput19
|
||||
|
|
|
@ -191,4 +191,12 @@ No match: POSIX code 17: match failed
|
|||
a\x{00}b\=posix_startend=0:3
|
||||
0: a\x00b
|
||||
|
||||
/"A" 00 "B"/hex
|
||||
A\x{00}B\=posix_startend=0:3
|
||||
0: A\x00B
|
||||
|
||||
/ABC/use_length
|
||||
ABC
|
||||
0: ABC
|
||||
|
||||
# End of testdata/testinput18
|
||||
|
|
|
@ -18,4 +18,8 @@ No match: POSIX code 17: match failed
|
|||
+++\x{c2}
|
||||
0: \xc2
|
||||
|
||||
/"^AB" 00 "\x{1234}$"/hex,utf
|
||||
AB\x{00}\x{1234}\=posix_startend=0:6
|
||||
0: AB\x{00}\x{1234}
|
||||
|
||||
# End of testdata/testinput19
|
||||
|
|
Loading…
Reference in New Issue