Implement REG_PEND (GNU extension) for the POSIX wrapper.
This commit is contained in:
parent
f850015168
commit
bcba497c0b
|
@ -182,6 +182,8 @@ deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
|
||||||
38. Fix returned offsets from regexec() when REG_STARTEND is used with a
|
38. Fix returned offsets from regexec() when REG_STARTEND is used with a
|
||||||
starting offset greater than zero.
|
starting offset greater than zero.
|
||||||
|
|
||||||
|
39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
|
||||||
|
|
||||||
|
|
||||||
Version 10.23 14-February-2017
|
Version 10.23 14-February-2017
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
|
@ -69,7 +69,7 @@ replacement library. Other POSIX options are not even defined.
|
||||||
<P>
|
<P>
|
||||||
There are also some options that are not defined by POSIX. These have been
|
There are also some options that are not defined by POSIX. These have been
|
||||||
added at the request of users who want to make use of certain PCRE2-specific
|
added at the request of users who want to make use of certain PCRE2-specific
|
||||||
features via the POSIX calling interface.
|
features via the POSIX calling interface or to add BSD or GNU functionality.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When PCRE2 is called via these functions, it is only the API that is POSIX-like
|
When PCRE2 is called via these functions, it is only the API that is POSIX-like
|
||||||
|
@ -91,10 +91,11 @@ identifying error codes.
|
||||||
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
|
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||||
<P>
|
<P>
|
||||||
The function <b>regcomp()</b> is called to compile a pattern into an
|
The function <b>regcomp()</b> is called to compile a pattern into an
|
||||||
internal form. The pattern is a C string terminated by a binary zero, and
|
internal form. By default, the pattern is a C string terminated by a binary
|
||||||
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
|
zero (but see REG_PEND below). The <i>preg</i> argument is a pointer to a
|
||||||
to a <b>regex_t</b> structure that is used as a base for storing information
|
<b>regex_t</b> structure that is used as a base for storing information about
|
||||||
about the compiled regular expression.
|
the compiled regular expression. (It is also used for input when REG_PEND is
|
||||||
|
set.)
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The argument <i>cflags</i> is either zero, or contains one or more of the bits
|
The argument <i>cflags</i> is either zero, or contains one or more of the bits
|
||||||
|
@ -124,6 +125,16 @@ matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
|
||||||
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
||||||
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
||||||
because it disables the use of back references.
|
because it disables the use of back references.
|
||||||
|
<pre>
|
||||||
|
REG_PEND
|
||||||
|
</pre>
|
||||||
|
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
|
||||||
|
(which has the type const char *) must be set to point to the character beyond
|
||||||
|
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
|
||||||
|
now contain binary zeroes, which are treated as data characters. Without
|
||||||
|
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
|
||||||
|
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||||
|
caution in software intended to be portable to other systems.
|
||||||
<pre>
|
<pre>
|
||||||
REG_UCP
|
REG_UCP
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -156,9 +167,10 @@ class such as [^a] (they are).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
|
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
|
||||||
<i>preg</i> structure is filled in on success, and one member of the structure
|
<i>preg</i> structure is filled in on success, and one other member of the
|
||||||
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
|
structure (as well as <i>re_endp</i>) is public: <i>re_nsub</i> contains the
|
||||||
the regular expression. Various error codes are defined in the header file.
|
number of capturing subpatterns in the regular expression. Various error codes
|
||||||
|
are defined in the header file.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
|
NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
|
||||||
|
@ -228,15 +240,26 @@ function.
|
||||||
<pre>
|
<pre>
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
</pre>
|
</pre>
|
||||||
The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and
|
When this option is set, the subject string is starts at <i>string</i> +
|
||||||
to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i>
|
<i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
|
||||||
(there need not actually be a NUL at that location), regardless of the value of
|
should point to the first character beyond the string. There may be binary
|
||||||
<i>nmatch</i>. This is a BSD extension, compatible with but not specified by
|
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
||||||
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
|
way to pass a subject string that contains a binary zero.
|
||||||
intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does
|
</P>
|
||||||
not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
|
<P>
|
||||||
how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL are
|
Whatever the value of <i>pmatch[0].rm_so</i>, the offsets of the matched string
|
||||||
mutually exclusive; the error REG_INVARG is returned.
|
and any captured substrings are still given relative to the start of
|
||||||
|
<i>string</i> itself. (Before PCRE2 release 10.30 these were given relative to
|
||||||
|
<i>string</i> + <i>pmatch[0].rm_so</i>, but this differs from other
|
||||||
|
implementations.)
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
This is a BSD extension, compatible with but not specified by IEEE Standard
|
||||||
|
1003.2 (POSIX.2), and should be used with caution in software intended to be
|
||||||
|
portable to other systems. Note that a non-zero <i>rm_so</i> does not imply
|
||||||
|
REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
|
||||||
|
not how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL
|
||||||
|
are mutually exclusive; the error REG_INVARG is returned.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||||
|
@ -291,9 +314,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC9" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC9" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 31 January 2016
|
Last updated: 05 June 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2016 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -1078,6 +1078,19 @@ are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
|
||||||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
|
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
|
||||||
The other modifiers are ignored, with a warning message.
|
The other modifiers are ignored, with a warning message.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
There is one additional modifier that can be used with the POSIX wrapper. It is
|
||||||
|
ignored (with a warning) if used for non-POSIX matching.
|
||||||
|
<pre>
|
||||||
|
posix_startend=<n>[:<m>]
|
||||||
|
</pre>
|
||||||
|
This causes the subject string to be passed to <b>regexec()</b> using the
|
||||||
|
REG_STARTEND option, which uses offsets to restrict which part of the string is
|
||||||
|
searched. If only one number is given, the end offset is passed as the end of
|
||||||
|
the subject string. For more detail of REG_STARTEND, see the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
documentation.
|
||||||
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting match controls
|
Setting match controls
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -1817,7 +1830,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 01 June 2017
|
Last updated: 03 June 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -8986,7 +8986,8 @@ DESCRIPTION
|
||||||
|
|
||||||
There are also some options that are not defined by POSIX. These have
|
There are also some options that are not defined by POSIX. These have
|
||||||
been added at the request of users who want to make use of certain
|
been added at the request of users who want to make use of certain
|
||||||
PCRE2-specific features via the POSIX calling interface.
|
PCRE2-specific features via the POSIX calling interface or to add BSD
|
||||||
|
or GNU functionality.
|
||||||
|
|
||||||
When PCRE2 is called via these functions, it is only the API that is
|
When PCRE2 is called via these functions, it is only the API that is
|
||||||
POSIX-like in style. The syntax and semantics of the regular expres-
|
POSIX-like in style. The syntax and semantics of the regular expres-
|
||||||
|
@ -9008,10 +9009,11 @@ DESCRIPTION
|
||||||
COMPILING A PATTERN
|
COMPILING A PATTERN
|
||||||
|
|
||||||
The function regcomp() is called to compile a pattern into an internal
|
The function regcomp() is called to compile a pattern into an internal
|
||||||
form. The pattern is a C string terminated by a binary zero, and is
|
form. By default, the pattern is a C string terminated by a binary zero
|
||||||
passed in the argument pattern. The preg argument is a pointer to a
|
(but see REG_PEND below). The preg argument is a pointer to a regex_t
|
||||||
regex_t structure that is used as a base for storing information about
|
structure that is used as a base for storing information about the com-
|
||||||
the compiled regular expression.
|
piled regular expression. (It is also used for input when REG_PEND is
|
||||||
|
set.)
|
||||||
|
|
||||||
The argument cflags is either zero, or contains one or more of the bits
|
The argument cflags is either zero, or contains one or more of the bits
|
||||||
defined by the following macros:
|
defined by the following macros:
|
||||||
|
@ -9042,6 +9044,17 @@ COMPILING A PATTERN
|
||||||
used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no
|
used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no
|
||||||
longer happens because it disables the use of back references.
|
longer happens because it disables the use of back references.
|
||||||
|
|
||||||
|
REG_PEND
|
||||||
|
|
||||||
|
If this option is set, the reg_endp field in the preg structure (which
|
||||||
|
has the type const char *) must be set to point to the character beyond
|
||||||
|
the end of the pattern before calling regcomp(). The pattern itself may
|
||||||
|
now contain binary zeroes, which are treated as data characters. With-
|
||||||
|
out REG_PEND, a binary zero terminates the pattern and the re_endp
|
||||||
|
field is ignored. This is a GNU extension to the POSIX standard and
|
||||||
|
should be used with caution in software intended to be portable to
|
||||||
|
other systems.
|
||||||
|
|
||||||
REG_UCP
|
REG_UCP
|
||||||
|
|
||||||
The PCRE2_UCP option is set when the regular expression is passed for
|
The PCRE2_UCP option is set when the regular expression is passed for
|
||||||
|
@ -9071,9 +9084,10 @@ COMPILING A PATTERN
|
||||||
ter (they are not) or by a negative class such as [^a] (they are).
|
ter (they are not) or by a negative class such as [^a] (they are).
|
||||||
|
|
||||||
The yield of regcomp() is zero on success, and non-zero otherwise. The
|
The yield of regcomp() is zero on success, and non-zero otherwise. The
|
||||||
preg structure is filled in on success, and one member of the structure
|
preg structure is filled in on success, and one other member of the
|
||||||
is public: re_nsub contains the number of capturing subpatterns in the
|
structure (as well as re_endp) is public: re_nsub contains the number
|
||||||
regular expression. Various error codes are defined in the header file.
|
of capturing subpatterns in the regular expression. Various error codes
|
||||||
|
are defined in the header file.
|
||||||
|
|
||||||
NOTE: If the yield of regcomp() is non-zero, you must not attempt to
|
NOTE: If the yield of regcomp() is non-zero, you must not attempt to
|
||||||
use the contents of the preg structure. If, for example, you pass it to
|
use the contents of the preg structure. If, for example, you pass it to
|
||||||
|
@ -9146,15 +9160,24 @@ MATCHING A PATTERN
|
||||||
|
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
|
|
||||||
The string is considered to start at string + pmatch[0].rm_so and to
|
When this option is set, the subject string is starts at string +
|
||||||
have a terminating NUL located at string + pmatch[0].rm_eo (there need
|
pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
|
||||||
not actually be a NUL at that location), regardless of the value of
|
point to the first character beyond the string. There may be binary
|
||||||
nmatch. This is a BSD extension, compatible with but not specified by
|
zeroes within the subject string, and indeed, using REG_STARTEND is the
|
||||||
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in
|
only way to pass a subject string that contains a binary zero.
|
||||||
software intended to be portable to other systems. Note that a non-zero
|
|
||||||
rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location
|
Whatever the value of pmatch[0].rm_so, the offsets of the matched
|
||||||
of the string, not how it is matched. Setting REG_STARTEND and passing
|
string and any captured substrings are still given relative to the
|
||||||
pmatch as NULL are mutually exclusive; the error REG_INVARG is
|
start of string itself. (Before PCRE2 release 10.30 these were given
|
||||||
|
relative to string + pmatch[0].rm_so, but this differs from other
|
||||||
|
implementations.)
|
||||||
|
|
||||||
|
This is a BSD extension, compatible with but not specified by IEEE
|
||||||
|
Standard 1003.2 (POSIX.2), and should be used with caution in software
|
||||||
|
intended to be portable to other systems. Note that a non-zero rm_so
|
||||||
|
does not imply REG_NOTBOL; REG_STARTEND affects only the location and
|
||||||
|
length of the string, not how it is matched. Setting REG_STARTEND and
|
||||||
|
passing pmatch as NULL are mutually exclusive; the error REG_INVARG is
|
||||||
returned.
|
returned.
|
||||||
|
|
||||||
If the pattern was compiled with the REG_NOSUB flag, no data about any
|
If the pattern was compiled with the REG_NOSUB flag, no data about any
|
||||||
|
@ -9209,8 +9232,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 31 January 2016
|
Last updated: 05 June 2017
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2POSIX 3 "03 June 2017" "PCRE2 10.30"
|
.TH PCRE2POSIX 3 "05 June 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "SYNOPSIS"
|
.SH "SYNOPSIS"
|
||||||
|
@ -46,7 +46,7 @@ replacement library. Other POSIX options are not even defined.
|
||||||
.P
|
.P
|
||||||
There are also some options that are not defined by POSIX. These have been
|
There are also some options that are not defined by POSIX. These have been
|
||||||
added at the request of users who want to make use of certain PCRE2-specific
|
added at the request of users who want to make use of certain PCRE2-specific
|
||||||
features via the POSIX calling interface.
|
features via the POSIX calling interface or to add BSD or GNU functionality.
|
||||||
.P
|
.P
|
||||||
When PCRE2 is called via these functions, it is only the API that is POSIX-like
|
When PCRE2 is called via these functions, it is only the API that is POSIX-like
|
||||||
in style. The syntax and semantics of the regular expressions themselves are
|
in style. The syntax and semantics of the regular expressions themselves are
|
||||||
|
@ -68,10 +68,11 @@ identifying error codes.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The function \fBregcomp()\fP is called to compile a pattern into an
|
The function \fBregcomp()\fP is called to compile a pattern into an
|
||||||
internal form. The pattern is a C string terminated by a binary zero, and
|
internal form. By default, the pattern is a C string terminated by a binary
|
||||||
is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer
|
zero (but see REG_PEND below). The \fIpreg\fP argument is a pointer to a
|
||||||
to a \fBregex_t\fP structure that is used as a base for storing information
|
\fBregex_t\fP structure that is used as a base for storing information about
|
||||||
about the compiled regular expression.
|
the compiled regular expression. (It is also used for input when REG_PEND is
|
||||||
|
set.)
|
||||||
.P
|
.P
|
||||||
The argument \fIcflags\fP is either zero, or contains one or more of the bits
|
The argument \fIcflags\fP is either zero, or contains one or more of the bits
|
||||||
defined by the following macros:
|
defined by the following macros:
|
||||||
|
@ -100,6 +101,16 @@ matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
|
||||||
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
captured strings are returned. Versions of the PCRE library prior to 10.22 used
|
||||||
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
|
||||||
because it disables the use of back references.
|
because it disables the use of back references.
|
||||||
|
.sp
|
||||||
|
REG_PEND
|
||||||
|
.sp
|
||||||
|
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
|
||||||
|
(which has the type const char *) must be set to point to the character beyond
|
||||||
|
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
|
||||||
|
now contain binary zeroes, which are treated as data characters. Without
|
||||||
|
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
|
||||||
|
ignored. This is a GNU extension to the POSIX standard and should be used with
|
||||||
|
caution in software intended to be portable to other systems.
|
||||||
.sp
|
.sp
|
||||||
REG_UCP
|
REG_UCP
|
||||||
.sp
|
.sp
|
||||||
|
@ -130,9 +141,10 @@ newlines are matched by the dot metacharacter (they are not) or by a negative
|
||||||
class such as [^a] (they are).
|
class such as [^a] (they are).
|
||||||
.P
|
.P
|
||||||
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
|
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
|
||||||
\fIpreg\fP structure is filled in on success, and one member of the structure
|
\fIpreg\fP structure is filled in on success, and one other member of the
|
||||||
is public: \fIre_nsub\fP contains the number of capturing subpatterns in
|
structure (as well as \fIre_endp\fP) is public: \fIre_nsub\fP contains the
|
||||||
the regular expression. Various error codes are defined in the header file.
|
number of capturing subpatterns in the regular expression. Various error codes
|
||||||
|
are defined in the header file.
|
||||||
.P
|
.P
|
||||||
NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to
|
NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to
|
||||||
use the contents of the \fIpreg\fP structure. If, for example, you pass it to
|
use the contents of the \fIpreg\fP structure. If, for example, you pass it to
|
||||||
|
@ -204,21 +216,24 @@ function.
|
||||||
.sp
|
.sp
|
||||||
REG_STARTEND
|
REG_STARTEND
|
||||||
.sp
|
.sp
|
||||||
When this option is set, the string is considered to start at \fIstring\fP +
|
When this option is set, the subject string is starts at \fIstring\fP +
|
||||||
\fIpmatch[0].rm_so\fP and to have a terminating NUL located at \fIstring\fP +
|
\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
|
||||||
\fIpmatch[0].rm_eo\fP (there need not actually be a NUL at that location),
|
should point to the first character beyond the string. There may be binary
|
||||||
regardless of the value of \fInmatch\fP. However, the offsets of the matched
|
zeroes within the subject string, and indeed, using REG_STARTEND is the only
|
||||||
string and any captured substrings are still given relative to the start of
|
way to pass a subject string that contains a binary zero.
|
||||||
\fIstring\fP. (Before PCRE2 release 10.30 these were given relative to
|
.P
|
||||||
|
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
|
||||||
|
and any captured substrings are still given relative to the start of
|
||||||
|
\fIstring\fP itself. (Before PCRE2 release 10.30 these were given relative to
|
||||||
\fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other
|
\fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other
|
||||||
implementations.)
|
implementations.)
|
||||||
.P
|
.P
|
||||||
This is a BSD extension, compatible with but not specified by IEEE Standard
|
This is a BSD extension, compatible with but not specified by IEEE Standard
|
||||||
1003.2 (POSIX.2), and should be used with caution in software intended to be
|
1003.2 (POSIX.2), and should be used with caution in software intended to be
|
||||||
portable to other systems. Note that a non-zero \fIrm_so\fP does not imply
|
portable to other systems. Note that a non-zero \fIrm_so\fP does not imply
|
||||||
REG_NOTBOL; REG_STARTEND affects only the location of the string, not how it is
|
REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
|
||||||
matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL are mutually
|
not how it is matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL
|
||||||
exclusive; the error REG_INVARG is returned.
|
are mutually exclusive; the error REG_INVARG is returned.
|
||||||
.P
|
.P
|
||||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||||
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
|
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
|
||||||
|
@ -277,6 +292,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 03 June 2017
|
Last updated: 05 June 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -965,6 +965,17 @@ SUBJECT MODIFIERS
|
||||||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
||||||
The other modifiers are ignored, with a warning message.
|
The other modifiers are ignored, with a warning message.
|
||||||
|
|
||||||
|
There is one additional modifier that can be used with the POSIX wrap-
|
||||||
|
per. It is ignored (with a warning) if used for non-POSIX matching.
|
||||||
|
|
||||||
|
posix_startend=<n>[:<m>]
|
||||||
|
|
||||||
|
This causes the subject string to be passed to regexec() using the
|
||||||
|
REG_STARTEND option, which uses offsets to restrict which part of the
|
||||||
|
string is searched. If only one number is given, the end offset is
|
||||||
|
passed as the end of the subject string. For more detail of REG_STAR-
|
||||||
|
TEND, see the pcre2posix documentation.
|
||||||
|
|
||||||
Setting match controls
|
Setting match controls
|
||||||
|
|
||||||
The following modifiers affect the matching process or request addi-
|
The following modifiers affect the matching process or request addi-
|
||||||
|
@ -1651,5 +1662,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 01 June 2017
|
Last updated: 03 June 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
|
|
|
@ -231,10 +231,14 @@ PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
regcomp(regex_t *preg, const char *pattern, int cflags)
|
regcomp(regex_t *preg, const char *pattern, int cflags)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE erroffset;
|
PCRE2_SIZE erroffset;
|
||||||
|
PCRE2_SIZE patlen;
|
||||||
int errorcode;
|
int errorcode;
|
||||||
int options = 0;
|
int options = 0;
|
||||||
int re_nsub = 0;
|
int re_nsub = 0;
|
||||||
|
|
||||||
|
patlen = ((cflags & REG_PEND) != 0)? (PCRE2_SIZE)(preg->re_endp - pattern) :
|
||||||
|
PCRE2_ZERO_TERMINATED;
|
||||||
|
|
||||||
if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS;
|
if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS;
|
||||||
if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE;
|
if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE;
|
||||||
if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL;
|
if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL;
|
||||||
|
@ -243,8 +247,8 @@ if ((cflags & REG_UCP) != 0) options |= PCRE2_UCP;
|
||||||
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;
|
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;
|
||||||
|
|
||||||
preg->re_cflags = cflags;
|
preg->re_cflags = cflags;
|
||||||
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, PCRE2_ZERO_TERMINATED,
|
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, patlen, options,
|
||||||
options, &errorcode, &erroffset, NULL);
|
&errorcode, &erroffset, NULL);
|
||||||
preg->re_erroffset = erroffset;
|
preg->re_erroffset = erroffset;
|
||||||
|
|
||||||
if (preg->re_pcre2_code == NULL)
|
if (preg->re_pcre2_code == NULL)
|
||||||
|
|
|
@ -62,6 +62,7 @@ extern "C" {
|
||||||
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */
|
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */
|
||||||
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */
|
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */
|
||||||
#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */
|
#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */
|
||||||
|
#define REG_PEND 0x0800 /* GNU feature: pass end pattern by re_endp */
|
||||||
|
|
||||||
/* This is not used by PCRE2, but by defining it we make it easier
|
/* This is not used by PCRE2, but by defining it we make it easier
|
||||||
to slot PCRE2 into existing programs that make POSIX calls. */
|
to slot PCRE2 into existing programs that make POSIX calls. */
|
||||||
|
@ -91,11 +92,13 @@ enum {
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
/* The structure representing a compiled regular expression. */
|
/* The structure representing a compiled regular expression. It is also used
|
||||||
|
for passing the pattern end pointer when REG_PEND is set. */
|
||||||
|
|
||||||
typedef struct {
|
typedef struct {
|
||||||
void *re_pcre2_code;
|
void *re_pcre2_code;
|
||||||
void *re_match_data;
|
void *re_match_data;
|
||||||
|
const char *re_endp;
|
||||||
size_t re_nsub;
|
size_t re_nsub;
|
||||||
size_t re_erroffset;
|
size_t re_erroffset;
|
||||||
int re_cflags;
|
int re_cflags;
|
||||||
|
|
|
@ -699,7 +699,8 @@ static modstruct modlist[] = {
|
||||||
#define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0)
|
#define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0)
|
||||||
|
|
||||||
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
|
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
|
||||||
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX|CTL_POSIX_NOSUB)
|
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_HEXPAT|CTL_POSIX| \
|
||||||
|
CTL_POSIX_NOSUB|CTL_USE_LENGTH)
|
||||||
|
|
||||||
#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0)
|
#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0)
|
||||||
|
|
||||||
|
@ -733,11 +734,9 @@ the first control word. Note that CTL_POSIX_NOSUB is always accompanied by
|
||||||
CTL_POSIX, so it doesn't need its own entries. */
|
CTL_POSIX, so it doesn't need its own entries. */
|
||||||
|
|
||||||
static uint32_t exclusive_pat_controls[] = {
|
static uint32_t exclusive_pat_controls[] = {
|
||||||
CTL_POSIX | CTL_HEXPAT,
|
|
||||||
CTL_POSIX | CTL_PUSH,
|
CTL_POSIX | CTL_PUSH,
|
||||||
CTL_POSIX | CTL_PUSHCOPY,
|
CTL_POSIX | CTL_PUSHCOPY,
|
||||||
CTL_POSIX | CTL_PUSHTABLESCOPY,
|
CTL_POSIX | CTL_PUSHTABLESCOPY,
|
||||||
CTL_POSIX | CTL_USE_LENGTH,
|
|
||||||
CTL_PUSH | CTL_PUSHCOPY,
|
CTL_PUSH | CTL_PUSHCOPY,
|
||||||
CTL_PUSH | CTL_PUSHTABLESCOPY,
|
CTL_PUSH | CTL_PUSHTABLESCOPY,
|
||||||
CTL_PUSHCOPY | CTL_PUSHTABLESCOPY,
|
CTL_PUSHCOPY | CTL_PUSHTABLESCOPY,
|
||||||
|
@ -896,7 +895,7 @@ static PCRE2_SIZE malloclistlength[MALLOCLISTSIZE];
|
||||||
static uint32_t malloclistptr = 0;
|
static uint32_t malloclistptr = 0;
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
static regex_t preg = { NULL, NULL, 0, 0, 0 };
|
static regex_t preg = { NULL, NULL, 0, 0, 0, 0 };
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
static int *dfa_workspace = NULL;
|
static int *dfa_workspace = NULL;
|
||||||
|
@ -5264,6 +5263,12 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
||||||
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
|
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
|
||||||
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
|
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
|
||||||
|
|
||||||
|
if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
|
||||||
|
{
|
||||||
|
preg.re_endp = (char *)pbuffer8 + patlen;
|
||||||
|
cflags |= REG_PEND;
|
||||||
|
}
|
||||||
|
|
||||||
rc = regcomp(&preg, (char *)pbuffer8, cflags);
|
rc = regcomp(&preg, (char *)pbuffer8, cflags);
|
||||||
|
|
||||||
/* Compiling failed */
|
/* Compiling failed */
|
||||||
|
|
|
@ -123,4 +123,10 @@
|
||||||
/^a\x{00}b$/posix
|
/^a\x{00}b$/posix
|
||||||
a\x{00}b\=posix_startend=0:3
|
a\x{00}b\=posix_startend=0:3
|
||||||
|
|
||||||
|
/"A" 00 "B"/hex
|
||||||
|
A\x{00}B\=posix_startend=0:3
|
||||||
|
|
||||||
|
/ABC/use_length
|
||||||
|
ABC
|
||||||
|
|
||||||
# End of testdata/testinput18
|
# End of testdata/testinput18
|
||||||
|
|
|
@ -15,4 +15,7 @@
|
||||||
/\w/ucp
|
/\w/ucp
|
||||||
+++\x{c2}
|
+++\x{c2}
|
||||||
|
|
||||||
|
/"^AB" 00 "\x{1234}$"/hex,utf
|
||||||
|
AB\x{00}\x{1234}\=posix_startend=0:6
|
||||||
|
|
||||||
# End of testdata/testinput19
|
# End of testdata/testinput19
|
||||||
|
|
|
@ -191,4 +191,12 @@ No match: POSIX code 17: match failed
|
||||||
a\x{00}b\=posix_startend=0:3
|
a\x{00}b\=posix_startend=0:3
|
||||||
0: a\x00b
|
0: a\x00b
|
||||||
|
|
||||||
|
/"A" 00 "B"/hex
|
||||||
|
A\x{00}B\=posix_startend=0:3
|
||||||
|
0: A\x00B
|
||||||
|
|
||||||
|
/ABC/use_length
|
||||||
|
ABC
|
||||||
|
0: ABC
|
||||||
|
|
||||||
# End of testdata/testinput18
|
# End of testdata/testinput18
|
||||||
|
|
|
@ -18,4 +18,8 @@ No match: POSIX code 17: match failed
|
||||||
+++\x{c2}
|
+++\x{c2}
|
||||||
0: \xc2
|
0: \xc2
|
||||||
|
|
||||||
|
/"^AB" 00 "\x{1234}$"/hex,utf
|
||||||
|
AB\x{00}\x{1234}\=posix_startend=0:6
|
||||||
|
0: AB\x{00}\x{1234}
|
||||||
|
|
||||||
# End of testdata/testinput19
|
# End of testdata/testinput19
|
||||||
|
|
Loading…
Reference in New Issue