Implement REG_PEND (GNU extension) for the POSIX wrapper.

This commit is contained in:
Philip.Hazel 2017-06-05 18:25:47 +00:00
parent f850015168
commit bcba497c0b
13 changed files with 447 additions and 327 deletions

View File

@ -182,6 +182,8 @@ deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
38. Fix returned offsets from regexec() when REG_STARTEND is used with a 38. Fix returned offsets from regexec() when REG_STARTEND is used with a
starting offset greater than zero. starting offset greater than zero.
39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
Version 10.23 14-February-2017 Version 10.23 14-February-2017
------------------------------ ------------------------------

View File

@ -69,7 +69,7 @@ replacement library. Other POSIX options are not even defined.
<P> <P>
There are also some options that are not defined by POSIX. These have been There are also some options that are not defined by POSIX. These have been
added at the request of users who want to make use of certain PCRE2-specific added at the request of users who want to make use of certain PCRE2-specific
features via the POSIX calling interface. features via the POSIX calling interface or to add BSD or GNU functionality.
</P> </P>
<P> <P>
When PCRE2 is called via these functions, it is only the API that is POSIX-like When PCRE2 is called via these functions, it is only the API that is POSIX-like
@ -91,10 +91,11 @@ identifying error codes.
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br> <br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
<P> <P>
The function <b>regcomp()</b> is called to compile a pattern into an The function <b>regcomp()</b> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and internal form. By default, the pattern is a C string terminated by a binary
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer zero (but see REG_PEND below). The <i>preg</i> argument is a pointer to a
to a <b>regex_t</b> structure that is used as a base for storing information <b>regex_t</b> structure that is used as a base for storing information about
about the compiled regular expression. the compiled regular expression. (It is also used for input when REG_PEND is
set.)
</P> </P>
<P> <P>
The argument <i>cflags</i> is either zero, or contains one or more of the bits The argument <i>cflags</i> is either zero, or contains one or more of the bits
@ -124,6 +125,16 @@ matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
captured strings are returned. Versions of the PCRE library prior to 10.22 used captured strings are returned. Versions of the PCRE library prior to 10.22 used
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
because it disables the use of back references. because it disables the use of back references.
<pre>
REG_PEND
</pre>
If this option is set, the <b>reg_endp</b> field in the <i>preg</i> structure
(which has the type const char *) must be set to point to the character beyond
the end of the pattern before calling <b>regcomp()</b>. The pattern itself may
now contain binary zeroes, which are treated as data characters. Without
REG_PEND, a binary zero terminates the pattern and the <b>re_endp</b> field is
ignored. This is a GNU extension to the POSIX standard and should be used with
caution in software intended to be portable to other systems.
<pre> <pre>
REG_UCP REG_UCP
</pre> </pre>
@ -156,9 +167,10 @@ class such as [^a] (they are).
</P> </P>
<P> <P>
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
<i>preg</i> structure is filled in on success, and one member of the structure <i>preg</i> structure is filled in on success, and one other member of the
is public: <i>re_nsub</i> contains the number of capturing subpatterns in structure (as well as <i>re_endp</i>) is public: <i>re_nsub</i> contains the
the regular expression. Various error codes are defined in the header file. number of capturing subpatterns in the regular expression. Various error codes
are defined in the header file.
</P> </P>
<P> <P>
NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
@ -228,15 +240,26 @@ function.
<pre> <pre>
REG_STARTEND REG_STARTEND
</pre> </pre>
The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and When this option is set, the subject string is starts at <i>string</i> +
to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i> <i>pmatch[0].rm_so</i> and ends at <i>string</i> + <i>pmatch[0].rm_eo</i>, which
(there need not actually be a NUL at that location), regardless of the value of should point to the first character beyond the string. There may be binary
<i>nmatch</i>. This is a BSD extension, compatible with but not specified by zeroes within the subject string, and indeed, using REG_STARTEND is the only
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software way to pass a subject string that contains a binary zero.
intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does </P>
not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not <P>
how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL are Whatever the value of <i>pmatch[0].rm_so</i>, the offsets of the matched string
mutually exclusive; the error REG_INVARG is returned. and any captured substrings are still given relative to the start of
<i>string</i> itself. (Before PCRE2 release 10.30 these were given relative to
<i>string</i> + <i>pmatch[0].rm_so</i>, but this differs from other
implementations.)
</P>
<P>
This is a BSD extension, compatible with but not specified by IEEE Standard
1003.2 (POSIX.2), and should be used with caution in software intended to be
portable to other systems. Note that a non-zero <i>rm_so</i> does not imply
REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
not how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL
are mutually exclusive; the error REG_INVARG is returned.
</P> </P>
<P> <P>
If the pattern was compiled with the REG_NOSUB flag, no data about any matched If the pattern was compiled with the REG_NOSUB flag, no data about any matched
@ -291,9 +314,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC9" href="#TOC1">REVISION</a><br> <br><a name="SEC9" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 31 January 2016 Last updated: 05 June 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -1078,6 +1078,19 @@ are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>. REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
The other modifiers are ignored, with a warning message. The other modifiers are ignored, with a warning message.
</P> </P>
<P>
There is one additional modifier that can be used with the POSIX wrapper. It is
ignored (with a warning) if used for non-POSIX matching.
<pre>
posix_startend=&#60;n&#62;[:&#60;m&#62;]
</pre>
This causes the subject string to be passed to <b>regexec()</b> using the
REG_STARTEND option, which uses offsets to restrict which part of the string is
searched. If only one number is given, the end offset is passed as the end of
the subject string. For more detail of REG_STARTEND, see the
<a href="pcre2posix.html"><b>pcre2posix</b></a>
documentation.
</P>
<br><b> <br><b>
Setting match controls Setting match controls
</b><br> </b><br>
@ -1817,7 +1830,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 01 June 2017 Last updated: 03 June 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -8986,7 +8986,8 @@ DESCRIPTION
There are also some options that are not defined by POSIX. These have There are also some options that are not defined by POSIX. These have
been added at the request of users who want to make use of certain been added at the request of users who want to make use of certain
PCRE2-specific features via the POSIX calling interface. PCRE2-specific features via the POSIX calling interface or to add BSD
or GNU functionality.
When PCRE2 is called via these functions, it is only the API that is When PCRE2 is called via these functions, it is only the API that is
POSIX-like in style. The syntax and semantics of the regular expres- POSIX-like in style. The syntax and semantics of the regular expres-
@ -9008,10 +9009,11 @@ DESCRIPTION
COMPILING A PATTERN COMPILING A PATTERN
The function regcomp() is called to compile a pattern into an internal The function regcomp() is called to compile a pattern into an internal
form. The pattern is a C string terminated by a binary zero, and is form. By default, the pattern is a C string terminated by a binary zero
passed in the argument pattern. The preg argument is a pointer to a (but see REG_PEND below). The preg argument is a pointer to a regex_t
regex_t structure that is used as a base for storing information about structure that is used as a base for storing information about the com-
the compiled regular expression. piled regular expression. (It is also used for input when REG_PEND is
set.)
The argument cflags is either zero, or contains one or more of the bits The argument cflags is either zero, or contains one or more of the bits
defined by the following macros: defined by the following macros:
@ -9042,6 +9044,17 @@ COMPILING A PATTERN
used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no
longer happens because it disables the use of back references. longer happens because it disables the use of back references.
REG_PEND
If this option is set, the reg_endp field in the preg structure (which
has the type const char *) must be set to point to the character beyond
the end of the pattern before calling regcomp(). The pattern itself may
now contain binary zeroes, which are treated as data characters. With-
out REG_PEND, a binary zero terminates the pattern and the re_endp
field is ignored. This is a GNU extension to the POSIX standard and
should be used with caution in software intended to be portable to
other systems.
REG_UCP REG_UCP
The PCRE2_UCP option is set when the regular expression is passed for The PCRE2_UCP option is set when the regular expression is passed for
@ -9071,9 +9084,10 @@ COMPILING A PATTERN
ter (they are not) or by a negative class such as [^a] (they are). ter (they are not) or by a negative class such as [^a] (they are).
The yield of regcomp() is zero on success, and non-zero otherwise. The The yield of regcomp() is zero on success, and non-zero otherwise. The
preg structure is filled in on success, and one member of the structure preg structure is filled in on success, and one other member of the
is public: re_nsub contains the number of capturing subpatterns in the structure (as well as re_endp) is public: re_nsub contains the number
regular expression. Various error codes are defined in the header file. of capturing subpatterns in the regular expression. Various error codes
are defined in the header file.
NOTE: If the yield of regcomp() is non-zero, you must not attempt to NOTE: If the yield of regcomp() is non-zero, you must not attempt to
use the contents of the preg structure. If, for example, you pass it to use the contents of the preg structure. If, for example, you pass it to
@ -9146,15 +9160,24 @@ MATCHING A PATTERN
REG_STARTEND REG_STARTEND
The string is considered to start at string + pmatch[0].rm_so and to When this option is set, the subject string is starts at string +
have a terminating NUL located at string + pmatch[0].rm_eo (there need pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should
not actually be a NUL at that location), regardless of the value of point to the first character beyond the string. There may be binary
nmatch. This is a BSD extension, compatible with but not specified by zeroes within the subject string, and indeed, using REG_STARTEND is the
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in only way to pass a subject string that contains a binary zero.
software intended to be portable to other systems. Note that a non-zero
rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location Whatever the value of pmatch[0].rm_so, the offsets of the matched
of the string, not how it is matched. Setting REG_STARTEND and passing string and any captured substrings are still given relative to the
pmatch as NULL are mutually exclusive; the error REG_INVARG is start of string itself. (Before PCRE2 release 10.30 these were given
relative to string + pmatch[0].rm_so, but this differs from other
implementations.)
This is a BSD extension, compatible with but not specified by IEEE
Standard 1003.2 (POSIX.2), and should be used with caution in software
intended to be portable to other systems. Note that a non-zero rm_so
does not imply REG_NOTBOL; REG_STARTEND affects only the location and
length of the string, not how it is matched. Setting REG_STARTEND and
passing pmatch as NULL are mutually exclusive; the error REG_INVARG is
returned. returned.
If the pattern was compiled with the REG_NOSUB flag, no data about any If the pattern was compiled with the REG_NOSUB flag, no data about any
@ -9209,8 +9232,8 @@ AUTHOR
REVISION REVISION
Last updated: 31 January 2016 Last updated: 05 June 2017
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2POSIX 3 "03 June 2017" "PCRE2 10.30" .TH PCRE2POSIX 3 "05 June 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "SYNOPSIS" .SH "SYNOPSIS"
@ -46,7 +46,7 @@ replacement library. Other POSIX options are not even defined.
.P .P
There are also some options that are not defined by POSIX. These have been There are also some options that are not defined by POSIX. These have been
added at the request of users who want to make use of certain PCRE2-specific added at the request of users who want to make use of certain PCRE2-specific
features via the POSIX calling interface. features via the POSIX calling interface or to add BSD or GNU functionality.
.P .P
When PCRE2 is called via these functions, it is only the API that is POSIX-like When PCRE2 is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are in style. The syntax and semantics of the regular expressions themselves are
@ -68,10 +68,11 @@ identifying error codes.
.rs .rs
.sp .sp
The function \fBregcomp()\fP is called to compile a pattern into an The function \fBregcomp()\fP is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and internal form. By default, the pattern is a C string terminated by a binary
is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer zero (but see REG_PEND below). The \fIpreg\fP argument is a pointer to a
to a \fBregex_t\fP structure that is used as a base for storing information \fBregex_t\fP structure that is used as a base for storing information about
about the compiled regular expression. the compiled regular expression. (It is also used for input when REG_PEND is
set.)
.P .P
The argument \fIcflags\fP is either zero, or contains one or more of the bits The argument \fIcflags\fP is either zero, or contains one or more of the bits
defined by the following macros: defined by the following macros:
@ -100,6 +101,16 @@ matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
captured strings are returned. Versions of the PCRE library prior to 10.22 used captured strings are returned. Versions of the PCRE library prior to 10.22 used
to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
because it disables the use of back references. because it disables the use of back references.
.sp
REG_PEND
.sp
If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure
(which has the type const char *) must be set to point to the character beyond
the end of the pattern before calling \fBregcomp()\fP. The pattern itself may
now contain binary zeroes, which are treated as data characters. Without
REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is
ignored. This is a GNU extension to the POSIX standard and should be used with
caution in software intended to be portable to other systems.
.sp .sp
REG_UCP REG_UCP
.sp .sp
@ -130,9 +141,10 @@ newlines are matched by the dot metacharacter (they are not) or by a negative
class such as [^a] (they are). class such as [^a] (they are).
.P .P
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
\fIpreg\fP structure is filled in on success, and one member of the structure \fIpreg\fP structure is filled in on success, and one other member of the
is public: \fIre_nsub\fP contains the number of capturing subpatterns in structure (as well as \fIre_endp\fP) is public: \fIre_nsub\fP contains the
the regular expression. Various error codes are defined in the header file. number of capturing subpatterns in the regular expression. Various error codes
are defined in the header file.
.P .P
NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to
use the contents of the \fIpreg\fP structure. If, for example, you pass it to use the contents of the \fIpreg\fP structure. If, for example, you pass it to
@ -204,21 +216,24 @@ function.
.sp .sp
REG_STARTEND REG_STARTEND
.sp .sp
When this option is set, the string is considered to start at \fIstring\fP + When this option is set, the subject string is starts at \fIstring\fP +
\fIpmatch[0].rm_so\fP and to have a terminating NUL located at \fIstring\fP + \fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which
\fIpmatch[0].rm_eo\fP (there need not actually be a NUL at that location), should point to the first character beyond the string. There may be binary
regardless of the value of \fInmatch\fP. However, the offsets of the matched zeroes within the subject string, and indeed, using REG_STARTEND is the only
string and any captured substrings are still given relative to the start of way to pass a subject string that contains a binary zero.
\fIstring\fP. (Before PCRE2 release 10.30 these were given relative to .P
Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string
and any captured substrings are still given relative to the start of
\fIstring\fP itself. (Before PCRE2 release 10.30 these were given relative to
\fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other \fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other
implementations.) implementations.)
.P .P
This is a BSD extension, compatible with but not specified by IEEE Standard This is a BSD extension, compatible with but not specified by IEEE Standard
1003.2 (POSIX.2), and should be used with caution in software intended to be 1003.2 (POSIX.2), and should be used with caution in software intended to be
portable to other systems. Note that a non-zero \fIrm_so\fP does not imply portable to other systems. Note that a non-zero \fIrm_so\fP does not imply
REG_NOTBOL; REG_STARTEND affects only the location of the string, not how it is REG_NOTBOL; REG_STARTEND affects only the location and length of the string,
matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL are mutually not how it is matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL
exclusive; the error REG_INVARG is returned. are mutually exclusive; the error REG_INVARG is returned.
.P .P
If the pattern was compiled with the REG_NOSUB flag, no data about any matched If the pattern was compiled with the REG_NOSUB flag, no data about any matched
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
@ -277,6 +292,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 03 June 2017 Last updated: 05 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -965,6 +965,17 @@ SUBJECT MODIFIERS
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec(). REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
The other modifiers are ignored, with a warning message. The other modifiers are ignored, with a warning message.
There is one additional modifier that can be used with the POSIX wrap-
per. It is ignored (with a warning) if used for non-POSIX matching.
posix_startend=<n>[:<m>]
This causes the subject string to be passed to regexec() using the
REG_STARTEND option, which uses offsets to restrict which part of the
string is searched. If only one number is given, the end offset is
passed as the end of the subject string. For more detail of REG_STAR-
TEND, see the pcre2posix documentation.
Setting match controls Setting match controls
The following modifiers affect the matching process or request addi- The following modifiers affect the matching process or request addi-
@ -1651,5 +1662,5 @@ AUTHOR
REVISION REVISION
Last updated: 01 June 2017 Last updated: 03 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.

View File

@ -231,10 +231,14 @@ PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION
regcomp(regex_t *preg, const char *pattern, int cflags) regcomp(regex_t *preg, const char *pattern, int cflags)
{ {
PCRE2_SIZE erroffset; PCRE2_SIZE erroffset;
PCRE2_SIZE patlen;
int errorcode; int errorcode;
int options = 0; int options = 0;
int re_nsub = 0; int re_nsub = 0;
patlen = ((cflags & REG_PEND) != 0)? (PCRE2_SIZE)(preg->re_endp - pattern) :
PCRE2_ZERO_TERMINATED;
if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS; if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS;
if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE; if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE;
if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL; if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL;
@ -243,8 +247,8 @@ if ((cflags & REG_UCP) != 0) options |= PCRE2_UCP;
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY; if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;
preg->re_cflags = cflags; preg->re_cflags = cflags;
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, PCRE2_ZERO_TERMINATED, preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, patlen, options,
options, &errorcode, &erroffset, NULL); &errorcode, &erroffset, NULL);
preg->re_erroffset = erroffset; preg->re_erroffset = erroffset;
if (preg->re_pcre2_code == NULL) if (preg->re_pcre2_code == NULL)

View File

@ -62,6 +62,7 @@ extern "C" {
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */ #define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */ #define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */
#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */ #define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */
#define REG_PEND 0x0800 /* GNU feature: pass end pattern by re_endp */
/* This is not used by PCRE2, but by defining it we make it easier /* This is not used by PCRE2, but by defining it we make it easier
to slot PCRE2 into existing programs that make POSIX calls. */ to slot PCRE2 into existing programs that make POSIX calls. */
@ -91,11 +92,13 @@ enum {
}; };
/* The structure representing a compiled regular expression. */ /* The structure representing a compiled regular expression. It is also used
for passing the pattern end pointer when REG_PEND is set. */
typedef struct { typedef struct {
void *re_pcre2_code; void *re_pcre2_code;
void *re_match_data; void *re_match_data;
const char *re_endp;
size_t re_nsub; size_t re_nsub;
size_t re_erroffset; size_t re_erroffset;
int re_cflags; int re_cflags;

View File

@ -699,7 +699,8 @@ static modstruct modlist[] = {
#define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0) #define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0)
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \ #define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX|CTL_POSIX_NOSUB) CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_HEXPAT|CTL_POSIX| \
CTL_POSIX_NOSUB|CTL_USE_LENGTH)
#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0) #define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0)
@ -733,11 +734,9 @@ the first control word. Note that CTL_POSIX_NOSUB is always accompanied by
CTL_POSIX, so it doesn't need its own entries. */ CTL_POSIX, so it doesn't need its own entries. */
static uint32_t exclusive_pat_controls[] = { static uint32_t exclusive_pat_controls[] = {
CTL_POSIX | CTL_HEXPAT,
CTL_POSIX | CTL_PUSH, CTL_POSIX | CTL_PUSH,
CTL_POSIX | CTL_PUSHCOPY, CTL_POSIX | CTL_PUSHCOPY,
CTL_POSIX | CTL_PUSHTABLESCOPY, CTL_POSIX | CTL_PUSHTABLESCOPY,
CTL_POSIX | CTL_USE_LENGTH,
CTL_PUSH | CTL_PUSHCOPY, CTL_PUSH | CTL_PUSHCOPY,
CTL_PUSH | CTL_PUSHTABLESCOPY, CTL_PUSH | CTL_PUSHTABLESCOPY,
CTL_PUSHCOPY | CTL_PUSHTABLESCOPY, CTL_PUSHCOPY | CTL_PUSHTABLESCOPY,
@ -896,7 +895,7 @@ static PCRE2_SIZE malloclistlength[MALLOCLISTSIZE];
static uint32_t malloclistptr = 0; static uint32_t malloclistptr = 0;
#ifdef SUPPORT_PCRE2_8 #ifdef SUPPORT_PCRE2_8
static regex_t preg = { NULL, NULL, 0, 0, 0 }; static regex_t preg = { NULL, NULL, 0, 0, 0, 0 };
#endif #endif
static int *dfa_workspace = NULL; static int *dfa_workspace = NULL;
@ -5264,6 +5263,12 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL; if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY; if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
{
preg.re_endp = (char *)pbuffer8 + patlen;
cflags |= REG_PEND;
}
rc = regcomp(&preg, (char *)pbuffer8, cflags); rc = regcomp(&preg, (char *)pbuffer8, cflags);
/* Compiling failed */ /* Compiling failed */

View File

@ -123,4 +123,10 @@
/^a\x{00}b$/posix /^a\x{00}b$/posix
a\x{00}b\=posix_startend=0:3 a\x{00}b\=posix_startend=0:3
/"A" 00 "B"/hex
A\x{00}B\=posix_startend=0:3
/ABC/use_length
ABC
# End of testdata/testinput18 # End of testdata/testinput18

View File

@ -15,4 +15,7 @@
/\w/ucp /\w/ucp
+++\x{c2} +++\x{c2}
/"^AB" 00 "\x{1234}$"/hex,utf
AB\x{00}\x{1234}\=posix_startend=0:6
# End of testdata/testinput19 # End of testdata/testinput19

View File

@ -191,4 +191,12 @@ No match: POSIX code 17: match failed
a\x{00}b\=posix_startend=0:3 a\x{00}b\=posix_startend=0:3
0: a\x00b 0: a\x00b
/"A" 00 "B"/hex
A\x{00}B\=posix_startend=0:3
0: A\x00B
/ABC/use_length
ABC
0: ABC
# End of testdata/testinput18 # End of testdata/testinput18

View File

@ -18,4 +18,8 @@ No match: POSIX code 17: match failed
+++\x{c2} +++\x{c2}
0: \xc2 0: \xc2
/"^AB" 00 "\x{1234}$"/hex,utf
AB\x{00}\x{1234}\=posix_startend=0:6
0: AB\x{00}\x{1234}
# End of testdata/testinput19 # End of testdata/testinput19