Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.

This commit is contained in:
Philip.Hazel 2017-06-01 18:10:15 +00:00
parent c0902e176f
commit e3a0f22349
16 changed files with 206 additions and 50 deletions

View File

@ -27,10 +27,11 @@ DESCRIPTION
</b><br> </b><br>
<P> <P>
This function sets additional option bits for <b>pcre2_compile()</b> that are This function sets additional option bits for <b>pcre2_compile()</b> that are
housed in a compile context. It completely replaces all the bits. The extra housed in a compile context. It completely replaces all the bits. The extra
options are: options are:
<pre> <pre>
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
</pre> </pre>
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the
<a href="pcre2api.html"><b>pcre2api</b></a> <a href="pcre2api.html"><b>pcre2api</b></a>

View File

@ -1706,6 +1706,24 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
incorporated in the compiled pattern. However, they can only match subject incorporated in the compiled pattern. However, they can only match subject
characters if the matching function is called with PCRE2_NO_UTF_CHECK set. characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
<pre>
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
</pre>
This is a dangerous option. Use with care. By default, an unrecognized escape
such as \j or a malformed one such as \x{2z} causes a compile-time error when
detected by <b>pcre2_compile()</b>. Perl is somewhat inconsistent in handling
such items: for example, \j is treated as a literal "j", and non-hexadecimal
digits in \x{} are just ignored, though warnings are given in both cases if
Perl's warning switch is enabled. However, a malformed octal number after \o{
always causes an error in Perl.
</P>
<P>
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
<b>pcre2_compile()</b>, all unrecognized or erroneous escape sequences are
treated as single-character escapes. For example, \j is a literal "j" and
\x{2z} is treated as the literal string "x{2z}". Setting this option means
that typos in patterns may go undetected and have unexpected results. This is a
dangerous option. Use with care.
</P> </P>
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br> <br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
<P> <P>
@ -3471,7 +3489,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 30 May 2017 Last updated: 01 June 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -577,6 +577,7 @@ for a description of the effects of these options.
alt_verbnames set PCRE2_ALT_VERBNAMES alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED anchored set PCRE2_ANCHORED
auto_callout set PCRE2_AUTO_CALLOUT auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
/i caseless set PCRE2_CASELESS /i caseless set PCRE2_CASELESS
dollar_endonly set PCRE2_DOLLAR_ENDONLY dollar_endonly set PCRE2_DOLLAR_ENDONLY
/s dotall set PCRE2_DOTALL /s dotall set PCRE2_DOTALL
@ -1816,7 +1817,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 26 May 2017 Last updated: 01 June 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -1688,6 +1688,24 @@ COMPILING A PATTERN
only match subject characters if the matching function is called with only match subject characters if the matching function is called with
PCRE2_NO_UTF_CHECK set. PCRE2_NO_UTF_CHECK set.
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
This is a dangerous option. Use with care. By default, an unrecognized
escape such as \j or a malformed one such as \x{2z} causes a compile-
time error when detected by pcre2_compile(). Perl is somewhat inconsis-
tent in handling such items: for example, \j is treated as a literal
"j", and non-hexadecimal digits in \x{} are just ignored, though warn-
ings are given in both cases if Perl's warning switch is enabled. How-
ever, a malformed octal number after \o{ always causes an error in
Perl.
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
pcre2_compile(), all unrecognized or erroneous escape sequences are
treated as single-character escapes. For example, \j is a literal "j"
and \x{2z} is treated as the literal string "x{2z}". Setting this
option means that typos in patterns may go undetected and have unex-
pected results. This is a dangerous option. Use with care.
COMPILATION ERROR CODES COMPILATION ERROR CODES
@ -3350,7 +3368,7 @@ AUTHOR
REVISION REVISION
Last updated: 30 May 2017 Last updated: 01 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "17 May 2017" "PCRE2 10.30" .TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "01 June 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -15,12 +15,15 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.rs .rs
.sp .sp
This function sets additional option bits for \fBpcre2_compile()\fP that are This function sets additional option bits for \fBpcre2_compile()\fP that are
housed in a compile context. It completely replaces all the bits. The extra housed in a compile context. It completely replaces all the bits. The extra
options are: options are:
.sp .sp
.\" JOIN .\" JOIN
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \ex{df800} to \ex{dfff} PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \ex{df800} to \ex{dfff}
in UTF-8 and UTF-32 modes in UTF-8 and UTF-32 modes
.\" JOIN
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as
a literal following character
.sp .sp
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the
.\" HREF .\" HREF

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "30 May 2017" "PCRE2 10.30" .TH PCRE2API 3 "01 June 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -1661,6 +1661,23 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
point values in UTF-8 and UTF-32 patterns no longer provoke errors and are point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
incorporated in the compiled pattern. However, they can only match subject incorporated in the compiled pattern. However, they can only match subject
characters if the matching function is called with PCRE2_NO_UTF_CHECK set. characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
.sp
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
.sp
This is a dangerous option. Use with care. By default, an unrecognized escape
such as \ej or a malformed one such as \ex{2z} causes a compile-time error when
detected by \fBpcre2_compile()\fP. Perl is somewhat inconsistent in handling
such items: for example, \ej is treated as a literal "j", and non-hexadecimal
digits in \ex{} are just ignored, though warnings are given in both cases if
Perl's warning switch is enabled. However, a malformed octal number after \eo{
always causes an error in Perl.
.P
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
\fBpcre2_compile()\fP, all unrecognized or erroneous escape sequences are
treated as single-character escapes. For example, \ej is a literal "j" and
\ex{2z} is treated as the literal string "x{2z}". Setting this option means
that typos in patterns may go undetected and have unexpected results. This is a
dangerous option. Use with care.
. .
. .
.SH "COMPILATION ERROR CODES" .SH "COMPILATION ERROR CODES"
@ -3491,6 +3508,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 30 May 2017 Last updated: 01 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "26 May 2017" "PCRE 10.30" .TH PCRE2TEST 1 "01 June 2017" "PCRE 10.30"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -539,6 +539,7 @@ for a description of the effects of these options.
alt_verbnames set PCRE2_ALT_VERBNAMES alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED anchored set PCRE2_ANCHORED
auto_callout set PCRE2_AUTO_CALLOUT auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
/i caseless set PCRE2_CASELESS /i caseless set PCRE2_CASELESS
dollar_endonly set PCRE2_DOLLAR_ENDONLY dollar_endonly set PCRE2_DOLLAR_ENDONLY
/s dotall set PCRE2_DOTALL /s dotall set PCRE2_DOTALL
@ -1792,6 +1793,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 26 May 2017 Last updated: 01 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -521,6 +521,7 @@ PATTERN MODIFIERS
alt_verbnames set PCRE2_ALT_VERBNAMES alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED anchored set PCRE2_ANCHORED
auto_callout set PCRE2_AUTO_CALLOUT auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
/i caseless set PCRE2_CASELESS /i caseless set PCRE2_CASELESS
dollar_endonly set PCRE2_DOLLAR_ENDONLY dollar_endonly set PCRE2_DOLLAR_ENDONLY
/s dotall set PCRE2_DOTALL /s dotall set PCRE2_DOTALL
@ -1650,5 +1651,5 @@ AUTHOR
REVISION REVISION
Last updated: 26 May 2017 Last updated: 01 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.

View File

@ -142,6 +142,7 @@ D is inspected during pcre2_dfa_match() execution
/* An additional compile options word is available in the compile context. */ /* An additional compile options word is available in the compile context. */
#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */ #define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */
#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */
/* These are for pcre2_jit_compile(). */ /* These are for pcre2_jit_compile(). */

View File

@ -142,6 +142,7 @@ D is inspected during pcre2_dfa_match() execution
/* An additional compile options word is available in the compile context. */ /* An additional compile options word is available in the compile context. */
#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */ #define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */
#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */
/* These are for pcre2_jit_compile(). */ /* These are for pcre2_jit_compile(). */

View File

@ -2591,11 +2591,23 @@ while (ptr < ptrend)
/* ---- Escape sequence ---- */ /* ---- Escape sequence ---- */
case CHAR_BACKSLASH: case CHAR_BACKSLASH:
tempptr = ptr;
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options, escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
FALSE, cb); FALSE, cb);
if (errorcode != 0) goto FAILED; if (errorcode != 0)
{
ESCAPE_FAILED:
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
goto FAILED;
ptr = tempptr;
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
{
GETCHARINCTEST(c, ptr); /* Get character value, increment pointer */
}
escape = 0; /* Treat as literal character */
}
/* The escape was a data character. */ /* The escape was a data escape or literal character. */
if (escape == 0) if (escape == 0)
{ {
@ -2647,12 +2659,12 @@ while (ptr < ptrend)
case ESC_C: case ESC_C:
#ifdef NEVER_BACKSLASH_C #ifdef NEVER_BACKSLASH_C
errorcode = ERR85; errorcode = ERR85;
goto FAILED; goto ESCAPE_FAILED;
#else #else
if ((options & PCRE2_NEVER_BACKSLASH_C) != 0) if ((options & PCRE2_NEVER_BACKSLASH_C) != 0)
{ {
errorcode = ERR83; errorcode = ERR83;
goto FAILED; goto ESCAPE_FAILED;
} }
#endif #endif
okquantifier = TRUE; okquantifier = TRUE;
@ -2662,7 +2674,7 @@ while (ptr < ptrend)
case ESC_X: case ESC_X:
#ifndef SUPPORT_UNICODE #ifndef SUPPORT_UNICODE
errorcode = ERR45; /* Supported only with Unicode support */ errorcode = ERR45; /* Supported only with Unicode support */
goto FAILED; goto ESCAPE_FAILED;
#endif #endif
case ESC_H: case ESC_H:
case ESC_h: case ESC_h:
@ -2727,7 +2739,7 @@ while (ptr < ptrend)
BOOL negated; BOOL negated;
uint16_t ptype = 0, pdata = 0; uint16_t ptype = 0, pdata = 0;
if (!get_ucp(&ptr, &negated, &ptype, &pdata, &errorcode, cb)) if (!get_ucp(&ptr, &negated, &ptype, &pdata, &errorcode, cb))
goto FAILED; goto ESCAPE_FAILED;
if (negated) escape = (escape == ESC_P)? ESC_p : ESC_P; if (negated) escape = (escape == ESC_P)? ESC_p : ESC_P;
*parsed_pattern++ = META_ESCAPE + escape; *parsed_pattern++ = META_ESCAPE + escape;
*parsed_pattern++ = (ptype << 16) | pdata; *parsed_pattern++ = (ptype << 16) | pdata;
@ -2735,7 +2747,7 @@ while (ptr < ptrend)
} }
#else #else
errorcode = ERR45; errorcode = ERR45;
goto FAILED; goto ESCAPE_FAILED;
#endif #endif
break; /* End \P and \p */ break; /* End \P and \p */
@ -2751,7 +2763,7 @@ while (ptr < ptrend)
*ptr != CHAR_LESS_THAN_SIGN && *ptr != CHAR_APOSTROPHE)) *ptr != CHAR_LESS_THAN_SIGN && *ptr != CHAR_APOSTROPHE))
{ {
errorcode = (escape == ESC_g)? ERR57 : ERR69; errorcode = (escape == ESC_g)? ERR57 : ERR69;
goto FAILED; goto ESCAPE_FAILED;
} }
terminator = (*ptr == CHAR_LESS_THAN_SIGN)? terminator = (*ptr == CHAR_LESS_THAN_SIGN)?
CHAR_GREATER_THAN_SIGN : (*ptr == CHAR_APOSTROPHE)? CHAR_GREATER_THAN_SIGN : (*ptr == CHAR_APOSTROPHE)?
@ -2769,18 +2781,18 @@ while (ptr < ptrend)
if (p >= ptrend || *p != terminator) if (p >= ptrend || *p != terminator)
{ {
errorcode = ERR57; errorcode = ERR57;
goto FAILED; goto ESCAPE_FAILED;
} }
ptr = p; ptr = p;
goto SET_RECURSION; goto SET_RECURSION;
} }
if (errorcode != 0) goto FAILED; if (errorcode != 0) goto ESCAPE_FAILED;
} }
/* Not a numerical recursion */ /* Not a numerical recursion */
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen, if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen,
&errorcode, cb)) goto FAILED; &errorcode, cb)) goto ESCAPE_FAILED;
/* \k and \g when used with braces are back references, whereas \g used /* \k and \g when used with braces are back references, whereas \g used
with quotes or angle brackets is a recursion */ with quotes or angle brackets is a recursion */
@ -2792,7 +2804,7 @@ while (ptr < ptrend)
PUTOFFSET(offset, parsed_pattern); PUTOFFSET(offset, parsed_pattern);
okquantifier = TRUE; okquantifier = TRUE;
break; break; /* End special escape processing */
} }
break; /* End escape sequence processing */ break; /* End escape sequence processing */
@ -3139,10 +3151,23 @@ while (ptr < ptrend)
else else
{ {
tempptr = ptr;
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
options, TRUE, cb); options, TRUE, cb);
if (errorcode != 0) goto FAILED; if (errorcode != 0)
{
CLASS_ESCAPE_FAILED:
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
goto FAILED;
ptr = tempptr;
if (ptr >= ptrend) c = CHAR_BACKSLASH; else
{
GETCHARINCTEST(c, ptr); /* Get character value, increment pointer */
}
escape = 0; /* Treat as literal character */
}
if (escape == 0) /* Escaped character code point is in c */ if (escape == 0) /* Escaped character code point is in c */
{ {
char_is_literal = FALSE; char_is_literal = FALSE;
@ -3176,7 +3201,7 @@ while (ptr < ptrend)
if (class_range_state == RANGE_STARTED) if (class_range_state == RANGE_STARTED)
{ {
errorcode = ERR50; errorcode = ERR50;
goto FAILED; goto CLASS_ESCAPE_FAILED;
} }
/* Of the remaining escapes, only those that define characters are /* Of the remaining escapes, only those that define characters are
@ -3187,7 +3212,7 @@ while (ptr < ptrend)
{ {
case ESC_N: case ESC_N:
errorcode = ERR71; /* Not supported in a class */ errorcode = ERR71; /* Not supported in a class */
goto FAILED; goto CLASS_ESCAPE_FAILED;
case ESC_H: case ESC_H:
case ESC_h: case ESC_h:
@ -3250,13 +3275,14 @@ while (ptr < ptrend)
} }
#else #else
errorcode = ERR45; errorcode = ERR45;
goto FAILED; goto CLASS_ESCAPE_FAILED;
#endif #endif
break; /* End \P and \p */ break; /* End \P and \p */
default: /* All others are not allowed in a class */ default: /* All others are not allowed in a class */
errorcode = ERR7; errorcode = ERR7;
goto FAILED_BACK; ptr--;
goto CLASS_ESCAPE_FAILED;
} }
} }

View File

@ -402,9 +402,9 @@ typedef struct convertstruct {
static convertstruct convertlist[] = { static convertstruct convertlist[] = {
{ "glob", PCRE2_CONVERT_GLOB }, { "glob", PCRE2_CONVERT_GLOB },
{ "glob_basic", PCRE2_CONVERT_GLOB_BASIC }, { "glob_basic", PCRE2_CONVERT_GLOB_BASIC },
{ "glob_ignore_dot_start", PCRE2_CONVERT_GLOB_IGNORE_DOT_START }, { "glob_ignore_dot_start", PCRE2_CONVERT_GLOB_IGNORE_DOT_START },
{ "glob_no_starstar", PCRE2_CONVERT_GLOB_NO_STARSTAR }, { "glob_no_starstar", PCRE2_CONVERT_GLOB_NO_STARSTAR },
{ "glob_no_wild_separator", PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR }, { "glob_no_wild_separator", PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR },
{ "posix_basic", PCRE2_CONVERT_POSIX_BASIC }, { "posix_basic", PCRE2_CONVERT_POSIX_BASIC },
{ "posix_extended", PCRE2_CONVERT_POSIX_EXTENDED }, { "posix_extended", PCRE2_CONVERT_POSIX_EXTENDED },
{ "unset", CONVERT_UNSET }}; { "unset", CONVERT_UNSET }};
@ -590,6 +590,7 @@ static modstruct modlist[] = {
{ "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) }, { "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) },
{ "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) }, { "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) },
{ "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) }, { "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) },
{ "bad_escape_is_literal", MOD_CTC, MOD_OPT, PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL, CO(extra_options) },
{ "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) }, { "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) },
{ "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) }, { "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) },
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) }, { "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
@ -692,8 +693,8 @@ static modstruct modlist[] = {
#define POSIX_SUPPORTED_COMPILE_OPTIONS ( \ #define POSIX_SUPPORTED_COMPILE_OPTIONS ( \
PCRE2_CASELESS|PCRE2_DOTALL|PCRE2_MULTILINE|PCRE2_UCP|PCRE2_UTF| \ PCRE2_CASELESS|PCRE2_DOTALL|PCRE2_MULTILINE|PCRE2_UCP|PCRE2_UTF| \
PCRE2_UNGREEDY) PCRE2_UNGREEDY)
#define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0) #define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0)
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \ #define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX|CTL_POSIX_NOSUB) CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX|CTL_POSIX_NOSUB)
@ -3701,7 +3702,7 @@ for (;;)
case MOD_CON: /* A convert type/options list */ case MOD_CON: /* A convert type/options list */
for (;; pp++) for (;; pp++)
{ {
uint8_t *colon = (uint8_t *)strchr((const char *)pp, ':'); uint8_t *colon = (uint8_t *)strchr((const char *)pp, ':');
len = ((colon != NULL && colon < ep)? colon:ep) - pp; len = ((colon != NULL && colon < ep)? colon:ep) - pp;
for (i = 0; i < convertlistcount; i++) for (i = 0; i < convertlistcount; i++)
@ -4073,13 +4074,14 @@ Returns: nothing
*/ */
static void static void
show_compile_extra_options(uint32_t options, const char *before, show_compile_extra_options(uint32_t options, const char *before,
const char *after) const char *after)
{ {
if (options == 0) fprintf(outfile, "%s <none>%s", before, after); if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
else fprintf(outfile, "%s%s%s", else fprintf(outfile, "%s%s%s%s",
before, before,
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "", ((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
after); after);
} }
@ -5225,14 +5227,14 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
msg = ""; msg = "";
} }
if ((FLD(pat_context, extra_options) & if ((FLD(pat_context, extra_options) &
~POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS) != 0) ~POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS) != 0)
{ {
show_compile_extra_options( show_compile_extra_options(
FLD(pat_context, extra_options) & ~POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS, FLD(pat_context, extra_options) & ~POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS,
msg, ""); msg, "");
msg = ""; msg = "";
} }
if ((pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS) != 0 || if ((pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS) != 0 ||
(pat_patctl.control2 & ~POSIX_SUPPORTED_COMPILE_CONTROLS2) != 0) (pat_patctl.control2 & ~POSIX_SUPPORTED_COMPILE_CONTROLS2) != 0)
@ -5246,8 +5248,8 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if (FLD(pat_context, max_pattern_length) != PCRE2_UNSET) if (FLD(pat_context, max_pattern_length) != PCRE2_UNSET)
prmsg(&msg, "max_pattern_length"); prmsg(&msg, "max_pattern_length");
if (FLD(pat_context, parens_nest_limit) != PARENS_NEST_DEFAULT) if (FLD(pat_context, parens_nest_limit) != PARENS_NEST_DEFAULT)
prmsg(&msg, "parens_nest_limit"); prmsg(&msg, "parens_nest_limit");
if (msg[0] == 0) fprintf(outfile, "\n"); if (msg[0] == 0) fprintf(outfile, "\n");
/* Translate PCRE2 options to POSIX options and then compile. */ /* Translate PCRE2 options to POSIX options and then compile. */
@ -5413,7 +5415,7 @@ if (pat_patctl.convert_type != CONVERT_UNSET)
if (pat_patctl.convert_glob_escape != 0) if (pat_patctl.convert_glob_escape != 0)
{ {
uint32_t escape = (pat_patctl.convert_glob_escape == '0')? 0 : uint32_t escape = (pat_patctl.convert_glob_escape == '0')? 0 :
pat_patctl.convert_glob_escape; pat_patctl.convert_glob_escape;
PCRE2_SET_GLOB_ESCAPE(rc, con_context, escape); PCRE2_SET_GLOB_ESCAPE(rc, con_context, escape);
if (rc != 0) if (rc != 0)
{ {
@ -7057,10 +7059,10 @@ else for (gmatched = 0;; gmatched++)
if ((dat_datctl.control & CTL_DFA) == 0 && if ((dat_datctl.control & CTL_DFA) == 0 &&
(FLD(compiled_code, executable_jit) == NULL || (FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0)) (dat_datctl.options & PCRE2_NO_JIT) != 0))
{ {
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap"); (void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
} }
capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT, capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT,
"match"); "match");

13
testdata/testinput2 vendored
View File

@ -5279,4 +5279,17 @@ a)"xI
/(a)(?-n:(b))(c)/nB /(a)(?-n:(b))(c)/nB
# ----------------------------------------------------------------------
# These test the dangerous PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL option.
/\j\x{z}\o{82}\L\uabcd\u\U\g{\g/B,\bad_escape_is_literal
/\N{\c/B,bad_escape_is_literal
/[\j\x{z}\o\gA-\Nb-\g]/B,bad_escape_is_literal
/[Q-\N]/B,bad_escape_is_literal
# ----------------------------------------------------------------------
# End of testinput2 # End of testinput2

9
testdata/testinput5 vendored
View File

@ -2015,6 +2015,13 @@
\= Expect no match \= Expect no match
X$ X$
# --------------------------------------------------------------------------- # ----------------------------------------------------------------------
# These test the dangerous PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL option.
/\x{d800}/B,utf,bad_escape_is_literal
/\ud800/B,utf,alt_bsux,bad_escape_is_literal
# ----------------------------------------------------------------------
# End of testinput5 # End of testinput5

27
testdata/testoutput2 vendored
View File

@ -15988,6 +15988,33 @@ Subject length lower bound = 1
End End
------------------------------------------------------------------ ------------------------------------------------------------------
# ----------------------------------------------------------------------
# These test the dangerous PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL option.
/\j\x{z}\o{82}\L\uabcd\u\U\g{\g/B,\bad_escape_is_literal
** Unrecognized modifier '\' in '\bad_escape_is_literal'
/\N{\c/B,bad_escape_is_literal
------------------------------------------------------------------
Bra
N{c
Ket
End
------------------------------------------------------------------
/[\j\x{z}\o\gA-\Nb-\g]/B,bad_escape_is_literal
------------------------------------------------------------------
Bra
[A-Nb-gjoxz{}]
Ket
End
------------------------------------------------------------------
/[Q-\N]/B,bad_escape_is_literal
Failed: error 108 at offset 4: range out of order in character class
# ----------------------------------------------------------------------
# End of testinput2 # End of testinput2
Error -65: PCRE2_ERROR_BADDATA (unknown error number) Error -65: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data Error -62: bad serialized data

21
testdata/testoutput5 vendored
View File

@ -4579,6 +4579,25 @@ No match
X$ X$
No match No match
# --------------------------------------------------------------------------- # ----------------------------------------------------------------------
# These test the dangerous PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL option.
/\x{d800}/B,utf,bad_escape_is_literal
------------------------------------------------------------------
Bra
x{d800}
Ket
End
------------------------------------------------------------------
/\ud800/B,utf,alt_bsux,bad_escape_is_literal
------------------------------------------------------------------
Bra
ud800
Ket
End
------------------------------------------------------------------
# ----------------------------------------------------------------------
# End of testinput5 # End of testinput5