Fix issues with BAD_ESCAPE_IS_LITERAL in character classes.
This commit is contained in:
parent
8f165d376e
commit
7de013bac3
10
ChangeLog
10
ChangeLog
|
@ -102,6 +102,16 @@ for the stack as it needs for -bigstack.
|
||||||
|
|
||||||
26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning.
|
26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning.
|
||||||
|
|
||||||
|
26. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s
|
||||||
|
which are valid in character classes, but not as the end of ranges, were being
|
||||||
|
treated as literals. An example is [_-\s] (but not [\s-_] because that gave an
|
||||||
|
error at the *start* of a range). Now an "invalid range" error is given
|
||||||
|
independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
|
||||||
|
|
||||||
|
27. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape
|
||||||
|
sequences such as \eX when they appeared invalidly in a character class. Now
|
||||||
|
the option applies only to unrecognized or malformed escape sequences.
|
||||||
|
|
||||||
|
|
||||||
Version 10.32 10-September-2018
|
Version 10.32 10-September-2018
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
|
@ -1870,11 +1870,14 @@ always causes an error in Perl.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
|
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
|
||||||
<b>pcre2_compile()</b>, all unrecognized or erroneous escape sequences are
|
<b>pcre2_compile()</b>, all unrecognized or malformed escape sequences are
|
||||||
treated as single-character escapes. For example, \j is a literal "j" and
|
treated as single-character escapes. For example, \j is a literal "j" and
|
||||||
\x{2z} is treated as the literal string "x{2z}". Setting this option means
|
\x{2z} is treated as the literal string "x{2z}". Setting this option means
|
||||||
that typos in patterns may go undetected and have unexpected results. This is a
|
that typos in patterns may go undetected and have unexpected results. Also note
|
||||||
dangerous option. Use with care.
|
that a sequence such as [\N{] is interpreted as a malformed attempt at
|
||||||
|
[\N{...}] and so is treated as [N{] whereas [\N] gives an error because an
|
||||||
|
unqualified \N is a valid escape sequence but is not supported in a character
|
||||||
|
class. To reiterate: this is a dangerous option. Use with great care.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3782,9 +3785,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 27 November 2018
|
Last updated: 04 January 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -1846,11 +1846,15 @@ COMPILING A PATTERN
|
||||||
Perl.
|
Perl.
|
||||||
|
|
||||||
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
|
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
|
||||||
pcre2_compile(), all unrecognized or erroneous escape sequences are
|
pcre2_compile(), all unrecognized or malformed escape sequences are
|
||||||
treated as single-character escapes. For example, \j is a literal "j"
|
treated as single-character escapes. For example, \j is a literal "j"
|
||||||
and \x{2z} is treated as the literal string "x{2z}". Setting this
|
and \x{2z} is treated as the literal string "x{2z}". Setting this
|
||||||
option means that typos in patterns may go undetected and have unex-
|
option means that typos in patterns may go undetected and have unex-
|
||||||
pected results. This is a dangerous option. Use with care.
|
pected results. Also note that a sequence such as [\N{] is interpreted
|
||||||
|
as a malformed attempt at [\N{...}] and so is treated as [N{] whereas
|
||||||
|
[\N] gives an error because an unqualified \N is a valid escape
|
||||||
|
sequence but is not supported in a character class. To reiterate: this
|
||||||
|
is a dangerous option. Use with great care.
|
||||||
|
|
||||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||||
|
|
||||||
|
@ -3654,8 +3658,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 27 November 2018
|
Last updated: 04 January 2019
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "27 November 2018" "PCRE2 10.33"
|
.TH PCRE2API 3 "04 January 2019" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1825,11 +1825,14 @@ Perl's warning switch is enabled. However, a malformed octal number after \eo{
|
||||||
always causes an error in Perl.
|
always causes an error in Perl.
|
||||||
.P
|
.P
|
||||||
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
|
If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
|
||||||
\fBpcre2_compile()\fP, all unrecognized or erroneous escape sequences are
|
\fBpcre2_compile()\fP, all unrecognized or malformed escape sequences are
|
||||||
treated as single-character escapes. For example, \ej is a literal "j" and
|
treated as single-character escapes. For example, \ej is a literal "j" and
|
||||||
\ex{2z} is treated as the literal string "x{2z}". Setting this option means
|
\ex{2z} is treated as the literal string "x{2z}". Setting this option means
|
||||||
that typos in patterns may go undetected and have unexpected results. This is a
|
that typos in patterns may go undetected and have unexpected results. Also note
|
||||||
dangerous option. Use with care.
|
that a sequence such as [\eN{] is interpreted as a malformed attempt at
|
||||||
|
[\eN{...}] and so is treated as [N{] whereas [\eN] gives an error because an
|
||||||
|
unqualified \eN is a valid escape sequence but is not supported in a character
|
||||||
|
class. To reiterate: this is a dangerous option. Use with great care.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
PCRE2_EXTRA_ESCAPED_CR_IS_LF
|
||||||
.sp
|
.sp
|
||||||
|
@ -3790,6 +3793,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 27 November 2018
|
Last updated: 04 January 2019
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
New API code Copyright (c) 2016-2019 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -3346,9 +3346,9 @@ while (ptr < ptrend)
|
||||||
tempptr = ptr;
|
tempptr = ptr;
|
||||||
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
|
escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
|
||||||
options, TRUE, cb);
|
options, TRUE, cb);
|
||||||
|
|
||||||
if (errorcode != 0)
|
if (errorcode != 0)
|
||||||
{
|
{
|
||||||
CLASS_ESCAPE_FAILED:
|
|
||||||
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
|
||||||
goto FAILED;
|
goto FAILED;
|
||||||
ptr = tempptr;
|
ptr = tempptr;
|
||||||
|
@ -3359,30 +3359,32 @@ while (ptr < ptrend)
|
||||||
escape = 0; /* Treat as literal character */
|
escape = 0; /* Treat as literal character */
|
||||||
}
|
}
|
||||||
|
|
||||||
if (escape == 0) /* Escaped character code point is in c */
|
switch(escape)
|
||||||
{
|
{
|
||||||
|
case 0: /* Escaped character code point is in c */
|
||||||
char_is_literal = FALSE;
|
char_is_literal = FALSE;
|
||||||
goto CLASS_LITERAL;
|
goto CLASS_LITERAL;
|
||||||
}
|
|
||||||
|
|
||||||
/* These three escapes do not alter the class range state. */
|
case ESC_b:
|
||||||
|
c = CHAR_BS; /* \b is backspace in a class */
|
||||||
if (escape == ESC_b)
|
|
||||||
{
|
|
||||||
c = CHAR_BS; /* \b is backspace in a class */
|
|
||||||
char_is_literal = FALSE;
|
char_is_literal = FALSE;
|
||||||
goto CLASS_LITERAL;
|
goto CLASS_LITERAL;
|
||||||
}
|
|
||||||
|
|
||||||
else if (escape == ESC_Q)
|
case ESC_Q:
|
||||||
{
|
|
||||||
inescq = TRUE; /* Enter literal mode */
|
inescq = TRUE; /* Enter literal mode */
|
||||||
goto CLASS_CONTINUE;
|
goto CLASS_CONTINUE;
|
||||||
}
|
|
||||||
|
|
||||||
else if (escape == ESC_E) /* Ignore orphan \E */
|
case ESC_E: /* Ignore orphan \E */
|
||||||
goto CLASS_CONTINUE;
|
goto CLASS_CONTINUE;
|
||||||
|
|
||||||
|
case ESC_B: /* Always an error in a class */
|
||||||
|
case ESC_R:
|
||||||
|
case ESC_X:
|
||||||
|
errorcode = ERR7;
|
||||||
|
ptr--;
|
||||||
|
goto FAILED;
|
||||||
|
}
|
||||||
|
|
||||||
/* The second part of a range can be a single-character escape
|
/* The second part of a range can be a single-character escape
|
||||||
sequence (detected above), but not any of the other escapes. Perl
|
sequence (detected above), but not any of the other escapes. Perl
|
||||||
treats a hyphen as a literal in such circumstances. However, in Perl's
|
treats a hyphen as a literal in such circumstances. However, in Perl's
|
||||||
|
@ -3392,7 +3394,7 @@ while (ptr < ptrend)
|
||||||
if (class_range_state == RANGE_STARTED)
|
if (class_range_state == RANGE_STARTED)
|
||||||
{
|
{
|
||||||
errorcode = ERR50;
|
errorcode = ERR50;
|
||||||
goto CLASS_ESCAPE_FAILED;
|
goto FAILED; /* Not CLASS_ESCAPE_FAILED; always an error */
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Of the remaining escapes, only those that define characters are
|
/* Of the remaining escapes, only those that define characters are
|
||||||
|
@ -3402,8 +3404,8 @@ while (ptr < ptrend)
|
||||||
switch(escape)
|
switch(escape)
|
||||||
{
|
{
|
||||||
case ESC_N:
|
case ESC_N:
|
||||||
errorcode = ERR71; /* Not supported in a class */
|
errorcode = ERR71;
|
||||||
goto CLASS_ESCAPE_FAILED;
|
goto FAILED;
|
||||||
|
|
||||||
case ESC_H:
|
case ESC_H:
|
||||||
case ESC_h:
|
case ESC_h:
|
||||||
|
@ -3466,14 +3468,14 @@ while (ptr < ptrend)
|
||||||
}
|
}
|
||||||
#else
|
#else
|
||||||
errorcode = ERR45;
|
errorcode = ERR45;
|
||||||
goto CLASS_ESCAPE_FAILED;
|
goto FAILED;
|
||||||
#endif
|
#endif
|
||||||
break; /* End \P and \p */
|
break; /* End \P and \p */
|
||||||
|
|
||||||
default: /* All others are not allowed in a class */
|
default: /* All others are not allowed in a class */
|
||||||
errorcode = ERR7;
|
errorcode = ERR7;
|
||||||
ptr--;
|
ptr--;
|
||||||
goto CLASS_ESCAPE_FAILED;
|
goto FAILED;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Perl gives a warning unless a following hyphen is the last character
|
/* Perl gives a warning unless a following hyphen is the last character
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
New API code Copyright (c) 2016-2019 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -71,7 +71,7 @@ static const unsigned char compile_error_texts[] =
|
||||||
/* 5 */
|
/* 5 */
|
||||||
"number too big in {} quantifier\0"
|
"number too big in {} quantifier\0"
|
||||||
"missing terminating ] for character class\0"
|
"missing terminating ] for character class\0"
|
||||||
"invalid escape sequence in character class\0"
|
"escape sequence is invalid in character class\0"
|
||||||
"range out of order in character class\0"
|
"range out of order in character class\0"
|
||||||
"quantifier does not follow a repeatable item\0"
|
"quantifier does not follow a repeatable item\0"
|
||||||
/* 10 */
|
/* 10 */
|
||||||
|
|
|
@ -5304,10 +5304,22 @@ a)"xI
|
||||||
|
|
||||||
/\N{\c/IB,bad_escape_is_literal
|
/\N{\c/IB,bad_escape_is_literal
|
||||||
|
|
||||||
/[\j\x{z}\o\gA-\Nb-\g]/B,bad_escape_is_literal
|
/[\j\x{z}\o\gAb\g]/B,bad_escape_is_literal
|
||||||
|
|
||||||
/[Q-\N]/B,bad_escape_is_literal
|
/[Q-\N]/B,bad_escape_is_literal
|
||||||
|
|
||||||
|
/[\s-_]/bad_escape_is_literal
|
||||||
|
|
||||||
|
/[_-\s]/bad_escape_is_literal
|
||||||
|
|
||||||
|
/[\B\R\X]/B
|
||||||
|
|
||||||
|
/[\B\R\X]/B,bad_escape_is_literal
|
||||||
|
|
||||||
|
/[A-\BP-\RV-\X]/B
|
||||||
|
|
||||||
|
/[A-\BP-\RV-\X]/B,bad_escape_is_literal
|
||||||
|
|
||||||
# ----------------------------------------------------------------------
|
# ----------------------------------------------------------------------
|
||||||
|
|
||||||
/a\b(c/literal
|
/a\b(c/literal
|
||||||
|
|
|
@ -135,13 +135,13 @@ Failed: error 105 at offset 7: number too big in {} quantifier
|
||||||
Failed: error 106 at offset 5: missing terminating ] for character class
|
Failed: error 106 at offset 5: missing terminating ] for character class
|
||||||
|
|
||||||
/[\B]/B
|
/[\B]/B
|
||||||
Failed: error 107 at offset 2: invalid escape sequence in character class
|
Failed: error 107 at offset 2: escape sequence is invalid in character class
|
||||||
|
|
||||||
/[\R]/B
|
/[\R]/B
|
||||||
Failed: error 107 at offset 2: invalid escape sequence in character class
|
Failed: error 107 at offset 2: escape sequence is invalid in character class
|
||||||
|
|
||||||
/[\X]/B
|
/[\X]/B
|
||||||
Failed: error 107 at offset 2: invalid escape sequence in character class
|
Failed: error 107 at offset 2: escape sequence is invalid in character class
|
||||||
|
|
||||||
/[z-a]/
|
/[z-a]/
|
||||||
Failed: error 108 at offset 3: range out of order in character class
|
Failed: error 108 at offset 3: range out of order in character class
|
||||||
|
@ -16224,16 +16224,34 @@ First code unit = 'N'
|
||||||
Last code unit = 'c'
|
Last code unit = 'c'
|
||||||
Subject length lower bound = 3
|
Subject length lower bound = 3
|
||||||
|
|
||||||
/[\j\x{z}\o\gA-\Nb-\g]/B,bad_escape_is_literal
|
/[\j\x{z}\o\gAb\g]/B,bad_escape_is_literal
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
Bra
|
Bra
|
||||||
[A-Nb-gjoxz{}]
|
[Abgjoxz{}]
|
||||||
Ket
|
Ket
|
||||||
End
|
End
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
||||||
/[Q-\N]/B,bad_escape_is_literal
|
/[Q-\N]/B,bad_escape_is_literal
|
||||||
Failed: error 108 at offset 4: range out of order in character class
|
Failed: error 150 at offset 5: invalid range in character class
|
||||||
|
|
||||||
|
/[\s-_]/bad_escape_is_literal
|
||||||
|
Failed: error 150 at offset 3: invalid range in character class
|
||||||
|
|
||||||
|
/[_-\s]/bad_escape_is_literal
|
||||||
|
Failed: error 150 at offset 5: invalid range in character class
|
||||||
|
|
||||||
|
/[\B\R\X]/B
|
||||||
|
Failed: error 107 at offset 2: escape sequence is invalid in character class
|
||||||
|
|
||||||
|
/[\B\R\X]/B,bad_escape_is_literal
|
||||||
|
Failed: error 107 at offset 2: escape sequence is invalid in character class
|
||||||
|
|
||||||
|
/[A-\BP-\RV-\X]/B
|
||||||
|
Failed: error 107 at offset 4: escape sequence is invalid in character class
|
||||||
|
|
||||||
|
/[A-\BP-\RV-\X]/B,bad_escape_is_literal
|
||||||
|
Failed: error 107 at offset 4: escape sequence is invalid in character class
|
||||||
|
|
||||||
# ----------------------------------------------------------------------
|
# ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue