Implement PCRE2_SUBSTITUTE_{OVERFLOW_LENGTH,UNKNOWN_UNSET}.
This commit is contained in:
parent
215e2185e4
commit
35e0f55783
|
@ -386,6 +386,9 @@ possible to test it.
|
|||
111. "Harden" pcre2test against ridiculously large values in modifiers and
|
||||
command line arguments.
|
||||
|
||||
112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_
|
||||
LENGTH.
|
||||
|
||||
|
||||
Version 10.20 30-June-2015
|
||||
--------------------------
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_SUBSTITUTE 3 "04 December 2015" "PCRE2 10.21"
|
||||
.TH PCRE2_SUBSTITUTE 3 "12 December 2015" "PCRE2 10.21"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -58,6 +58,8 @@ The options are:
|
|||
PCRE2_UTF was set at compile time)
|
||||
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
|
||||
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
||||
.sp
|
||||
The function returns the number of substitutions, which may be zero if there
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "04 December 2015" "PCRE2 10.21"
|
||||
.TH PCRE2API 3 "12 December 2015" "PCRE2 10.21"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -2704,12 +2704,20 @@ functions from the match context, if provided, or else those that were used to
|
|||
allocate memory for the compiled code.
|
||||
.P
|
||||
The \fIoutlengthptr\fP argument must point to a variable that contains the
|
||||
length, in code units, of the output buffer. If the function is successful,
|
||||
the value is updated to contain the length of the new string, excluding the
|
||||
trailing zero that is automatically added. If the function is not successful,
|
||||
the value is set to PCRE2_UNSET for general errors (such as output buffer too
|
||||
small). For syntax errors in the replacement string, the value is set to the
|
||||
offset in the replacement string where the error was detected.
|
||||
length, in code units, of the output buffer. If the function is successful, the
|
||||
value is updated to contain the length of the new string, excluding the
|
||||
trailing zero that is automatically added.
|
||||
.P
|
||||
If the function is not successful, the value set via \fIoutlengthptr\fP depends
|
||||
on the type of error. For syntax errors in the replacement string, the value is
|
||||
the offset in the replacement string where the error was detected. For other
|
||||
errors, the value is PCRE2_UNSET by default. This includes the case of the
|
||||
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set
|
||||
(see below), in which case the value is the minimum length needed, including
|
||||
space for the trailing zero. Note that in order to compute the required length,
|
||||
\fBpcre2_substitute()\fP has to simulate all the matching and copying, instead
|
||||
of giving an error return as soon as the buffer overflows. Note also that the
|
||||
length is in code units, not bytes.
|
||||
.P
|
||||
In the replacement string, which is interpreted as a UTF string in UTF mode,
|
||||
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
|
||||
|
@ -2734,7 +2742,8 @@ simultaneous substitutions, as this \fBpcre2test\fP example shows:
|
|||
apple lemon
|
||||
2: pear orange
|
||||
.sp
|
||||
Three additional options are available:
|
||||
As well as the usual options for \fBpcre2_match()\fP, a number of additional
|
||||
options can be set in the \fIoptions\fP argument.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||
replacing every matching substring. If this is not set, only the first matching
|
||||
|
@ -2745,10 +2754,30 @@ advanced by one character except when CRLF is a valid newline sequence and the
|
|||
next two characters are CR, LF. In this case, the current position is advanced
|
||||
by two characters.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups to be treated as
|
||||
empty strings when inserted as described above. If this option is not set, an
|
||||
attempt to insert an unset group causes the PCRE2_ERROR_UNSET error. This
|
||||
option does not influence the extended substitution syntax described below.
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
|
||||
this option is set, however, \fBpcre2_substitute()\fP continues to go through
|
||||
the motions of matching and substituting (without, of course, writing anything)
|
||||
in order to compute the size of buffer that is needed. This value is passed
|
||||
back via the \fIoutlengthptr\fP variable, with the result of the function still
|
||||
being PCRE2_ERROR_NOMEMORY.
|
||||
.P
|
||||
Passing a buffer size of zero is a permitted way of finding out how much memory
|
||||
is needed for given substitution. However, this does mean that the entire
|
||||
operation is carried out twice. Depending on the application, it may be more
|
||||
efficient to allocate a large buffer and free the excess afterwards, instead of
|
||||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do
|
||||
not appear in the pattern to be treated as unset groups. This option should be
|
||||
used with care, because it means that a typo in a group name or number no
|
||||
longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown
|
||||
groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
|
||||
strings when inserted as described above. If this option is not set, an attempt
|
||||
to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
|
||||
not influence the extended substitution syntax described below.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
||||
replacement string. Without this option, only the dollar character is special,
|
||||
|
@ -2800,26 +2829,38 @@ string remains in force afterwards, as shown in this \fBpcre2test\fP example:
|
|||
1: HELLO
|
||||
.sp
|
||||
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||
substitutions.
|
||||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
|
||||
groups in the extended syntax forms to be treated as unset.
|
||||
.P
|
||||
If successful, the function returns the number of replacements that were made.
|
||||
This may be zero if no matches were found, and is never greater than 1 unless
|
||||
PCRE2_SUBSTITUTE_GLOBAL is set.
|
||||
If successful, \fBpcre2_substitute()\fP returns the number of replacements that
|
||||
were made. This may be zero if no matches were found, and is never greater than
|
||||
1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
||||
.P
|
||||
In the event of an error, a negative error code is returned. Except for
|
||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
|
||||
are passed straight back. PCRE2_ERROR_NOSUBSTRING is returned for a
|
||||
non-existent substring insertion, and PCRE2_ERROR_UNSET is returned for an
|
||||
unset substring insertion when the simple (non-extended) syntax is used and
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. PCRE2_ERROR_NOMEMORY is returned if
|
||||
the output buffer is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for
|
||||
miscellaneous syntax errors in the replacement string, with more particular
|
||||
errors being PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
|
||||
PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket not found),
|
||||
PCRE2_BADSUBSTITUTION (syntax error in extended group substitution), and
|
||||
PCRE2_BADSUBPATTERN (the pattern match ended before it started). As for all
|
||||
PCRE2 errors, a text message that describes the error can be obtained by
|
||||
calling \fBpcre2_get_error_message()\fP.
|
||||
are passed straight back.
|
||||
.P
|
||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
|
||||
unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
||||
.P
|
||||
PCRE2_ERROR_UNSET is returned for an unset substring insertion (including an
|
||||
unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple
|
||||
(non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set.
|
||||
.P
|
||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
|
||||
needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
|
||||
default.
|
||||
.P
|
||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
|
||||
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
|
||||
(invalid escape sequence), PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket
|
||||
not found), PCRE2_BADSUBSTITUTION (syntax error in extended group
|
||||
substitution), and PCRE2_BADSUBPATTERN (the pattern match ended before it
|
||||
started, which can happen if \eK is used in an assertion).
|
||||
.P
|
||||
As for all PCRE2 errors, a text message that describes the error can be
|
||||
obtained by calling \fBpcre2_get_error_message()\fP.
|
||||
.
|
||||
.
|
||||
.SH "DUPLICATE SUBPATTERN NAMES"
|
||||
|
@ -3113,6 +3154,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 04 December 2015
|
||||
Last updated: 21 December 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "04 December 2015" "PCRE 10.21"
|
||||
.TH PCRE2TEST 1 "12 December 2015" "PCRE 10.21"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -863,6 +863,8 @@ compilation process.
|
|||
replace=<string> specify a replacement string
|
||||
startchar show starting character when relevant
|
||||
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||
.sp
|
||||
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
||||
|
@ -963,6 +965,8 @@ pattern.
|
|||
startchar show startchar when relevant
|
||||
startoffset=<n> same as offset=<n>
|
||||
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||
zero_terminate pass the subject as zero-terminated
|
||||
.sp
|
||||
|
@ -1107,10 +1111,15 @@ the appropriate code unit width. If it is not a valid UTF-8 string, the
|
|||
individual code units are copied directly. This provides a means of passing an
|
||||
invalid UTF-8 string for testing purposes.
|
||||
.P
|
||||
If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||
\fBpcre2_substitute()\fP. The \fBsubstitute_extended\fP and
|
||||
\fBsubstitute_unset_empty\fP modifiers set PCRE2_SUBSTITUTE_EXTENDED and
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY, respectively.
|
||||
The following modifiers set options (in additional to the normal match options)
|
||||
for \fBpcre2_substitute()\fP:
|
||||
.sp
|
||||
global PCRE2_SUBSTITUTE_GLOBAL
|
||||
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
|
||||
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||
.sp
|
||||
.P
|
||||
After a successful substitution, the modified string is output, preceded by the
|
||||
number of replacements. This may be zero if there were no matches. Here is a
|
||||
|
@ -1135,6 +1144,19 @@ character. Here is an example that tests the edge case:
|
|||
123abc123\e=replace=[9]XYZ
|
||||
Failed: error -47: no more memory
|
||||
.sp
|
||||
The default action of \fBpcre2_substitute()\fP is to return
|
||||
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
|
||||
\fBsubstitute_overflow_length\fP modifier), \fBpcre2_substitute()\fP continues
|
||||
to go through the motions of matching and substituting, in order to compute the
|
||||
size of buffer that is required. When this happens, \fBpcre2test\fP shows the
|
||||
required buffer length (which includes space for the trailing zero) as part of
|
||||
the error message. For example:
|
||||
.sp
|
||||
/abc/substitute_overflow_length
|
||||
123abc123\e=replace=[9]XYZ
|
||||
Failed: error -47: no more memory: 10 code units are needed
|
||||
.sp
|
||||
A replacement string is ignored with POSIX and DFA matching. Specifying partial
|
||||
matching provokes an error return ("bad option value") from
|
||||
\fBpcre2_substitute()\fP.
|
||||
|
@ -1618,6 +1640,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 04 December 2015
|
||||
Last updated: 12 December 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -151,6 +151,8 @@ sanity checks). */
|
|||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
||||
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
|
||||
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u
|
||||
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u
|
||||
|
||||
/* Newline and \R settings, for use in compile contexts. The newline values
|
||||
must be kept in step with values set in config.h and both sets must all be
|
||||
|
|
|
@ -151,6 +151,8 @@ sanity checks). */
|
|||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
||||
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
|
||||
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u
|
||||
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u
|
||||
|
||||
/* Newline and \R settings, for use in compile contexts. The newline values
|
||||
must be kept in step with values set in config.h and both sets must all be
|
||||
|
|
|
@ -47,6 +47,12 @@ POSSIBILITY OF SUCH DAMAGE.
|
|||
|
||||
#define PTR_STACK_SIZE 20
|
||||
|
||||
#define SUBSTITUTE_OPTIONS \
|
||||
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY)
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Find end of substitute text *
|
||||
|
@ -181,6 +187,30 @@ Returns: >= 0 number of substitutions made
|
|||
PCRE2_ERROR_BADREPLACEMENT means invalid use of $
|
||||
*/
|
||||
|
||||
/* This macro checks for space in the buffer before copying into it. On
|
||||
overflow, either give an error immediately, or keep on, accumulating the
|
||||
length. */
|
||||
|
||||
#define CHECKMEMCPY(from,length) \
|
||||
if (!overflowed && lengthleft < length) \
|
||||
{ \
|
||||
if ((suboptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) == 0) goto NOROOM; \
|
||||
overflowed = TRUE; \
|
||||
extra_needed = length - lengthleft; \
|
||||
} \
|
||||
else if (overflowed) \
|
||||
{ \
|
||||
extra_needed += length; \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
memcpy(buffer + buff_offset, from, CU2BYTES(length)); \
|
||||
buff_offset += length; \
|
||||
lengthleft -= length; \
|
||||
}
|
||||
|
||||
/* Here's the function */
|
||||
|
||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||
pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
|
||||
PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data,
|
||||
|
@ -193,20 +223,22 @@ int forcecase = 0;
|
|||
int forcecasereset = 0;
|
||||
uint32_t ovector_count;
|
||||
uint32_t goptions = 0;
|
||||
uint32_t suboptions;
|
||||
BOOL match_data_created = FALSE;
|
||||
BOOL global = FALSE;
|
||||
BOOL extended = FALSE;
|
||||
BOOL literal = FALSE;
|
||||
BOOL uempty = FALSE; /* Unset/unknown groups => empty string */
|
||||
BOOL overflowed = FALSE;
|
||||
#ifdef SUPPORT_UNICODE
|
||||
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
|
||||
#endif
|
||||
PCRE2_UCHAR temp[6];
|
||||
PCRE2_SPTR ptr;
|
||||
PCRE2_SPTR repend;
|
||||
PCRE2_SIZE extra_needed = 0;
|
||||
PCRE2_SIZE buff_offset, buff_length, lengthleft, fraglength;
|
||||
PCRE2_SIZE *ovector;
|
||||
|
||||
buff_length = *blength;
|
||||
buff_offset = 0;
|
||||
lengthleft = buff_length = *blength;
|
||||
*blength = PCRE2_UNSET;
|
||||
|
||||
/* Partial matching is not valid. */
|
||||
|
@ -248,33 +280,14 @@ if (utf && (options & PCRE2_NO_UTF_CHECK) == 0)
|
|||
}
|
||||
#endif /* SUPPORT_UNICODE */
|
||||
|
||||
/* Notice the global and extended options and remove them from the options that
|
||||
are passed to pcre2_match(). */
|
||||
/* Save the substitute options and remove them from the match options. */
|
||||
|
||||
if ((options & PCRE2_SUBSTITUTE_GLOBAL) != 0)
|
||||
{
|
||||
options &= ~PCRE2_SUBSTITUTE_GLOBAL;
|
||||
global = TRUE;
|
||||
}
|
||||
|
||||
if ((options & PCRE2_SUBSTITUTE_EXTENDED) != 0)
|
||||
{
|
||||
options &= ~PCRE2_SUBSTITUTE_EXTENDED;
|
||||
extended = TRUE;
|
||||
}
|
||||
|
||||
if ((options & PCRE2_SUBSTITUTE_UNSET_EMPTY) != 0)
|
||||
{
|
||||
options &= ~PCRE2_SUBSTITUTE_UNSET_EMPTY;
|
||||
uempty = TRUE;
|
||||
}
|
||||
suboptions = options & SUBSTITUTE_OPTIONS;
|
||||
options &= ~SUBSTITUTE_OPTIONS;
|
||||
|
||||
/* Copy up to the start offset */
|
||||
|
||||
if (start_offset > buff_length) goto NOROOM;
|
||||
memcpy(buffer, subject, start_offset * (PCRE2_CODE_UNIT_WIDTH/8));
|
||||
buff_offset = start_offset;
|
||||
lengthleft = buff_length - start_offset;
|
||||
CHECKMEMCPY(subject, start_offset);
|
||||
|
||||
/* Loop for global substituting. */
|
||||
|
||||
|
@ -330,13 +343,11 @@ do
|
|||
#endif
|
||||
}
|
||||
|
||||
fraglength = start_offset - save_start;
|
||||
if (lengthleft < fraglength) goto NOROOM;
|
||||
memcpy(buffer + buff_offset, subject + save_start,
|
||||
fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
|
||||
buff_offset += fraglength;
|
||||
lengthleft -= fraglength;
|
||||
/* Copy what we have advanced past, reset the special global options, and
|
||||
continue to the next match. */
|
||||
|
||||
fraglength = start_offset - save_start;
|
||||
CHECKMEMCPY(subject + save_start, fraglength);
|
||||
goptions = 0;
|
||||
continue;
|
||||
}
|
||||
|
@ -350,25 +361,21 @@ do
|
|||
goto EXIT;
|
||||
}
|
||||
|
||||
/* Paranoid check for integer overflow; surely no real call to this function
|
||||
would ever hit this! */
|
||||
/* Count substitutions with a paranoid check for integer overflow; surely no
|
||||
real call to this function would ever hit this! */
|
||||
|
||||
if (subs == INT_MAX)
|
||||
{
|
||||
rc = PCRE2_ERROR_TOOMANYREPLACE;
|
||||
goto EXIT;
|
||||
}
|
||||
|
||||
/* Count substitutions and proceed */
|
||||
|
||||
subs++;
|
||||
|
||||
/* Copy the text leading up to the match. */
|
||||
|
||||
if (rc == 0) rc = ovector_count;
|
||||
fraglength = ovector[0] - start_offset;
|
||||
if (fraglength >= lengthleft) goto NOROOM;
|
||||
memcpy(buffer + buff_offset, subject + start_offset,
|
||||
fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
|
||||
buff_offset += fraglength;
|
||||
lengthleft -= fraglength;
|
||||
CHECKMEMCPY(subject + start_offset, fraglength);
|
||||
|
||||
/* Process the replacement string. Literal mode is set by \Q, but only in
|
||||
extended mode when backslashes are being interpreted. In extended mode we
|
||||
|
@ -378,12 +385,13 @@ do
|
|||
for (;;)
|
||||
{
|
||||
uint32_t ch;
|
||||
unsigned int chlen;
|
||||
|
||||
/* If at the end of a nested substring, pop the stack. */
|
||||
|
||||
if (ptr >= repend)
|
||||
{
|
||||
if (ptrstackptr <= 0) break;
|
||||
if (ptrstackptr <= 0) break; /* End of replacement string */
|
||||
repend = ptrstack[--ptrstackptr];
|
||||
ptr = ptrstack[--ptrstackptr];
|
||||
continue;
|
||||
|
@ -450,15 +458,25 @@ do
|
|||
group = group * 10 + next - CHAR_0;
|
||||
|
||||
/* A check for a number greater than the hightest captured group
|
||||
is sufficient here; no need for a separate overflow check. */
|
||||
is sufficient here; no need for a separate overflow check. If unknown
|
||||
groups are to be treated as unset, just skip over any remaining
|
||||
digits and carry on. */
|
||||
|
||||
if (group > code->top_bracket)
|
||||
{
|
||||
if ((suboptions & PCRE2_SUBSTITUTE_UNKNOWN_UNSET) != 0)
|
||||
{
|
||||
while (++ptr < repend && *ptr >= CHAR_0 && *ptr <= CHAR_9);
|
||||
break;
|
||||
}
|
||||
else
|
||||
{
|
||||
rc = PCRE2_ERROR_NOSUBSTRING;
|
||||
goto PTREXIT;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
const uint8_t *ctypes = code->tables + ctypes_offset;
|
||||
|
@ -478,7 +496,8 @@ do
|
|||
|
||||
if (inparens)
|
||||
{
|
||||
if (extended && !star && ptr < repend - 2 && next == CHAR_COLON)
|
||||
if ((suboptions & PCRE2_SUBSTITUTE_EXTENDED) != 0 &&
|
||||
!star && ptr < repend - 2 && next == CHAR_COLON)
|
||||
{
|
||||
special = *(++ptr);
|
||||
if (special != CHAR_PLUS && special != CHAR_MINUS)
|
||||
|
@ -513,8 +532,8 @@ do
|
|||
ptr++;
|
||||
}
|
||||
|
||||
/* Have found a syntactically correct group number or name, or
|
||||
*name. Only *MARK is currently recognized. */
|
||||
/* Have found a syntactically correct group number or name, or *name.
|
||||
Only *MARK is currently recognized. */
|
||||
|
||||
if (star)
|
||||
{
|
||||
|
@ -523,11 +542,10 @@ do
|
|||
PCRE2_SPTR mark = pcre2_get_mark(match_data);
|
||||
if (mark != NULL)
|
||||
{
|
||||
while (*mark != 0)
|
||||
{
|
||||
if (lengthleft-- < 1) goto NOROOM;
|
||||
buffer[buff_offset++] = *mark++;
|
||||
}
|
||||
PCRE2_SPTR mark_start = mark;
|
||||
while (*mark != 0) mark++;
|
||||
fraglength = mark - mark_start;
|
||||
CHECKMEMCPY(mark_start, fraglength);
|
||||
}
|
||||
}
|
||||
else goto BAD;
|
||||
|
@ -541,12 +559,21 @@ do
|
|||
PCRE2_SPTR subptr, subptrend;
|
||||
|
||||
/* Find a number for a named group. In case there are duplicate names,
|
||||
search for the first one that is set. */
|
||||
search for the first one that is set. If the name is not found when
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_EMPTY is set, set the group number to a
|
||||
non-existent group. */
|
||||
|
||||
if (group < 0)
|
||||
{
|
||||
PCRE2_SPTR first, last, entry;
|
||||
rc = pcre2_substring_nametable_scan(code, name, &first, &last);
|
||||
if (rc == PCRE2_ERROR_NOSUBSTRING &&
|
||||
(suboptions & PCRE2_SUBSTITUTE_UNKNOWN_UNSET) != 0)
|
||||
{
|
||||
group = code->top_bracket + 1;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (rc < 0) goto PTREXIT;
|
||||
for (entry = first; entry <= last; entry += rc)
|
||||
{
|
||||
|
@ -562,11 +589,12 @@ do
|
|||
}
|
||||
}
|
||||
|
||||
/* If group is still negative, it means we did not find a group that
|
||||
is in the ovector. Just set the first group. */
|
||||
/* If group is still negative, it means we did not find a group
|
||||
that is in the ovector. Just set the first group. */
|
||||
|
||||
if (group < 0) group = GET2(first, 0);
|
||||
}
|
||||
}
|
||||
|
||||
/* We now have a group that is identified by number. Find the length of
|
||||
the captured string. If a group in a non-special substitution is unset
|
||||
|
@ -575,10 +603,15 @@ do
|
|||
rc = pcre2_substring_length_bynumber(match_data, group, &sublength);
|
||||
if (rc < 0)
|
||||
{
|
||||
if (rc == PCRE2_ERROR_NOSUBSTRING &&
|
||||
(suboptions & PCRE2_SUBSTITUTE_UNKNOWN_UNSET) != 0)
|
||||
{
|
||||
rc = PCRE2_ERROR_UNSET;
|
||||
}
|
||||
if (rc != PCRE2_ERROR_UNSET) goto PTREXIT; /* Non-unset errors */
|
||||
if (special == 0) /* Plain substitution */
|
||||
{
|
||||
if (uempty) continue; /* Treat as empty */
|
||||
if ((suboptions & PCRE2_SUBSTITUTE_UNSET_EMPTY) != 0) continue;
|
||||
goto PTREXIT; /* Else error */
|
||||
}
|
||||
}
|
||||
|
@ -646,26 +679,13 @@ do
|
|||
}
|
||||
|
||||
#ifdef SUPPORT_UNICODE
|
||||
if (utf)
|
||||
{
|
||||
unsigned int chlen;
|
||||
#if PCRE2_CODE_UNIT_WIDTH == 8
|
||||
if (lengthleft < 6) goto NOROOM;
|
||||
#elif PCRE2_CODE_UNIT_WIDTH == 16
|
||||
if (lengthleft < 2) goto NOROOM;
|
||||
#else
|
||||
if (lengthleft < 1) goto NOROOM;
|
||||
#endif
|
||||
chlen = PRIV(ord2utf)(ch, buffer + buff_offset);
|
||||
buff_offset += chlen;
|
||||
lengthleft -= chlen;
|
||||
}
|
||||
else
|
||||
if (utf) chlen = PRIV(ord2utf)(ch, temp); else
|
||||
#endif
|
||||
{
|
||||
if (lengthleft-- < 1) goto NOROOM;
|
||||
buffer[buff_offset++] = ch;
|
||||
temp[0] = ch;
|
||||
chlen = 1;
|
||||
}
|
||||
CHECKMEMCPY(temp, chlen);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -675,7 +695,8 @@ do
|
|||
the case-forcing escapes are not supported in pcre2_compile() so must be
|
||||
recognized here. */
|
||||
|
||||
else if (extended && *ptr == CHAR_BACKSLASH)
|
||||
else if ((suboptions & PCRE2_SUBSTITUTE_EXTENDED) != 0 &&
|
||||
*ptr == CHAR_BACKSLASH)
|
||||
{
|
||||
int errorcode = 0;
|
||||
|
||||
|
@ -756,33 +777,19 @@ do
|
|||
)[ch/8] & (1 << (ch%8))) == 0)
|
||||
ch = (code->tables + fcc_offset)[ch];
|
||||
}
|
||||
|
||||
forcecase = forcecasereset;
|
||||
}
|
||||
|
||||
#ifdef SUPPORT_UNICODE
|
||||
if (utf)
|
||||
{
|
||||
unsigned int chlen;
|
||||
#if PCRE2_CODE_UNIT_WIDTH == 8
|
||||
if (lengthleft < 6) goto NOROOM;
|
||||
#elif PCRE2_CODE_UNIT_WIDTH == 16
|
||||
if (lengthleft < 2) goto NOROOM;
|
||||
#else
|
||||
if (lengthleft < 1) goto NOROOM;
|
||||
#endif
|
||||
chlen = PRIV(ord2utf)(ch, buffer + buff_offset);
|
||||
buff_offset += chlen;
|
||||
lengthleft -= chlen;
|
||||
}
|
||||
else
|
||||
if (utf) chlen = PRIV(ord2utf)(ch, temp); else
|
||||
#endif
|
||||
{
|
||||
if (lengthleft-- < 1) goto NOROOM;
|
||||
buffer[buff_offset++] = ch;
|
||||
}
|
||||
}
|
||||
temp[0] = ch;
|
||||
chlen = 1;
|
||||
}
|
||||
CHECKMEMCPY(temp, chlen);
|
||||
} /* End handling a literal code unit */
|
||||
} /* End of loop for scanning the replacement. */
|
||||
|
||||
/* The replacement has been copied to the output. Update the start offset to
|
||||
point to the rest of the subject string. If we matched an empty string,
|
||||
|
@ -791,18 +798,33 @@ do
|
|||
start_offset = ovector[1];
|
||||
goptions = (ovector[0] != ovector[1])? 0 :
|
||||
PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
|
||||
} while (global); /* Repeat "do" loop */
|
||||
} while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */
|
||||
|
||||
/* Copy the rest of the subject and return the number of substitutions. */
|
||||
/* Copy the rest of the subject. */
|
||||
|
||||
rc = subs;
|
||||
fraglength = length - start_offset;
|
||||
if (fraglength + 1 > lengthleft) goto NOROOM;
|
||||
memcpy(buffer + buff_offset, subject + start_offset,
|
||||
fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
|
||||
buff_offset += fraglength;
|
||||
buffer[buff_offset] = 0;
|
||||
*blength = buff_offset;
|
||||
CHECKMEMCPY(subject + start_offset, fraglength);
|
||||
temp[0] = 0;
|
||||
CHECKMEMCPY(temp , 1);
|
||||
|
||||
/* If overflowed is set it means the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set,
|
||||
and matching has carried on after a full buffer, in order to compute the length
|
||||
needed. Otherwise, an overflow generates an immediate error return. */
|
||||
|
||||
if (overflowed)
|
||||
{
|
||||
rc = PCRE2_ERROR_NOMEMORY;
|
||||
*blength = buff_length + extra_needed;
|
||||
}
|
||||
|
||||
/* After a successful execution, return the number of substitutions and set the
|
||||
length of buffer used, excluding the trailing zero. */
|
||||
|
||||
else
|
||||
{
|
||||
rc = subs;
|
||||
*blength = buff_offset - 1;
|
||||
}
|
||||
|
||||
EXIT:
|
||||
if (match_data_created) pcre2_match_data_free(match_data);
|
||||
|
|
116
src/pcre2test.c
116
src/pcre2test.c
|
@ -399,7 +399,8 @@ enum { MOD_CTC, /* Applies to a compile context */
|
|||
MOD_STR }; /* Is a string */
|
||||
|
||||
/* Control bits. Some apply to compiling, some to matching, but some can be set
|
||||
either on a pattern or a data line, so they must all be distinct. */
|
||||
either on a pattern or a data line, so they must all be distinct. There are now
|
||||
so many of them that they are split into two fields. */
|
||||
|
||||
#define CTL_AFTERTEXT 0x00000001u
|
||||
#define CTL_ALLAFTERTEXT 0x00000002u
|
||||
|
@ -426,12 +427,22 @@ either on a pattern or a data line, so they must all be distinct. */
|
|||
#define CTL_POSIX 0x00400000u
|
||||
#define CTL_PUSH 0x00800000u
|
||||
#define CTL_STARTCHAR 0x01000000u
|
||||
#define CTL_SUBSTITUTE_EXTENDED 0x02000000u
|
||||
#define CTL_SUBSTITUTE_UNSET_EMPTY 0x04000000u
|
||||
#define CTL_ZERO_TERMINATE 0x08000000u
|
||||
#define CTL_ZERO_TERMINATE 0x02000000u
|
||||
/* Spare 0x04000000u */
|
||||
/* Spare 0x08000000u */
|
||||
/* Spare 0x10000000u */
|
||||
/* Spare 0x20000000u */
|
||||
#define CTL_NL_SET 0x40000000u /* Informational */
|
||||
#define CTL_BSR_SET 0x80000000u /* Informational */
|
||||
|
||||
#define CTL_BSR_SET 0x80000000u /* This is informational */
|
||||
#define CTL_NL_SET 0x40000000u /* This is informational */
|
||||
/* Second control word */
|
||||
|
||||
#define CTL2_SUBSTITUTE_EXTENDED 0x00000001u
|
||||
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000002u
|
||||
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u
|
||||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
|
||||
|
||||
/* Combinations */
|
||||
|
||||
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
|
||||
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
|
||||
|
@ -448,9 +459,12 @@ data line. */
|
|||
CTL_GLOBAL|\
|
||||
CTL_MARK|\
|
||||
CTL_MEMORY|\
|
||||
CTL_STARTCHAR|\
|
||||
CTL_SUBSTITUTE_EXTENDED|\
|
||||
CTL_SUBSTITUTE_UNSET_EMPTY)
|
||||
CTL_STARTCHAR)
|
||||
|
||||
#define CTL2_ALLPD (CTL2_SUBSTITUTE_EXTENDED|\
|
||||
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
|
||||
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
|
||||
CTL2_SUBSTITUTE_UNSET_EMPTY)
|
||||
|
||||
/* Structures for holding modifier information for patterns and subject strings
|
||||
(data). Fields containing modifiers that can be set either for a pattern or a
|
||||
|
@ -460,6 +474,7 @@ same offset in the big table below works for both. */
|
|||
typedef struct patctl { /* Structure for pattern modifiers. */
|
||||
uint32_t options; /* Must be in same position as datctl */
|
||||
uint32_t control; /* Must be in same position as datctl */
|
||||
uint32_t control2; /* Must be in same position as datctl */
|
||||
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
|
||||
uint32_t jit;
|
||||
uint32_t stackguard_test;
|
||||
|
@ -474,6 +489,7 @@ typedef struct patctl { /* Structure for pattern modifiers. */
|
|||
typedef struct datctl { /* Structure for data line modifiers. */
|
||||
uint32_t options; /* Must be in same position as patctl */
|
||||
uint32_t control; /* Must be in same position as patctl */
|
||||
uint32_t control2; /* Must be in same position as patctl */
|
||||
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
|
||||
uint32_t cfail[2];
|
||||
int32_t callout_data;
|
||||
|
@ -592,8 +608,10 @@ static modstruct modlist[] = {
|
|||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
||||
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
||||
{ "substitute_extended", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_EXTENDED, PO(control) },
|
||||
{ "substitute_unset_empty", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_UNSET_EMPTY, PO(control) },
|
||||
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
|
||||
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
|
||||
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
|
||||
{ "substitute_unset_empty", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
|
||||
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
||||
{ "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
|
||||
{ "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) },
|
||||
|
@ -613,10 +631,13 @@ static modstruct modlist[] = {
|
|||
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
|
||||
CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX)
|
||||
|
||||
#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0)
|
||||
|
||||
#define POSIX_SUPPORTED_MATCH_OPTIONS ( \
|
||||
PCRE2_NOTBOL|PCRE2_NOTEMPTY|PCRE2_NOTEOL)
|
||||
|
||||
#define POSIX_SUPPORTED_MATCH_CONTROLS (CTL_AFTERTEXT|CTL_ALLAFTERTEXT)
|
||||
#define POSIX_SUPPORTED_MATCH_CONTROLS2 (0)
|
||||
|
||||
/* Control bits that are not ignored with 'push'. */
|
||||
|
||||
|
@ -624,23 +645,27 @@ static modstruct modlist[] = {
|
|||
CTL_BINCODE|CTL_CALLOUT_INFO|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO| \
|
||||
CTL_JITVERIFY|CTL_MEMORY|CTL_PUSH|CTL_BSR_SET|CTL_NL_SET)
|
||||
|
||||
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (0)
|
||||
|
||||
/* Controls that apply only at compile time with 'push'. */
|
||||
|
||||
#define PUSH_COMPILE_ONLY_CONTROLS CTL_JITVERIFY
|
||||
#define PUSH_COMPILE_ONLY_CONTROLS2 (0)
|
||||
|
||||
/* Controls that are forbidden with #pop. */
|
||||
|
||||
#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_PUSH)
|
||||
|
||||
/* Pattern controls that are mutually exclusive. */
|
||||
/* Pattern controls that are mutually exclusive. At present these are all in
|
||||
the first control word. */
|
||||
|
||||
static uint32_t exclusive_pat_controls[] = {
|
||||
CTL_POSIX | CTL_HEXPAT,
|
||||
CTL_POSIX | CTL_PUSH,
|
||||
CTL_EXPAND | CTL_HEXPAT };
|
||||
|
||||
/* Data controls that are mutually exclusive. */
|
||||
|
||||
/* Data controls that are mutually exclusive. At present these are all in the
|
||||
first control word. */
|
||||
static uint32_t exclusive_dat_controls[] = {
|
||||
CTL_ALLUSEDTEXT | CTL_STARTCHAR,
|
||||
CTL_FINDLIMITS | CTL_NULLCONTEXT };
|
||||
|
@ -3528,15 +3553,16 @@ words.
|
|||
|
||||
Arguments:
|
||||
controls control bits
|
||||
controls2 more control bits
|
||||
before text to print before
|
||||
|
||||
Returns: nothing
|
||||
*/
|
||||
|
||||
static void
|
||||
show_controls(uint32_t controls, const char *before)
|
||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||
{
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
before,
|
||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||
|
@ -3565,8 +3591,10 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
|||
((controls & CTL_POSIX) != 0)? " posix" : "",
|
||||
((controls & CTL_PUSH) != 0)? " push" : "",
|
||||
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
|
||||
((controls & CTL_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
|
||||
((controls & CTL_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
||||
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
|
||||
}
|
||||
|
||||
|
@ -4398,14 +4426,15 @@ patlen = p - buffer - 2;
|
|||
if (!decode_modifiers(p, CTX_PAT, &pat_patctl, NULL)) return PR_SKIP;
|
||||
utf = (pat_patctl.options & PCRE2_UTF) != 0;
|
||||
|
||||
/* Check for mutually exclusive modifiers. */
|
||||
/* Check for mutually exclusive modifiers. At present, these are all in the
|
||||
first control word. */
|
||||
|
||||
for (k = 0; k < sizeof(exclusive_pat_controls)/sizeof(uint32_t); k++)
|
||||
{
|
||||
uint32_t c = pat_patctl.control & exclusive_pat_controls[k];
|
||||
if (c != 0 && c != (c & (~c+1)))
|
||||
{
|
||||
show_controls(c, "** Not allowed together:");
|
||||
show_controls(c, 0, "** Not allowed together:");
|
||||
fprintf(outfile, "\n");
|
||||
return PR_SKIP;
|
||||
}
|
||||
|
@ -4605,9 +4634,11 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
|||
pat_patctl.options & ~POSIX_SUPPORTED_COMPILE_OPTIONS, msg, "");
|
||||
msg = "";
|
||||
}
|
||||
if ((pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS) != 0)
|
||||
if ((pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS) != 0 ||
|
||||
(pat_patctl.control2 & ~POSIX_SUPPORTED_COMPILE_CONTROLS2) != 0)
|
||||
{
|
||||
show_controls(pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS, msg);
|
||||
show_controls(pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS,
|
||||
pat_patctl.control2 & ~POSIX_SUPPORTED_COMPILE_CONTROLS2, msg);
|
||||
msg = "";
|
||||
}
|
||||
|
||||
|
@ -4663,15 +4694,19 @@ if ((pat_patctl.control & CTL_PUSH) != 0)
|
|||
fprintf(outfile, "** Replacement text is not supported with 'push'.\n");
|
||||
return PR_OK;
|
||||
}
|
||||
if ((pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS) != 0)
|
||||
if ((pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS) != 0 ||
|
||||
(pat_patctl.control2 & ~PUSH_SUPPORTED_COMPILE_CONTROLS2) != 0)
|
||||
{
|
||||
show_controls(pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS,
|
||||
pat_patctl.control2 & ~PUSH_SUPPORTED_COMPILE_CONTROLS2,
|
||||
"** Ignored when compiled pattern is stacked with 'push':");
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
if ((pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS) != 0)
|
||||
if ((pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS) != 0 ||
|
||||
(pat_patctl.control2 & PUSH_COMPILE_ONLY_CONTROLS2) != 0)
|
||||
{
|
||||
show_controls(pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS,
|
||||
pat_patctl.control2 & PUSH_COMPILE_ONLY_CONTROLS2,
|
||||
"** Applies only to compile when pattern is stacked with 'push':");
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
|
@ -5340,6 +5375,7 @@ matching. */
|
|||
DATCTXCPY(dat_context, default_dat_context);
|
||||
memcpy(&dat_datctl, &def_datctl, sizeof(datctl));
|
||||
dat_datctl.control |= (pat_patctl.control & CTL_ALLPD);
|
||||
dat_datctl.control2 |= (pat_patctl.control2 & CTL2_ALLPD);
|
||||
strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement);
|
||||
|
||||
/* Initialize for scanning the data line. */
|
||||
|
@ -5657,14 +5693,15 @@ ulen = len/code_unit_size; /* Length in code units */
|
|||
if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl))
|
||||
return PR_OK;
|
||||
|
||||
/* Check for mutually exclusive modifiers. */
|
||||
/* Check for mutually exclusive modifiers. At present, these are all in the
|
||||
first control word. */
|
||||
|
||||
for (k = 0; k < sizeof(exclusive_dat_controls)/sizeof(uint32_t); k++)
|
||||
{
|
||||
c = dat_datctl.control & exclusive_dat_controls[k];
|
||||
if (c != 0 && c != (c & (~c+1)))
|
||||
{
|
||||
show_controls(c, "** Not allowed together:");
|
||||
show_controls(c, 0, "** Not allowed together:");
|
||||
fprintf(outfile, "\n");
|
||||
return PR_OK;
|
||||
}
|
||||
|
@ -5717,9 +5754,11 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
|||
show_match_options(dat_datctl.options & ~POSIX_SUPPORTED_MATCH_OPTIONS);
|
||||
msg = "";
|
||||
}
|
||||
if ((dat_datctl.control & ~POSIX_SUPPORTED_MATCH_CONTROLS) != 0)
|
||||
if ((dat_datctl.control & ~POSIX_SUPPORTED_MATCH_CONTROLS) != 0 ||
|
||||
(dat_datctl.control2 & ~POSIX_SUPPORTED_MATCH_CONTROLS2) != 0)
|
||||
{
|
||||
show_controls(dat_datctl.control & ~POSIX_SUPPORTED_MATCH_CONTROLS, msg);
|
||||
show_controls(dat_datctl.control & ~POSIX_SUPPORTED_MATCH_CONTROLS,
|
||||
dat_datctl.control2 & ~POSIX_SUPPORTED_MATCH_CONTROLS2, msg);
|
||||
msg = "";
|
||||
}
|
||||
|
||||
|
@ -5891,9 +5930,13 @@ if (dat_datctl.replacement[0] != 0)
|
|||
|
||||
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_GLOBAL) |
|
||||
(((dat_datctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_EXTENDED) |
|
||||
(((dat_datctl.control & CTL_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) |
|
||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
|
||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY);
|
||||
|
||||
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
|
||||
|
@ -5987,12 +6030,16 @@ if (dat_datctl.replacement[0] != 0)
|
|||
|
||||
if (rc < 0)
|
||||
{
|
||||
PCRE2_SIZE msize;
|
||||
fprintf(outfile, "Failed: error %d", rc);
|
||||
if (nsize != PCRE2_UNSET)
|
||||
if (rc != PCRE2_ERROR_NOMEMORY && nsize != PCRE2_UNSET)
|
||||
fprintf(outfile, " at offset %ld in replacement", nsize);
|
||||
fprintf(outfile, ": ");
|
||||
PCRE2_GET_ERROR_MESSAGE(nsize, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, nsize, FALSE, outfile);
|
||||
PCRE2_GET_ERROR_MESSAGE(msize, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, msize, FALSE, outfile);
|
||||
if (rc == PCRE2_ERROR_NOMEMORY &&
|
||||
(xoptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)
|
||||
fprintf(outfile, ": %ld code units are needed", nsize);
|
||||
}
|
||||
else
|
||||
{
|
||||
|
@ -6850,7 +6897,8 @@ control blocks must be the same so that common options and controls such as
|
|||
We cannot test this till runtime because "offsetof" does not work in the
|
||||
preprocessor. */
|
||||
|
||||
if (PO(options) != DO(options) || PO(control) != DO(control))
|
||||
if (PO(options) != DO(options) || PO(control) != DO(control) ||
|
||||
PO(control2) != DO(control2))
|
||||
{
|
||||
fprintf(stderr, "** Coding error: "
|
||||
"options and control offsets for pattern and data must be the same.\n");
|
||||
|
|
|
@ -4042,8 +4042,6 @@
|
|||
|
||||
/(((((a)))))/parens_nest_limit=2
|
||||
|
||||
# Tests for pcre2_substitute()
|
||||
|
||||
/abc/replace=XYZ
|
||||
123123
|
||||
123abc123
|
||||
|
@ -4149,11 +4147,24 @@
|
|||
|
||||
/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[22]${*MARK}
|
||||
apple lemon blackberry
|
||||
apple lemon blackberry\=substitute_overflow_length
|
||||
|
||||
/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[23]${*MARK}
|
||||
apple lemon blackberry
|
||||
|
||||
# End of substitute tests
|
||||
/abc/
|
||||
123abc123\=replace=[9]XYZ
|
||||
123abc123\=substitute_overflow_length,replace=[9]XYZ
|
||||
123abc123\=substitute_overflow_length,replace=[6]XYZ
|
||||
123abc123\=substitute_overflow_length,replace=[1]XYZ
|
||||
123abc123\=substitute_overflow_length,replace=[0]XYZ
|
||||
|
||||
/a(b)c/
|
||||
123abc123\=replace=[9]x$1z
|
||||
123abc123\=substitute_overflow_length,replace=[9]x$1z
|
||||
123abc123\=substitute_overflow_length,replace=[6]x$1z
|
||||
123abc123\=substitute_overflow_length,replace=[1]x$1z
|
||||
123abc123\=substitute_overflow_length,replace=[0]x$1z
|
||||
|
||||
"((?=(?(?=(?(?=(?(?=()))))))))"
|
||||
a
|
||||
|
@ -4749,12 +4760,24 @@ a)"xI
|
|||
cat\=replace=>$1<,substitute_unset_empty
|
||||
xbcom\=replace=>$1<,substitute_unset_empty
|
||||
|
||||
/a|(b)c/substitute_extended
|
||||
cat\=replace=>${2:-xx}<
|
||||
cat\=replace=>${2:-xx}<,substitute_unknown_unset
|
||||
cat\=replace=>${X:-xx}<,substitute_unknown_unset
|
||||
|
||||
/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
|
||||
cat
|
||||
xbcom
|
||||
|
||||
/a|(?'X'b)c/replace=>$Y<,substitute_unset_empty
|
||||
cat
|
||||
cat\=substitute_unknown_unset
|
||||
cat\=substitute_unknown_unset,-substitute_unset_empty
|
||||
|
||||
/a|(b)c/replace=>$2<,substitute_unset_empty
|
||||
cat
|
||||
cat\=substitute_unknown_unset
|
||||
cat\=substitute_unknown_unset,-substitute_unset_empty
|
||||
|
||||
/()()()/use_offset_limit
|
||||
\=ovector=11000000000
|
||||
|
|
|
@ -13432,8 +13432,6 @@ Subject length lower bound = 0
|
|||
/(((((a)))))/parens_nest_limit=2
|
||||
Failed: error 119 at offset 3: parentheses are too deeply nested
|
||||
|
||||
# Tests for pcre2_substitute()
|
||||
|
||||
/abc/replace=XYZ
|
||||
123123
|
||||
0: 123123
|
||||
|
@ -13583,12 +13581,36 @@ Failed: error -35 at offset 9 in replacement: invalid replacement string
|
|||
/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[22]${*MARK}
|
||||
apple lemon blackberry
|
||||
Failed: error -48: no more memory
|
||||
apple lemon blackberry\=substitute_overflow_length
|
||||
Failed: error -48: no more memory: 23 code units are needed
|
||||
|
||||
/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[23]${*MARK}
|
||||
apple lemon blackberry
|
||||
3: pear orange strawberry
|
||||
|
||||
# End of substitute tests
|
||||
/abc/
|
||||
123abc123\=replace=[9]XYZ
|
||||
Failed: error -48: no more memory
|
||||
123abc123\=substitute_overflow_length,replace=[9]XYZ
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
123abc123\=substitute_overflow_length,replace=[6]XYZ
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
123abc123\=substitute_overflow_length,replace=[1]XYZ
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
123abc123\=substitute_overflow_length,replace=[0]XYZ
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
|
||||
/a(b)c/
|
||||
123abc123\=replace=[9]x$1z
|
||||
Failed: error -48: no more memory
|
||||
123abc123\=substitute_overflow_length,replace=[9]x$1z
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
123abc123\=substitute_overflow_length,replace=[6]x$1z
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
123abc123\=substitute_overflow_length,replace=[1]x$1z
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
123abc123\=substitute_overflow_length,replace=[0]x$1z
|
||||
Failed: error -48: no more memory: 10 code units are needed
|
||||
|
||||
"((?=(?(?=(?(?=(?(?=()))))))))"
|
||||
a
|
||||
|
@ -15075,15 +15097,35 @@ Failed: error -55 at offset 3 in replacement: requested value is not set
|
|||
xbcom\=replace=>$1<,substitute_unset_empty
|
||||
1: x>b<om
|
||||
|
||||
/a|(b)c/substitute_extended
|
||||
cat\=replace=>${2:-xx}<
|
||||
Failed: error -49 at offset 9 in replacement: unknown substring
|
||||
cat\=replace=>${2:-xx}<,substitute_unknown_unset
|
||||
1: c>xx<t
|
||||
cat\=replace=>${X:-xx}<,substitute_unknown_unset
|
||||
1: c>xx<t
|
||||
|
||||
/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
|
||||
cat
|
||||
1: c><t
|
||||
xbcom
|
||||
1: x>b<om
|
||||
|
||||
/a|(?'X'b)c/replace=>$Y<,substitute_unset_empty
|
||||
cat
|
||||
Failed: error -49 at offset 3 in replacement: unknown substring
|
||||
cat\=substitute_unknown_unset
|
||||
1: c><t
|
||||
cat\=substitute_unknown_unset,-substitute_unset_empty
|
||||
Failed: error -55 at offset 3 in replacement: requested value is not set
|
||||
|
||||
/a|(b)c/replace=>$2<,substitute_unset_empty
|
||||
cat
|
||||
Failed: error -49 at offset 3 in replacement: unknown substring
|
||||
cat\=substitute_unknown_unset
|
||||
1: c><t
|
||||
cat\=substitute_unknown_unset,-substitute_unset_empty
|
||||
Failed: error -55 at offset 3 in replacement: requested value is not set
|
||||
|
||||
/()()()/use_offset_limit
|
||||
\=ovector=11000000000
|
||||
|
|
Loading…
Reference in New Issue