Substitution tests and documentation.
This commit is contained in:
parent
b3ac0ffb32
commit
c19bd9a377
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "09 November 2014" "PCRE 10.00"
|
.TH PCRE2TEST 1 "12 November 2014" "PCRE 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -645,6 +645,7 @@ not affect the compilation process.
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
/g global global matching
|
/g global global matching
|
||||||
mark show mark values
|
mark show mark values
|
||||||
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
.sp
|
.sp
|
||||||
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
||||||
|
@ -719,6 +720,7 @@ pattern.
|
||||||
offset=<n> set starting offset
|
offset=<n> set starting offset
|
||||||
ovector=<n> set size of output vector
|
ovector=<n> set size of output vector
|
||||||
recursion_limit=<n> set a recursion limit
|
recursion_limit=<n> set a recursion limit
|
||||||
|
replace=<string> specify a replacement string
|
||||||
startchar show startchar when relevant
|
startchar show startchar when relevant
|
||||||
zero_terminate pass the subject as zero-terminated
|
zero_terminate pass the subject as zero-terminated
|
||||||
.sp
|
.sp
|
||||||
|
@ -797,6 +799,29 @@ Any value other than zero is used as a return from \fBpcre2test\fP's callout
|
||||||
function.
|
function.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Finding all matches in a string"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
Searching for all possible matches within a subject can be requested by the
|
||||||
|
\fBglobal\fP or \fB/altglobal\fP modifier. After finding a match, the matching
|
||||||
|
function is called again to search the remainder of the subject. The difference
|
||||||
|
between \fBglobal\fP and \fBaltglobal\fP is that the former uses the
|
||||||
|
\fIstart_offset\fP argument to \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
|
||||||
|
to start searching at a new point within the entire string (which is what Perl
|
||||||
|
does), whereas the latter passes over a shortened substring. This makes a
|
||||||
|
difference to the matching process if the pattern begins with a lookbehind
|
||||||
|
assertion (including \eb or \eB).
|
||||||
|
.P
|
||||||
|
If an empty string is matched, the next match is done with the
|
||||||
|
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
|
||||||
|
another, non-empty, match at the same point in the subject. If this match
|
||||||
|
fails, the start offset is advanced, and the normal match is retried. This
|
||||||
|
imitates the way Perl handles such cases when using the \fB/g\fP modifier or
|
||||||
|
the \fBsplit()\fP function. Normally, the start offset is advanced by one
|
||||||
|
character, but if the newline convention recognizes CRLF as a newline, and the
|
||||||
|
current character is CR followed by LF, an advance of two is used.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SS "Testing substring extraction functions"
|
.SS "Testing substring extraction functions"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -821,27 +846,38 @@ length (that is, the return from the extraction function) is given in
|
||||||
parentheses after each substring.
|
parentheses after each substring.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Finding all matches in a string"
|
.SS "Testing the substitution function"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
Searching for all possible matches within a subject can be requested by the
|
If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
|
||||||
\fBglobal\fP or \fB/altglobal\fP modifier. After finding a match, the matching
|
called instead of one of the matching functions. Unlike subject strings,
|
||||||
function is called again to search the remainder of the subject. The difference
|
\fBpcre2test\fP does not process replacement strings for escape sequences. In
|
||||||
between \fBglobal\fP and \fBaltglobal\fP is that the former uses the
|
UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
|
||||||
\fIstart_offset\fP argument to \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
|
If so, it is correctly converted to a UTF string of the appropriate code unit
|
||||||
to start searching at a new point within the entire string (which is what Perl
|
width. If it is not a valid UTF-8 string, the individual code units are copied
|
||||||
does), whereas the latter passes over a shortened substring. This makes a
|
directly. This provides a means of passing an invalid UTF-8 string for testing
|
||||||
difference to the matching process if the pattern begins with a lookbehind
|
purposes.
|
||||||
assertion (including \eb or \eB).
|
|
||||||
.P
|
.P
|
||||||
If an empty string is matched, the next match is done with the
|
If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
|
\fBpcre2_substitute()\fP. After a successful substitution, the modified string
|
||||||
another, non-empty, match at the same point in the subject. If this match
|
is output, preceded by the number of replacements. This may be zero if there
|
||||||
fails, the start offset is advanced, and the normal match is retried. This
|
were no matches. Here is a simple example of a substitution test:
|
||||||
imitates the way Perl handles such cases when using the \fB/g\fP modifier or
|
.sp
|
||||||
the \fBsplit()\fP function. Normally, the start offset is advanced by one
|
/abc/replace=xxx
|
||||||
character, but if the newline convention recognizes CRLF as a newline, and the
|
=abc=abc=
|
||||||
current character is CR followed by LF, an advance of two is used.
|
1: =xxx=abc=
|
||||||
|
=abc=abc=\=global
|
||||||
|
2: =xxx=xxx=
|
||||||
|
.sp
|
||||||
|
Subject and replacement strings should be kept relatively short for
|
||||||
|
substitution tests, as fixed-size buffers are used. To make it easy to test for
|
||||||
|
buffer overflow, if the replacement string starts with a number in square
|
||||||
|
brackets, that number is passed to \fBpcre2_substitute()\fP as the size of the
|
||||||
|
output buffer, with the replacement string starting at the next character.
|
||||||
|
.P
|
||||||
|
A replacement string is ignored with POSIX and DFA matching. Specifying partial
|
||||||
|
matching provokes an error return ("bad option value") from
|
||||||
|
\fBpcre2_substitute()\fP.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Setting the JIT stack size"
|
.SS "Setting the JIT stack size"
|
||||||
|
@ -1200,6 +1236,6 @@ Cambridge CB2 3QH, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 09 November 2014
|
Last updated: 12 November 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -102,7 +102,7 @@ static const char compile_error_texts[] =
|
||||||
/* 30 */
|
/* 30 */
|
||||||
"unknown POSIX class name\0"
|
"unknown POSIX class name\0"
|
||||||
"internal error in pcre2_study(): should not occur\0"
|
"internal error in pcre2_study(): should not occur\0"
|
||||||
"this version of PCRE does not have UTF or Unicode property support\0"
|
"this version of PCRE2 does not have Unicode support\0"
|
||||||
"parentheses are too deeply nested (stack check)\0"
|
"parentheses are too deeply nested (stack check)\0"
|
||||||
"character code point value in \\x{} or \\o{} is too large\0"
|
"character code point value in \\x{} or \\o{} is too large\0"
|
||||||
/* 35 */
|
/* 35 */
|
||||||
|
@ -118,7 +118,7 @@ static const char compile_error_texts[] =
|
||||||
"two named subpatterns have the same name (PCRE2_DUPNAMES not set)\0"
|
"two named subpatterns have the same name (PCRE2_DUPNAMES not set)\0"
|
||||||
"group name must start with a non-digit\0"
|
"group name must start with a non-digit\0"
|
||||||
/* 45 */
|
/* 45 */
|
||||||
"this version of PCRE does not have support for \\P, \\p, or \\X\0"
|
"this version of PCRE2 does not have support for \\P, \\p, or \\X\0"
|
||||||
"malformed \\P or \\p sequence\0"
|
"malformed \\P or \\p sequence\0"
|
||||||
"unknown property name after \\P or \\p\0"
|
"unknown property name after \\P or \\p\0"
|
||||||
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)\0"
|
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)\0"
|
||||||
|
|
|
@ -40,14 +40,16 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
|
||||||
|
|
||||||
/* This module contains an internal function for validating UTF character
|
/* This module contains an internal function for validating UTF character
|
||||||
strings. */
|
strings. This file is also #included by the pcre2test program, which uses
|
||||||
|
macros to change names from _pcre2_xxx to xxxx, thereby avoiding name clashes
|
||||||
|
with the library. In this case, PCRE2_PCRE2TEST is defined. */
|
||||||
|
|
||||||
|
#ifndef PCRE2_PCRE2TEST /* We're compiling the library */
|
||||||
#ifdef HAVE_CONFIG_H
|
#ifdef HAVE_CONFIG_H
|
||||||
#include "config.h"
|
#include "config.h"
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#include "pcre2_internal.h"
|
#include "pcre2_internal.h"
|
||||||
|
#endif /* PCRE2_PCRE2TEST */
|
||||||
|
|
||||||
|
|
||||||
#ifndef SUPPORT_UNICODE
|
#ifndef SUPPORT_UNICODE
|
||||||
|
|
127
src/pcre2test.c
127
src/pcre2test.c
|
@ -165,9 +165,14 @@ void vms_setsymbol( char *, char *, int );
|
||||||
#define DEFAULT_OVECCOUNT 15 /* Default ovector count */
|
#define DEFAULT_OVECCOUNT 15 /* Default ovector count */
|
||||||
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
||||||
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
||||||
#define REPLACE_BUFFSIZE 400 /* For replacement strings */
|
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
||||||
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
||||||
|
|
||||||
|
/* Make sure the buffer into which replacement strings are copied is big enough
|
||||||
|
to hold them as 32-bit code units. */
|
||||||
|
|
||||||
|
#define REPLACE_BUFFSIZE (4*REPLACE_MODSIZE)
|
||||||
|
|
||||||
/* Execution modes */
|
/* Execution modes */
|
||||||
|
|
||||||
#define PCRE8_MODE 8
|
#define PCRE8_MODE 8
|
||||||
|
@ -258,6 +263,20 @@ these inclusions should not be changed. */
|
||||||
|
|
||||||
#define PCRE2_SUFFIX(a) a
|
#define PCRE2_SUFFIX(a) a
|
||||||
|
|
||||||
|
/* We need to be able to check input text for UTF-8 validity, whatever code
|
||||||
|
widths are actually available, because the input to pcre2test is always in
|
||||||
|
8-bit code units. So we include the UTF validity checking function for 8-bit
|
||||||
|
code units. */
|
||||||
|
|
||||||
|
extern int valid_utf(PCRE2_SPTR8, PCRE2_SIZE, PCRE2_SIZE *);
|
||||||
|
|
||||||
|
#define PCRE2_CODE_UNIT_WIDTH 8
|
||||||
|
#undef PCRE2_SPTR
|
||||||
|
#define PCRE2_SPTR PCRE2_SPTR8
|
||||||
|
#include "pcre2_valid_utf.c"
|
||||||
|
#undef PCRE2_CODE_UNIT_WIDTH
|
||||||
|
#undef PCRE2_SPTR
|
||||||
|
|
||||||
/* If we have 8-bit support, default to it; if there is also 16-or 32-bit
|
/* If we have 8-bit support, default to it; if there is also 16-or 32-bit
|
||||||
support, it can be selected by a command-line option. If there is no 8-bit
|
support, it can be selected by a command-line option. If there is no 8-bit
|
||||||
support, there must be 16- or 32-bit support, so default to one of them. The
|
support, there must be 16- or 32-bit support, so default to one of them. The
|
||||||
|
@ -369,15 +388,20 @@ data line. */
|
||||||
CTL_MARK|\
|
CTL_MARK|\
|
||||||
CTL_MEMORY|\
|
CTL_MEMORY|\
|
||||||
CTL_STARTCHAR)
|
CTL_STARTCHAR)
|
||||||
|
|
||||||
|
/* Structures for holding modifier information for patterns and subject strings
|
||||||
|
(data). Fields containing modifiers that can be set either for a pattern or a
|
||||||
|
subject must be at the start and in the same order in both cases so that the
|
||||||
|
same offset in the big table below works for both. */
|
||||||
|
|
||||||
typedef struct patctl { /* Structure for pattern modifiers. */
|
typedef struct patctl { /* Structure for pattern modifiers. */
|
||||||
uint32_t options; /* Must be in same position as datctl */
|
uint32_t options; /* Must be in same position as datctl */
|
||||||
uint32_t control; /* Must be in same position as datctl */
|
uint32_t control; /* Must be in same position as datctl */
|
||||||
|
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
|
||||||
uint32_t jit;
|
uint32_t jit;
|
||||||
uint32_t stackguard_test;
|
uint32_t stackguard_test;
|
||||||
uint32_t tables_id;
|
uint32_t tables_id;
|
||||||
uint8_t locale[32];
|
uint8_t locale[32];
|
||||||
uint8_t replacement[REPLACE_BUFFSIZE];
|
|
||||||
} patctl;
|
} patctl;
|
||||||
|
|
||||||
#define MAXCPYGET 10
|
#define MAXCPYGET 10
|
||||||
|
@ -386,6 +410,7 @@ typedef struct patctl { /* Structure for pattern modifiers. */
|
||||||
typedef struct datctl { /* Structure for data line modifiers. */
|
typedef struct datctl { /* Structure for data line modifiers. */
|
||||||
uint32_t options; /* Must be in same position as patctl */
|
uint32_t options; /* Must be in same position as patctl */
|
||||||
uint32_t control; /* Must be in same position as patctl */
|
uint32_t control; /* Must be in same position as patctl */
|
||||||
|
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
|
||||||
uint32_t cfail[2];
|
uint32_t cfail[2];
|
||||||
int32_t callout_data;
|
int32_t callout_data;
|
||||||
int32_t copy_numbers[MAXCPYGET];
|
int32_t copy_numbers[MAXCPYGET];
|
||||||
|
@ -487,7 +512,7 @@ static modstruct modlist[] = {
|
||||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
||||||
{ "replace", MOD_PAT, MOD_STR, 0, PO(replacement) },
|
{ "replace", MOD_PND, MOD_STR, 0, PO(replacement) },
|
||||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||||
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
||||||
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
||||||
|
@ -4211,13 +4236,14 @@ uint32_t *q32 = NULL;
|
||||||
|
|
||||||
/* Copy the default context and data control blocks to the active ones. Then
|
/* Copy the default context and data control blocks to the active ones. Then
|
||||||
copy from the pattern the controls that can be set in either the pattern or the
|
copy from the pattern the controls that can be set in either the pattern or the
|
||||||
data. This allows them to be unset in the data line. We do not do this for
|
data. This allows them to be overridden in the data line. We do not do this for
|
||||||
options because those that are common apply separately to compiling and
|
options because those that are common apply separately to compiling and
|
||||||
matching. */
|
matching. */
|
||||||
|
|
||||||
DATCTXCPY(dat_context, default_dat_context);
|
DATCTXCPY(dat_context, default_dat_context);
|
||||||
memcpy(&dat_datctl, &def_datctl, sizeof(datctl));
|
memcpy(&dat_datctl, &def_datctl, sizeof(datctl));
|
||||||
dat_datctl.control |= (pat_patctl.control & CTL_ALLPD);
|
dat_datctl.control |= (pat_patctl.control & CTL_ALLPD);
|
||||||
|
strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement);
|
||||||
|
|
||||||
/* Initialize for scanning the data line. */
|
/* Initialize for scanning the data line. */
|
||||||
|
|
||||||
|
@ -4715,20 +4741,28 @@ else
|
||||||
PCRE2_MATCH_DATA_FREE(match_data);
|
PCRE2_MATCH_DATA_FREE(match_data);
|
||||||
PCRE2_MATCH_DATA_CREATE(match_data, max_oveccount, NULL);
|
PCRE2_MATCH_DATA_CREATE(match_data, max_oveccount, NULL);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Replacement processing is ignored for DFA matching. */
|
||||||
|
|
||||||
|
if (dat_datctl.replacement[0] != 0 && (dat_datctl.control & CTL_DFA) != 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Ignored for DFA matching: replace\n");
|
||||||
|
dat_datctl.replacement[0] = 0;
|
||||||
|
}
|
||||||
|
|
||||||
/* If a replacement string is provided, call pcre2_substitute() instead of one
|
/* If a replacement string is provided, call pcre2_substitute() instead of one
|
||||||
of the matching functions. First we have to convert the replacement string to
|
of the matching functions. First we have to convert the replacement string to
|
||||||
the appropriate width. */
|
the appropriate width. */
|
||||||
|
|
||||||
if (pat_patctl.replacement[0] != 0)
|
if (dat_datctl.replacement[0] != 0)
|
||||||
{
|
{
|
||||||
int rc;
|
int rc;
|
||||||
uint8_t *pr;
|
uint8_t *pr;
|
||||||
uint8_t rbuffer[REPLACE_BUFFSIZE];
|
uint8_t rbuffer[REPLACE_BUFFSIZE];
|
||||||
uint8_t nbuffer[REPLACE_BUFFSIZE];
|
uint8_t nbuffer[REPLACE_BUFFSIZE];
|
||||||
uint32_t goption;
|
uint32_t goption;
|
||||||
PCRE2_SIZE rlen;
|
PCRE2_SIZE rlen, nsize, erroroffset;
|
||||||
PCRE2_SIZE nsize;
|
BOOL badutf = FALSE;
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
uint8_t *r8 = NULL;
|
uint8_t *r8 = NULL;
|
||||||
|
@ -4740,10 +4774,13 @@ if (pat_patctl.replacement[0] != 0)
|
||||||
uint32_t *r32 = NULL;
|
uint32_t *r32 = NULL;
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
goption = ((pat_patctl.control & CTL_GLOBAL) == 0)? 0 :
|
if (timeitm)
|
||||||
|
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
||||||
|
|
||||||
|
goption = ((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_GLOBAL;
|
PCRE2_SUBSTITUTE_GLOBAL;
|
||||||
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
|
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
|
||||||
pr = pat_patctl.replacement;
|
pr = dat_datctl.replacement;
|
||||||
|
|
||||||
/* If the replacement starts with '[<number>]' we interpret that as length
|
/* If the replacement starts with '[<number>]' we interpret that as length
|
||||||
value for the replacement buffer. */
|
value for the replacement buffer. */
|
||||||
|
@ -4767,52 +4804,58 @@ if (pat_patctl.replacement[0] != 0)
|
||||||
nsize = n;
|
nsize = n;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Now copy the replacement string to a buffer of the appropriate width. */
|
/* Now copy the replacement string to a buffer of the appropriate width. No
|
||||||
|
escape processing is done for replacements. In UTF mode, check for an invalid
|
||||||
|
UTF-8 input string, and if it is invalid, just copy its code units without
|
||||||
|
UTF interpretation. This provides a means of checking that an invalid string
|
||||||
|
is detected. Otherwise, UTF-8 can be used to include wide characters in a
|
||||||
|
replacement. */
|
||||||
|
|
||||||
|
if (utf) badutf = valid_utf(pr, strlen((const char *)pr), &erroroffset);
|
||||||
|
|
||||||
while ((c = *pr++) != 0)
|
/* Not UTF or invalid UTF-8: just copy the code units. */
|
||||||
|
|
||||||
|
if (!utf || badutf)
|
||||||
{
|
{
|
||||||
if (utf && HASUTF8EXTRALEN(c)) { GETUTF8INC(c, pr); }
|
while ((c = *pr++) != 0)
|
||||||
|
{
|
||||||
/* At present no escape processing is provided for replacements. */
|
#ifdef SUPPORT_PCRE2_8
|
||||||
|
if (test_mode == PCRE8_MODE) *r8++ = c;
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_16
|
||||||
|
if (test_mode == PCRE16_MODE) *r16++ = c;
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_32
|
||||||
|
if (test_mode == PCRE32_MODE) *r32++ = c;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Valid UTF-8 replacement string */
|
||||||
|
|
||||||
|
else while ((c = *pr++) != 0)
|
||||||
|
{
|
||||||
|
if (HASUTF8EXTRALEN(c)) { GETUTF8INC(c, pr); }
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
if (test_mode == PCRE8_MODE)
|
if (test_mode == PCRE8_MODE) r8 += ord2utf8(c, r8);
|
||||||
{
|
|
||||||
if (utf)
|
|
||||||
{
|
|
||||||
r8 += ord2utf8(c, r8);
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
*r8++ = c;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_16
|
#ifdef SUPPORT_PCRE2_16
|
||||||
if (test_mode == PCRE16_MODE)
|
if (test_mode == PCRE16_MODE)
|
||||||
{
|
{
|
||||||
if (utf)
|
if (c >= 0x10000u)
|
||||||
{
|
{
|
||||||
if (c >= 0x10000u)
|
c-= 0x10000u;
|
||||||
{
|
*r16++ = 0xD800 | (c >> 10);
|
||||||
c-= 0x10000u;
|
*r16++ = 0xDC00 | (c & 0x3ff);
|
||||||
*r16++ = 0xD800 | (c >> 10);
|
|
||||||
*r16++ = 0xDC00 | (c & 0x3ff);
|
|
||||||
}
|
|
||||||
else
|
|
||||||
*r16++ = c;
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
*r16++ = c;
|
|
||||||
}
|
}
|
||||||
|
else *r16++ = c;
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_32
|
#ifdef SUPPORT_PCRE2_32
|
||||||
if (test_mode == PCRE32_MODE)
|
if (test_mode == PCRE32_MODE) *r32++ = c;
|
||||||
{
|
|
||||||
*r32++ = c;
|
|
||||||
}
|
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -444,4 +444,7 @@
|
||||||
|
|
||||||
/\x{3a3}B/IBi,utf
|
/\x{3a3}B/IBi,utf
|
||||||
|
|
||||||
|
/abc/utf,replace=Ã
|
||||||
|
abc
|
||||||
|
|
||||||
# End of testinput10
|
# End of testinput10
|
||||||
|
|
|
@ -4067,6 +4067,12 @@ a random value. /Ix
|
||||||
/abc/replace=xyz
|
/abc/replace=xyz
|
||||||
1abc2\=partial_hard
|
1abc2\=partial_hard
|
||||||
|
|
||||||
|
/abc/replace=xyz
|
||||||
|
123abc456
|
||||||
|
123abc456\=replace=pqr
|
||||||
|
123abc456abc789
|
||||||
|
123abc456abc789\=g
|
||||||
|
|
||||||
# End of substitute tests
|
# End of substitute tests
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -1546,4 +1546,8 @@ Starting code units: \xce \xcf
|
||||||
Last code unit = 'B' (caseless)
|
Last code unit = 'B' (caseless)
|
||||||
Subject length lower bound = 2
|
Subject length lower bound = 2
|
||||||
|
|
||||||
|
/abc/utf,replace=Ã
|
||||||
|
abc
|
||||||
|
Failed: error -3: UTF-8 error: 1 byte missing at end
|
||||||
|
|
||||||
# End of testinput10
|
# End of testinput10
|
||||||
|
|
|
@ -13689,6 +13689,16 @@ Failed: error -47: no more memory
|
||||||
1abc2\=partial_hard
|
1abc2\=partial_hard
|
||||||
Failed: error -34: bad option value
|
Failed: error -34: bad option value
|
||||||
|
|
||||||
|
/abc/replace=xyz
|
||||||
|
123abc456
|
||||||
|
1: 123xyz456
|
||||||
|
123abc456\=replace=pqr
|
||||||
|
1: 123pqr456
|
||||||
|
123abc456abc789
|
||||||
|
1: 123xyz456abc789
|
||||||
|
123abc456abc789\=g
|
||||||
|
2: 123xyz456xyz789
|
||||||
|
|
||||||
# End of substitute tests
|
# End of substitute tests
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
Loading…
Reference in New Issue