From 810d9b6da5c8e598c12c35ad20122eddb42239c2 Mon Sep 17 00:00:00 2001
From: "Philip.Hazel"
This function frees the memory used for a compiled pattern, including any -memory used by the JIT compiler. If the compiled pattern was created by a call -to pcre2_code_copy_with_tables(), the memory for the character tables is +memory used by the JIT compiler. If the compiled pattern was created by a call +to pcre2_code_copy_with_tables(), the memory for the character tables is also freed.
diff --git a/doc/html/pcre2_compile.html b/doc/html/pcre2_compile.html
index da103cd..0a9eafa 100644
--- a/doc/html/pcre2_compile.html
+++ b/doc/html/pcre2_compile.html
@@ -64,7 +64,7 @@ The option bits are:
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_EXTENDED Ignore white space and # comments
PCRE2_FIRSTLINE Force matching to be before newline
- PCRE2_LITERAL Pattern characters are all literal
+ PCRE2_LITERAL Pattern characters are all literal
PCRE2_MATCH_UNSET_BACKREF Match unset back references
PCRE2_MULTILINE ^ and $ match newlines within data
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns
diff --git a/doc/html/pcre2_config.html b/doc/html/pcre2_config.html
index 7929d62..465f6a1 100644
--- a/doc/html/pcre2_config.html
+++ b/doc/html/pcre2_config.html
@@ -45,7 +45,7 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_CONFIG_BSR Indicates what \R matches by default:
PCRE2_BSR_UNICODE
PCRE2_BSR_ANYCRLF
- PCRE2_CONFIG_HEAPLIMIT Default heap memory limit
+ PCRE2_CONFIG_HEAPLIMIT Default heap memory limit
PCRE2_CONFIG_DEPTHLIMIT Default backtracking depth limit
PCRE2_CONFIG_JIT Availability of just-in-time compiler support (1=yes 0=no)
PCRE2_CONFIG_JITTARGET Information (a string) about the target architecture for the JIT compiler
@@ -57,7 +57,7 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_NEWLINE_CRLF
PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF
- PCRE2_NEWLINE_NUL
+ PCRE2_NEWLINE_NUL
PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit
PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0
diff --git a/doc/html/pcre2_converted_pattern_free.html b/doc/html/pcre2_converted_pattern_free.html
index 961f04f..11adefd 100644
--- a/doc/html/pcre2_converted_pattern_free.html
+++ b/doc/html/pcre2_converted_pattern_free.html
@@ -26,8 +26,8 @@ DESCRIPTION
This function is part of an experimental set of pattern conversion functions. -It frees the memory occupied by a converted pattern that was obtained by -calling pcre2_pattern_convert() with arguments that caused it to place +It frees the memory occupied by a converted pattern that was obtained by +calling pcre2_pattern_convert() with arguments that caused it to place the converted pattern into newly obtained heap memory.
diff --git a/doc/html/pcre2_maketables.html b/doc/html/pcre2_maketables.html
index 995c23a..6d240e3 100644
--- a/doc/html/pcre2_maketables.html
+++ b/doc/html/pcre2_maketables.html
@@ -25,7 +25,7 @@ SYNOPSIS
DESCRIPTION
-This function builds a set of character tables for character code points that +This function builds a set of character tables for character code points that are less than 256. These can be passed to pcre2_compile() in a compile context in order to override the internal, built-in tables (which were either defaulted or made by pcre2_maketables() when PCRE2 was compiled). See the diff --git a/doc/html/pcre2_match.html b/doc/html/pcre2_match.html index 724a39f..5f6f0b1 100644 --- a/doc/html/pcre2_match.html +++ b/doc/html/pcre2_match.html @@ -43,14 +43,14 @@ offsets to captured substrings. Its arguments are: A match context is needed only if you want to:
Set up a callout function - Set a matching offset limit - Change the heap memory limit - Change the backtracking match limit + Set a matching offset limit + Change the heap memory limit + Change the backtracking match limit Change the backtracking depth limit Set custom memory management specifically for the matchThe length and startoffset values are code -units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a +units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a subject that is terminated by a binary zero code unit. The options are:
PCRE2_ANCHORED Match only at the first position @@ -59,7 +59,7 @@ subject that is terminated by a binary zero code unit. The options are: PCRE2_NOTEOL Subject string is not the end of a line PCRE2_NOTEMPTY An empty string is not a valid match PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject is not a valid match - PCRE2_NO_JIT Do not use JIT matching + PCRE2_NO_JIT Do not use JIT matching PCRE2_NO_UTF_CHECK Do not check the subject for UTF validity (only relevant if PCRE2_UTF was set at compile time) PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match diff --git a/doc/html/pcre2_pattern_info.html b/doc/html/pcre2_pattern_info.html index d07f9ed..ae3e7ff 100644 --- a/doc/html/pcre2_pattern_info.html +++ b/doc/html/pcre2_pattern_info.html @@ -48,7 +48,7 @@ request are as follows: 1 first code unit is set 2 start of string or after newline PCRE2_INFO_FIRSTCODEUNIT First code unit when type is 1 - PCRE2_INFO_FRAMESIZE Size of backtracking frame + PCRE2_INFO_FRAMESIZE Size of backtracking frame PCRE2_INFO_HASBACKSLASHC Return 1 if pattern contains \C PCRE2_INFO_HASCRORLF Return 1 if explicit CR or LF matches exist in the pattern PCRE2_INFO_HEAPLIMIT Heap memory limit if set, otherwise PCRE2_ERROR_UNSET @@ -71,7 +71,7 @@ request are as follows: PCRE2_NEWLINE_CRLF PCRE2_NEWLINE_ANY PCRE2_NEWLINE_ANYCRLF - PCRE2_NEWLINE_NUL + PCRE2_NEWLINE_NUL PCRE2_INFO_RECURSIONLIMIT Obsolete synonym for PCRE2_INFO_DEPTHLIMIT PCRE2_INFO_SIZE Size of compiled patterndiff --git a/doc/html/pcre2_set_newline.html b/doc/html/pcre2_set_newline.html index a078f69..ba81300 100644 --- a/doc/html/pcre2_set_newline.html +++ b/doc/html/pcre2_set_newline.html @@ -35,7 +35,7 @@ matching patterns. The second argument must be one of: PCRE2_NEWLINE_CRLF CR followed by LF only PCRE2_NEWLINE_ANYCRLF Any of the above PCRE2_NEWLINE_ANY Any Unicode newline sequence - PCRE2_NEWLINE_NUL The NUL character (binary zero) + PCRE2_NEWLINE_NUL The NUL character (binary zero) The result is zero for success or PCRE2_ERROR_BADDATA if the second argument is invalid. diff --git a/doc/html/pcre2_set_recursion_limit.html b/doc/html/pcre2_set_recursion_limit.html index c415aa3..9ff68c2 100644 --- a/doc/html/pcre2_set_recursion_limit.html +++ b/doc/html/pcre2_set_recursion_limit.html @@ -26,7 +26,7 @@ SYNOPSIS DESCRIPTION
-This function is obsolete and should not be used in new code. Use +This function is obsolete and should not be used in new code. Use pcre2_set_depth_limit() instead.
diff --git a/doc/html/pcre2_substitute.html b/doc/html/pcre2_substitute.html index c937802..2215ce9 100644 --- a/doc/html/pcre2_substitute.html +++ b/doc/html/pcre2_substitute.html @@ -60,7 +60,7 @@ want to: The length, startoffset and rlength values are code units, not characters, as is the contents of the variable pointed at by outlengthptr, which is updated to the actual length of the new string. -The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for +The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for zero-terminated strings. The options are:
PCRE2_ANCHORED Match only at the first position diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html index 3dfe07f..823e605 100644 --- a/doc/html/pcre2build.html +++ b/doc/html/pcre2build.html @@ -87,10 +87,10 @@ Options that specify values have names that start with --with.which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as -indicating a line ending. Finally, a fifth option, specified by +indicating a line ending. A fifth option, specified by
BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
By default, a library called libpcre2-8 is built, containing functions -that take string arguments contained in vectors of bytes, interpreted either as +that take string arguments contained in arrays of bytes, interpreted either as single-byte characters, or UTF-8 strings. You can also build two other libraries, called libpcre2-16 and libpcre2-32, which process -strings that are contained in vectors of 16-bit and 32-bit code units, +strings that are contained in arrays of 16-bit and 32-bit code units, respectively. These can be interpreted either as single-unit characters or UTF-16/UTF-32 strings. To build these additional libraries, add one or both of the following to the configure command: @@ -208,19 +208,23 @@ to the configure command. There is a fourth option, specified by --enable-newline-is-anycrlf
--enable-newline-is-anycauses PCRE2 to recognize any Unicode newline sequence. The Unicode newline sequences are the three just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line -separator, U+2028), and PS (paragraph separator, U+2029). +separator, U+2028), and PS (paragraph separator, U+2029). The final option is +
+ --enable-newline-is-nul ++which causes NUL (binary zero) is set as the default line-ending character.
Whatever default line ending convention is selected when PCRE2 is built can be overridden by applications that use the library. At build time it is -conventional to use the standard for your operating system. +recommended to use the standard for your operating system.
@@ -301,7 +305,9 @@ because the size of each backtracking "frame" depends on the number of capturing parentheses in a pattern, the amount of heap that is used before the limit is reached varies from pattern to pattern. This limit was more useful in versions before 10.30, where function recursion was used for backtracking. -However, as well as applying to pcre2_match(), this limit also controls +
++As well as applying to pcre2_match(), the depth limit also controls the depth of recursive function calls in pcre2_dfa_match(). These are used for lookaround assertions, atomic groups, and recursion within patterns. The limit does not apply to JIT matching. @@ -559,7 +565,7 @@ Cambridge, England.
-Last updated: 17 June 2017
+Last updated: 18 July 2017
Copyright © 1997-2017 University of Cambridge.
diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html
index 5c890b0..e6d2e7e 100644
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@@ -85,7 +85,7 @@ documentation for details.
8. Subroutine calls (whether recursive or not) were treated as atomic groups up to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking -into subroutine calls is now supported, as in Perl. +into subroutine calls is now supported, as in Perl.
9. If any of the backtracking control verbs are used in a subpattern that is
diff --git a/doc/html/pcre2grep.html b/doc/html/pcre2grep.html
index 0a028a0..b5565a8 100644
--- a/doc/html/pcre2grep.html
+++ b/doc/html/pcre2grep.html
@@ -517,20 +517,20 @@ memory. There are three options that set resource limits for matching.
The --match-limit option provides a means of limiting computing resource
usage when processing patterns that are not going to match, but which have a
very large number of possibilities in their search trees. The classic example
-is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
-counter that is incremented each time around its main processing loop. If the
+is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
+counter that is incremented each time around its main processing loop. If the
value set by --match-limit is reached, an error occurs.
The --heap-limit option specifies, as a number of kilobytes, the amount
of heap memory that may be used for matching. Heap memory is needed only if
matching the pattern requires a significant number of nested backtracking
-points to be remembered. This parameter can be set to zero to forbid the use of
+points to be remembered. This parameter can be set to zero to forbid the use of
heap memory altogether.
The --depth-limit option limits the depth of nested backtracking points,
-which indirectly limits the amount of memory that is used. The amount of memory
+which indirectly limits the amount of memory that is used. The amount of memory
needed for each backtracking point depends on the number of capturing
parentheses in the pattern, so the amount of memory that is used before this
limit acts varies from pattern to pattern. This limit is of use only if it is
@@ -538,7 +538,7 @@ set smaller than --match-limit.
There are no short forms for these options. The default settings are specified
-when the PCRE2 library is compiled, with the default defaults being very large
+when the PCRE2 library is compiled, with the default defaults being very large
and so effectively unlimited.
@@ -841,7 +841,7 @@ patterns are ignored by pcre2grep. A callout in a PCRE2 pattern is of the form (?C<arg>) where the argument is either a number or a quoted string (see the pcre2callout -documentation for details). Numbered callouts are ignored by pcre2grep; +documentation for details). Numbered callouts are ignored by pcre2grep; only callouts with string arguments are useful.
The --match-limit option of pcre2grep can be used to set the -overall resource limit. There are also other limits that affect the amount of -memory used during matching; see the discussion of --heap-limit and +overall resource limit. There are also other limits that affect the amount of +memory used during matching; see the discussion of --heap-limit and --depth-limit above.
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code, -so that most simple patterns do not use much memory for storing the compiled +so that most simple patterns do not use much memory for storing the compiled version. However, there is one case where the memory usage of a compiled pattern can be unexpectedly large. If a parenthesized subpattern has a quantifier with a minimum greater than 1 and/or a limited maximum, the whole @@ -91,7 +91,7 @@ vector is used. Rewriting patterns to be time-efficient, as described below, may also reduce the memory requirements.
-In contrast to pcre2_match(), pcre2_dfa_match() does use recursive +In contrast to pcre2_match(), pcre2_dfa_match() does use recursive function calls, but only for processing atomic groups, lookaround assertions, and recursion within the pattern. Too much nested recursion may cause stack issues. The "match depth" parameter can be used to limit the depth of function @@ -184,7 +184,7 @@ appreciable time with strings longer than about 20 characters.
In many cases, the solution to this kind of performance issue is to use an -atomic group or a possessive quantifier. This can often reduce memory +atomic group or a possessive quantifier. This can often reduce memory requirements as well. As another example, consider this pattern:
([^<]|<(?!inet))+ @@ -205,7 +205,7 @@ are "swallowed" in one item inside the parentheses, and a possessive quantifier is used to stop any backtracking into the runs of non-"<" characters. This version also uses a lot less memory because entry to a new set of parentheses happens only when a "<" character that is not followed by "inet" is encountered -(and we assume this is relatively rare). +(and we assume this is relatively rare).The following are recognized only at the very start of a pattern or after one diff --git a/doc/pcre2.3 b/doc/pcre2.3 index 02dddaf..83a7655 100644 --- a/doc/pcre2.3 +++ b/doc/pcre2.3 @@ -130,7 +130,7 @@ against this: see the \fBpcre2_set_match_limit()\fP function in the .\" HREF \fBpcre2api\fP .\" -page. There is a similar function called \fBpcre2_set_depth_limit()\fP that can +page. There is a similar function called \fBpcre2_set_depth_limit()\fP that can be used to restrict the amount of memory that is used. . . diff --git a/doc/pcre2.txt b/doc/pcre2.txt index 186cbc7..bb0314f 100644 --- a/doc/pcre2.txt +++ b/doc/pcre2.txt @@ -170,8 +170,8 @@ REVISION Last updated: 01 April 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2API(3) Library Functions Manual PCRE2API(3) @@ -3432,8 +3432,8 @@ REVISION Last updated: 10 July 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2BUILD(3) Library Functions Manual PCRE2BUILD(3) @@ -3487,10 +3487,10 @@ PCRE2 BUILD-TIME OPTIONS BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES By default, a library called libpcre2-8 is built, containing functions - that take string arguments contained in vectors of bytes, interpreted + that take string arguments contained in arrays of bytes, interpreted either as single-byte characters, or UTF-8 strings. You can also build two other libraries, called libpcre2-16 and libpcre2-32, which process - strings that are contained in vectors of 16-bit and 32-bit code units, + strings that are contained in arrays of 16-bit and 32-bit code units, respectively. These can be interpreted either as single-unit characters or UTF-16/UTF-32 strings. To build these additional libraries, add one or both of the following to the configure command: @@ -3609,7 +3609,7 @@ NEWLINE RECOGNITION --enable-newline-is-anycrlf which causes PCRE2 to recognize any of the three sequences CR, LF, or - CRLF as indicating a line ending. Finally, a fifth option, specified by + CRLF as indicating a line ending. A fifth option, specified by --enable-newline-is-any @@ -3617,97 +3617,103 @@ NEWLINE RECOGNITION newline sequences are the three just mentioned, plus the single charac- ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, - U+2029). + U+2029). The final option is + + --enable-newline-is-nul + + which causes NUL (binary zero) is set as the default line-ending char- + acter. Whatever default line ending convention is selected when PCRE2 is built - can be overridden by applications that use the library. At build time - it is conventional to use the standard for your operating system. + can be overridden by applications that use the library. At build time + it is recommended to use the standard for your operating system. WHAT \R MATCHES - By default, the sequence \R in a pattern matches any Unicode newline - sequence, independently of what has been selected as the line ending + By default, the sequence \R in a pattern matches any Unicode newline + sequence, independently of what has been selected as the line ending sequence. If you specify --enable-bsr-anycrlf - the default is changed so that \R matches only CR, LF, or CRLF. What- - ever is selected when PCRE2 is built can be overridden by applications + the default is changed so that \R matches only CR, LF, or CRLF. What- + ever is selected when PCRE2 is built can be overridden by applications that use the library. HANDLING VERY LARGE PATTERNS - Within a compiled pattern, offset values are used to point from one - part to another (for example, from an opening parenthesis to an alter- - nation metacharacter). By default, in the 8-bit and 16-bit libraries, - two-byte values are used for these offsets, leading to a maximum size - for a compiled pattern of around 64K code units. This is sufficient to + Within a compiled pattern, offset values are used to point from one + part to another (for example, from an opening parenthesis to an alter- + nation metacharacter). By default, in the 8-bit and 16-bit libraries, + two-byte values are used for these offsets, leading to a maximum size + for a compiled pattern of around 64K code units. This is sufficient to handle all but the most gigantic patterns. Nevertheless, some people do - want to process truly enormous patterns, so it is possible to compile - PCRE2 to use three-byte or four-byte offsets by adding a setting such + want to process truly enormous patterns, so it is possible to compile + PCRE2 to use three-byte or four-byte offsets by adding a setting such as --with-link-size=3 - to the configure command. The value given must be 2, 3, or 4. For the - 16-bit library, a value of 3 is rounded up to 4. In these libraries, - using longer offsets slows down the operation of PCRE2 because it has - to load additional data when handling them. For the 32-bit library the - value is always 4 and cannot be overridden; the value of --with-link- + to the configure command. The value given must be 2, 3, or 4. For the + 16-bit library, a value of 3 is rounded up to 4. In these libraries, + using longer offsets slows down the operation of PCRE2 because it has + to load additional data when handling them. For the 32-bit library the + value is always 4 and cannot be overridden; the value of --with-link- size is ignored. LIMITING PCRE2 RESOURCE USAGE The pcre2_match() function increments a counter each time it goes round - its main loop. Putting a limit on this counter controls the amount of - computing resource used by a single call to pcre2_match(). The limit + its main loop. Putting a limit on this counter controls the amount of + computing resource used by a single call to pcre2_match(). The limit can be changed at run time, as described in the pcre2api documentation. - The default is 10 million, but this can be changed by adding a setting + The default is 10 million, but this can be changed by adding a setting such as --with-match-limit=500000 - to the configure command. This setting also applies to the - pcre2_dfa_match() matching function, and to JIT matching (though the + to the configure command. This setting also applies to the + pcre2_dfa_match() matching function, and to JIT matching (though the counting is done differently). - The pcre2_match() function starts out using a 20K vector on the system - stack to record backtracking points. The more nested backtracking + The pcre2_match() function starts out using a 20K vector on the system + stack to record backtracking points. The more nested backtracking points there are (that is, the deeper the search tree), the more memory - is needed. If the initial vector is not large enough, heap memory is + is needed. If the initial vector is not large enough, heap memory is used, up to a certain limit, which is specified in kilobytes. The limit can be changed at run time, as described in the pcre2api documentation. - The default limit (in effect unlimited) is 20 million. You can change + The default limit (in effect unlimited) is 20 million. You can change this by a setting such as --with-heap-limit=500 - which limits the amount of heap to 500 kilobytes. This limit applies - only to interpretive matching in pcre2_match(). It does not apply when - JIT (which has its own memory arrangements) is used, nor does it apply + which limits the amount of heap to 500 kilobytes. This limit applies + only to interpretive matching in pcre2_match(). It does not apply when + JIT (which has its own memory arrangements) is used, nor does it apply to pcre2_dfa_match(). - You can also explicitly limit the depth of nested backtracking in the + You can also explicitly limit the depth of nested backtracking in the pcre2_match() interpreter. This limit defaults to the value that is set - for --with-match-limit. You can set a lower default limit by adding, + for --with-match-limit. You can set a lower default limit by adding, for example, --with-match-limit_depth=10000 - to the configure command. This value can be overridden at run time. - This depth limit indirectly limits the amount of heap memory that is - used, but because the size of each backtracking "frame" depends on the - number of capturing parentheses in a pattern, the amount of heap that - is used before the limit is reached varies from pattern to pattern. - This limit was more useful in versions before 10.30, where function - recursion was used for backtracking. However, as well as applying to - pcre2_match(), this limit also controls the depth of recursive function - calls in pcre2_dfa_match(). These are used for lookaround assertions, - atomic groups, and recursion within patterns. The limit does not apply - to JIT matching. + to the configure command. This value can be overridden at run time. + This depth limit indirectly limits the amount of heap memory that is + used, but because the size of each backtracking "frame" depends on the + number of capturing parentheses in a pattern, the amount of heap that + is used before the limit is reached varies from pattern to pattern. + This limit was more useful in versions before 10.30, where function + recursion was used for backtracking. + + As well as applying to pcre2_match(), the depth limit also controls the + depth of recursive function calls in pcre2_dfa_match(). These are used + for lookaround assertions, atomic groups, and recursion within pat- + terns. The limit does not apply to JIT matching. CREATING CHARACTER TABLES AT BUILD TIME @@ -3969,11 +3975,11 @@ AUTHOR REVISION - Last updated: 17 June 2017 + Last updated: 18 July 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2CALLOUT(3) Library Functions Manual PCRE2CALLOUT(3) @@ -4366,8 +4372,8 @@ REVISION Last updated: 14 April 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2COMPAT(3) Library Functions Manual PCRE2COMPAT(3) @@ -4564,8 +4570,8 @@ REVISION Last updated: 18 April 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2JIT(3) Library Functions Manual PCRE2JIT(3) @@ -4958,8 +4964,8 @@ REVISION Last updated: 31 March 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2LIMITS(3) Library Functions Manual PCRE2LIMITS(3) @@ -5029,8 +5035,8 @@ REVISION Last updated: 30 March 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2MATCHING(3) Library Functions Manual PCRE2MATCHING(3) @@ -5248,8 +5254,8 @@ REVISION Last updated: 29 September 2014 Copyright (c) 1997-2014 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2PARTIAL(3) Library Functions Manual PCRE2PARTIAL(3) @@ -5688,8 +5694,8 @@ REVISION Last updated: 22 December 2014 Copyright (c) 1997-2014 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2PATTERN(3) Library Functions Manual PCRE2PATTERN(3) @@ -8790,8 +8796,8 @@ REVISION Last updated: 05 July 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2PERFORM(3) Library Functions Manual PCRE2PERFORM(3) @@ -9018,8 +9024,8 @@ REVISION Last updated: 08 April 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2POSIX(3) Library Functions Manual PCRE2POSIX(3) @@ -9326,8 +9332,8 @@ REVISION Last updated: 15 June 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2SAMPLE(3) Library Functions Manual PCRE2SAMPLE(3) @@ -9595,8 +9601,8 @@ REVISION Last updated: 21 March 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2SYNTAX(3) Library Functions Manual PCRE2SYNTAX(3) @@ -10043,8 +10049,8 @@ REVISION Last updated: 17 June 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + PCRE2UNICODE(3) Library Functions Manual PCRE2UNICODE(3) @@ -10300,5 +10306,5 @@ REVISION Last updated: 17 May 2017 Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------ - - + + diff --git a/doc/pcre2_code_free.3 b/doc/pcre2_code_free.3 index 58a8b1c..7376869 100644 --- a/doc/pcre2_code_free.3 +++ b/doc/pcre2_code_free.3 @@ -14,8 +14,8 @@ PCRE2 - Perl-compatible regular expressions (revised API) .rs .sp This function frees the memory used for a compiled pattern, including any -memory used by the JIT compiler. If the compiled pattern was created by a call -to \fBpcre2_code_copy_with_tables()\fP, the memory for the character tables is +memory used by the JIT compiler. If the compiled pattern was created by a call +to \fBpcre2_code_copy_with_tables()\fP, the memory for the character tables is also freed. .P There is a complete description of the PCRE2 native API in the diff --git a/doc/pcre2_compile.3 b/doc/pcre2_compile.3 index acc5736..19f35c3 100644 --- a/doc/pcre2_compile.3 +++ b/doc/pcre2_compile.3 @@ -52,7 +52,7 @@ The option bits are: PCRE2_ENDANCHORED Pattern can match only at end of subject PCRE2_EXTENDED Ignore white space and # comments PCRE2_FIRSTLINE Force matching to be before newline - PCRE2_LITERAL Pattern characters are all literal + PCRE2_LITERAL Pattern characters are all literal PCRE2_MATCH_UNSET_BACKREF Match unset back references PCRE2_MULTILINE ^ and $ match newlines within data PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns diff --git a/doc/pcre2_config.3 b/doc/pcre2_config.3 index 6c48da6..1f99370 100644 --- a/doc/pcre2_config.3 +++ b/doc/pcre2_config.3 @@ -31,7 +31,7 @@ point to a uint32_t integer variable. The available codes are: PCRE2_CONFIG_BSR Indicates what \eR matches by default: PCRE2_BSR_UNICODE PCRE2_BSR_ANYCRLF - PCRE2_CONFIG_HEAPLIMIT Default heap memory limit + PCRE2_CONFIG_HEAPLIMIT Default heap memory limit PCRE2_CONFIG_DEPTHLIMIT Default backtracking depth limit .\" JOIN PCRE2_CONFIG_JIT Availability of just-in-time compiler @@ -47,7 +47,7 @@ point to a uint32_t integer variable. The available codes are: PCRE2_NEWLINE_CRLF PCRE2_NEWLINE_ANY PCRE2_NEWLINE_ANYCRLF - PCRE2_NEWLINE_NUL + PCRE2_NEWLINE_NUL PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0 diff --git a/doc/pcre2_converted_pattern_free.3 b/doc/pcre2_converted_pattern_free.3 index 1f4c8e6..687e078 100644 --- a/doc/pcre2_converted_pattern_free.3 +++ b/doc/pcre2_converted_pattern_free.3 @@ -14,8 +14,8 @@ PCRE2 - Perl-compatible regular expressions (revised API) .rs .sp This function is part of an experimental set of pattern conversion functions. -It frees the memory occupied by a converted pattern that was obtained by -calling \fBpcre2_pattern_convert()\fP with arguments that caused it to place +It frees the memory occupied by a converted pattern that was obtained by +calling \fBpcre2_pattern_convert()\fP with arguments that caused it to place the converted pattern into newly obtained heap memory. .P The pattern conversion functions are described in the diff --git a/doc/pcre2_dfa_match.3 b/doc/pcre2_dfa_match.3 index 32a22c8..7839145 100644 --- a/doc/pcre2_dfa_match.3 +++ b/doc/pcre2_dfa_match.3 @@ -43,17 +43,17 @@ The options are: PCRE2_NOTBOL Subject is not the beginning of a line PCRE2_NOTEOL Subject is not the end of a line PCRE2_NOTEMPTY An empty string is not a valid match -.\" JOIN +.\" JOIN PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject is not a valid match -.\" JOIN +.\" JOIN PCRE2_NO_UTF_CHECK Do not check the subject for UTF validity (only relevant if PCRE2_UTF was set at compile time) -.\" JOIN +.\" JOIN PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match -.\" JOIN +.\" JOIN PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial match if no full matches are found PCRE2_DFA_RESTART Restart after a partial match diff --git a/doc/pcre2_maketables.3 b/doc/pcre2_maketables.3 index 1250315..740954b 100644 --- a/doc/pcre2_maketables.3 +++ b/doc/pcre2_maketables.3 @@ -12,7 +12,7 @@ PCRE2 - Perl-compatible regular expressions (revised API) .SH DESCRIPTION .rs .sp -This function builds a set of character tables for character code points that +This function builds a set of character tables for character code points that are less than 256. These can be passed to \fBpcre2_compile()\fP in a compile context in order to override the internal, built-in tables (which were either defaulted or made by \fBpcre2_maketables()\fP when PCRE2 was compiled). See the diff --git a/doc/pcre2_match.3 b/doc/pcre2_match.3 index feaa470..ee4e159 100644 --- a/doc/pcre2_match.3 +++ b/doc/pcre2_match.3 @@ -31,14 +31,14 @@ offsets to captured substrings. Its arguments are: A match context is needed only if you want to: .sp Set up a callout function - Set a matching offset limit - Change the heap memory limit - Change the backtracking match limit + Set a matching offset limit + Change the heap memory limit + Change the backtracking match limit Change the backtracking depth limit Set custom memory management specifically for the match .sp The \fIlength\fP and \fIstartoffset\fP values are code -units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a +units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a subject that is terminated by a binary zero code unit. The options are: .sp PCRE2_ANCHORED Match only at the first position @@ -49,7 +49,7 @@ subject that is terminated by a binary zero code unit. The options are: .\" JOIN PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject is not a valid match - PCRE2_NO_JIT Do not use JIT matching + PCRE2_NO_JIT Do not use JIT matching .\" JOIN PCRE2_NO_UTF_CHECK Do not check the subject for UTF validity (only relevant if PCRE2_UTF diff --git a/doc/pcre2_pattern_info.3 b/doc/pcre2_pattern_info.3 index 3c3bae5..256e386 100644 --- a/doc/pcre2_pattern_info.3 +++ b/doc/pcre2_pattern_info.3 @@ -38,7 +38,7 @@ request are as follows: 1 first code unit is set 2 start of string or after newline PCRE2_INFO_FIRSTCODEUNIT First code unit when type is 1 - PCRE2_INFO_FRAMESIZE Size of backtracking frame + PCRE2_INFO_FRAMESIZE Size of backtracking frame PCRE2_INFO_HASBACKSLASHC Return 1 if pattern contains \eC .\" JOIN PCRE2_INFO_HASCRORLF Return 1 if explicit CR or LF matches @@ -71,7 +71,7 @@ request are as follows: PCRE2_NEWLINE_CRLF PCRE2_NEWLINE_ANY PCRE2_NEWLINE_ANYCRLF - PCRE2_NEWLINE_NUL + PCRE2_NEWLINE_NUL PCRE2_INFO_RECURSIONLIMIT Obsolete synonym for PCRE2_INFO_DEPTHLIMIT PCRE2_INFO_SIZE Size of compiled pattern .sp diff --git a/doc/pcre2_set_newline.3 b/doc/pcre2_set_newline.3 index 5d58701..0bccfc7 100644 --- a/doc/pcre2_set_newline.3 +++ b/doc/pcre2_set_newline.3 @@ -23,7 +23,7 @@ matching patterns. The second argument must be one of: PCRE2_NEWLINE_CRLF CR followed by LF only PCRE2_NEWLINE_ANYCRLF Any of the above PCRE2_NEWLINE_ANY Any Unicode newline sequence - PCRE2_NEWLINE_NUL The NUL character (binary zero) + PCRE2_NEWLINE_NUL The NUL character (binary zero) .sp The result is zero for success or PCRE2_ERROR_BADDATA if the second argument is invalid. diff --git a/doc/pcre2_set_recursion_limit.3 b/doc/pcre2_set_recursion_limit.3 index 1b74456..26f4257 100644 --- a/doc/pcre2_set_recursion_limit.3 +++ b/doc/pcre2_set_recursion_limit.3 @@ -14,7 +14,7 @@ PCRE2 - Perl-compatible regular expressions (revised API) .SH DESCRIPTION .rs .sp -This function is obsolete and should not be used in new code. Use +This function is obsolete and should not be used in new code. Use \fBpcre2_set_depth_limit()\fP instead. .P There is a complete description of the PCRE2 native API in the diff --git a/doc/pcre2_substitute.3 b/doc/pcre2_substitute.3 index 17de3ec..7da668c 100644 --- a/doc/pcre2_substitute.3 +++ b/doc/pcre2_substitute.3 @@ -48,7 +48,7 @@ want to: The \fIlength\fP, \fIstartoffset\fP and \fIrlength\fP values are code units, not characters, as is the contents of the variable pointed at by \fIoutlengthptr\fP, which is updated to the actual length of the new string. -The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for +The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for zero-terminated strings. The options are: .sp PCRE2_ANCHORED Match only at the first position diff --git a/doc/pcre2build.3 b/doc/pcre2build.3 index 8b081d9..7586d22 100644 --- a/doc/pcre2build.3 +++ b/doc/pcre2build.3 @@ -1,4 +1,4 @@ -.TH PCRE2BUILD 3 "17 June 2017" "PCRE2 10.30" +.TH PCRE2BUILD 3 "18 July 2017" "PCRE2 10.30" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) . @@ -66,10 +66,10 @@ Options that specify values have names that start with --with. .rs .sp By default, a library called \fBlibpcre2-8\fP is built, containing functions -that take string arguments contained in vectors of bytes, interpreted either as +that take string arguments contained in arrays of bytes, interpreted either as single-byte characters, or UTF-8 strings. You can also build two other libraries, called \fBlibpcre2-16\fP and \fBlibpcre2-32\fP, which process -strings that are contained in vectors of 16-bit and 32-bit code units, +strings that are contained in arrays of 16-bit and 32-bit code units, respectively. These can be interpreted either as single-unit characters or UTF-16/UTF-32 strings. To build these additional libraries, add one or both of the following to the \fBconfigure\fP command: @@ -197,18 +197,22 @@ to the \fBconfigure\fP command. There is a fourth option, specified by --enable-newline-is-anycrlf .sp which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as -indicating a line ending. Finally, a fifth option, specified by +indicating a line ending. A fifth option, specified by .sp --enable-newline-is-any .sp causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline sequences are the three just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line -separator, U+2028), and PS (paragraph separator, U+2029). +separator, U+2028), and PS (paragraph separator, U+2029). The final option is +.sp + --enable-newline-is-nul +.sp +which causes NUL (binary zero) is set as the default line-ending character. .P Whatever default line ending convention is selected when PCRE2 is built can be overridden by applications that use the library. At build time it is -conventional to use the standard for your operating system. +recommended to use the standard for your operating system. . . .SH "WHAT \eR MATCHES" @@ -297,7 +301,8 @@ because the size of each backtracking "frame" depends on the number of capturing parentheses in a pattern, the amount of heap that is used before the limit is reached varies from pattern to pattern. This limit was more useful in versions before 10.30, where function recursion was used for backtracking. -However, as well as applying to \fBpcre2_match()\fP, this limit also controls +.P +As well as applying to \fBpcre2_match()\fP, the depth limit also controls the depth of recursive function calls in \fBpcre2_dfa_match()\fP. These are used for lookaround assertions, atomic groups, and recursion within patterns. The limit does not apply to JIT matching. @@ -577,6 +582,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 17 June 2017 +Last updated: 18 July 2017 Copyright (c) 1997-2017 University of Cambridge. .fi diff --git a/doc/pcre2compat.3 b/doc/pcre2compat.3 index b3722df..8094ebd 100644 --- a/doc/pcre2compat.3 +++ b/doc/pcre2compat.3 @@ -71,7 +71,7 @@ documentation for details. .P 8. Subroutine calls (whether recursive or not) were treated as atomic groups up to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking -into subroutine calls is now supported, as in Perl. +into subroutine calls is now supported, as in Perl. .P 9. If any of the backtracking control verbs are used in a subpattern that is called as a subroutine (whether or not recursively), their effect is confined diff --git a/doc/pcre2grep.1 b/doc/pcre2grep.1 index f6f79b4..8d5f338 100644 --- a/doc/pcre2grep.1 +++ b/doc/pcre2grep.1 @@ -446,25 +446,25 @@ memory. There are three options that set resource limits for matching. The \fB--match-limit\fP option provides a means of limiting computing resource usage when processing patterns that are not going to match, but which have a very large number of possibilities in their search trees. The classic example -is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a -counter that is incremented each time around its main processing loop. If the +is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a +counter that is incremented each time around its main processing loop. If the value set by \fB--match-limit\fP is reached, an error occurs. .sp The \fB--heap-limit\fP option specifies, as a number of kilobytes, the amount of heap memory that may be used for matching. Heap memory is needed only if matching the pattern requires a significant number of nested backtracking -points to be remembered. This parameter can be set to zero to forbid the use of +points to be remembered. This parameter can be set to zero to forbid the use of heap memory altogether. .sp The \fB--depth-limit\fP option limits the depth of nested backtracking points, -which indirectly limits the amount of memory that is used. The amount of memory +which indirectly limits the amount of memory that is used. The amount of memory needed for each backtracking point depends on the number of capturing parentheses in the pattern, so the amount of memory that is used before this limit acts varies from pattern to pattern. This limit is of use only if it is set smaller than \fB--match-limit\fP. .sp There are no short forms for these options. The default settings are specified -when the PCRE2 library is compiled, with the default defaults being very large +when the PCRE2 library is compiled, with the default defaults being very large and so effectively unlimited. .TP \fB--max-buffer-size=\fInumber\fP @@ -747,7 +747,7 @@ either a number or a quoted string (see the .\" HREF \fBpcre2callout\fP .\" -documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP; +documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP; only callouts with string arguments are useful. . . @@ -797,10 +797,10 @@ matcher backtracks in the normal way. If the callout string starts with a pipe (vertical bar) character, the rest of the string is written to the output, having been passed through the same escape processing as text from the --output option. This provides a simple echoing -facility that avoids calling an external program or script. No terminator is +facility that avoids calling an external program or script. No terminator is added to the string, so if you want a newline, you must include it explicitly. -Matching continues normally after the string is output. If you want to see only -the callout output but not any output from an actual match, you should end the +Matching continues normally after the string is output. If you want to see only +the callout output but not any output from an actual match, you should end the relevant pattern with (*FAIL). . . @@ -816,8 +816,8 @@ message and the line that caused the problem to the standard error stream. If there are more than 20 such errors, \fBpcre2grep\fP gives up. .P The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the -overall resource limit. There are also other limits that affect the amount of -memory used during matching; see the discussion of \fB--heap-limit\fP and +overall resource limit. There are also other limits that affect the amount of +memory used during matching; see the discussion of \fB--heap-limit\fP and \fB--depth-limit\fP above. . . diff --git a/doc/pcre2perform.3 b/doc/pcre2perform.3 index 0781102..8b49a2a 100644 --- a/doc/pcre2perform.3 +++ b/doc/pcre2perform.3 @@ -12,7 +12,7 @@ of them. .rs .sp Patterns are compiled by PCRE2 into a reasonably efficient interpretive code, -so that most simple patterns do not use much memory for storing the compiled +so that most simple patterns do not use much memory for storing the compiled version. However, there is one case where the memory usage of a compiled pattern can be unexpectedly large. If a parenthesized subpattern has a quantifier with a minimum greater than 1 and/or a limited maximum, the whole @@ -76,7 +76,7 @@ memory can be limited; if the limit is set to zero, only the initial stack vector is used. Rewriting patterns to be time-efficient, as described below, may also reduce the memory requirements. .P -In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive +In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive function calls, but only for processing atomic groups, lookaround assertions, and recursion within the pattern. Too much nested recursion may cause stack issues. The "match depth" parameter can be used to limit the depth of function @@ -163,7 +163,7 @@ applied to a whole line of "a" characters, whereas the latter takes an appreciable time with strings longer than about 20 characters. .P In many cases, the solution to this kind of performance issue is to use an -atomic group or a possessive quantifier. This can often reduce memory +atomic group or a possessive quantifier. This can often reduce memory requirements as well. As another example, consider this pattern: .sp ([^<]|<(?!inet))+ @@ -184,7 +184,7 @@ are "swallowed" in one item inside the parentheses, and a possessive quantifier is used to stop any backtracking into the runs of non-"<" characters. This version also uses a lot less memory because entry to a new set of parentheses happens only when a "<" character that is not followed by "inet" is encountered -(and we assume this is relatively rare). +(and we assume this is relatively rare). .P This example shows that one way of optimizing performance when matching long subject strings is to write repeated parenthesized subpatterns to match more @@ -194,10 +194,10 @@ than one character whenever possible. .SS "SETTING RESOURCE LIMITS" .rs .sp -You can set limits on the amount of processing that takes place when matching, +You can set limits on the amount of processing that takes place when matching, and on the amount of heap memory that is used. The default values of the limits are very large, and unlikely ever to operate. They can be changed when PCRE2 is -built, and they can also be set when \fBpcre2_match()\fP or +built, and they can also be set when \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is called. For details of these interfaces, see the .\" HREF \fBpcre2build\fP diff --git a/doc/pcre2syntax.3 b/doc/pcre2syntax.3 index 27f0aab..6eb0235 100644 --- a/doc/pcre2syntax.3 +++ b/doc/pcre2syntax.3 @@ -407,11 +407,11 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use (?i) caseless (?J) allow duplicate names (?m) multiline - (?n) no auto capture + (?n) no auto capture (?s) single line (dotall) (?U) default ungreedy (lazy) (?x) extended: ignore white space except in classes - (?xx) as (?x) but also ignore space and tab in classes + (?xx) as (?x) but also ignore space and tab in classes (?-...) unset option(s) .sp The following are recognized only at the very start of a pattern or after one diff --git a/perltest.sh b/perltest.sh index 806a888..1a7679a 100755 --- a/perltest.sh +++ b/perltest.sh @@ -50,7 +50,7 @@ fi # ucp sets Perl's /u modifier # utf invoke UTF-8 functionality # -# The data lines must not have any pcre2test modifiers. Unless +# The data lines must not have any pcre2test modifiers. Unless # "subject_litersl" is on the pattern, data lines are processed as # Perl double-quoted strings, so if they contain " $ or @ characters, these # have to be escaped. For this reason, all such characters in the @@ -141,20 +141,20 @@ for (;;) chomp($pattern); $pattern =~ s/\s+$//; - + # Split the pattern from the modifiers and adjust them as necessary. $pattern =~ /^\s*((.).*\2)(.*)$/s; $pat = $1; $mod = $3; - + # The private "aftertext" modifier means "print $' afterwards". $showrest = ($mod =~ s/aftertext,?//); - + # The "subject_literal" modifer disables escapes in subjects. - - $subject_literal = ($mod =~ s/subject_literal,?//); + + $subject_literal = ($mod =~ s/subject_literal,?//); # "allaftertext" is used by pcre2test to print remainders after captures @@ -238,7 +238,7 @@ for (;;) $x = $_; } else - { + { $x = eval "\"$_\""; # To get escapes processed } diff --git a/src/config.h.generic b/src/config.h.generic index 98a55d5..f794576 100644 --- a/src/config.h.generic +++ b/src/config.h.generic @@ -132,6 +132,12 @@ sure both macros are undefined; an emulation function will then be used. */ /* Define to 1 if you have theThis example shows that one way of optimizing performance when matching long @@ -216,10 +216,10 @@ than one character whenever possible. SETTING RESOURCE LIMITS
-You can set limits on the amount of processing that takes place when matching, +You can set limits on the amount of processing that takes place when matching, and on the amount of heap memory that is used. The default values of the limits are very large, and unlikely ever to operate. They can be changed when PCRE2 is -built, and they can also be set when pcre2_match() or +built, and they can also be set when pcre2_match() or pcre2_dfa_match() is called. For details of these interfaces, see the pcre2build documentation and the section entitled diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html index ce7e7da..9098f47 100644 --- a/doc/html/pcre2syntax.html +++ b/doc/html/pcre2syntax.html @@ -430,11 +430,11 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use (?i) caseless (?J) allow duplicate names (?m) multiline - (?n) no auto capture + (?n) no auto capture (?s) single line (dotall) (?U) default ungreedy (lazy) (?x) extended: ignore white space except in classes - (?xx) as (?x) but also ignore space and tab in classes + (?xx) as (?x) but also ignore space and tab in classes (?-...) unset option(s)
(?')|(? ")) | - b(? (?')|(? ")) ) + b(? (?')|(? ")) ) (?('quote')[a-z]+|[0-9]+)/Ix,dupnames a"aaaaa - b"aaaaa -\= Expect no match + b"aaaaa +\= Expect no match b"11111 - a"11111 - + a"11111 + /^(?|(a)(b)(c)(? d)|(? e)) (?('D')X|Y)/IBx,dupnames abcdX eX \= Expect no match abcdY - ey - + ey + /(?a) (b)(c) (?d (?(R&A)$ | (?4)) )/IBx,dupnames abcdd \= Expect no match - abcdde + abcdde /abcd*/ xxxxabcd\=ps @@ -2910,16 +2910,16 @@ /i(?(DEFINE)(? a))/I i - + /()i(?(1)a)/I ia /(?i)a(?-i)b|c/B XabX XAbX - CcC + CcC \= Expect no match - XABX + XABX /(?i)a(?s)b|c/B @@ -2927,20 +2927,20 @@ /^(ab(c\1)d|x){2}$/B xabcxd - + /^(?&t)*+(?(DEFINE)(?.))$/B /^(?&t)*(?(DEFINE)(? .))$/B # This one is here because Perl gives the match as "b" rather than "ab". I # believe this to be a Perl bug. - + /(?>a\Kb)z|(ab)/ - ab\=startchar + ab\=startchar /(?P (?P 0|)|(?P>L2)(?P>L1))/ abcd - 0abc + 0abc /abc(*MARK:)pqr/ @@ -2948,7 +2948,7 @@ /abc(*FAIL:123)xyz/ -# This should, and does, fail. In Perl, it does not, which I think is a +# This should, and does, fail. In Perl, it does not, which I think is a # bug because replacing the B in the pattern by (B|D) does make it fail. /A(*COMMIT)B/aftertext,mark @@ -2964,37 +2964,37 @@ /A(*PRUNE)B|A(*PRUNE)C/mark \= Expect no match AC - + # Mark names can be duplicated. Perl doesn't give a mark for this one, # though PCRE2 does. /^A(*:A)B|^X(*:A)Y/mark \= Expect no match XAQQ - -# COMMIT at the start of a pattern should be the same as an anchor. Perl + +# COMMIT at the start of a pattern should be the same as an anchor. Perl # optimizations defeat this. So does the PCRE2 optimization unless we disable # it. /(*COMMIT)ABC/ ABCDEFG - + /(*COMMIT)ABC/no_start_optimize \= Expect no match DEFGABC - + /^(ab (c+(*THEN)cd) | xyz)/x \= Expect no match - abcccd + abcccd /^(ab (c+(*PRUNE)cd) | xyz)/x \= Expect no match - abcccd + abcccd /^(ab (c+(*FAIL)cd) | xyz)/x \= Expect no match - abcccd - + abcccd + # Perl gets some of these wrong /(?>.(*ACCEPT))*?5/ @@ -3013,19 +3013,19 @@ ACBD \= Expect no match A\nB - ACB\n + ACB\n /A\NB./Bs ACBD - ACB\n + ACB\n \= Expect no match - A\nB - + A\nB + /A\NB/newline=crlf A\nB A\rB \= Expect no match - A\r\nB + A\r\nB /\R+b/B @@ -3096,7 +3096,7 @@ /.+/ \= Bad offsets abc\=offset=4 - abc\=offset=-4 + abc\=offset=-4 \= Valid data abc\=offset=0 abc\=offset=1 @@ -3116,24 +3116,24 @@ /(?P (?P=axn)xxx)(? yy)/B -# These tests are here because Perl gets the first one wrong. +# These tests are here because Perl gets the first one wrong. /(\R*)(.)/s \r\n - \r\r\n\n\r - \r\r\n\n\r\n + \r\r\n\n\r + \r\r\n\n\r\n /(\R)*(.)/s \r\n - \r\r\n\n\r - \r\r\n\n\r\n + \r\r\n\n\r + \r\r\n\n\r\n /((?>\r\n|\n|\x0b|\f|\r|\x85)*)(.)/s \r\n - \r\r\n\n\r - \r\r\n\n\r\n + \r\r\n\n\r + \r\r\n\n\r\n -# ------------- +# ------------- /^abc$/B @@ -3141,12 +3141,12 @@ /^(a)*+(\w)/ aaaaX -\= Expect no match +\= Expect no match aaaa /^(?:a)*+(\w)/ aaaaX -\= Expect no match +\= Expect no match aaaa /(a)++1234/IB @@ -3205,39 +3205,39 @@ /(abc)\1+/ -# Perl doesn't get these right IMO (the 3rd is PCRE2-specific) +# Perl doesn't get these right IMO (the 3rd is PCRE2-specific) /(?1)(?:(b(*ACCEPT))){0}/ b /(?1)(?:(b(*ACCEPT))){0}c/ bc -\= Expect no match - b +\= Expect no match + b /(?1)(?:((*ACCEPT))){0}c/ c c\=notempty /^.*?(?(?=a)a|b(*THEN)c)/ -\= Expect no match +\= Expect no match ba /^.*?(?(?=a)a|bc)/ ba /^.*?(?(?=a)a(*THEN)b|c)/ -\= Expect no match +\= Expect no match ac /^.*?(?(?=a)a(*THEN)b)c/ -\= Expect no match +\= Expect no match ac /^.*?(a(*THEN)b)c/ -\= Expect no match +\= Expect no match aabc - + /^.*? (?1) c (?(DEFINE)(a(*THEN)b))/x aabc @@ -3252,11 +3252,11 @@ /(*MARK:A)(*SKIP:B)(C|X)/mark C -\= Expect no match +\= Expect no match D - + /(*:A)A+(*SKIP:A)(B|Z)/mark -\= Expect no match +\= Expect no match AAAC # ---------------------------- @@ -3264,14 +3264,14 @@ "(?=a*(*ACCEPT)b)c" c c\=notempty - + /(?1)c(?(DEFINE)((*ACCEPT)b))/ c c\=notempty - + /(?>(*ACCEPT)b)c/ c -\= Expect no match +\= Expect no match c\=notempty /(?:(?>(a)))+a%/allaftertext @@ -3279,7 +3279,7 @@ /(a)b|ac/allaftertext ac\=ovector=1 - + /(a)(b)x|abc/allaftertext abc\=ovector=2 @@ -3304,7 +3304,7 @@ foobazbarX barfooX bazX - foobarbazX + foobarbazX bazfooX\=ovector=0 bazfooX\=ovector=1 bazfooX\=ovector=2 @@ -3368,17 +3368,17 @@ /^(?>a+)(?>(z+))\w/B aaaazzzzb \= Expect no match - aazz + aazz /(.)(\1|a(?2))/ bab - + /\1|(.)(?R)\1/ cbbbc - + /(.)((?(1)c|a)|a(?2))/ \= Expect no match - baa + baa /(?P (?P=abn)xxx)/B @@ -3419,7 +3419,7 @@ /a[\NB]c/ aNc - + /a[B-\Nc]/ /a[B\Nc]/ @@ -3431,34 +3431,34 @@ # This test, with something more complicated than individual letters, causes # different behaviour in Perl. Perhaps it disables some optimization; no tag is # passed back for the failures, whereas in PCRE2 there is a tag. - + /(A|P)(*:A)(B|P) | (X|P)(X|P)(*:B)(Y|P)/x,mark AABC - XXYZ + XXYZ \= Expect no match - XAQQ - XAQQXZZ - AXQQQ - AXXQQQ + XAQQ + XAQQXZZ + AXQQQ + AXXQQQ # Perl doesn't give marks for these, though it does if the alternatives are -# replaced by single letters. - +# replaced by single letters. + /(b|q)(*:m)f|a(*:n)w/mark - aw -\= Expect no match + aw +\= Expect no match abc /(q|b)(*:m)f|a(*:n)w/mark - aw -\= Expect no match + aw +\= Expect no match abc -# After a partial match, the behaviour is as for a failure. +# After a partial match, the behaviour is as for a failure. /^a(*:X)bcde/mark abc\=ps - + # These are here because Perl doesn't return a mark, except for the first. /(?=(*:x))(q|)/aftertext,mark @@ -3526,22 +3526,22 @@ ababa\=ps ababa\=ph abababx - ababababx + ababababx /^(..)\1{2,3}?x/ aba\=ps ababa\=ps ababa\=ph abababx - ababababx - + ababababx + /^(..)(\1{2,3})ab/ abababab /^\R/ \r\=ps \r\=ph - + /^\R{2,3}x/ \r\=ps \r\=ph @@ -3550,7 +3550,7 @@ \r\r\r\=ps \r\r\r\=ph \r\rx - \r\r\rx + \r\r\rx /^\R{2,3}?x/ \r\=ps @@ -3560,20 +3560,20 @@ \r\r\r\=ps \r\r\r\=ph \r\rx - \r\r\rx - + \r\r\rx + /^\R?x/ \r\=ps \r\=ph x - \rx + \rx /^\R+x/ \r\=ps \r\=ph \r\n\=ps \r\n\=ph - \rx + \rx /^a$/newline=crlf a\r\=ps @@ -3594,7 +3594,7 @@ /./newline=crlf \r\=ps \r\=ph - + /.{2,3}/newline=crlf \r\=ps \r\=ph @@ -3613,9 +3613,9 @@ "AB(C(D))(E(F))?(?(?=\2)(?=\4))" ABCDGHI\=ovector=01 - + # These are all run as real matches in test 1; here we are just checking the -# settings of the anchored and startline bits. +# settings of the anchored and startline bits. /(?>.*?a)(?<=ba)/I @@ -3651,10 +3651,10 @@ /(?:(a)+(?C1)bb|aa(?C2)b)/ aab\=callout_capture - + /(?:(a)++(?C1)bb|aa(?C2)b)/ aab\=callout_capture - + /(?:(?>(a))(?C1)bb|aa(?C2)b)/ aab\=callout_capture @@ -3671,11 +3671,11 @@ /(ab)x|ab/ ab\=ovector=0 ab\=ovector=1 - + /(?<=123)(*MARK:xx)abc/mark xxxx123a\=ph xxxx123a\=ps - + /123\Kabc/startchar xxxx123a\=ph xxxx123a\=ps @@ -3690,22 +3690,22 @@ /aaaaa(*COMMIT)(*PRUNE)b|a+c/ aaaaaac - + # Here are some that Perl treats differently because of the way it handles -# backtracking verbs. +# backtracking verbs. /(?!a(*COMMIT)b)ac|ad/ ac - ad + ad /^(?!a(*THEN)b|ac)../ - ad + ad \= Expect no match ac /^(?=a(*THEN)b|ac)/ ac - + /\A.*?(?:a|b(*THEN)c)/ ba @@ -3716,14 +3716,14 @@ ba /(?:(a(*MARK:X)a+(*SKIP:X)b)){0}(?:(?1)|aac)/ - aac + aac /\A.*?(a|b(*THEN)c)/ ba /^(A(*THEN)B|A(*THEN)D)/ - AD - + AD + /(?!b(*THEN)a)bn|bnn/ bnn @@ -3733,7 +3733,7 @@ /(?=b(*THEN)a|)bn|bnn/ bnn -# This test causes a segfault with Perl 5.18.0 +# This test causes a segfault with Perl 5.18.0 /^(?=(a)){0}b(?1)/ backgammon @@ -3841,13 +3841,13 @@ /[a-c]{0,6}d/IB -# End of special auto-possessive tests +# End of special auto-possessive tests /^A\o{1239}B/ A\123B /^A\oB/ - + /^A\x{zz}B/ /^A\x{12Z/ @@ -3919,13 +3919,13 @@ /[[:<:]]red[[:>:]]/B little red riding hood - a /red/ thing + a /red/ thing red is a colour - put it all on red + put it all on red \= Expect no match no reduction Alfred Winifred - + /[a[:<:]] should give error/ /(?=ab\K)/aftertext @@ -3934,7 +3934,7 @@ /abcd/newline=lf,firstline \= Expect no match xx\nxabcd - + # Test stack guard external calls. /(((a)))/stackguard=1 @@ -3965,25 +3965,25 @@ /A\9B/ -# This one is here because Perl fails to match "12" for this pattern when the $ +# This one is here because Perl fails to match "12" for this pattern when the $ # is present. - + /^(?(?=abc)\w{3}:|\d\d)$/ abc: 12 \= Expect no match 123 - xyz + xyz -# Perl gets this one wrong, giving "a" as the after text for ca and failing to +# Perl gets this one wrong, giving "a" as the after text for ca and failing to # match for cd. /(?(?=ab)ab)/aftertext abxxx ca - cd - -# This should test both paths for processing OP_RECURSE. + cd + +# This should test both paths for processing OP_RECURSE. /(?(R)a+|(?R)b)/ aaaabcde @@ -3995,29 +3995,29 @@ /(*NOTEMPTY)a*?b*?/ ab ba - cb + cb /(*NOTEMPTY_ATSTART)a*?b*?/aftertext ab - cdab + cdab /(?(VERSION>=10.0)yes|no)/I yesno - + /(?(VERSION=8)yes){3}/BI,aftertext yesno /(?(VERSION=8)yes|no){3}/I yesnononoyes \= Expect no match - yesno + yesno /(?:(? abc)|xyz)(?(VERSION)yes|no)/I abcyes xyzno \= Expect no match abcno - xyzyes + xyzyes /(?(VERSION<10)yes|no)/ @@ -4033,11 +4033,11 @@ /(|ab)*?d/I abd - xyd + xyd /(|ab)*?d/I,no_start_optimize abd - xyd + xyd /\k*(?aa)(?bb)/match_unset_backref,dupnames aabb @@ -4097,7 +4097,7 @@ /abc/replace=[9]XYZ 123abc123 - + /abc/replace=xyz 1abc2\=partial_hard @@ -4109,23 +4109,23 @@ /(?<=abc)(|def)/g,replace=<$0> 123abcxyzabcdef789abcpqr - + /./replace=$0 a - + /(.)(.)/replace=$2+$1 abc - + /(?.)(?.)/replace=$B+$A abc - + /(.)(.)/g,replace=$2$1 - abcdefgh - + abcdefgh + /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=${*MARK} apple lemon blackberry apple strudel - fruitless + fruitless /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/replace=${*MARK} sauce, apple lemon blackberry @@ -4133,15 +4133,15 @@ /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=<$*MARK> apple lemon blackberry apple strudel - fruitless - -/(*:pear)apple/g,replace=${*MARKING} + fruitless + +/(*:pear)apple/g,replace=${*MARKING} apple lemon blackberry /(*:pear)apple/g,replace=${*MARK-time apple lemon blackberry -/(*:pear)apple/g,replace=${*mark} +/(*:pear)apple/g,replace=${*mark} apple lemon blackberry /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=<$*MARKET> @@ -4177,10 +4177,10 @@ /(a)(b)|(c)/ XcX\=ovector=2,get=1,get=2,get=3,get=4,getall - + /x(?=ab\K)/ - xab\=get=0 - xab\=copy=0 + xab\=get=0 + xab\=copy=0 xab\=getall /(?a)|(?b)/dupnames @@ -4243,16 +4243,16 @@ 00765 456 \= Expect no match - 356 + 356 '^(a)*+(\w)' g - g\=ovector=1 + g\=ovector=1 '^(?:a)*+(\w)' g - g\=ovector=1 - + g\=ovector=1 + # These two pattern showeds up compile-time bugs "((?2){0,1999}())?" @@ -4293,11 +4293,11 @@ /^(?(?C25)(?=abc)abcd|xyz)/B,callout_info abcdefg - xyz123 + xyz123 /^(?(?C$abc$)(?=abc)abcd|xyz)/B abcdefg - xyz123 + xyz123 /^ab(?C'first')cd(?C"second")ef/ abcdefg @@ -4314,8 +4314,8 @@ /(?(?!)a|b)/ bbb -\= Expect no match - aaa +\= Expect no match + aaa # JIT gives a different error message for the infinite recursion @@ -4349,9 +4349,9 @@ /abc/ \= Expect no match \[9x!xxx(]{9999} - + /(abc)*/ - \[abc]{5} + \[abc]{5} /^/gm \n\n\n @@ -4369,17 +4369,17 @@ /A\8B\9C/ A8B9C - + /(?x:((?'a')) # comment (with parentheses) and | vertical (?-x:#not a comment (?'b')) # this is a comment () (?'c')) # not a comment (?'d')/info /(?|(?'a')(2)(?'b')|(?'a')(?'a')(3))/I,dupnames A23B - B32A + B32A # These are some patterns that used to cause buffer overflows or other errors -# while compiling. +# while compiling. /.((?2)(?R)|\1|$)()/B @@ -4463,7 +4463,7 @@ {4,5a}bc /\x0{ab}/ - \0{ab} + \0{ab} /^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-02}Z/ ababababbbabZXXXX @@ -4505,8 +4505,8 @@ \= Expect no match aacb -/(*MARK:a\zb)z/alt_verbnames - +/(*MARK:a\zb)z/alt_verbnames + /(*:ab\t(d\)c)xxx/ /(*:ab\t(d\)c)xxx/alt_verbnames,mark @@ -4514,28 +4514,28 @@ /(*:A\Qxx)x\EB)x/alt_verbnames,mark x - + /(*:A\ExxxB)x/alt_verbnames,mark - x - + x + /(*: A \ and #comment \ B)x/x,alt_verbnames,mark - x - + x + /(*: A \ and #comment \ B)x/alt_verbnames,mark - x - + x + /(*: A \ and #comment \ B)x/x,mark - x - + x + /(*: A \ and #comment \ B)x/mark - x - + x + /(*:A -B)x/alt_verbnames,mark +B)x/alt_verbnames,mark x /(*:abc\Qpqr)/alt_verbnames @@ -4553,7 +4553,7 @@ B)x/alt_verbnames,mark 1234abc\=offset_limit=7 \= Expect no match 1234abc\=offset_limit=6 - + /A/g,replace=-,use_offset_limit XAXAXAXAXA\=offset_limit=4 @@ -4567,16 +4567,16 @@ B)x/alt_verbnames,mark /abcd/null_context abcd\=null_context -\= Expect error +\= Expect error abcd\=null_context,find_limits - abcd\=allusedtext,startchar + abcd\=allusedtext,startchar /abcd/replace=w\rx\x82y\o{333}z(\Q12\$34$$\x34\E5$$),substitute_extended abcd - + /a(bc)(DE)/replace=a\u$1\U$1\E$1\l$2\L$2\Eab\Uab\LYZ\EDone,substitute_extended abcDE - + /abcd/replace=xy\kz,substitute_extended abcd @@ -4614,8 +4614,8 @@ B)x/alt_verbnames,mark /(?J)(?:(?a)|(?b))/replace=<$A> [a] - [b] -\= Expect error + [b] +\= Expect error (a)\=ovector=1 /(a)|(b)/replace=<$1> @@ -4640,10 +4640,10 @@ B)x/alt_verbnames,mark /(?=a\K)/replace=z BaCaD - + /(?'abcdefghijklmnopqrstuvwxyzABCDEFG'toolong)/ - -/(?'abcdefghijklmnopqrstuvwxyzABCDEF'justright)/ + +/(?'abcdefghijklmnopqrstuvwxyzABCDEF'justright)/ # These two use zero-termination /abcd/max_pattern_length=3 @@ -4766,7 +4766,7 @@ a)"xI /a|(b)c/replace=>$1<,substitute_unset_empty cat - xbcom + xbcom /a|(b)c/ cat\=replace=>$1< @@ -4780,26 +4780,26 @@ a)"xI /a|(?'X'b)c/replace=>$X<,substitute_unset_empty cat - xbcom + xbcom /a|(?'X'b)c/replace=>$Y<,substitute_unset_empty cat - cat\=substitute_unknown_unset - cat\=substitute_unknown_unset,-substitute_unset_empty + cat\=substitute_unknown_unset + cat\=substitute_unknown_unset,-substitute_unset_empty /a|(b)c/replace=>$2<,substitute_unset_empty cat - cat\=substitute_unknown_unset - cat\=substitute_unknown_unset,-substitute_unset_empty + cat\=substitute_unknown_unset + cat\=substitute_unknown_unset,-substitute_unset_empty /()()()/use_offset_limit \=ovector=11000000000 \=callout_fail=11000000000 \=callout_fail=1:11000000000 - \=callout_data=11000000000 - \=callout_data=-11000000000 - \=offset_limit=1100000000000000000000 - \=copy=11000000000 + \=callout_data=11000000000 + \=callout_data=-11000000000 + \=offset_limit=1100000000000000000000 + \=copy=11000000000 /(*MARK:A\x00b)/mark abc @@ -4848,22 +4848,22 @@ a)"xI /([ab])...(?<=\1)z/ a11az - b11bz + b11bz \= Expect no match - b11az - + b11az + /(?|([ab]))...(?<=\1)z/ /([ab])(\1)...(?<=\2)z/ aa11az - -/(a\2)(b\1)(?<=\2)/ - + +/(a\2)(b\1)(?<=\2)/ + /(?[ab])...(?<=\k'A')z/ a11az - b11bz + b11bz \= Expect no match - b11az + b11az /(?[ab])...(?<=\k'A')(?)z/dupnames @@ -4877,8 +4877,8 @@ a)"xI /'ab(?C1)c'/hex,auto_callout abc - -# Perl accepts these, but gives a warning. We can't warn, so give an error. + +# Perl accepts these, but gives a warning. We can't warn, so give an error. /[a-[:digit:]]+/ a-a9-a @@ -4943,7 +4943,7 @@ a)"xI "()X|((((((((()))))))((((())))))\2())((((((\2\2)))\2)(\22((((\2\2)2))\2)))(2\ZZZ)+:)Z^|91ZiZZnter(ZZ |91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z+:)Z|91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z((Z*(\2(Z\':))\0)i|||||||||||||||loZ\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0nte!rnal errpr\2\\21r(2\ZZZ)+:)Z!|91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0(2\ZZZ)+:)Z^|91ZiZZnter(ZZ |91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0(2\ZZZ)+:)Z^)))int \)\0(2\ZZZ)+:)Z^|91ZiZZnter(ZZernZal ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \))\ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)))\2))))((((((\2\2))))))"I # This checks that new code for handling groups that may match an empty string -# works on a very large number of alternatives. This pattern used to provoke a +# works on a very large number of alternatives. This pattern used to provoke a # complaint that it was too complicated. /(?:\[A|B|C|D|E|F|G|H|I|J|]{200}Z)/expand @@ -4975,35 +4975,35 @@ a)"xI // \=ovector=7777777777 - -# This is here because Perl matches, even though a COMMIT is encountered -# outside of the recursion. + +# This is here because Perl matches, even though a COMMIT is encountered +# outside of the recursion. /(?1)(A(*COMMIT)|B)D/ BAXBAD - + "(?1){2}(a)"B "(?1){2,4}(a)"B # This test differs from Perl for the first subject. Perl ends up with -# $1 set to 'B'; PCRE2 has it unset (which I think is right). +# $1 set to 'B'; PCRE2 has it unset (which I think is right). /^(?: -(?:A| (?:B|B(*ACCEPT)) (?<=(.)) D) +(?:A| (?:B|B(*ACCEPT)) (?<=(.)) D) (Z) )+$/x AZB - AZBDZ - -# The first of these, when run by Perl, gives the mark 'aa', which is wrong. + AZBDZ + +# The first of these, when run by Perl, gives the mark 'aa', which is wrong. '(?>a(*:aa))b|ac' mark ac '(?:a(*:aa))b|ac' mark ac - + /(R?){65}/ (R?){65} @@ -5023,7 +5023,7 @@ a)"xI /^ (?(DEFINE) (..(*ACCEPT)|...) ) (?1)$/x \= Expect no match abc - + # Perl gives no match for this one /(a(*MARK:m)(*ACCEPT)){0}(?1)/mark @@ -5088,7 +5088,7 @@ a)"xI /^[^a]{3,}?x/i,no_start_optimize,no_auto_possess \= Expect no match bbb - cc + cc /^X\S/no_start_optimize,no_auto_possess \= Expect no match @@ -5145,7 +5145,7 @@ a)"xI /^X\V+?/no_start_optimize,no_auto_possess \= Expect no match X - X\n + X\n /^X\D+?/no_start_optimize,no_auto_possess \= Expect no match @@ -5155,17 +5155,17 @@ a)"xI /^X\S+?/no_start_optimize,no_auto_possess \= Expect no match X - X\n + X\n /^X\W+?/no_start_optimize,no_auto_possess \= Expect no match X - XX + XX /^X.+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\n - + /(*CRLF)^X.+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\r\=ps @@ -5176,20 +5176,20 @@ a)"xI X\n\r\n X\n\rY X\n\nY - X\n\x{0c}Y - + X\n\x{0c}Y + /(*BSR_ANYCRLF)^X\R+?Z/no_start_optimize,no_auto_possess \= Expect no match X\nX X\n\r\n X\n\rY X\n\nY - X\n\x{0c}Y - + X\n\x{0c}Y + /^X\H+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\t - XYY + XYY /^X\h+?Z/no_start_optimize,no_auto_possess \= Expect no match @@ -5199,7 +5199,7 @@ a)"xI /^X\V+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\n - XYY + XYY /^X\v+?Z/no_start_optimize,no_auto_possess \= Expect no match @@ -5209,7 +5209,7 @@ a)"xI /^X\D+?Z/no_start_optimize,no_auto_possess \= Expect no match XY9 - XYY + XYY /^X\d+?Z/no_start_optimize,no_auto_possess \= Expect no match @@ -5219,7 +5219,7 @@ a)"xI /^X\S+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\n - XYY + XYY /^X\s+?Z/no_start_optimize,no_auto_possess \= Expect no match @@ -5229,12 +5229,12 @@ a)"xI /^X\W+?Z/no_start_optimize,no_auto_possess \= Expect no match X.A - X++ + X++ /^X\w+?Z/no_start_optimize,no_auto_possess \= Expect no match Xa. - Xaa + Xaa /^X.{1,3}Z/s,no_start_optimize,no_auto_possess \= Expect no match @@ -5248,12 +5248,12 @@ a)"xI /^X\V+Z/no_start_optimize,no_auto_possess \= Expect no match XY\n - XYY + XYY /^(X(*THEN)Y|AB){0}(?1)/ ABX \= Expect no match - XAB + XAB /^(?!A(?C1)B)C/ ABC\=callout_error=1,no_jit @@ -5332,14 +5332,14 @@ a)"xI /cat|dog/match_word the cat sat -\= Expect no match +\= Expect no match caterpillar snowcat syndicate /(cat)|dog/match_line,literal (cat)|dog -\= Expect no match +\= Expect no match the cat sat caterpillar snowcat @@ -5348,6 +5348,21 @@ a)"xI /a whole line/match_line,multiline Rhubarb \na whole line\n custard \= Expect no match - Not a whole line + Not a whole line -# End of testinput2 +# Perl gets this wrong, failing to capture 'b' in group 1. + +/^(b+|a){1,2}?bc/ + bbc + +# And again here, for the "babc" subject string. + +/^(b*|ba){1,2}?bc/ + babc + bbabc + bababc +\= Expect no match + bababbc + babababc + +# End of testinput2 diff --git a/testdata/testoutput1 b/testdata/testoutput1 index 789fe49..85c1aad 100644 --- a/testdata/testoutput1 +++ b/testdata/testoutput1 @@ -183,27 +183,6 @@ No match abbbbbbbbbbbac No match -/^(b+|a){1,2}?bc/ - bbc - 0: bbc - 1: b - -/^(b*|ba){1,2}?bc/ - babc - 0: babc - 1: ba - bbabc - 0: bbabc - 1: ba - bababc - 0: bababc - 1: ba -\= Expect no match - bababbc -No match - babababc -No match - /^(ba|b*){1,2}?bc/ babc 0: babc diff --git a/testdata/testoutput2 b/testdata/testoutput2 index ef71e50..7e138dc 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -1,12 +1,12 @@ # This set of tests is not Perl-compatible. It checks on special features # of PCRE2's API, error diagnostics, and the compiled code of some patterns. -# It also checks the non-Perl syntax that PCRE2 supports (Python, .NET, -# Oniguruma). There are also some tests where PCRE2 and Perl differ, -# either because PCRE2 can't be compatible, or there is a possible Perl +# It also checks the non-Perl syntax that PCRE2 supports (Python, .NET, +# Oniguruma). There are also some tests where PCRE2 and Perl differ, +# either because PCRE2 can't be compatible, or there is a possible Perl # bug. # NOTE: This is a non-UTF set of tests. When UTF support is needed, use -# test 5. +# test 5. #forbid_utf #newline_default lf any anycrlf @@ -823,7 +823,7 @@ Subject length lower bound = 4 No match aaaaaa No match - + # Perl does not fail these two for the final subjects. Neither did PCRE until # release 8.01. The problem is in backtracking into a subpattern that contains # a recursive reference to itself. PCRE has now made these into atomic patterns. @@ -4400,7 +4400,7 @@ Subject length lower bound = 2 Callout data = 1 0: ab 1: ab -\= Expect no match +\= Expect no match aaabbb\=callout_data=-1 --->aaabbb 1 ^ ^ b @@ -4844,7 +4844,7 @@ Subject length lower bound = 2 +2 ^ ^ b +3 ^ ^ 0: aaaab -\= Expect no match +\= Expect no match aaaacb --->aaaacb +0 ^ a+ @@ -4912,7 +4912,7 @@ Subject length lower bound = 4 +10 ^ ^ 0: defx 1: def -\= Expect no match +\= Expect no match abcdefzx --->abcdefzx +0 ^ ( @@ -4986,7 +4986,7 @@ Subject length lower bound = 4 +10 ^ ^ 0: defx 1: def -\= Expect no match +\= Expect no match abcdefzx --->abcdefzx +0 ^ ( @@ -5102,7 +5102,7 @@ Capturing subpattern count = 1 Options: auto_callout Starting code units: a b x Subject length lower bound = 2 -\= Expect no match +\= Expect no match Note: that { does NOT introduce a quantifier --->Note: that { does NOT introduce a quantifier +0 ^ ( @@ -5152,7 +5152,7 @@ Capturing subpattern count = 1 Options: auto_callout Starting code units: a b x Subject length lower bound = 2 -\= Expect no match +\= Expect no match Note: that { does NOT introduce a quantifier --->Note: that { does NOT introduce a quantifier +0 ^ ( @@ -5819,7 +5819,7 @@ Subject length lower bound = 2 Number not found for group 'Z' Copy substring 'Z' failed (-49): unknown substring C a1 (2) A (non-unique) - + /(?|(?)(?)(?)|(?)(?)(?))/I,dupnames Capturing subpattern count = 3 Named capturing subpatterns: @@ -6147,7 +6147,7 @@ Subject length lower bound = 3 No match xyz\rabclf No match - + /^abc/Im,newline=cr Capturing subpattern count = 0 Options: multiline @@ -7722,13 +7722,13 @@ No match 0: \x0d\x0afoo \nfoo 0: \x0afoo - + /^$/gm,newline=any abc\r\rxyz 0: - abc\n\rxyz + abc\n\rxyz 0: -\= Expect no match +\= Expect no match abc\r\nxyz No match @@ -7743,7 +7743,7 @@ No match 0+ \x0d\x0a 0: \x0d\x0a 0+ - + /(?m)$/g,newline=any,aftertext abc\r\n\r\n 0: @@ -7763,7 +7763,7 @@ No match /^X/m XABC 0: X -\= Expect no match +\= Expect no match XABC\=notbol No match @@ -7798,9 +7798,9 @@ No match 0: xyabcabc 1: abc \= Expect no match - xyabc + xyabc No match - + /x(?-0)y/ Failed: error 126 at offset 5: a relative value of zero is not allowed @@ -7836,9 +7836,9 @@ Failed: error 115 at offset 5: reference to non-existent subpattern Y 0: Y \= Expect no match - abcY + abcY No match - + /^((?(+1)X|Y)(abc))+/B ------------------------------------------------------------------ Bra @@ -7866,7 +7866,7 @@ No match 1: Xabc 2: abc \= Expect no match - XabcXabc + XabcXabc No match /(?(-1)a)/B @@ -7912,11 +7912,11 @@ Failed: error 115 at offset 6: reference to non-existent subpattern tom-tom 0: tom-tom 1: tom - bon-bon + bon-bon 0: bon-bon 1: bon \= Expect no match - tom-bon + tom-bon No match /\g{A/ @@ -7940,7 +7940,7 @@ Failed: error 142 at offset 4: syntax error in subpattern name (missing terminat >abc< 0: abc 1: abc - >xyz< + >xyz< 0: xyz 1: xyz @@ -7970,7 +7970,7 @@ Failed: error 142 at offset 4: syntax error in subpattern name (missing terminat 1: x 2: abc 3: x - xxyzx + xxyzx 0: xxyzx 1: x 2: xyz @@ -8006,7 +8006,7 @@ Failed: error 142 at offset 4: syntax error in subpattern name (missing terminat 2: abc 3: pqr 4: x - xxyzx + xxyzx 0: xxyzx 1: x 2: xyz @@ -8024,7 +8024,7 @@ Failed: error 142 at offset 4: syntax error in subpattern name (missing terminat \= Expect no match XXXX No match - + /\H+\hY/B ------------------------------------------------------------------ Bra @@ -8034,7 +8034,7 @@ No match Ket End ------------------------------------------------------------------ - XXXX Y + XXXX Y 0: XXXX Y /\H+ Y/B @@ -8261,7 +8261,7 @@ Failed: error 106 at offset 10: missing terminating ] for character class +3 ^ ^ (*FAIL) +3 ^^ (*FAIL) No match - + /a+b?c+(*FAIL)/auto_callout \= Expect no match aaabccc @@ -8325,7 +8325,7 @@ No match +15 ^ ^ (*FAIL) +15 ^ ^ (*FAIL) No match - + /a+b?(*SKIP)c+(*FAIL)/auto_callout \= Expect no match aaabcccaaabccc @@ -8372,7 +8372,7 @@ No match +13 ^ ^ (*FAIL) +13 ^ ^ (*FAIL) No match - + /a(*MARK)b/ Failed: error 166 at offset 7: (*MARK) must have an argument @@ -8397,17 +8397,17 @@ Failed: error 115 at offset 3: reference to non-existent subpattern \= Expect no match \r\nA No match - + /\nA/newline=crlf - \r\nA + \r\nA 0: \x0aA /[\r\n]A/newline=crlf - \r\nA + \r\nA 0: \x0aA /(\r|\n)A/newline=crlf - \r\nA + \r\nA 0: \x0aA 1: \x0a @@ -8418,52 +8418,52 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed a\nb 0: a\x0ab \= Expect no match - a\rb + a\rb No match /(*CR)a.b/newline=lf a\nb 0: a\x0ab \= Expect no match - a\rb + a\rb No match /(*LF)a.b/newline=CRLF a\rb 0: a\x0db \= Expect no match - a\nb + a\nb No match /(*CRLF)a.b/ a\rb 0: a\x0db - a\nb + a\nb 0: a\x0ab \= Expect no match - a\r\nb + a\r\nb No match /(*ANYCRLF)a.b/newline=CR \= Expect no match a\rb No match - a\nb + a\nb No match - a\r\nb + a\r\nb No match /(*ANY)a.b/newline=cr \= Expect no match a\rb No match - a\nb + a\nb No match - a\r\nb + a\r\nb No match - a\x85b + a\x85b No match - + /(*ANY).*/g abc\r\ndef 0: abc @@ -8484,21 +8484,21 @@ No match 0: 0: def 0: - + /(*NUL)^.*/ a\nb\x00ccc 0: a\x0ab - + /(*NUL)^.*/s a\nb\x00ccc 0: a\x0ab\x00ccc - + /^x/m,newline=NUL ab\x00xy 0: x - + /'#comment' 0d 0a 00 '^x\' 0a 'y'/x,newline=nul,hex - x\nyz + x\nyz 0: x\x0ay /(*NUL)^X\NY/ @@ -8507,7 +8507,7 @@ No match X\rY 0: X\x0dY \= Expect no match - X\x00Y + X\x00Y No match /a\Rb/I,bsr=anycrlf @@ -8525,7 +8525,7 @@ Subject length lower bound = 3 \= Expect no match a\x85b No match - a\x0bb + a\x0bb No match /a\Rb/I,bsr=unicode @@ -8542,9 +8542,9 @@ Subject length lower bound = 3 0: a\x0d\x0ab a\x85b 0: a\x85b - a\x0bb + a\x0bb 0: a\x0bb - + /a\R?b/I,bsr=anycrlf Capturing subpattern count = 0 \R matches CR, LF, or CRLF @@ -8560,7 +8560,7 @@ Subject length lower bound = 2 \= Expect no match a\x85b No match - a\x0bb + a\x0bb No match /a\R?b/I,bsr=unicode @@ -8577,9 +8577,9 @@ Subject length lower bound = 2 0: a\x0d\x0ab a\x85b 0: a\x85b - a\x0bb + a\x0bb 0: a\x0bb - + /a\R{2,4}b/I,bsr=anycrlf Capturing subpattern count = 0 \R matches CR, LF, or CRLF @@ -8595,7 +8595,7 @@ Subject length lower bound = 4 \= Expect no match a\x85\x85b No match - a\x0b\x0bb + a\x0b\x0bb No match /a\R{2,4}b/I,bsr=unicode @@ -8612,12 +8612,12 @@ Subject length lower bound = 4 0: a\x0d\x0a\x0a\x0d\x0db a\x85\x85b 0: a\x85\x85b - a\x0b\x0bb + a\x0b\x0bb 0: a\x0b\x0bb -\= Expect no match - a\r\r\r\r\rb +\= Expect no match + a\r\r\r\r\rb No match - + /(*BSR_ANYCRLF)a\Rb/I Capturing subpattern count = 0 \R matches CR, LF, or CRLF @@ -8626,7 +8626,7 @@ Last code unit = 'b' Subject length lower bound = 3 a\nb 0: a\x0ab - a\rb + a\rb 0: a\x0db /(*BSR_UNICODE)a\Rb/I @@ -8647,7 +8647,7 @@ Last code unit = 'b' Subject length lower bound = 3 a\nb 0: a\x0ab - a\rb + a\rb 0: a\x0db /(*CRLF)(*BSR_UNICODE)a\Rb/I @@ -8750,10 +8750,10 @@ Failed: error 157 at offset 8: \g is not followed by a braced, angle-bracketed, /^(?+1)(?x|y){0}z/ xzxx 0: xz - yzyy + yzyy 0: yz \= Expect no match - xxz + xxz No match /(\3)(\1)(a)/ @@ -8767,13 +8767,13 @@ No match 1: 2: 3: a - + /TA]/ - The ACTA] comes + The ACTA] comes 0: TA] /TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames - The ACTA] comes + The ACTA] comes 0: TA] /(?2)[]a()b](abc)/ @@ -8788,7 +8788,7 @@ Failed: error 115 at offset 3: reference to non-existent subpattern abcbabc 0: abcbabc 1: abc -\= Expect no match +\= Expect no match abcXabc No match @@ -8796,7 +8796,7 @@ No match abcXabc 0: abcXabc 1: abc -\= Expect no match +\= Expect no match abcbabc No match @@ -8827,30 +8827,30 @@ No match /a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames \= Expect no match - ab + ab No match /a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames \= Expect no match - ab + ab No match /a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames aXb 0: aXb - a\nb + a\nb 0: a\x0ab \= Expect no match - ab + ab No match - + /a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames aXb 0: aXb - a\nX\nXb + a\nX\nXb 0: a\x0aX\x0aXb \= Expect no match - ab + ab No match /a(?!)b/B @@ -8902,7 +8902,7 @@ Subject length lower bound = 0 +12 ^ ) +13 ^ 0: - abc + abc --->abc +0 ^ (? +2 ^ (?= @@ -8923,7 +8923,7 @@ Subject length lower bound = 0 +10 ^^ | +13 ^^ 0: b - + /(?(?=b).*b|^d)/I Capturing subpattern count = 0 Subject length lower bound = 1 @@ -8933,28 +8933,28 @@ Capturing subpattern count = 0 Subject length lower bound = 1 /xyz/auto_callout - xyz + xyz --->xyz +0 ^ x +1 ^^ y +2 ^ ^ z +3 ^ ^ 0: xyz - abcxyz + abcxyz --->abcxyz +0 ^ x +1 ^^ y +2 ^ ^ z +3 ^ ^ 0: xyz -\= Expect no match +\= Expect no match abc No match - abcxypqr + abcxypqr No match - + /xyz/auto_callout,no_start_optimize - abcxyz + abcxyz --->abcxyz +0 ^ x +0 ^ x @@ -8964,7 +8964,7 @@ No match +2 ^ ^ z +3 ^ ^ 0: xyz -\= Expect no match +\= Expect no match abc --->abc +0 ^ x @@ -8972,7 +8972,7 @@ No match +0 ^ x +0 ^ x No match - abcxypqr + abcxypqr --->abcxypqr +0 ^ x +0 ^ x @@ -8986,7 +8986,7 @@ No match +0 ^ x +0 ^ x No match - + /(*NO_START_OPT)xyz/auto_callout abcxyz --->abcxyz @@ -8998,7 +8998,7 @@ No match +17 ^ ^ z +18 ^ ^ 0: xyz - + /(*NO_AUTO_POSSESS)a+b/B ------------------------------------------------------------------ Bra @@ -9009,7 +9009,7 @@ No match ------------------------------------------------------------------ /xyz/auto_callout,no_start_optimize - abcxyz + abcxyz --->abcxyz +0 ^ x +0 ^ x @@ -9067,7 +9067,7 @@ Failed: error 115 at offset 5: reference to non-existent subpattern 3: c 4: d 5: Y - + /Xa{2,4}b/ X\=ps Partial match: X @@ -9079,7 +9079,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /Xa{2,4}?b/ X\=ps Partial match: X @@ -9091,7 +9091,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /Xa{2,4}+b/ X\=ps Partial match: X @@ -9103,7 +9103,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /X\d{2,4}b/ X\=ps Partial match: X @@ -9115,7 +9115,7 @@ Partial match: X33 Partial match: X333 X3333\=ps Partial match: X3333 - + /X\d{2,4}?b/ X\=ps Partial match: X @@ -9127,7 +9127,7 @@ Partial match: X33 Partial match: X333 X3333\=ps Partial match: X3333 - + /X\d{2,4}+b/ X\=ps Partial match: X @@ -9139,7 +9139,7 @@ Partial match: X33 Partial match: X333 X3333\=ps Partial match: X3333 - + /X\D{2,4}b/ X\=ps Partial match: X @@ -9151,7 +9151,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /X\D{2,4}?b/ X\=ps Partial match: X @@ -9163,7 +9163,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /X\D{2,4}+b/ X\=ps Partial match: X @@ -9175,7 +9175,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /X[abc]{2,4}b/ X\=ps Partial match: X @@ -9187,7 +9187,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /X[abc]{2,4}?b/ X\=ps Partial match: X @@ -9199,7 +9199,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /X[abc]{2,4}+b/ X\=ps Partial match: X @@ -9211,7 +9211,7 @@ Partial match: Xaa Partial match: Xaaa Xaaaa\=ps Partial match: Xaaaa - + /X[^a]{2,4}b/ X\=ps Partial match: X @@ -9223,7 +9223,7 @@ Partial match: Xzz Partial match: Xzzz Xzzzz\=ps Partial match: Xzzzz - + /X[^a]{2,4}?b/ X\=ps Partial match: X @@ -9235,7 +9235,7 @@ Partial match: Xzz Partial match: Xzzz Xzzzz\=ps Partial match: Xzzzz - + /X[^a]{2,4}+b/ X\=ps Partial match: X @@ -9247,7 +9247,7 @@ Partial match: Xzz Partial match: Xzzz Xzzzz\=ps Partial match: Xzzzz - + /(Y)X\1{2,4}b/ YX\=ps Partial match: YX @@ -9259,7 +9259,7 @@ Partial match: YXYY Partial match: YXYYY YXYYYY\=ps Partial match: YXYYYY - + /(Y)X\1{2,4}?b/ YX\=ps Partial match: YX @@ -9271,7 +9271,7 @@ Partial match: YXYY Partial match: YXYYY YXYYYY\=ps Partial match: YXYYYY - + /(Y)X\1{2,4}+b/ YX\=ps Partial match: YX @@ -9283,7 +9283,7 @@ Partial match: YXYY Partial match: YXYYY YXYYYY\=ps Partial match: YXYYYY - + /\++\KZ|\d+X|9+Y/startchar ++++123999\=ps Partial match: 123999 @@ -9299,7 +9299,7 @@ Partial match: 123999 No match ZA\=ps No match - + /Z(?!)/ \= Expect no match Z\=ps @@ -9312,7 +9312,7 @@ No match 0: dog dogs\=ph Partial match: dogs - + /dog(sbody)??/ dogs\=ps 0: dog @@ -9324,7 +9324,7 @@ Partial match: dogs 0: dog dogs\=ph 0: dog - + /dogsbody|dog/ dogs\=ps 0: dog @@ -9342,7 +9342,7 @@ Partial match: the cat 0: abc abc\=ph 0: abc - + /abc\K123/startchar xyzabc123pqr 0: abc123 @@ -9351,9 +9351,9 @@ Partial match: the cat Partial match: abc12 xyzabc12\=ph Partial match: abc12 - + /(?<=abc)123/ - xyzabc123pqr + xyzabc123pqr 0: 123 xyzabc12\=ps Partial match: abc12 @@ -9458,7 +9458,7 @@ Partial match: +ab No match xyzabcdef\=notempty No match - + /^(?:(?=abc)|abc\K)/aftertext,startchar abcdef 0: @@ -9467,7 +9467,7 @@ No match 0: abc ^^^ 0+ def -\= Expect no match +\= Expect no match abcdef\=notempty No match @@ -9487,7 +9487,7 @@ No match xyz\=notempty_atstart 0: 0+ yz -\= Expect no match +\= Expect no match xyz\=notempty No match @@ -9498,7 +9498,7 @@ No match xyzabc 0: 0+ xyzabc -\= Expect no match +\= Expect no match xyzabc\=notempty No match xyzabc\=notempty_atstart @@ -9507,7 +9507,7 @@ No match No match xyz\=notempty No match - + /^(? a|b\g c)/ aaaa 0: a @@ -9515,7 +9515,7 @@ No match bacxxx 0: bac 1: bac - bbaccxxx + bbaccxxx 0: bbacc 1: bbacc bbbacccxx @@ -9529,7 +9529,7 @@ No match bacxxx 0: bac 1: bac - bbaccxxx + bbaccxxx 0: bbacc 1: bbacc bbbacccxx @@ -9543,7 +9543,7 @@ No match bacxxx 0: bac 1: bac - bbaccxxx + bbaccxxx 0: bbacc 1: bbacc bbbacccxx @@ -9557,7 +9557,7 @@ No match bacxxx 0: bac 1: bac - bbaccxxx + bbaccxxx 0: bbacc 1: bbacc bbbacccxx @@ -9571,7 +9571,7 @@ No match bacxxx 0: bac 1: bac - bbaccxxx + bbaccxxx 0: bbacc 1: bbacc bbbacccxx @@ -9587,7 +9587,7 @@ No match 0: bac 1: bac 2: bac - bbaccxxx + bbaccxxx 0: bbacc 1: bbacc 2: bbacc @@ -9600,7 +9600,7 @@ No match XaaX 0: aa 1: a - XAAX + XAAX 0: AA 1: A @@ -9608,15 +9608,15 @@ No match XaaX 0: aa 1: a -\= Expect no match - XAAX +\= Expect no match + XAAX No match /(?-i:\g<+1>)(?i:(a))/ XaaX 0: aa 1: a - XAAX + XAAX 0: AA 1: A @@ -9626,7 +9626,7 @@ No match abc 0: abc 1: a - accccbbb + accccbbb 0: accccbbb 1: a @@ -9645,7 +9645,7 @@ No match xbaax 0: a 1: a - xzzzax + xzzzax 0: a 1: a @@ -9762,7 +9762,7 @@ Subject length lower bound = 9 (?: [0-9a-f]{1,4} | # 1-4 hex digits or (?(1)0 | () ) ) # if null previously matched, fail; else null : # followed by colon - ){1,7} # end item; 1-7 of them required + ){1,7} # end item; 1-7 of them required [0-9a-f]{1,4} $ # final hex number at end of string (?(1)|.) # check that there was an empty component /Iix @@ -9792,7 +9792,7 @@ Subject length lower bound = 1 Failed: error 165 at offset 16: different names for subpatterns of the same number are not allowed /(?:a(? (?')|(? ")) | - b(? (?')|(? ")) ) + b(? (?')|(? ")) ) (?('quote')[a-z]+|[0-9]+)/Ix,dupnames Capturing subpattern count = 6 Max back reference = 4 @@ -9811,7 +9811,7 @@ Subject length lower bound = 3 1: " 2: 3: " - b"aaaaa + b"aaaaa 0: b"aaaaa 1: 2: @@ -9819,12 +9819,12 @@ Subject length lower bound = 3 4: " 5: 6: " -\= Expect no match +\= Expect no match b"11111 No match - a"11111 + a"11111 No match - + /^(?|(a)(b)(c)(? d)|(? e)) (?('D')X|Y)/IBx,dupnames ------------------------------------------------------------------ Bra @@ -9877,9 +9877,9 @@ Subject length lower bound = 2 \= Expect no match abcdY No match - ey + ey No match - + /(?a) (b)(c) (?d (?(R&A)$ | (?4)) )/IBx,dupnames ------------------------------------------------------------------ Bra @@ -9920,7 +9920,7 @@ Subject length lower bound = 4 3: c 4: dd \= Expect no match - abcdde + abcdde No match /abcd*/ @@ -9994,7 +9994,7 @@ First code unit = 'i' Subject length lower bound = 1 i 0: i - + /()i(?(1)a)/I Capturing subpattern count = 1 Max back reference = 1 @@ -10018,10 +10018,10 @@ Subject length lower bound = 1 0: ab XAbX 0: Ab - CcC + CcC 0: c \= Expect no match - XABX + XABX No match /(?i)a(?s)b|c/B @@ -10081,7 +10081,7 @@ No match 0: xabcxd 1: abcxd 2: cx - + /^(?&t)*+(?(DEFINE)(? .))$/B ------------------------------------------------------------------ Bra @@ -10122,9 +10122,9 @@ No match # This one is here because Perl gives the match as "b" rather than "ab". I # believe this to be a Perl bug. - + /(?>a\Kb)z|(ab)/ - ab\=startchar + ab\=startchar 0: ab 1: ab @@ -10133,7 +10133,7 @@ No match 0: 1: 2: - 0abc + 0abc 0: 0 1: 0 2: 0 @@ -10147,7 +10147,7 @@ Failed: error 166 at offset 6: (*MARK) must have an argument /abc(*FAIL:123)xyz/ Failed: error 159 at offset 10: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) -# This should, and does, fail. In Perl, it does not, which I think is a +# This should, and does, fail. In Perl, it does not, which I think is a # bug because replacing the B in the pattern by (B|D) does make it fail. /A(*COMMIT)B/aftertext,mark @@ -10166,7 +10166,7 @@ No match \= Expect no match AC No match - + # Mark names can be duplicated. Perl doesn't give a mark for this one, # though PCRE2 does. @@ -10174,35 +10174,35 @@ No match \= Expect no match XAQQ No match, mark = A - -# COMMIT at the start of a pattern should be the same as an anchor. Perl + +# COMMIT at the start of a pattern should be the same as an anchor. Perl # optimizations defeat this. So does the PCRE2 optimization unless we disable # it. /(*COMMIT)ABC/ ABCDEFG 0: ABC - + /(*COMMIT)ABC/no_start_optimize \= Expect no match DEFGABC No match - + /^(ab (c+(*THEN)cd) | xyz)/x \= Expect no match - abcccd + abcccd No match /^(ab (c+(*PRUNE)cd) | xyz)/x \= Expect no match - abcccd + abcccd No match /^(ab (c+(*FAIL)cd) | xyz)/x \= Expect no match - abcccd + abcccd No match - + # Perl gets some of these wrong /(?>.(*ACCEPT))*?5/ @@ -10239,7 +10239,7 @@ No match \= Expect no match A\nB No match - ACB\n + ACB\n No match /A\NB./Bs @@ -10254,19 +10254,19 @@ No match ------------------------------------------------------------------ ACBD 0: ACBD - ACB\n + ACB\n 0: ACB\x0a \= Expect no match - A\nB + A\nB No match - + /A\NB/newline=crlf A\nB 0: A\x0aB A\rB 0: A\x0dB \= Expect no match - A\r\nB + A\r\nB No match /\R+b/B @@ -10444,7 +10444,7 @@ No match \= Bad offsets abc\=offset=4 Failed: error -33: bad offset value - abc\=offset=-4 + abc\=offset=-4 ** Invalid value in 'offset=-4' \= Valid data abc\=offset=0 @@ -10507,18 +10507,18 @@ Failed: error 115 at offset 12: reference to non-existent subpattern End ------------------------------------------------------------------ -# These tests are here because Perl gets the first one wrong. +# These tests are here because Perl gets the first one wrong. /(\R*)(.)/s \r\n 0: \x0d 1: 2: \x0d - \r\r\n\n\r + \r\r\n\n\r 0: \x0d\x0d\x0a\x0a\x0d 1: \x0d\x0d\x0a\x0a 2: \x0d - \r\r\n\n\r\n + \r\r\n\n\r\n 0: \x0d\x0d\x0a\x0a\x0d 1: \x0d\x0d\x0a\x0a 2: \x0d @@ -10528,11 +10528,11 @@ Failed: error 115 at offset 12: reference to non-existent subpattern 0: \x0d 1: 2: \x0d - \r\r\n\n\r + \r\r\n\n\r 0: \x0d\x0d\x0a\x0a\x0d 1: \x0a 2: \x0d - \r\r\n\n\r\n + \r\r\n\n\r\n 0: \x0d\x0d\x0a\x0a\x0d 1: \x0a 2: \x0d @@ -10542,16 +10542,16 @@ Failed: error 115 at offset 12: reference to non-existent subpattern 0: \x0d 1: 2: \x0d - \r\r\n\n\r + \r\r\n\n\r 0: \x0d\x0d\x0a\x0a\x0d 1: \x0d\x0d\x0a\x0a 2: \x0d - \r\r\n\n\r\n + \r\r\n\n\r\n 0: \x0d\x0d\x0a\x0a\x0d 1: \x0d\x0d\x0a\x0a 2: \x0d -# ------------- +# ------------- /^abc$/B ------------------------------------------------------------------ @@ -10578,7 +10578,7 @@ Failed: error 115 at offset 12: reference to non-existent subpattern 0: aaaaX 1: a 2: X -\= Expect no match +\= Expect no match aaaa No match @@ -10586,7 +10586,7 @@ No match aaaaX 0: aaaaX 1: X -\= Expect no match +\= Expect no match aaaa No match @@ -10765,7 +10765,7 @@ Subject length lower bound = 1 /(abc)\1+/ -# Perl doesn't get these right IMO (the 3rd is PCRE2-specific) +# Perl doesn't get these right IMO (the 3rd is PCRE2-specific) /(?1)(?:(b(*ACCEPT))){0}/ b @@ -10774,8 +10774,8 @@ Subject length lower bound = 1 /(?1)(?:(b(*ACCEPT))){0}c/ bc 0: bc -\= Expect no match - b +\= Expect no match + b No match /(?1)(?:((*ACCEPT))){0}c/ @@ -10785,7 +10785,7 @@ No match 0: c /^.*?(?(?=a)a|b(*THEN)c)/ -\= Expect no match +\= Expect no match ba No match @@ -10794,20 +10794,20 @@ No match 0: ba /^.*?(?(?=a)a(*THEN)b|c)/ -\= Expect no match +\= Expect no match ac No match /^.*?(?(?=a)a(*THEN)b)c/ -\= Expect no match +\= Expect no match ac No match /^.*?(a(*THEN)b)c/ -\= Expect no match +\= Expect no match aabc No match - + /^.*? (?1) c (?(DEFINE)(a(*THEN)b))/x aabc 0: aabc @@ -10830,12 +10830,12 @@ No match 0: C 1: C MK: A -\= Expect no match +\= Expect no match D No match, mark = A - + /(*:A)A+(*SKIP:A)(B|Z)/mark -\= Expect no match +\= Expect no match AAAC No match, mark = A @@ -10846,17 +10846,17 @@ No match, mark = A 0: c c\=notempty 0: c - + /(?1)c(?(DEFINE)((*ACCEPT)b))/ c 0: c c\=notempty 0: c - + /(?>(*ACCEPT)b)c/ c 0: -\= Expect no match +\= Expect no match c\=notempty No match @@ -10871,7 +10871,7 @@ No match ac\=ovector=1 0: ac 0+ - + /(a)(b)x|abc/allaftertext abc\=ovector=2 0: abc @@ -10944,7 +10944,7 @@ Subject length lower bound = 6 1: 2: 3: baz - foobarbazX + foobarbazX 0: bazX 1: 2: @@ -11260,7 +11260,7 @@ Subject length lower bound = 0 0: aaaazzzzb 1: zzzz \= Expect no match - aazz + aazz No match /(.)(\1|a(?2))/ @@ -11268,15 +11268,15 @@ No match 0: bab 1: b 2: ab - + /\1|(.)(?R)\1/ cbbbc 0: cbbbc 1: c - + /(.)((?(1)c|a)|a(?2))/ \= Expect no match - baa + baa No match /(?P (?P=abn)xxx)/B @@ -11386,7 +11386,7 @@ No match /a[\NB]c/ Failed: error 171 at offset 4: \N is not supported in a class aNc - + /a[B-\Nc]/ Failed: error 150 at offset 6: invalid range in character class @@ -11400,14 +11400,14 @@ Failed: error 171 at offset 5: \N is not supported in a class # This test, with something more complicated than individual letters, causes # different behaviour in Perl. Perhaps it disables some optimization; no tag is # passed back for the failures, whereas in PCRE2 there is a tag. - + /(A|P)(*:A)(B|P) | (X|P)(X|P)(*:B)(Y|P)/x,mark AABC 0: AB 1: A 2: B MK: A - XXYZ + XXYZ 0: XXY 1: 2: @@ -11416,40 +11416,40 @@ MK: A 5: Y MK: B \= Expect no match - XAQQ + XAQQ No match, mark = A - XAQQXZZ + XAQQXZZ No match, mark = A - AXQQQ + AXQQQ No match, mark = A - AXXQQQ + AXXQQQ No match, mark = B # Perl doesn't give marks for these, though it does if the alternatives are -# replaced by single letters. - +# replaced by single letters. + /(b|q)(*:m)f|a(*:n)w/mark - aw + aw 0: aw MK: n -\= Expect no match +\= Expect no match abc No match, mark = m /(q|b)(*:m)f|a(*:n)w/mark - aw + aw 0: aw MK: n -\= Expect no match +\= Expect no match abc No match, mark = m -# After a partial match, the behaviour is as for a failure. +# After a partial match, the behaviour is as for a failure. /^a(*:X)bcde/mark abc\=ps Partial match, mark=X: abc - + # These are here because Perl doesn't return a mark, except for the first. /(?=(*:x))(q|)/aftertext,mark @@ -11579,7 +11579,7 @@ Partial match: ababa abababx 0: abababx 1: ab - ababababx + ababababx 0: ababababx 1: ab @@ -11593,10 +11593,10 @@ Partial match: ababa abababx 0: abababx 1: ab - ababababx + ababababx 0: ababababx 1: ab - + /^(..)(\1{2,3})ab/ abababab 0: abababab @@ -11608,7 +11608,7 @@ Partial match: ababa 0: \x0d \r\=ph Partial match: \x0d - + /^\R{2,3}x/ \r\=ps Partial match: \x0d @@ -11624,7 +11624,7 @@ Partial match: \x0d\x0d\x0d Partial match: \x0d\x0d\x0d \r\rx 0: \x0d\x0dx - \r\r\rx + \r\r\rx 0: \x0d\x0d\x0dx /^\R{2,3}?x/ @@ -11642,9 +11642,9 @@ Partial match: \x0d\x0d\x0d Partial match: \x0d\x0d\x0d \r\rx 0: \x0d\x0dx - \r\r\rx + \r\r\rx 0: \x0d\x0d\x0dx - + /^\R?x/ \r\=ps Partial match: \x0d @@ -11652,7 +11652,7 @@ Partial match: \x0d Partial match: \x0d x 0: x - \rx + \rx 0: \x0dx /^\R+x/ @@ -11664,7 +11664,7 @@ Partial match: \x0d Partial match: \x0d\x0a \r\n\=ph Partial match: \x0d\x0a - \rx + \rx 0: \x0dx /^a$/newline=crlf @@ -11698,7 +11698,7 @@ Partial match: a\x0d 0: \x0d \r\=ph Partial match: \x0d - + /.{2,3}/newline=crlf \r\=ps Partial match: \x0d @@ -11731,9 +11731,9 @@ Partial match: \x0d\x0d ABCDGHI\=ovector=01 Matched, but too many substrings 0: ABCD - + # These are all run as real matches in test 1; here we are just checking the -# settings of the anchored and startline bits. +# settings of the anchored and startline bits. /(?>.*?a)(?<=ba)/I Capturing subpattern count = 0 @@ -11845,7 +11845,7 @@ Callout 2: last capture = 0 --->aab ^ ^ b 0: aab - + /(?:(a)++(?C1)bb|aa(?C2)b)/ aab\=callout_capture Callout 1: last capture = 1 @@ -11856,7 +11856,7 @@ Callout 2: last capture = 0 --->aab ^ ^ b 0: aab - + /(?:(?>(a))(?C1)bb|aa(?C2)b)/ aab\=callout_capture Callout 1: last capture = 1 @@ -11926,7 +11926,7 @@ Callout 2: last capture = 0 0: ab ab\=ovector=1 0: ab - + /(?<=123)(*MARK:xx)abc/mark xxxx123a\=ph Partial match, mark=xx: 123a @@ -11934,7 +11934,7 @@ Partial match, mark=xx: 123a xxxx123a\=ps Partial match, mark=xx: 123a <<< - + /123\Kabc/startchar xxxx123a\=ph Partial match: 123a @@ -11972,18 +11972,18 @@ Partial match: 123a /aaaaa(*COMMIT)(*PRUNE)b|a+c/ aaaaaac 0: aaaac - + # Here are some that Perl treats differently because of the way it handles -# backtracking verbs. +# backtracking verbs. /(?!a(*COMMIT)b)ac|ad/ ac 0: ac - ad + ad 0: ad /^(?!a(*THEN)b|ac)../ - ad + ad 0: ad \= Expect no match ac @@ -11992,7 +11992,7 @@ No match /^(?=a(*THEN)b|ac)/ ac 0: - + /\A.*?(?:a|b(*THEN)c)/ ba 0: ba @@ -12006,7 +12006,7 @@ No match 0: ba /(?:(a(*MARK:X)a+(*SKIP:X)b)){0}(?:(?1)|aac)/ - aac + aac 0: aac /\A.*?(a|b(*THEN)c)/ @@ -12015,10 +12015,10 @@ No match 1: a /^(A(*THEN)B|A(*THEN)D)/ - AD + AD 0: AD 1: AD - + /(?!b(*THEN)a)bn|bnn/ bnn 0: bn @@ -12031,7 +12031,7 @@ No match bnn 0: bn -# This test causes a segfault with Perl 5.18.0 +# This test causes a segfault with Perl 5.18.0 /^(?=(a)){0}b(?1)/ backgammon @@ -13162,7 +13162,7 @@ Starting code units: a b c d Last code unit = 'd' Subject length lower bound = 1 -# End of special auto-possessive tests +# End of special auto-possessive tests /^A\o{1239}B/ Failed: error 164 at offset 8: non-octal character in \o{} (closing brace missing?) @@ -13170,7 +13170,7 @@ Failed: error 164 at offset 8: non-octal character in \o{} (closing brace missin /^A\oB/ Failed: error 155 at offset 4: missing opening brace after \o - + /^A\x{zz}B/ Failed: error 167 at offset 5: non-hex character in \x{} (closing brace missing?) @@ -13332,18 +13332,18 @@ Failed: error 144 at offset 5: group name must start with a non-digit ------------------------------------------------------------------ little red riding hood 0: red - a /red/ thing + a /red/ thing 0: red red is a colour 0: red - put it all on red + put it all on red 0: red \= Expect no match no reduction No match Alfred Winifred No match - + /[a[:<:]] should give error/ Failed: error 130 at offset 4: unknown POSIX class name @@ -13357,7 +13357,7 @@ Start of matched string is beyond its end - displaying from end to start. \= Expect no match xx\nxabcd No match - + # Test stack guard external calls. /(((a)))/stackguard=1 @@ -13411,9 +13411,9 @@ Failed: error 115 at offset 2: reference to non-existent subpattern /A\9B/ Failed: error 115 at offset 2: reference to non-existent subpattern -# This one is here because Perl fails to match "12" for this pattern when the $ +# This one is here because Perl fails to match "12" for this pattern when the $ # is present. - + /^(?(?=abc)\w{3}:|\d\d)$/ abc: 0: abc: @@ -13422,10 +13422,10 @@ Failed: error 115 at offset 2: reference to non-existent subpattern \= Expect no match 123 No match - xyz + xyz No match -# Perl gets this one wrong, giving "a" as the after text for ca and failing to +# Perl gets this one wrong, giving "a" as the after text for ca and failing to # match for cd. /(?(?=ab)ab)/aftertext @@ -13435,11 +13435,11 @@ No match ca 0: 0+ ca - cd + cd 0: 0+ cd - -# This should test both paths for processing OP_RECURSE. + +# This should test both paths for processing OP_RECURSE. /(?(R)a+|(?R)b)/ aaaabcde @@ -13456,14 +13456,14 @@ No match 0: a ba 0: b - cb + cb 0: b /(*NOTEMPTY_ATSTART)a*?b*?/aftertext ab 0: a 0+ b - cdab + cdab 0: 0+ dab @@ -13472,7 +13472,7 @@ Capturing subpattern count = 0 Subject length lower bound = 2 yesno 0: yes - + /(?(VERSION=8)yes){3}/BI,aftertext ------------------------------------------------------------------ Bra @@ -13496,7 +13496,7 @@ Subject length lower bound = 6 yesnononoyes 0: nonono \= Expect no match - yesno + yesno No match /(?:(? abc)|xyz)(?(VERSION)yes|no)/I @@ -13514,7 +13514,7 @@ Subject length lower bound = 5 \= Expect no match abcno No match - xyzyes + xyzyes No match /(?(VERSION<10)yes|no)/ @@ -13548,7 +13548,7 @@ Subject length lower bound = 1 abd 0: abd 1: ab - xyd + xyd 0: d /(|ab)*?d/I,no_start_optimize @@ -13558,7 +13558,7 @@ Subject length lower bound = 0 abd 0: abd 1: ab - xyd + xyd 0: d /\k*(?aa)(?bb)/match_unset_backref,dupnames @@ -13645,7 +13645,7 @@ Failed: error -58 at offset 4 in replacement: expected closing curly bracket in /abc/replace=[9]XYZ 123abc123 Failed: error -48: no more memory - + /abc/replace=xyz 1abc2\=partial_hard Failed: error -34: bad option value @@ -13663,29 +13663,29 @@ Failed: error -34: bad option value /(?<=abc)(|def)/g,replace=<$0> 123abcxyzabcdef789abcpqr 4: 123abc<>xyzabc<> 789abc<>pqr - + /./replace=$0 a 1: a - + /(.)(.)/replace=$2+$1 abc 1: b+ac - + /(?.)(?.)/replace=$B+$A abc 1: b+ac - + /(.)(.)/g,replace=$2$1 - abcdefgh + abcdefgh 4: badcfehg - + /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=${*MARK} apple lemon blackberry 3: pear orange strawberry apple strudel 1: pear strudel - fruitless + fruitless 0: fruitless /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/replace=${*MARK} sauce, @@ -13697,10 +13697,10 @@ Failed: error -34: bad option value 3: apple strudel 1: strudel - fruitless + fruitless 0: fruitless - -/(*:pear)apple/g,replace=${*MARKING} + +/(*:pear)apple/g,replace=${*MARKING} apple lemon blackberry Failed: error -35 at offset 11 in replacement: invalid replacement string @@ -13708,7 +13708,7 @@ Failed: error -35 at offset 11 in replacement: invalid replacement string apple lemon blackberry Failed: error -58 at offset 7 in replacement: expected closing curly bracket in replacement string -/(*:pear)apple/g,replace=${*mark} +/(*:pear)apple/g,replace=${*mark} apple lemon blackberry Failed: error -35 at offset 8 in replacement: invalid replacement string @@ -13772,13 +13772,13 @@ Get substring 3 failed (-54): requested value is not available Get substring 4 failed (-49): unknown substring 0L c 1L - + /x(?=ab\K)/ - xab\=get=0 + xab\=get=0 Start of matched string is beyond its end - displaying from end to start. 0: ab 0G (0) - xab\=copy=0 + xab\=copy=0 Start of matched string is beyond its end - displaying from end to start. 0: ab 0C (0) @@ -13944,7 +13944,7 @@ Failed: error 109 at offset 7: quantifier does not follow a repeatable item 456 0: 456 \= Expect no match - 356 + 356 No match '^(a)*+(\w)' @@ -13952,7 +13952,7 @@ No match 0: g 1: 2: g - g\=ovector=1 + g\=ovector=1 Matched, but too many substrings 0: g @@ -13960,10 +13960,10 @@ Matched, but too many substrings g 0: g 1: g - g\=ovector=1 + g\=ovector=1 Matched, but too many substrings 0: g - + # These two pattern showeds up compile-time bugs "((?2){0,1999}())?" @@ -14135,7 +14135,7 @@ Callout 25 (?= --->abcdefg 25 ^ (?= 0: abcd - xyz123 + xyz123 --->xyz123 25 ^ (?= 0: xyz @@ -14161,7 +14161,7 @@ Callout (7): $abc$ --->abcdefg ^ (?= 0: abcd - xyz123 + xyz123 Callout (7): $abc$ --->xyz123 ^ (?= @@ -14205,8 +14205,8 @@ Callout (5): 'x\x00z' /(?(?!)a|b)/ bbb 0: b -\= Expect no match - aaa +\= Expect no match + aaa No match # JIT gives a different error message for the infinite recursion @@ -14308,9 +14308,9 @@ Subject length lower bound = 0 \= Expect no match \[9x!xxx(]{9999} No match - + /(abc)*/ - \[abc]{5} + \[abc]{5} 0: abcabcabcabcabc 1: abc @@ -14356,7 +14356,7 @@ Failed: error 115 at offset 2: reference to non-existent subpattern /A\8B\9C/ Failed: error 115 at offset 2: reference to non-existent subpattern A8B9C - + /(?x:((?'a')) # comment (with parentheses) and | vertical (?-x:#not a comment (?'b')) # this is a comment () (?'c')) # not a comment (?'d')/info @@ -14384,14 +14384,14 @@ Subject length lower bound = 1 1: 2: 2 3: - B32A + B32A 0: 3 1: 2: 3: 3 # These are some patterns that used to cause buffer overflows or other errors -# while compiling. +# while compiling. /.((?2)(?R)|\1|$)()/B ------------------------------------------------------------------ @@ -14623,7 +14623,7 @@ Subject length lower bound = 0 0: {4,5a}bc /\x0{ab}/ - \0{ab} + \0{ab} 0: \x00{ab} /^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-02}Z/ @@ -14683,9 +14683,9 @@ No match aacb No match -/(*MARK:a\zb)z/alt_verbnames +/(*MARK:a\zb)z/alt_verbnames Failed: error 140 at offset 10: invalid escape sequence in (*VERB) name - + /(*:ab\t(d\)c)xxx/ Failed: error 122 at offset 12: unmatched closing parenthesis @@ -14698,38 +14698,38 @@ MK: ab\x09(d)c x 0: x MK: Axx)xB - + /(*:A\ExxxB)x/alt_verbnames,mark - x + x 0: x MK: AxxxB - + /(*: A \ and #comment \ B)x/x,alt_verbnames,mark - x + x 0: x MK: A and B - + /(*: A \ and #comment \ B)x/alt_verbnames,mark - x + x 0: x MK: A and #comment\x0a B - + /(*: A \ and #comment \ B)x/x,mark - x + x 0: x MK: A \ and #comment\x0a \ B - + /(*: A \ and #comment \ B)x/mark - x + x 0: x MK: A \ and #comment\x0a \ B - + /(*:A -B)x/alt_verbnames,mark +B)x/alt_verbnames,mark x 0: x MK: A\x0aB @@ -14758,7 +14758,7 @@ No match \= Expect no match 1234abc\=offset_limit=6 No match - + /A/g,replace=-,use_offset_limit XAXAXAXAXA\=offset_limit=4 2: X-X-XAXAXA @@ -14777,20 +14777,20 @@ No match /abcd/null_context abcd\=null_context 0: abcd -\= Expect error +\= Expect error abcd\=null_context,find_limits ** Not allowed together: find_limits null_context - abcd\=allusedtext,startchar + abcd\=allusedtext,startchar ** Not allowed together: allusedtext startchar /abcd/replace=w\rx\x82y\o{333}z(\Q12\$34$$\x34\E5$$),substitute_extended abcd 1: w\x0dx\x82y\xdbz(12\$34$$\x345$) - + /a(bc)(DE)/replace=a\u$1\U$1\E$1\l$2\L$2\Eab\Uab\LYZ\EDone,substitute_extended abcDE 1: aBcBCbcdEdeabAByzDone - + /abcd/replace=xy\kz,substitute_extended abcd Failed: error -57 at offset 4 in replacement: bad escape sequence in replacement string @@ -14844,9 +14844,9 @@ Failed: error -49 at offset 10 in replacement: unknown substring /(?J)(?:(?a)|(?b))/replace=<$A> [a] 1: [] - [b] + [b] 1: [] -\= Expect error +\= Expect error (a)\=ovector=1 Failed: error -54 at offset 3 in replacement: requested value is not available @@ -14890,11 +14890,11 @@ Subject length lower bound = 1 /(?=a\K)/replace=z BaCaD Failed: error -60: match with end before start is not supported - + /(?'abcdefghijklmnopqrstuvwxyzABCDEFG'toolong)/ Failed: error 148 at offset 36: subpattern name is too long (maximum 32 characters) - -/(?'abcdefghijklmnopqrstuvwxyzABCDEF'justright)/ + +/(?'abcdefghijklmnopqrstuvwxyzABCDEF'justright)/ # These two use zero-termination /abcd/max_pattern_length=3 @@ -15246,7 +15246,7 @@ Failed: error 162 at offset 49: subpattern name expected /a|(b)c/replace=>$1<,substitute_unset_empty cat 1: c> b $X<,substitute_unset_empty cat 1: c> b $Y<,substitute_unset_empty cat Failed: error -49 at offset 3 in replacement: unknown substring - cat\=substitute_unknown_unset + cat\=substitute_unknown_unset 1: c> $2<,substitute_unset_empty cat Failed: error -49 at offset 3 in replacement: unknown substring - cat\=substitute_unknown_unset + cat\=substitute_unknown_unset 1: c> [ab])...(?<=\k'A')z/ a11az 0: a11az 1: a - b11bz + b11bz 0: b11bz 1: b \= Expect no match - b11az + b11az No match /(?[ab])...(?<=\k'A')(?)z/dupnames @@ -15439,8 +15439,8 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length 1 ^ ^ c +8 ^ ^ 0: abc - -# Perl accepts these, but gives a warning. We can't warn, so give an error. + +# Perl accepts these, but gives a warning. We can't warn, so give an error. /[a-[:digit:]]+/ Failed: error 150 at offset 4: invalid range in character class @@ -15567,7 +15567,7 @@ Contains explicit CR or LF match Subject length lower bound = 1 # This checks that new code for handling groups that may match an empty string -# works on a very large number of alternatives. This pattern used to provoke a +# works on a very large number of alternatives. This pattern used to provoke a # complaint that it was too complicated. /(?:\[A|B|C|D|E|F|G|H|I|J|]{200}Z)/expand @@ -15630,14 +15630,14 @@ Subject length lower bound = 11 // \=ovector=7777777777 ** Invalid value in 'ovector=7777777777' - -# This is here because Perl matches, even though a COMMIT is encountered -# outside of the recursion. + +# This is here because Perl matches, even though a COMMIT is encountered +# outside of the recursion. /(?1)(A(*COMMIT)|B)D/ BAXBAD No match - + "(?1){2}(a)"B ------------------------------------------------------------------ Bra @@ -15673,22 +15673,22 @@ No match ------------------------------------------------------------------ # This test differs from Perl for the first subject. Perl ends up with -# $1 set to 'B'; PCRE2 has it unset (which I think is right). +# $1 set to 'B'; PCRE2 has it unset (which I think is right). /^(?: -(?:A| (?:B|B(*ACCEPT)) (?<=(.)) D) +(?:A| (?:B|B(*ACCEPT)) (?<=(.)) D) (Z) )+$/x AZB 0: AZB 1: 2: Z - AZBDZ + AZBDZ 0: AZBDZ 1: B 2: Z - -# The first of these, when run by Perl, gives the mark 'aa', which is wrong. + +# The first of these, when run by Perl, gives the mark 'aa', which is wrong. '(?>a(*:aa))b|ac' mark ac @@ -15697,7 +15697,7 @@ No match '(?:a(*:aa))b|ac' mark ac 0: ac - + /(R?){65}/ (R?){65} 0: @@ -15735,7 +15735,7 @@ Callout 1: last capture = 1 \= Expect no match abc No match - + # Perl gives no match for this one /(a(*MARK:m)(*ACCEPT)){0}(?1)/mark @@ -15838,7 +15838,7 @@ No match \= Expect no match bbb No match - cc + cc No match /^X\S/no_start_optimize,no_auto_possess @@ -15910,7 +15910,7 @@ No match \= Expect no match X No match - X\n + X\n No match /^X\D+?/no_start_optimize,no_auto_possess @@ -15924,21 +15924,21 @@ No match \= Expect no match X No match - X\n + X\n No match /^X\W+?/no_start_optimize,no_auto_possess \= Expect no match X No match - XX + XX No match /^X.+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\n No match - + /(*CRLF)^X.+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\r\=ps @@ -15954,9 +15954,9 @@ No match No match X\n\nY No match - X\n\x{0c}Y + X\n\x{0c}Y No match - + /(*BSR_ANYCRLF)^X\R+?Z/no_start_optimize,no_auto_possess \= Expect no match X\nX @@ -15967,14 +15967,14 @@ No match No match X\n\nY No match - X\n\x{0c}Y + X\n\x{0c}Y No match - + /^X\H+?Z/no_start_optimize,no_auto_possess \= Expect no match XY\t No match - XYY + XYY No match /^X\h+?Z/no_start_optimize,no_auto_possess @@ -15988,7 +15988,7 @@ No match \= Expect no match XY\n No match - XYY + XYY No match /^X\v+?Z/no_start_optimize,no_auto_possess @@ -16002,7 +16002,7 @@ No match \= Expect no match XY9 No match - XYY + XYY No match /^X\d+?Z/no_start_optimize,no_auto_possess @@ -16016,7 +16016,7 @@ No match \= Expect no match XY\n No match - XYY + XYY No match /^X\s+?Z/no_start_optimize,no_auto_possess @@ -16030,14 +16030,14 @@ No match \= Expect no match X.A No match - X++ + X++ No match /^X\w+?Z/no_start_optimize,no_auto_possess \= Expect no match Xa. No match - Xaa + Xaa No match /^X.{1,3}Z/s,no_start_optimize,no_auto_possess @@ -16056,14 +16056,14 @@ No match \= Expect no match XY\n No match - XYY + XYY No match /^(X(*THEN)Y|AB){0}(?1)/ ABX 0: AB \= Expect no match - XAB + XAB No match /^(?!A(?C1)B)C/ @@ -16272,7 +16272,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL /cat|dog/match_word the cat sat 0: cat -\= Expect no match +\= Expect no match caterpillar No match snowcat @@ -16283,7 +16283,7 @@ No match /(cat)|dog/match_line,literal (cat)|dog 0: (cat)|dog -\= Expect no match +\= Expect no match the cat sat No match caterpillar @@ -16297,10 +16297,35 @@ No match Rhubarb \na whole line\n custard 0: a whole line \= Expect no match - Not a whole line + Not a whole line No match -# End of testinput2 +# Perl gets this wrong, failing to capture 'b' in group 1. + +/^(b+|a){1,2}?bc/ + bbc + 0: bbc + 1: b + +# And again here, for the "babc" subject string. + +/^(b*|ba){1,2}?bc/ + babc + 0: babc + 1: ba + bbabc + 0: bbabc + 1: ba + bababc + 0: bababc + 1: ba +\= Expect no match + bababbc +No match + babababc +No match + +# End of testinput2 Error -65: PCRE2_ERROR_BADDATA (unknown error number) Error -62: bad serialized data Error -2: partial match diff --git a/testdata/testoutput8-32-3 b/testdata/testoutput8-32-3 index 83e3086..30667a3 100644 --- a/testdata/testoutput8-32-3 +++ b/testdata/testoutput8-32-3 @@ -853,10 +853,8 @@ Memory allocation (code space): 28 # with link size - hence multiple tests with different values. /(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 -Failed: error 186 at offset 5813: regular expression is too complicated /(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 -Failed: error 186 at offset 5820: regular expression is too complicated /(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 Failed: error 186 at offset 12820: regular expression is too complicated diff --git a/testdata/testoutput8-32-4 b/testdata/testoutput8-32-4 index 83e3086..30667a3 100644 --- a/testdata/testoutput8-32-4 +++ b/testdata/testoutput8-32-4 @@ -853,10 +853,8 @@ Memory allocation (code space): 28 # with link size - hence multiple tests with different values. /(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 -Failed: error 186 at offset 5813: regular expression is too complicated /(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 -Failed: error 186 at offset 5820: regular expression is too complicated /(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 Failed: error 186 at offset 12820: regular expression is too complicated