diff --git a/doc/pcre2api.3 b/doc/pcre2api.3 index a3fd0a8..bd1108d 100644 --- a/doc/pcre2api.3 +++ b/doc/pcre2api.3 @@ -1,4 +1,4 @@ -.TH PCRE2API 3 "13 December 2014" "PCRE2 10.00" +.TH PCRE2API 3 "14 December 2014" "PCRE2 10.00" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .sp @@ -921,6 +921,16 @@ PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that contains the compiled pattern and related data. The caller must free the memory by calling \fBpcre2_code_free()\fP when it is no longer needed. .P +NOTE: When one of the matching functions is called, pointers to the compiled +pattern and the subject string are set in the match data block so that they can +be referenced by the extraction functions. After running a match, you must not +free a compiled pattern (or a subject string) until after all operations on the +.\" HTML +.\" +match data block +.\" +have taken place. +.P If the compile context argument \fIccontext\fP is NULL, memory for the compiled pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from the same memory function that was used for the compile context. @@ -1683,7 +1693,7 @@ pattern with the JIT compiler does not alter the value returned by this option. .B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP); .fi .P -Information about successful and unsuccessful matches is placed in a match +Information about a successful or unsuccessful match is placed in a match data block, which is an opaque structure that is accessed by function calls. In particular, the match data block contains a vector of offsets into the subject string that define the matched part of the subject and any substrings that were @@ -1713,10 +1723,8 @@ memory is obtained using the same allocator that was used for the compiled pattern (custom or default). .P A match data block can be used many times, with the same or different compiled -patterns. When it is no longer needed, it should be freed by calling -\fBpcre2_match_data_free()\fP. You can extract information from a match data -block after a match operation has finished, using functions that are described -in the sections on +patterns. You can extract information from a match data block after a match +operation has finished, using functions that are described in the sections on .\" HTML .\" matched strings @@ -1727,6 +1735,15 @@ and other match data .\" below. +.P +When one of the matching functions is called, pointers to the compiled pattern +and the subject string are set in the match data block so that they can be +referenced by the extraction functions. After running a match, you must not +free a compiled pattern or a subject string until after all operations on the +match data block (for that match) have taken place. +.P +When a match data block itself is no longer needed, it should be freed by +calling \fBpcre2_match_data_free()\fP. . . .SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION" @@ -2053,8 +2070,13 @@ returned value is 3. If there are no capturing subpatterns, the return value from a successful match is 1, indicating that just the first pair of offsets has been set. .P -If a capturing subpattern is matched repeatedly within a single match -operation, it is the last portion of the string that it matched that is +If a pattern uses the \eK escape sequence within a positive assertion, the +reported start of the match can be greater than the end of the match. For +example, if the pattern (?=ab\eK) is matched against "ab", the start and end +offset values for the match are 2 and 0. +.P +If a capturing subpattern group is matched repeatedly within a single match +operation, it is the last portion of the subject that it matched that is returned. .P If the ovector is too small to hold all the captured substring offsets, as much @@ -2268,23 +2290,31 @@ above. .\" For convenience, auxiliary functions are provided for extracting captured substrings as new, separate, zero-terminated strings. The functions in this -section identify substrings by number. The next section describes similar -functions for extracting substrings by name. A substring that contains a binary -zero is correctly extracted and has a further zero added on the end, but the -result is not, of course, a C string. +section identify substrings by number. The number zero refers to the entire +matched substring, with higher numbers referring to substrings captured by +parenthesized groups. The next section describes similar functions for +extracting captured substrings by name. A substring that contains a binary zero +is correctly extracted and has a further zero added on the end, but the result +is not, of course, a C string. +.P +If a pattern uses the \eK escape sequence within a positive assertion, the +reported start of the match can be greater than the end of the match. For +example, if the pattern (?=ab\eK) is matched against "ab", the start and end +offset values for the match are 2 and 0. In this situation, calling these +functions with a zero substring number extracts a zero-length empty string. .P You can find the length in code units of a captured substring without extracting it by calling \fBpcre2_substring_length_bynumber()\fP. The first argument is a pointer to the match data block, the second is the group number, -and the third is a pointer to a variable into which the length is placed. +and the third is a pointer to a variable into which the length is placed. If +you just want to know whether or not the substring has been captured, you can +pass the third argument as NULL. .P -The \fBpcre2_substring_copy_bynumber()\fP function copies one string into a -supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it into -new memory, obtained using the same memory allocation function that was used -for the match data block. The first two arguments of these functions are a -pointer to the match data block and a capturing group number. A group number of -zero extracts the substring that matched the entire pattern, and higher values -extract the captured substrings. +The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring +into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it +into new memory, obtained using the same memory allocation function that was +used for the match data block. The first two arguments of these functions are a +pointer to the match data block and a capturing group number. .P The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to the buffer and a pointer to a variable that contains its length in code units. @@ -2297,8 +2327,9 @@ of code units that comprise the substring, again excluding the terminating zero. When the substring is no longer needed, the memory should be freed by calling \fBpcre2_substring_free()\fP. .P -The return value from these functions is zero for success, or one of these -error codes: +The return value from all these functions is zero for success, or a negative +error code. If the pattern match failed, the match failure code is returned. +Other possible error codes are: .sp PCRE2_ERROR_NOMEMORY .sp @@ -2319,7 +2350,8 @@ could not be captured. PCRE2_ERROR_UNSET .sp The substring did not participate in the match. For example, if the pattern is -(abc)|(def) and the subject is "def", substring number 1 is unset. +(abc)|(def) and the subject is "def", and the ovector contains at least two +capturing slots, substring number 1 is unset. . . .SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS" @@ -2388,15 +2420,20 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the compiled pattern, and the second is the name. The yield of the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of -that name. +that name. Given the number, you can extract the substring directly, or use one +of the functions described above. .P -Given the number, you can extract the substring directly, or use one of the -functions described above. For convenience, there are also "byname" functions -that correspond to the "bynumber" functions, the only difference being that the -second argument is a name instead of a number. If PCRE2_DUPNAMES is -set and there are duplicate names, these functions return the first named -string that is set. PCRE2_ERROR_UNSET is returned only if all groups of the -same name are unset. +For convenience, there are also "byname" functions that correspond to the +"bynumber" functions, the only difference being that the second argument is a +name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate +names, these functions scan all the groups with the given name, and return the +first named string that is set. +.P +If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is +returned. If all groups with the name have numbers that are greater than the +number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there +is at least one group with a slot in the ovector, but no group is found to be +set, PCRE2_ERROR_UNSET is returned. .P \fBWarning:\fP If the pattern uses the (?| feature to set up multiple subpatterns with the same number, as described in the @@ -2660,17 +2697,36 @@ is matched against the string .sp the three matched strings are .sp - - + + .sp On success, the yield of the function is a number greater than zero, which is the number of matched substrings. The offsets of the substrings are returned in -the ovector, and can be extracted in the same way as for \fBpcre2_match()\fP. -They are returned in reverse order of length; that is, the longest -matching string is given first. If there were too many matches to fit into -the ovector, the yield of the function is zero, and the vector is filled with -the longest matches. +the ovector, and can be extracted by number in the same way as for +\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups +that may exist in the pattern, because DFA matching does not support group +capture. +.P +Calls to the convenience functions that extract substrings by name +return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a +DFA match. The convenience functions that extract substrings by number never +return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are +slightly different: +.sp + PCRE2_ERROR_UNAVAILABLE +.sp +The ovector is not big enough to include a slot for the given substring number. +.sp + PCRE2_ERROR_UNSET +.sp +There is a slot in the ovector for this substring, but there were insufficient +matches to fill it. +.P +The matched strings are stored in the ovector in reverse order of length; that +is, the longest matching string is first. If there were too many matches to fit +into the ovector, the yield of the function is zero, and the vector is filled +with the longest matches. .P NOTE: PCRE2's "auto-possessification" optimization usually applies to character repeats at the end of a pattern (as well as internally). For example, the @@ -2746,6 +2802,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 13 December 2014 +Last updated: 14 December 2014 Copyright (c) 1997-2014 University of Cambridge. .fi diff --git a/src/pcre2.h.in b/src/pcre2.h.in index 866ac52..00c7460 100644 --- a/src/pcre2.h.in +++ b/src/pcre2.h.in @@ -212,20 +212,21 @@ context functions. */ #define PCRE2_ERROR_DFA_BADRESTART (-38) #define PCRE2_ERROR_DFA_RECURSE (-39) #define PCRE2_ERROR_DFA_UCOND (-40) -#define PCRE2_ERROR_DFA_UITEM (-41) -#define PCRE2_ERROR_DFA_WSSIZE (-42) -#define PCRE2_ERROR_INTERNAL (-43) -#define PCRE2_ERROR_JIT_BADOPTION (-44) -#define PCRE2_ERROR_JIT_STACKLIMIT (-45) -#define PCRE2_ERROR_MATCHLIMIT (-46) -#define PCRE2_ERROR_NOMEMORY (-47) -#define PCRE2_ERROR_NOSUBSTRING (-48) -#define PCRE2_ERROR_NOUNIQUESUBSTRING (-49) -#define PCRE2_ERROR_NULL (-50) -#define PCRE2_ERROR_RECURSELOOP (-51) -#define PCRE2_ERROR_RECURSIONLIMIT (-52) -#define PCRE2_ERROR_UNAVAILABLE (-53) -#define PCRE2_ERROR_UNSET (-54) +#define PCRE2_ERROR_DFA_UFUNC (-41) +#define PCRE2_ERROR_DFA_UITEM (-42) +#define PCRE2_ERROR_DFA_WSSIZE (-43) +#define PCRE2_ERROR_INTERNAL (-44) +#define PCRE2_ERROR_JIT_BADOPTION (-45) +#define PCRE2_ERROR_JIT_STACKLIMIT (-46) +#define PCRE2_ERROR_MATCHLIMIT (-47) +#define PCRE2_ERROR_NOMEMORY (-48) +#define PCRE2_ERROR_NOSUBSTRING (-49) +#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50) +#define PCRE2_ERROR_NULL (-51) +#define PCRE2_ERROR_RECURSELOOP (-52) +#define PCRE2_ERROR_RECURSIONLIMIT (-53) +#define PCRE2_ERROR_UNAVAILABLE (-54) +#define PCRE2_ERROR_UNSET (-55) /* Request types for pcre2_pattern_info() */ diff --git a/src/pcre2_dfa_match.c b/src/pcre2_dfa_match.c index 8b82115..ca57df9 100644 --- a/src/pcre2_dfa_match.c +++ b/src/pcre2_dfa_match.c @@ -3275,6 +3275,12 @@ if ((re->flags & PCRE2_LASTSET) != 0) } } +/* Fill in fields that are always returned in the match data. */ + +match_data->code = re; +match_data->subject = subject; +match_data->mark = NULL; +match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER; /* Call the main matching function, looping for a non-anchored regex after a failed match. If not restarting, perform certain optimizations at the start of diff --git a/src/pcre2_error.c b/src/pcre2_error.c index e0c6db8..ce72fda 100644 --- a/src/pcre2_error.c +++ b/src/pcre2_error.c @@ -212,18 +212,19 @@ static const char match_error_texts[] = "invalid data in workspace for DFA restart\0" "too much recursion for DFA matching\0" /* 40 */ - "backreference condition or recursion test not supported for DFA matching\0" - "item unsupported for DFA matching\0" + "backreference condition or recursion test is not supported for DFA matching\0" + "function is not supported for DFA matching\0" + "pattern contains an item that is not supported for DFA matching\0" "workspace size exceeded in DFA matching\0" "internal error - pattern overwritten?\0" - "bad JIT option\0" /* 45 */ + "bad JIT option\0" "JIT stack limit reached\0" "match limit exceeded\0" "no more memory\0" "unknown substring\0" - "non-unique substring name\0" /* 50 */ + "non-unique substring name\0" "NULL argument passed\0" "nested recursion at the same subject position\0" "recursion limit exceeded\0" diff --git a/src/pcre2_internal.h b/src/pcre2_internal.h index e174195..c8696bf 100644 --- a/src/pcre2_internal.h +++ b/src/pcre2_internal.h @@ -526,15 +526,16 @@ bytes in a code unit in that mode. */ #define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32) +/* Values for the matchedby field in a match data block. */ + +enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */ + PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */ + PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */ + /* Magic number to provide a small check against being handed junk. */ #define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */ -/* This value is used to detect a loaded regular expression in different -endianness. */ - -#define REVERSED_MAGIC_NUMBER 0x45524350UL /* 'ERCP' */ - /* The maximum remaining length of subject we are prepared to search for a req_unit match. */ diff --git a/src/pcre2_intmodedep.h b/src/pcre2_intmodedep.h index bb8e6fc..eb713cb 100644 --- a/src/pcre2_intmodedep.h +++ b/src/pcre2_intmodedep.h @@ -616,12 +616,13 @@ typedef struct pcre2_real_match_data { pcre2_memctl memctl; const pcre2_real_code *code; /* The pattern used for the match */ PCRE2_SPTR subject; /* The subject that was matched */ - int rc; /* The return code from the match */ + PCRE2_SPTR mark; /* Pointer to last mark */ PCRE2_SIZE leftchar; /* Offset to leftmost code unit */ PCRE2_SIZE rightchar; /* Offset to rightmost code unit */ PCRE2_SIZE startchar; /* Offset to starting code unit */ - PCRE2_SPTR mark; /* Pointer to last mark */ + uint16_t matchedby; /* Type of match (normal, JIT, DFA) */ uint16_t oveccount; /* Number of pairs */ + int rc; /* The return code from the match */ PCRE2_SIZE ovector[1]; /* The first field */ } pcre2_real_match_data; diff --git a/src/pcre2_jit_match.c b/src/pcre2_jit_match.c index 89c06af..40a599a 100644 --- a/src/pcre2_jit_match.c +++ b/src/pcre2_jit_match.c @@ -180,6 +180,7 @@ match_data->startchar = arguments.startchar_ptr - subject; match_data->leftchar = 0; match_data->rightchar = 0; match_data->mark = arguments.mark_ptr; +match_data->matchedby = PCRE2_MATCHEDBY_JIT; return match_data->rc; diff --git a/src/pcre2_match.c b/src/pcre2_match.c index 438fa0a..23431aa 100644 --- a/src/pcre2_match.c +++ b/src/pcre2_match.c @@ -6995,6 +6995,7 @@ while (mb->ovecsave_chain != NULL) match_data->code = re; match_data->subject = subject; match_data->mark = mb->mark; +match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER; /* Handle a fully successful match. */ @@ -7026,14 +7027,15 @@ if (rc == MATCH_MATCH || rc == MATCH_ACCEPT) match_data->rc = ((mb->capture_last & OVFLBIT) != 0)? 0 : mb->end_offset_top/2; - /* If there is space in the offset vector, set any unused pairs at the end to - PCRE2_UNSET for backwards compatibility. It is documented that this happens. - In earlier versions, the whole set of potential capturing offsets was - initialized each time round the loop, but this is handled differently now. - "Gaps" are set to PCRE2_UNSET dynamically instead (this fixes a bug). Thus, - it is only those at the end that need setting here. We can't just set them - all at the start of the whole thing because they may get set in one branch - that is not the final matching branch. */ + /* If there is space in the offset vector, set any pairs that follow the + highest-numbered captured string but are less than the number of capturing + groups in the pattern (and are within the ovector) to PCRE2_UNSET. It is + documented that this happens. In earlier versions, the whole set of potential + capturing offsets was initialized each time round the loop, but this is + handled differently now. "Gaps" are set to PCRE2_UNSET dynamically instead + (this fixed a bug). Thus, it is only those at the end that need setting here. + We can't just mark them all unset at the start of the whole thing because + they may get set in one branch that is not the final matching branch. */ if (mb->end_offset_top/2 <= re->top_bracket) { diff --git a/src/pcre2_substring.c b/src/pcre2_substring.c index f5c56a3..5299def 100644 --- a/src/pcre2_substring.c +++ b/src/pcre2_substring.c @@ -64,27 +64,34 @@ Arguments: Returns: if successful: zero if not successful, a negative error code: (1) an error from nametable_scan() - (2) an error from copy_bynumber() - (3) PCRE2_ERROR_UNSET: all named groups are unset + (2) an error from copy_bynumber() + (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector + (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset */ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION pcre2_substring_copy_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname, PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr) { -PCRE2_SPTR first; -PCRE2_SPTR last; -PCRE2_SPTR entry; -int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, +PCRE2_SPTR first, last, entry; +int failrc, entrysize; +if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER) + return PCRE2_ERROR_DFA_UFUNC; +entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, &first, &last); if (entrysize < 0) return entrysize; +failrc = PCRE2_ERROR_UNAVAILABLE; for (entry = first; entry <= last; entry += entrysize) { uint32_t n = GET2(entry, 0); - if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET) - return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr); + if (n < match_data->oveccount) + { + if (match_data->ovector[n*2] != PCRE2_UNSET) + return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr); + failrc = PCRE2_ERROR_UNSET; + } } -return PCRE2_ERROR_UNSET; +return failrc; } @@ -146,26 +153,33 @@ Returns: if successful: zero if not successful, a negative value: (1) an error from nametable_scan() (2) an error from get_bynumber() - (3) PCRE2_ERROR_UNSET: all named groups are unset + (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector + (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset */ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION pcre2_substring_get_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr) { -PCRE2_SPTR first; -PCRE2_SPTR last; -PCRE2_SPTR entry; -int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, +PCRE2_SPTR first, last, entry; +int failrc, entrysize; +if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER) + return PCRE2_ERROR_DFA_UFUNC; +entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, &first, &last); if (entrysize < 0) return entrysize; +failrc = PCRE2_ERROR_UNAVAILABLE; for (entry = first; entry <= last; entry += entrysize) { uint32_t n = GET2(entry, 0); - if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET) - return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr); + if (n < match_data->oveccount) + { + if (match_data->ovector[n*2] != PCRE2_UNSET) + return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr); + failrc = PCRE2_ERROR_UNSET; + } } -return PCRE2_ERROR_UNSET; +return failrc; } @@ -251,19 +265,25 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION pcre2_substring_length_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname, PCRE2_SIZE *sizeptr) { -PCRE2_SPTR first; -PCRE2_SPTR last; -PCRE2_SPTR entry; -int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, +PCRE2_SPTR first, last, entry; +int failrc, entrysize; +if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER) + return PCRE2_ERROR_DFA_UFUNC; +entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, &first, &last); if (entrysize < 0) return entrysize; +failrc = PCRE2_ERROR_UNAVAILABLE; for (entry = first; entry <= last; entry += entrysize) { uint32_t n = GET2(entry, 0); - if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET) - return pcre2_substring_length_bynumber(match_data, n, sizeptr); + if (n < match_data->oveccount) + { + if (match_data->ovector[n*2] != PCRE2_UNSET) + return pcre2_substring_length_bynumber(match_data, n, sizeptr); + failrc = PCRE2_ERROR_UNSET; + } } -return PCRE2_ERROR_UNSET; +return failrc; } @@ -292,13 +312,23 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION pcre2_substring_length_bynumber(pcre2_match_data *match_data, uint32_t stringnumber, PCRE2_SIZE *sizeptr) { +int count; PCRE2_SIZE left, right; -if (stringnumber > match_data->code->top_bracket) - return PCRE2_ERROR_NOSUBSTRING; -if (stringnumber >= match_data->oveccount) - return PCRE2_ERROR_UNAVAILABLE; -if (match_data->ovector[stringnumber*2] == PCRE2_UNSET) - return PCRE2_ERROR_UNSET; +if ((count = match_data->rc) < 0) return count; /* Match failed */ +if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER) + { + if (stringnumber > match_data->code->top_bracket) + return PCRE2_ERROR_NOSUBSTRING; + if (stringnumber >= match_data->oveccount) + return PCRE2_ERROR_UNAVAILABLE; + if (match_data->ovector[stringnumber*2] == PCRE2_UNSET) + return PCRE2_ERROR_UNSET; + } +else /* Matched using pcre2_dfa_match() */ + { + if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE; + if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET; + } left = match_data->ovector[stringnumber*2]; right = match_data->ovector[stringnumber*2+1]; if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left; diff --git a/testdata/grepoutput b/testdata/grepoutput index 6f84141..7ba1320 100644 --- a/testdata/grepoutput +++ b/testdata/grepoutput @@ -384,15 +384,15 @@ aaaaa2 010203040506 RC=0 ======== STDERR ======== -pcre2grep: pcre2_match() gave error -46 while matching this text: +pcre2grep: pcre2_match() gave error -47 while matching this text: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -pcre2grep: pcre2_match() gave error -46 while matching this text: +pcre2grep: pcre2_match() gave error -47 while matching this text: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded. +pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded. pcre2grep: Check your regex for nested unlimited loops. ---------------------------- Test 38 ------------------------------ This line contains a binary zero here >< for testing. @@ -510,23 +510,23 @@ In the middle of a line, PATTERN appears. Check up on PATTERN near the end. RC=0 ---------------------------- Test 62 ----------------------------- -pcre2grep: pcre2_match() gave error -46 while matching text that starts: +pcre2grep: pcre2_match() gave error -47 while matching text that starts: This is a file of miscellaneous text that is used as test data for checking that the pcregrep command is working correctly. The file must be more than 24K long so that it needs more than a single read -pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded. +pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded. pcre2grep: Check your regex for nested unlimited loops. RC=1 ---------------------------- Test 63 ----------------------------- -pcre2grep: pcre2_match() gave error -52 while matching text that starts: +pcre2grep: pcre2_match() gave error -53 while matching text that starts: This is a file of miscellaneous text that is used as test data for checking that the pcregrep command is working correctly. The file must be more than 24K long so that it needs more than a single read -pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded. +pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded. pcre2grep: Check your regex for nested unlimited loops. RC=1 ---------------------------- Test 64 ------------------------------ diff --git a/testdata/testinput2 b/testdata/testinput2 index 63eee98..d3ac233 100644 --- a/testdata/testinput2 +++ b/testdata/testinput2 @@ -4090,5 +4090,11 @@ a random value. /Ix /x(?=ab\K)/ xab\=get=0 xab\=copy=0 + xab\=getall + +/(?a)|(?b)/dupnames + a\=ovector=1,copy=A,get=A,get=2 + a\=ovector=2,copy=A,get=A,get=2 + b\=ovector=2,copy=A,get=A,get=2 # End of testinput2 diff --git a/testdata/testinput6 b/testdata/testinput6 index d748136..bb0dbcd 100644 --- a/testdata/testinput6 +++ b/testdata/testinput6 @@ -4797,4 +4797,15 @@ ab cdab +/(a)(b)|(c)/ + XcX\=ovector=2,get=1,get=2,get=3,get=4,getall + +/(?aa)/ + aa\=get=A + aa\=copy=A + +/a+/no_auto_possess + a\=ovector=2,get=1,get=2,getall + aaa\=ovector=2,get=1,get=2,getall + # End of testinput6 diff --git a/testdata/testoutput14 b/testdata/testoutput14 index b57b24b..cdfd6f7 100644 --- a/testdata/testoutput14 +++ b/testdata/testoutput14 @@ -114,11 +114,11 @@ Subject length lower bound = 3 aaaaaaaaaaaaaz No match aaaaaaaaaaaaaz\=match_limit=3000 -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded /(a+)*zz/ aaaaaaaaaaaaaz\=recursion_limit=10 -Failed: error -52: recursion limit exceeded +Failed: error -53: recursion limit exceeded /(*LIMIT_MATCH=3000)(a+)*zz/I Capturing subpattern count = 1 @@ -127,9 +127,9 @@ Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 aaaaaaaaaaaaaz -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded aaaaaaaaaaaaaz\=match_limit=60000 -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I Capturing subpattern count = 1 @@ -138,7 +138,7 @@ Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 aaaaaaaaaaaaaz -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded /(*LIMIT_MATCH=60000)(a+)*zz/I Capturing subpattern count = 1 @@ -149,7 +149,7 @@ Subject length lower bound = 2 aaaaaaaaaaaaaz No match aaaaaaaaaaaaaz\=match_limit=3000 -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded /(*LIMIT_RECURSION=10)(a+)*zz/I Capturing subpattern count = 1 @@ -158,9 +158,9 @@ Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 aaaaaaaaaaaaaz -Failed: error -52: recursion limit exceeded +Failed: error -53: recursion limit exceeded aaaaaaaaaaaaaz\=recursion_limit=1000 -Failed: error -52: recursion limit exceeded +Failed: error -53: recursion limit exceeded /(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I Capturing subpattern count = 1 @@ -180,21 +180,21 @@ Subject length lower bound = 2 aaaaaaaaaaaaaz No match aaaaaaaaaaaaaz\=recursion_limit=10 -Failed: error -52: recursion limit exceeded +Failed: error -53: recursion limit exceeded # These three have infinitely nested recursions. /((?2))((?1))/ abc -Failed: error -51: nested recursion at the same subject position +Failed: error -52: nested recursion at the same subject position /((?(R2)a+|(?1)b))/ aaaabcde -Failed: error -51: nested recursion at the same subject position +Failed: error -52: nested recursion at the same subject position /(?(R)a*(?1)|((?R))b)/ aaaabcde -Failed: error -51: nested recursion at the same subject position +Failed: error -52: nested recursion at the same subject position # The allusedtext modifier does not work with JIT, which does not maintain # the leftchar/rightchar data. diff --git a/testdata/testoutput16 b/testdata/testoutput16 index 2456815..fe1ca82 100644 --- a/testdata/testoutput16 +++ b/testdata/testoutput16 @@ -15,7 +15,7 @@ JIT compilation was not successful /(?(R)a*(?1)|((?R))b)/ aaaabcde -Failed: error -45: JIT stack limit reached +Failed: error -46: JIT stack limit reached /abcd/I Capturing subpattern count = 0 @@ -64,13 +64,13 @@ No match abcd 0: abcd (JIT) ab\=ps -Failed: error -44: bad JIT option +Failed: error -45: bad JIT option ab\=ph -Failed: error -44: bad JIT option +Failed: error -45: bad JIT option xyz No match (JIT) xyz\=ps -Failed: error -44: bad JIT option +Failed: error -45: bad JIT option /abcd/jit=2 abcd @@ -84,13 +84,13 @@ No match /abcd/jit=2,jitfast abcd -Failed: error -44: bad JIT option +Failed: error -45: bad JIT option ab\=ps Partial match: ab (JIT) ab\=ph -Failed: error -44: bad JIT option +Failed: error -45: bad JIT option xyz -Failed: error -44: bad JIT option +Failed: error -45: bad JIT option /abcd/jit=3 abcd @@ -256,7 +256,7 @@ Minimum match limit = 6 aaaaaaaaaaaaaz No match (JIT) aaaaaaaaaaaaaz\=match_limit=3000 -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded /(*LIMIT_MATCH=3000)(a+)*zz/I Capturing subpattern count = 1 @@ -266,9 +266,9 @@ Last code unit = 'z' Subject length lower bound = 2 JIT compilation was successful aaaaaaaaaaaaaz -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded aaaaaaaaaaaaaz\=match_limit=60000 -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I Capturing subpattern count = 1 @@ -278,7 +278,7 @@ Last code unit = 'z' Subject length lower bound = 2 JIT compilation was successful aaaaaaaaaaaaaz -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded /(*LIMIT_MATCH=60000)(a+)*zz/I Capturing subpattern count = 1 @@ -290,21 +290,21 @@ JIT compilation was successful aaaaaaaaaaaaaz No match (JIT) aaaaaaaaaaaaaz\=match_limit=3000 -Failed: error -46: match limit exceeded +Failed: error -47: match limit exceeded # These three have infinitely nested recursions. /((?2))((?1))/ abc -Failed: error -45: JIT stack limit reached +Failed: error -46: JIT stack limit reached /((?(R2)a+|(?1)b))/ aaaabcde -Failed: error -45: JIT stack limit reached +Failed: error -46: JIT stack limit reached /(?(R)a*(?1)|((?R))b)/ aaaabcde -Failed: error -45: JIT stack limit reached +Failed: error -46: JIT stack limit reached # Invalid options disable JIT when called via pcre2_match(), causing the # match to happen via the interpreter, but for fast JIT invalid options are diff --git a/testdata/testoutput2 b/testdata/testoutput2 index a0a9a4d..fd9944d 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -993,7 +993,7 @@ Subject length lower bound = 4 0: abcd 1: a 2: d -Copy substring 5 failed (-48): unknown substring +Copy substring 5 failed (-49): unknown substring /(.{20})/I Capturing subpattern count = 1 @@ -1047,9 +1047,9 @@ Subject length lower bound = 4 2: 3: f 1G a (1) -Get substring 2 failed (-54): requested value is not set +Get substring 2 failed (-55): requested value is not set 3G f (1) -Get substring 4 failed (-48): unknown substring +Get substring 4 failed (-49): unknown substring 0L adef 1L a 2L @@ -1062,7 +1062,7 @@ Get substring 4 failed (-48): unknown substring 1G bc (2) 2G bc (2) 3G f (1) -Get substring 4 failed (-48): unknown substring +Get substring 4 failed (-49): unknown substring 0L bcdef 1L bc 2L bc @@ -4363,7 +4363,7 @@ Subject length lower bound = 8 1: cd 2: gh Number not found for group 'three' -Copy substring 'three' failed (-48): unknown substring +Copy substring 'three' failed (-49): unknown substring /(?P)(?P)/IB ------------------------------------------------------------------ @@ -5731,7 +5731,7 @@ No match 1: a1 2: a1 Number not found for group 'Z' -Copy substring 'Z' failed (-48): unknown substring +Copy substring 'Z' failed (-49): unknown substring C a1 (2) A (non-unique) /(?|(?)(?)(?)|(?)(?)(?))/I,dupnames @@ -5772,7 +5772,7 @@ Subject length lower bound = 2 C a (1) A (non-unique) cd\=copy=A 0: cd -Copy substring 'A' failed (-54): requested value is not set +Copy substring 'A' failed (-55): requested value is not set /^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/I,dupnames Capturing subpattern count = 4 @@ -5817,7 +5817,7 @@ No match 1: a1 2: a1 Number not found for group 'Z' -Get substring 'Z' failed (-48): unknown substring +Get substring 'Z' failed (-49): unknown substring G a1 (2) A (non-unique) /^(?Pa)(?Pb)/I,dupnames @@ -5848,7 +5848,7 @@ Subject length lower bound = 2 G a (1) A (non-unique) cd\=get=A 0: cd -Get substring 'A' failed (-54): requested value is not set +Get substring 'A' failed (-55): requested value is not set /^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/I,dupnames Capturing subpattern count = 4 @@ -13659,11 +13659,11 @@ Failed: error -35: invalid replacement string /abc/replace=a$bad 123abc -Failed: error -48: unknown substring +Failed: error -49: unknown substring /abc/replace=a${A234567890123456789_123456789012}z 123abc -Failed: error -48: unknown substring +Failed: error -49: unknown substring /abc/replace=a${A23456789012345678901234567890123}z 123abc @@ -13683,7 +13683,7 @@ Failed: error -35: invalid replacement string /abc/replace=[9]XYZ 123abc123 -Failed: error -47: no more memory +Failed: error -48: no more memory /abc/replace=xyz 1abc2\=partial_hard @@ -13720,10 +13720,10 @@ No match Matched, but too many substrings 0: c 1: -Get substring 1 failed (-54): requested value is not set -Get substring 2 failed (-53): requested value is not available -Get substring 3 failed (-53): requested value is not available -Get substring 4 failed (-48): unknown substring +Get substring 1 failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available +Get substring 3 failed (-54): requested value is not available +Get substring 4 failed (-49): unknown substring 0L c 1L @@ -13736,5 +13736,30 @@ Start of matched string is beyond its end - displaying from end to start. Start of matched string is beyond its end - displaying from end to start. 0: ab 0C (0) + xab\=getall +Start of matched string is beyond its end - displaying from end to start. + 0: ab + 0L + +/(?a)|(?b)/dupnames + a\=ovector=1,copy=A,get=A,get=2 +Matched, but too many substrings + 0: a +Copy substring 'A' failed (-54): requested value is not available +Get substring 2 failed (-54): requested value is not available +Get substring 'A' failed (-54): requested value is not available + a\=ovector=2,copy=A,get=A,get=2 + 0: a + 1: a + C a (1) A (non-unique) +Get substring 2 failed (-54): requested value is not available + G a (1) A (non-unique) + b\=ovector=2,copy=A,get=A,get=2 +Matched, but too many substrings + 0: b + 1: +Copy substring 'A' failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available +Get substring 'A' failed (-55): requested value is not set # End of testinput2 diff --git a/testdata/testoutput6 b/testdata/testoutput6 index 4603ccc..18594e9 100644 --- a/testdata/testoutput6 +++ b/testdata/testoutput6 @@ -6133,7 +6133,7 @@ No match /^(?(2)a|(1)(2))+$/ 123a -Failed: error -40: backreference condition or recursion test not supported for DFA matching +Failed: error -40: backreference condition or recursion test is not supported for DFA matching /(?<=a|bbbb)c/ ac @@ -7087,7 +7087,7 @@ Partial match: dogs /abc\K123/ xyzabc123pqr -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching /(?<=abc)123/ xyzabc123pqr @@ -7205,29 +7205,29 @@ No match /^(?!a(*SKIP)b)/ ac -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?=a(*SKIP)b|ac)/ ** Failers No match ac -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?=a(*THEN)b|ac)/ ac -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?=a(*PRUNE)b)/ ab -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching ** Failers No match ac -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?(?!a(*SKIP)b))/ ac -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching /(?<=abc)def/ abc\=ph @@ -7424,7 +7424,7 @@ No match /((?2))((?1))/ abc -Failed: error -51: nested recursion at the same subject position +Failed: error -52: nested recursion at the same subject position /(?(R)a+|(?R)b)/ aaaabcde @@ -7440,11 +7440,11 @@ Failed: error -51: nested recursion at the same subject position /((?(R2)a+|(?1)b))/ aaaabcde -Failed: error -40: backreference condition or recursion test not supported for DFA matching +Failed: error -40: backreference condition or recursion test is not supported for DFA matching /(?(R)a*(?1)|((?R))b)/ aaaabcde -Failed: error -51: nested recursion at the same subject position +Failed: error -52: nested recursion at the same subject position /(a+)/no_auto_possess aaaa\=ovector=3 @@ -7734,4 +7734,36 @@ Failed: error -38: invalid data in workspace for DFA restart 0: 0+ dab +/(a)(b)|(c)/ + XcX\=ovector=2,get=1,get=2,get=3,get=4,getall + 0: c +Get substring 1 failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available +Get substring 3 failed (-54): requested value is not available +Get substring 4 failed (-54): requested value is not available + 0L c + +/(?aa)/ + aa\=get=A + 0: aa +Get substring 'A' failed (-41): function is not supported for DFA matching + aa\=copy=A + 0: aa +Copy substring 'A' failed (-41): function is not supported for DFA matching + +/a+/no_auto_possess + a\=ovector=2,get=1,get=2,getall + 0: a +Get substring 1 failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available + 0L a + aaa\=ovector=2,get=1,get=2,getall +Matched, but offsets vector is too small to show all matches + 0: aaa + 1: aa + 1G aa (2) +Get substring 2 failed (-54): requested value is not available + 0L aaa + 1L aa + # End of testinput6 diff --git a/testdata/testoutput7 b/testdata/testoutput7 index 7760c93..a7f6a62 100644 --- a/testdata/testoutput7 +++ b/testdata/testoutput7 @@ -1218,7 +1218,7 @@ Partial match: the cat /ab\Cde/utf abXde -Failed: error -41: item unsupported for DFA matching +Failed: error -42: pattern contains an item that is not supported for DFA matching /(?<=ab\Cde)X/utf Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion