Update and improve substring handling and its documentation.

2014-12-14 17:17:06 +00:00 · 2014-12-14 17:17:06 +00:00 · cb8865d247
commit cb8865d247
parent a85d15cbd1
17 changed files with 337 additions and 164 deletions
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "13 December 2014" "PCRE2 10.00"
+.TH PCRE2API 3 "14 December 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -921,6 +921,16 @@ PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
 contains the compiled pattern and related data. The caller must free the memory
 by calling \fBpcre2_code_free()\fP when it is no longer needed.
 .P
+NOTE: When one of the matching functions is called, pointers to the compiled
+pattern and the subject string are set in the match data block so that they can
+be referenced by the extraction functions. After running a match, you must not
+free a compiled pattern (or a subject string) until after all operations on the
+.\" HTML <a href="#matchdatablock">
+.\" </a>
+match data block 
+.\"
+have taken place.
+.P
 If the compile context argument \fIccontext\fP is NULL, memory for the compiled
 pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
 the same memory function that was used for the compile context.
@ -1683,7 +1693,7 @@ pattern with the JIT compiler does not alter the value returned by this option.
 .B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP);
 .fi
 .P
-Information about successful and unsuccessful matches is placed in a match
+Information about a successful or unsuccessful match is placed in a match
 data block, which is an opaque structure that is accessed by function calls. In
 particular, the match data block contains a vector of offsets into the subject
 string that define the matched part of the subject and any substrings that were
@ -1713,10 +1723,8 @@ memory is obtained using the same allocator that was used for the compiled
 pattern (custom or default).
 .P
 A match data block can be used many times, with the same or different compiled
-patterns. When it is no longer needed, it should be freed by calling
-\fBpcre2_match_data_free()\fP. You can extract information from a match data
-block after a match operation has finished, using functions that are described
-in the sections on
+patterns. You can extract information from a match data block after a match
+operation has finished, using functions that are described in the sections on
 .\" HTML <a href="#matchedstrings">
 .\" </a>
 matched strings
@ -1727,6 +1735,15 @@ and
 other match data
 .\"
 below.
+.P
+When one of the matching functions is called, pointers to the compiled pattern
+and the subject string are set in the match data block so that they can be
+referenced by the extraction functions. After running a match, you must not
+free a compiled pattern or a subject string until after all operations on the
+match data block (for that match) have taken place.
+.P
+When a match data block itself is no longer needed, it should be freed by
+calling \fBpcre2_match_data_free()\fP.
 .
 .
 .SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION"
@ -2053,8 +2070,13 @@ returned value is 3. If there are no capturing subpatterns, the return value
 from a successful match is 1, indicating that just the first pair of offsets
 has been set.
 .P
-If a capturing subpattern is matched repeatedly within a single match
-operation, it is the last portion of the string that it matched that is
+If a pattern uses the \eK escape sequence within a positive assertion, the 
+reported start of the match can be greater than the end of the match. For 
+example, if the pattern (?=ab\eK) is matched against "ab", the start and end 
+offset values for the match are 2 and 0.
+.P
+If a capturing subpattern group is matched repeatedly within a single match
+operation, it is the last portion of the subject that it matched that is
 returned.
 .P
 If the ovector is too small to hold all the captured substring offsets, as much
@ -2268,23 +2290,31 @@ above.
 .\"
 For convenience, auxiliary functions are provided for extracting captured
 substrings as new, separate, zero-terminated strings. The functions in this
-section identify substrings by number. The next section describes similar
-functions for extracting substrings by name. A substring that contains a binary
-zero is correctly extracted and has a further zero added on the end, but the
-result is not, of course, a C string.
+section identify substrings by number. The number zero refers to the entire
+matched substring, with higher numbers referring to substrings captured by
+parenthesized groups. The next section describes similar functions for
+extracting captured substrings by name. A substring that contains a binary zero
+is correctly extracted and has a further zero added on the end, but the result
+is not, of course, a C string.
+.P
+If a pattern uses the \eK escape sequence within a positive assertion, the 
+reported start of the match can be greater than the end of the match. For 
+example, if the pattern (?=ab\eK) is matched against "ab", the start and end 
+offset values for the match are 2 and 0. In this situation, calling these 
+functions with a zero substring number extracts a zero-length empty string.
 .P
 You can find the length in code units of a captured substring without
 extracting it by calling \fBpcre2_substring_length_bynumber()\fP. The first
 argument is a pointer to the match data block, the second is the group number,
-and the third is a pointer to a variable into which the length is placed.
+and the third is a pointer to a variable into which the length is placed. If 
+you just want to know whether or not the substring has been captured, you can 
+pass the third argument as NULL.
 .P
-The \fBpcre2_substring_copy_bynumber()\fP function copies one string into a
-supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it into
-new memory, obtained using the same memory allocation function that was used
-for the match data block. The first two arguments of these functions are a
-pointer to the match data block and a capturing group number. A group number of
-zero extracts the substring that matched the entire pattern, and higher values
-extract the captured substrings.
+The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring
+into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it
+into new memory, obtained using the same memory allocation function that was
+used for the match data block. The first two arguments of these functions are a
+pointer to the match data block and a capturing group number.
 .P
 The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
 the buffer and a pointer to a variable that contains its length in code units.
@ -2297,8 +2327,9 @@ of code units that comprise the substring, again excluding the terminating
 zero. When the substring is no longer needed, the memory should be freed by
 calling \fBpcre2_substring_free()\fP.
 .P
-The return value from these functions is zero for success, or one of these
-error codes:
+The return value from all these functions is zero for success, or a negative
+error code. If the pattern match failed, the match failure code is returned.
+Other possible error codes are:
 .sp
  PCRE2_ERROR_NOMEMORY
 .sp
@ -2319,7 +2350,8 @@ could not be captured.
  PCRE2_ERROR_UNSET
 .sp
 The substring did not participate in the match. For example, if the pattern is
-(abc)|(def) and the subject is "def", substring number 1 is unset.  
+(abc)|(def) and the subject is "def", and the ovector contains at least two
+capturing slots, substring number 1 is unset.
 .
 .
 .SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS"
@ -2388,15 +2420,20 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
 compiled pattern, and the second is the name. The yield of the function is the
 subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
 name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
-that name.
+that name. Given the number, you can extract the substring directly, or use one
+of the functions described above.
 .P
-Given the number, you can extract the substring directly, or use one of the
-functions described above. For convenience, there are also "byname" functions
-that correspond to the "bynumber" functions, the only difference being that the
-second argument is a name instead of a number. If PCRE2_DUPNAMES is
-set and there are duplicate names, these functions return the first named 
-string that is set. PCRE2_ERROR_UNSET is returned only if all groups of the 
-same name are unset.
+For convenience, there are also "byname" functions that correspond to the
+"bynumber" functions, the only difference being that the second argument is a
+name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
+names, these functions scan all the groups with the given name, and return the
+first named string that is set.
+.P
+If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is 
+returned. If all groups with the name have numbers that are greater than the 
+number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there 
+is at least one group with a slot in the ovector, but no group is found to be 
+set, PCRE2_ERROR_UNSET is returned.
 .P
 \fBWarning:\fP If the pattern uses the (?| feature to set up multiple
 subpatterns with the same number, as described in the
@ -2660,17 +2697,36 @@ is matched against the string
 .sp
 the three matched strings are
 .sp
-  <something>
-  <something> <something else>
  <something> <something else> <something further>
+  <something> <something else>
+  <something>
 .sp
 On success, the yield of the function is a number greater than zero, which is
 the number of matched substrings. The offsets of the substrings are returned in
-the ovector, and can be extracted in the same way as for \fBpcre2_match()\fP.
-They are returned in reverse order of length; that is, the longest
-matching string is given first. If there were too many matches to fit into
-the ovector, the yield of the function is zero, and the vector is filled with
-the longest matches.
+the ovector, and can be extracted by number in the same way as for
+\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups
+that may exist in the pattern, because DFA matching does not support group
+capture. 
+.P
+Calls to the convenience functions that extract substrings by name
+return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
+DFA match. The convenience functions that extract substrings by number never
+return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
+slightly different:
+.sp
+  PCRE2_ERROR_UNAVAILABLE
+.sp
+The ovector is not big enough to include a slot for the given substring number.
+.sp
+  PCRE2_ERROR_UNSET
+.sp
+There is a slot in the ovector for this substring, but there were insufficient 
+matches to fill it.
+.P
+The matched strings are stored in the ovector in reverse order of length; that
+is, the longest matching string is first. If there were too many matches to fit
+into the ovector, the yield of the function is zero, and the vector is filled
+with the longest matches.
 .P
 NOTE: PCRE2's "auto-possessification" optimization usually applies to character
 repeats at the end of a pattern (as well as internally). For example, the
@ -2746,6 +2802,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 13 December 2014
+Last updated: 14 December 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/src/pcre2.h.in
+++ b/src/pcre2.h.in
@ -212,20 +212,21 @@ context functions. */
 #define PCRE2_ERROR_DFA_BADRESTART    (-38)
 #define PCRE2_ERROR_DFA_RECURSE       (-39)
 #define PCRE2_ERROR_DFA_UCOND         (-40)
-#define PCRE2_ERROR_DFA_UITEM         (-41)
-#define PCRE2_ERROR_DFA_WSSIZE        (-42)
-#define PCRE2_ERROR_INTERNAL          (-43)
-#define PCRE2_ERROR_JIT_BADOPTION     (-44)
-#define PCRE2_ERROR_JIT_STACKLIMIT    (-45)
-#define PCRE2_ERROR_MATCHLIMIT        (-46)
-#define PCRE2_ERROR_NOMEMORY          (-47)
-#define PCRE2_ERROR_NOSUBSTRING       (-48)
-#define PCRE2_ERROR_NOUNIQUESUBSTRING (-49)
-#define PCRE2_ERROR_NULL              (-50)
-#define PCRE2_ERROR_RECURSELOOP       (-51)
-#define PCRE2_ERROR_RECURSIONLIMIT    (-52)
-#define PCRE2_ERROR_UNAVAILABLE       (-53)
-#define PCRE2_ERROR_UNSET             (-54)
+#define PCRE2_ERROR_DFA_UFUNC         (-41)
+#define PCRE2_ERROR_DFA_UITEM         (-42)
+#define PCRE2_ERROR_DFA_WSSIZE        (-43)
+#define PCRE2_ERROR_INTERNAL          (-44)
+#define PCRE2_ERROR_JIT_BADOPTION     (-45)
+#define PCRE2_ERROR_JIT_STACKLIMIT    (-46)
+#define PCRE2_ERROR_MATCHLIMIT        (-47)
+#define PCRE2_ERROR_NOMEMORY          (-48)
+#define PCRE2_ERROR_NOSUBSTRING       (-49)
+#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50)
+#define PCRE2_ERROR_NULL              (-51)
+#define PCRE2_ERROR_RECURSELOOP       (-52)
+#define PCRE2_ERROR_RECURSIONLIMIT    (-53)
+#define PCRE2_ERROR_UNAVAILABLE       (-54)
+#define PCRE2_ERROR_UNSET             (-55)

 /* Request types for pcre2_pattern_info() */

--- a/src/pcre2_dfa_match.c
+++ b/src/pcre2_dfa_match.c
@ -3275,6 +3275,12 @@ if ((re->flags & PCRE2_LASTSET) != 0)
    }
  }

+/* Fill in fields that are always returned in the match data. */
+
+match_data->code = re;
+match_data->subject = subject;
+match_data->mark = NULL;
+match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;

 /* Call the main matching function, looping for a non-anchored regex after a
 failed match. If not restarting, perform certain optimizations at the start of
--- a/src/pcre2_error.c
+++ b/src/pcre2_error.c
@ -212,18 +212,19 @@ static const char match_error_texts[] =
  "invalid data in workspace for DFA restart\0"
  "too much recursion for DFA matching\0"
  /* 40 */
-  "backreference condition or recursion test not supported for DFA matching\0"
-  "item unsupported for DFA matching\0"
+  "backreference condition or recursion test is not supported for DFA matching\0"
+  "function is not supported for DFA matching\0"
+  "pattern contains an item that is not supported for DFA matching\0"
  "workspace size exceeded in DFA matching\0"
  "internal error - pattern overwritten?\0"
-  "bad JIT option\0"
  /* 45 */
+  "bad JIT option\0"
  "JIT stack limit reached\0"
  "match limit exceeded\0"
  "no more memory\0"
  "unknown substring\0"
-  "non-unique substring name\0"
  /* 50 */
+  "non-unique substring name\0"
  "NULL argument passed\0"
  "nested recursion at the same subject position\0"
  "recursion limit exceeded\0"
--- a/src/pcre2_internal.h
+++ b/src/pcre2_internal.h
@ -526,15 +526,16 @@ bytes in a code unit in that mode. */

 #define PCRE2_MODE_MASK     (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)

+/* Values for the matchedby field in a match data block. */
+
+enum { PCRE2_MATCHEDBY_INTERPRETER,     /* pcre2_match() */
+       PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
+       PCRE2_MATCHEDBY_JIT };           /* pcre2_jit_match() */ 
+
 /* Magic number to provide a small check against being handed junk. */

 #define MAGIC_NUMBER  0x50435245UL   /* 'PCRE' */

-/* This value is used to detect a loaded regular expression in different
-endianness. */
-
-#define REVERSED_MAGIC_NUMBER  0x45524350UL   /* 'ERCP' */
-
 /* The maximum remaining length of subject we are prepared to search for a
 req_unit match. */

--- a/src/pcre2_intmodedep.h
+++ b/src/pcre2_intmodedep.h
@ -616,12 +616,13 @@ typedef struct pcre2_real_match_data {
  pcre2_memctl     memctl;
  const pcre2_real_code *code;    /* The pattern used for the match */
  PCRE2_SPTR       subject;       /* The subject that was matched */
-  int              rc;            /* The return code from the match */
+  PCRE2_SPTR       mark;          /* Pointer to last mark */
  PCRE2_SIZE       leftchar;      /* Offset to leftmost code unit */
  PCRE2_SIZE       rightchar;     /* Offset to rightmost code unit */
  PCRE2_SIZE       startchar;     /* Offset to starting code unit */
-  PCRE2_SPTR       mark;          /* Pointer to last mark */
+  uint16_t         matchedby;     /* Type of match (normal, JIT, DFA) */ 
  uint16_t         oveccount;     /* Number of pairs */
+  int              rc;            /* The return code from the match */
  PCRE2_SIZE       ovector[1];    /* The first field */
 } pcre2_real_match_data;

--- a/src/pcre2_jit_match.c
+++ b/src/pcre2_jit_match.c
@ -180,6 +180,7 @@ match_data->startchar = arguments.startchar_ptr - subject;
 match_data->leftchar = 0;
 match_data->rightchar = 0;
 match_data->mark = arguments.mark_ptr;
+match_data->matchedby = PCRE2_MATCHEDBY_JIT;

 return match_data->rc;

--- a/src/pcre2_match.c
+++ b/src/pcre2_match.c
@ -6995,6 +6995,7 @@ while (mb->ovecsave_chain != NULL)
 match_data->code = re;
 match_data->subject = subject;
 match_data->mark = mb->mark;
+match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;

 /* Handle a fully successful match. */

@ -7026,14 +7027,15 @@ if (rc == MATCH_MATCH || rc == MATCH_ACCEPT)
  match_data->rc = ((mb->capture_last & OVFLBIT) != 0)?
    0 : mb->end_offset_top/2;

-  /* If there is space in the offset vector, set any unused pairs at the end to
-  PCRE2_UNSET for backwards compatibility. It is documented that this happens.
-  In earlier versions, the whole set of potential capturing offsets was
-  initialized each time round the loop, but this is handled differently now.
-  "Gaps" are set to PCRE2_UNSET dynamically instead (this fixes a bug). Thus,
-  it is only those at the end that need setting here. We can't just set them
-  all at the start of the whole thing because they may get set in one branch
-  that is not the final matching branch. */
+  /* If there is space in the offset vector, set any pairs that follow the
+  highest-numbered captured string but are less than the number of capturing
+  groups in the pattern (and are within the ovector) to PCRE2_UNSET. It is
+  documented that this happens. In earlier versions, the whole set of potential
+  capturing offsets was initialized each time round the loop, but this is
+  handled differently now. "Gaps" are set to PCRE2_UNSET dynamically instead
+  (this fixed a bug). Thus, it is only those at the end that need setting here.
+  We can't just mark them all unset at the start of the whole thing because
+  they may get set in one branch that is not the final matching branch. */

  if (mb->end_offset_top/2 <= re->top_bracket)
    {
--- a/src/pcre2_substring.c
+++ b/src/pcre2_substring.c
@ -64,27 +64,34 @@ Arguments:
 Returns:         if successful: zero
                 if not successful, a negative error code:
                   (1) an error from nametable_scan()
-                   (2) an error from copy_bynumber()  
-                   (3) PCRE2_ERROR_UNSET: all named groups are unset
+                   (2) an error from copy_bynumber()
+                   (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector 
+                   (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
 */

 PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
 pcre2_substring_copy_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname,
  PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr)
 {
-PCRE2_SPTR first;
-PCRE2_SPTR last;
-PCRE2_SPTR entry;
-int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
+PCRE2_SPTR first, last, entry;
+int failrc, entrysize;
+if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  return PCRE2_ERROR_DFA_UFUNC;
+entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
  &first, &last);
 if (entrysize < 0) return entrysize;
+failrc = PCRE2_ERROR_UNAVAILABLE;
 for (entry = first; entry <= last; entry += entrysize)
  {
  uint32_t n = GET2(entry, 0);
-  if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
-    return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
+  if (n < match_data->oveccount)
+    {
+    if (match_data->ovector[n*2] != PCRE2_UNSET)
+      return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
+    failrc = PCRE2_ERROR_UNSET;   
+    }   
  }
-return PCRE2_ERROR_UNSET;
+return failrc;
 }


@ -146,26 +153,33 @@ Returns:         if successful: zero
                 if not successful, a negative value:
                   (1) an error from nametable_scan()
                   (2) an error from get_bynumber()  
-                   (3) PCRE2_ERROR_UNSET: all named groups are unset
+                   (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector 
+                   (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
 */

 PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
 pcre2_substring_get_byname(pcre2_match_data *match_data,
  PCRE2_SPTR stringname, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr)
 {
-PCRE2_SPTR first;
-PCRE2_SPTR last;
-PCRE2_SPTR entry;
-int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
+PCRE2_SPTR first, last, entry;
+int failrc, entrysize;
+if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  return PCRE2_ERROR_DFA_UFUNC;
+entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
  &first, &last);
 if (entrysize < 0) return entrysize;
+failrc = PCRE2_ERROR_UNAVAILABLE;
 for (entry = first; entry <= last; entry += entrysize)
  {
  uint32_t n = GET2(entry, 0);
-  if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
-    return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
+  if (n < match_data->oveccount)
+    {
+    if (match_data->ovector[n*2] != PCRE2_UNSET)
+      return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
+    failrc = PCRE2_ERROR_UNSET;
+    }    
  }
-return PCRE2_ERROR_UNSET;
+return failrc;
 }


@ -251,19 +265,25 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
 pcre2_substring_length_byname(pcre2_match_data *match_data,
  PCRE2_SPTR stringname, PCRE2_SIZE *sizeptr)
 {
-PCRE2_SPTR first;
-PCRE2_SPTR last;
-PCRE2_SPTR entry;
-int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
+PCRE2_SPTR first, last, entry;
+int failrc, entrysize;
+if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  return PCRE2_ERROR_DFA_UFUNC;
+entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
  &first, &last);
 if (entrysize < 0) return entrysize;
+failrc = PCRE2_ERROR_UNAVAILABLE;
 for (entry = first; entry <= last; entry += entrysize)
  {
  uint32_t n = GET2(entry, 0);
-  if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
-    return pcre2_substring_length_bynumber(match_data, n, sizeptr);
+  if (n < match_data->oveccount)
+    {
+    if (match_data->ovector[n*2] != PCRE2_UNSET)
+      return pcre2_substring_length_bynumber(match_data, n, sizeptr);
+    failrc = PCRE2_ERROR_UNSET;
+    }    
  }
-return PCRE2_ERROR_UNSET;
+return failrc;
 }


@ -292,13 +312,23 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
 pcre2_substring_length_bynumber(pcre2_match_data *match_data,
  uint32_t stringnumber, PCRE2_SIZE *sizeptr)
 {
+int count;
 PCRE2_SIZE left, right;
-if (stringnumber > match_data->code->top_bracket) 
-  return PCRE2_ERROR_NOSUBSTRING;
-if (stringnumber >= match_data->oveccount) 
-  return PCRE2_ERROR_UNAVAILABLE;
-if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
-  return PCRE2_ERROR_UNSET;
+if ((count = match_data->rc) < 0) return count;   /* Match failed */
+if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  {
+  if (stringnumber > match_data->code->top_bracket) 
+    return PCRE2_ERROR_NOSUBSTRING;
+  if (stringnumber >= match_data->oveccount) 
+    return PCRE2_ERROR_UNAVAILABLE;
+  if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
+    return PCRE2_ERROR_UNSET;
+  }
+else  /* Matched using pcre2_dfa_match() */
+  {
+  if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE;
+  if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET;
+  } 
 left = match_data->ovector[stringnumber*2];
 right = match_data->ovector[stringnumber*2+1];
 if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;
--- a/testdata/grepoutput
+++ b/testdata/grepoutput
@ -384,15 +384,15 @@ aaaaa2
 010203040506
 RC=0
 ======== STDERR ========
-pcre2grep: pcre2_match() gave error -46 while matching this text:
+pcre2grep: pcre2_match() gave error -47 while matching this text:

 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

-pcre2grep: pcre2_match() gave error -46 while matching this text:
+pcre2grep: pcre2_match() gave error -47 while matching this text:

 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

-pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded.
+pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded.
 pcre2grep: Check your regex for nested unlimited loops.
 ---------------------------- Test 38 ------------------------------
 This line contains a binary zero here >< for testing.
@ -510,23 +510,23 @@ In the middle of a line, PATTERN appears.
 Check up on PATTERN near the end.
 RC=0
 ---------------------------- Test 62 -----------------------------
-pcre2grep: pcre2_match() gave error -46 while matching text that starts:
+pcre2grep: pcre2_match() gave error -47 while matching text that starts:

 This is a file of miscellaneous text that is used as test data for checking
 that the pcregrep command is working correctly. The file must be more than 24K
 long so that it needs more than a single read

-pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded.
+pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded.
 pcre2grep: Check your regex for nested unlimited loops.
 RC=1
 ---------------------------- Test 63 -----------------------------
-pcre2grep: pcre2_match() gave error -52 while matching text that starts:
+pcre2grep: pcre2_match() gave error -53 while matching text that starts:

 This is a file of miscellaneous text that is used as test data for checking
 that the pcregrep command is working correctly. The file must be more than 24K
 long so that it needs more than a single read

-pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded.
+pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded.
 pcre2grep: Check your regex for nested unlimited loops.
 RC=1
 ---------------------------- Test 64 ------------------------------
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -4090,5 +4090,11 @@ a random value. /Ix
 /x(?=ab\K)/
    xab\=get=0 
    xab\=copy=0 
+    xab\=getall
+
+/(?<A>a)|(?<A>b)/dupnames
+    a\=ovector=1,copy=A,get=A,get=2
+    a\=ovector=2,copy=A,get=A,get=2
+    b\=ovector=2,copy=A,get=A,get=2

 # End of testinput2 
--- a/testdata/testinput6
+++ b/testdata/testinput6
@ -4797,4 +4797,15 @@
    ab
    cdab 

+/(a)(b)|(c)/
+    XcX\=ovector=2,get=1,get=2,get=3,get=4,getall
+
+/(?<A>aa)/
+    aa\=get=A
+    aa\=copy=A 
+
+/a+/no_auto_possess
+    a\=ovector=2,get=1,get=2,getall
+    aaa\=ovector=2,get=1,get=2,getall
+
 # End of testinput6
--- a/testdata/testoutput14
+++ b/testdata/testoutput14
@ -114,11 +114,11 @@ Subject length lower bound = 3
    aaaaaaaaaaaaaz
 No match
    aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(a+)*zz/
    aaaaaaaaaaaaaz\=recursion_limit=10
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded

 /(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@ -127,9 +127,9 @@ Starting code units: a z
 Last code unit = 'z'
 Subject length lower bound = 2
    aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded
    aaaaaaaaaaaaaz\=match_limit=60000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@ -138,7 +138,7 @@ Starting code units: a z
 Last code unit = 'z'
 Subject length lower bound = 2
    aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(a+)*zz/I
 Capturing subpattern count = 1
@ -149,7 +149,7 @@ Subject length lower bound = 2
    aaaaaaaaaaaaaz
 No match
    aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_RECURSION=10)(a+)*zz/I
 Capturing subpattern count = 1
@ -158,9 +158,9 @@ Starting code units: a z
 Last code unit = 'z'
 Subject length lower bound = 2
    aaaaaaaaaaaaaz
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded
    aaaaaaaaaaaaaz\=recursion_limit=1000
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded

 /(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I
 Capturing subpattern count = 1
@ -180,21 +180,21 @@ Subject length lower bound = 2
    aaaaaaaaaaaaaz
 No match
    aaaaaaaaaaaaaz\=recursion_limit=10
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded
    
 # These three have infinitely nested recursions. 
    
 /((?2))((?1))/
    abc
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position

 /((?(R2)a+|(?1)b))/
    aaaabcde
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position

 /(?(R)a*(?1)|((?R))b)/
    aaaabcde
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position
    
 # The allusedtext modifier does not work with JIT, which does not maintain
 # the leftchar/rightchar data.
--- a/testdata/testoutput16
+++ b/testdata/testoutput16
@ -15,7 +15,7 @@ JIT compilation was not successful

 /(?(R)a*(?1)|((?R))b)/
    aaaabcde
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached

 /abcd/I
 Capturing subpattern count = 0
@ -64,13 +64,13 @@ No match
    abcd
 0: abcd (JIT)
    ab\=ps
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
    ab\=ph
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
    xyz
 No match (JIT)
    xyz\=ps
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option

 /abcd/jit=2
    abcd
@ -84,13 +84,13 @@ No match

 /abcd/jit=2,jitfast
    abcd
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
    ab\=ps
 Partial match: ab (JIT)
    ab\=ph
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
    xyz
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option

 /abcd/jit=3
    abcd
@ -256,7 +256,7 @@ Minimum match limit = 6
    aaaaaaaaaaaaaz
 No match (JIT)
    aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@ -266,9 +266,9 @@ Last code unit = 'z'
 Subject length lower bound = 2
 JIT compilation was successful
    aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded
    aaaaaaaaaaaaaz\=match_limit=60000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@ -278,7 +278,7 @@ Last code unit = 'z'
 Subject length lower bound = 2
 JIT compilation was successful
    aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(a+)*zz/I
 Capturing subpattern count = 1
@ -290,21 +290,21 @@ JIT compilation was successful
    aaaaaaaaaaaaaz
 No match (JIT)
    aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 # These three have infinitely nested recursions. 
    
 /((?2))((?1))/
    abc
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached

 /((?(R2)a+|(?1)b))/
    aaaabcde
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached

 /(?(R)a*(?1)|((?R))b)/
    aaaabcde
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached
    
 # Invalid options disable JIT when called via pcre2_match(), causing the
 # match to happen via the interpreter, but for fast JIT invalid options are
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@ -993,7 +993,7 @@ Subject length lower bound = 4
 0: abcd
 1: a
 2: d
-Copy substring 5 failed (-48): unknown substring
+Copy substring 5 failed (-49): unknown substring

 /(.{20})/I
 Capturing subpattern count = 1
@ -1047,9 +1047,9 @@ Subject length lower bound = 4
 2: <unset>
 3: f
 1G a (1)
-Get substring 2 failed (-54): requested value is not set
+Get substring 2 failed (-55): requested value is not set
 3G f (1)
-Get substring 4 failed (-48): unknown substring
+Get substring 4 failed (-49): unknown substring
 0L adef
 1L a
 2L 
@ -1062,7 +1062,7 @@ Get substring 4 failed (-48): unknown substring
 1G bc (2)
 2G bc (2)
 3G f (1)
-Get substring 4 failed (-48): unknown substring
+Get substring 4 failed (-49): unknown substring
 0L bcdef
 1L bc
 2L bc
@ -4363,7 +4363,7 @@ Subject length lower bound = 8
 1: cd
 2: gh
 Number not found for group 'three'
-Copy substring 'three' failed (-48): unknown substring
+Copy substring 'three' failed (-49): unknown substring

 /(?P<Tes>)(?P<Test>)/IB
 ------------------------------------------------------------------
@ -5731,7 +5731,7 @@ No match
 1: a1
 2: a1
 Number not found for group 'Z'
-Copy substring 'Z' failed (-48): unknown substring
+Copy substring 'Z' failed (-49): unknown substring
  C a1 (2) A (non-unique)
    
 /(?|(?<a>)(?<b>)(?<a>)|(?<a>)(?<b>)(?<a>))/I,dupnames
@ -5772,7 +5772,7 @@ Subject length lower bound = 2
  C a (1) A (non-unique)
    cd\=copy=A
 0: cd
-Copy substring 'A' failed (-54): requested value is not set
+Copy substring 'A' failed (-55): requested value is not set

 /^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
 Capturing subpattern count = 4
@ -5817,7 +5817,7 @@ No match
 1: a1
 2: a1
 Number not found for group 'Z'
-Get substring 'Z' failed (-48): unknown substring
+Get substring 'Z' failed (-49): unknown substring
  G a1 (2) A (non-unique)

 /^(?P<A>a)(?P<A>b)/I,dupnames
@ -5848,7 +5848,7 @@ Subject length lower bound = 2
  G a (1) A (non-unique)
    cd\=get=A
 0: cd
-Get substring 'A' failed (-54): requested value is not set
+Get substring 'A' failed (-55): requested value is not set

 /^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
 Capturing subpattern count = 4
@ -13659,11 +13659,11 @@ Failed: error -35: invalid replacement string

 /abc/replace=a$bad
    123abc
-Failed: error -48: unknown substring
+Failed: error -49: unknown substring

 /abc/replace=a${A234567890123456789_123456789012}z
    123abc
-Failed: error -48: unknown substring
+Failed: error -49: unknown substring

 /abc/replace=a${A23456789012345678901234567890123}z
    123abc
@ -13683,7 +13683,7 @@ Failed: error -35: invalid replacement string

 /abc/replace=[9]XYZ
    123abc123
-Failed: error -47: no more memory
+Failed: error -48: no more memory
    
 /abc/replace=xyz
    1abc2\=partial_hard
@ -13720,10 +13720,10 @@ No match
 Matched, but too many substrings
 0: c
 1: <unset>
-Get substring 1 failed (-54): requested value is not set
-Get substring 2 failed (-53): requested value is not available
-Get substring 3 failed (-53): requested value is not available
-Get substring 4 failed (-48): unknown substring
+Get substring 1 failed (-55): requested value is not set
+Get substring 2 failed (-54): requested value is not available
+Get substring 3 failed (-54): requested value is not available
+Get substring 4 failed (-49): unknown substring
 0L c
 1L 
    
@ -13736,5 +13736,30 @@ Start of matched string is beyond its end - displaying from end to start.
 Start of matched string is beyond its end - displaying from end to start.
 0: ab
 0C  (0)
+    xab\=getall
+Start of matched string is beyond its end - displaying from end to start.
+ 0: ab
+ 0L 
+
+/(?<A>a)|(?<A>b)/dupnames
+    a\=ovector=1,copy=A,get=A,get=2
+Matched, but too many substrings
+ 0: a
+Copy substring 'A' failed (-54): requested value is not available
+Get substring 2 failed (-54): requested value is not available
+Get substring 'A' failed (-54): requested value is not available
+    a\=ovector=2,copy=A,get=A,get=2
+ 0: a
+ 1: a
+  C a (1) A (non-unique)
+Get substring 2 failed (-54): requested value is not available
+  G a (1) A (non-unique)
+    b\=ovector=2,copy=A,get=A,get=2
+Matched, but too many substrings
+ 0: b
+ 1: <unset>
+Copy substring 'A' failed (-55): requested value is not set
+Get substring 2 failed (-54): requested value is not available
+Get substring 'A' failed (-55): requested value is not set

 # End of testinput2 
--- a/testdata/testoutput6
+++ b/testdata/testoutput6
@ -6133,7 +6133,7 @@ No match

 /^(?(2)a|(1)(2))+$/
    123a
-Failed: error -40: backreference condition or recursion test not supported for DFA matching
+Failed: error -40: backreference condition or recursion test is not supported for DFA matching

 /(?<=a|bbbb)c/
    ac
@ -7087,7 +7087,7 @@ Partial match: dogs

 /abc\K123/
    xyzabc123pqr
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching
    
 /(?<=abc)123/
    xyzabc123pqr 
@ -7205,29 +7205,29 @@ No match

 /^(?!a(*SKIP)b)/
    ac
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching
    
 /^(?=a(*SKIP)b|ac)/
    ** Failers
 No match
    ac
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching
    
 /^(?=a(*THEN)b|ac)/
    ac
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching
    
 /^(?=a(*PRUNE)b)/
    ab  
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching
    ** Failers 
 No match
    ac
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching

 /^(?(?!a(*SKIP)b))/
    ac
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching

 /(?<=abc)def/
    abc\=ph
@ -7424,7 +7424,7 @@ No match

 /((?2))((?1))/
    abc
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position

 /(?(R)a+|(?R)b)/
    aaaabcde
@ -7440,11 +7440,11 @@ Failed: error -51: nested recursion at the same subject position

 /((?(R2)a+|(?1)b))/
    aaaabcde
-Failed: error -40: backreference condition or recursion test not supported for DFA matching
+Failed: error -40: backreference condition or recursion test is not supported for DFA matching

 /(?(R)a*(?1)|((?R))b)/
    aaaabcde
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position

 /(a+)/no_auto_possess
    aaaa\=ovector=3
@ -7734,4 +7734,36 @@ Failed: error -38: invalid data in workspace for DFA restart
 0: 
 0+ dab

+/(a)(b)|(c)/
+    XcX\=ovector=2,get=1,get=2,get=3,get=4,getall
+ 0: c
+Get substring 1 failed (-55): requested value is not set
+Get substring 2 failed (-54): requested value is not available
+Get substring 3 failed (-54): requested value is not available
+Get substring 4 failed (-54): requested value is not available
+ 0L c
+
+/(?<A>aa)/
+    aa\=get=A
+ 0: aa
+Get substring 'A' failed (-41): function is not supported for DFA matching
+    aa\=copy=A 
+ 0: aa
+Copy substring 'A' failed (-41): function is not supported for DFA matching
+
+/a+/no_auto_possess
+    a\=ovector=2,get=1,get=2,getall
+ 0: a
+Get substring 1 failed (-55): requested value is not set
+Get substring 2 failed (-54): requested value is not available
+ 0L a
+    aaa\=ovector=2,get=1,get=2,getall
+Matched, but offsets vector is too small to show all matches
+ 0: aaa
+ 1: aa
+ 1G aa (2)
+Get substring 2 failed (-54): requested value is not available
+ 0L aaa
+ 1L aa
+
 # End of testinput6
--- a/testdata/testoutput7
+++ b/testdata/testoutput7
@ -1218,7 +1218,7 @@ Partial match: the cat

 /ab\Cde/utf
    abXde
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching

 /(?<=ab\Cde)X/utf
 Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion