Documentation update.

2017-03-24 16:53:38 +00:00 · 2017-03-24 16:53:38 +00:00 · 3aeb812180
parent 32bab50c01
commit 3aeb812180
24 changed files with 1293 additions and 1357 deletions
--- a/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/doc/html/NON-AUTOTOOLS-BUILD.txt
@ -1,10 +1,6 @@
 Building PCRE2 without using autotools
 --------------------------------------

-This document has been converted from the PCRE1 document. I have removed a
-number of sections about building in various environments, as they applied only
-to PCRE1 and are probably out of date.
-
 This document contains the following sections:

  General
@ -183,21 +179,9 @@ can skip ahead to the CMake section.

 STACK SIZE IN WINDOWS ENVIRONMENTS

-The default processor stack size of 1Mb in some Windows environments is too
-small for matching patterns that need much recursion. In particular, test 2 may
-fail because of this. Normally, running out of stack causes a crash, but there
-have been cases where the test program has just died silently. See your linker
-documentation for how to increase stack size if you experience problems. If you
-are using CMake (see "BUILDING PCRE2 ON WINDOWS WITH CMAKE" below) and the gcc
-compiler, you can increase the stack size for pcre2test and pcre2grep by
-setting the CMAKE_EXE_LINKER_FLAGS variable to "-Wl,--stack,8388608" (for
-example). The Linux default of 8Mb is a reasonable choice for the stack, though
-even that can be too small for some pattern/subject combinations.
-
-PCRE2 has a compile configuration option to disable the use of stack for
-recursion so that heap is used instead. However, pattern matching is
-significantly slower when this is done. There is more about stack usage in the
-"pcre2stack" documentation.
+Prior to release 10.30 the default system stack size of 1Mb in some Windows 
+environments caused issues with some tests. This should no longer be the case 
+for 10.30 and later releases.


 LINKING PROGRAMS IN WINDOWS ENVIRONMENTS
@ -393,4 +377,4 @@ and executable, is in EBCDIC and native z/OS file formats and this is the
 recommended download site.

 =============================
-Last Updated: 13 October 2016
+Last Updated: 17 March 2017
--- a/doc/html/README.txt
+++ b/doc/html/README.txt
@ -15,8 +15,8 @@ subscribe or manage your subscription here:

   https://lists.exim.org/mailman/listinfo/pcre-dev

-Please read the NEWS file if you are upgrading from a previous release.
-The contents of this README file are:
+Please read the NEWS file if you are upgrading from a previous release. The
+contents of this README file are:

  The PCRE2 APIs
  Documentation for PCRE2
@ -44,8 +44,8 @@ wrappers.

 The distribution does contain a set of C wrapper functions for the 8-bit
 library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2-posix. Note that this
-just provides a POSIX calling interface to PCRE2; the regular expressions
+man page). These can be found in a library called libpcre2-posix. Note that
+this just provides a POSIX calling interface to PCRE2; the regular expressions
 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
 and does not give full access to all of PCRE2's facilities.

@ -95,10 +95,9 @@ PCRE2 documentation is supplied in two other forms:
 Building PCRE2 on non-Unix-like systems
 ---------------------------------------

-For a non-Unix-like system, please read the comments in the file
-NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
-"make" you may be able to build PCRE2 using autotools in the same way as for
-many Unix-like systems.
+For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
+your system supports the use of "configure" and "make" you may be able to build
+PCRE2 using autotools in the same way as for many Unix-like systems.

 PCRE2 can also be configured using CMake, which can be run in various ways
 (command line, GUI, etc). This creates Makefiles, solution files, etc. The file
@ -174,19 +173,19 @@ library. They are also documented in the pcre2build man page.
  architectures. If you try to enable it on an unsupported architecture, there
  will be a compile time error.

-. If you do not want to make use of the support for UTF-8 Unicode character
-  strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
-  library, or UTF-32 Unicode character strings in the 32-bit library, you can
-  add --disable-unicode to the "configure" command. This reduces the size of
-  the libraries. It is not possible to configure one library with Unicode
-  support, and another without, in the same configuration.
+. If you do not want to make use of the default support for UTF-8 Unicode
+  character strings in the 8-bit library, UTF-16 Unicode character strings in
+  the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
+  library, you can add --disable-unicode to the "configure" command. This
+  reduces the size of the libraries. It is not possible to configure one
+  library with Unicode support, and another without, in the same configuration.
+  It is also not possible to use --enable-ebcdic (see below) with Unicode
+  support, so if this option is set, you must also use --disable-unicode.

  When Unicode support is available, the use of a UTF encoding still has to be
  enabled by setting the PCRE2_UTF option at run time or starting a pattern
  with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
-  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is
-  not possible to use both --enable-unicode and --enable-ebcdic at the same
-  time.
+  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.

  As well as supporting UTF strings, Unicode support includes support for the
  \P, \p, and \X sequences that recognize Unicode character properties.
@ -232,18 +231,18 @@ library. They are also documented in the pcre2build man page.
  --with-match-limit=500000

  on the "configure" command. This is just the default; individual calls to
-  pcre2_match() can supply their own value. There is more discussion on the
-  pcre2api man page.
+  pcre2_match() can supply their own value. There is more discussion in the
+  pcre2api man page (search for pcre2_set_match_limit).

-. There is a separate counter that limits the depth of recursive function calls
-  during a matching process. This also has a default of ten million, which is
-  essentially "unlimited". You can change the default by setting, for example,
+. There is a separate counter that limits the depth of nested backtracking
+  during a matching process, which in turn limits the amount of memory that is
+  used. This also has a default of ten million, which is essentially
+  "unlimited". You can change the default by setting, for example,

-  --with-match-limit-recursion=500000
+  --with-match-limit-depth=5000

-  Recursive function calls use up the runtime stack; running out of stack can
-  cause programs to crash in strange ways. There is a discussion about stack
-  sizes in the pcre2stack man page.
+  There is more discussion in the pcre2api man page (search for
+  pcre2_set_depth_limit).

 . In the 8-bit library, the default maximum compiled pattern size is around
  64K bytes. You can increase this by adding --with-link-size=3 to the
@ -254,20 +253,6 @@ library. They are also documented in the pcre2build man page.
  performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
  link size setting is ignored, as 4-byte offsets are always used.

-. You can build PCRE2 so that its internal match() function that is called from
-  pcre2_match() does not call itself recursively. Instead, it uses memory
-  blocks obtained from the heap to save data that would otherwise be saved on
-  the stack. To build PCRE2 like this, use
-
-  --disable-stack-for-recursion
-
-  on the "configure" command. PCRE2 runs more slowly in this mode, but it may
-  be necessary in environments with limited stack sizes. This applies only to
-  the normal execution of the pcre2_match() function; if JIT support is being
-  successfully used, it is not relevant. Equally, it does not apply to
-  pcre2_dfa_match(), which does not use deeply nested recursion. There is a
-  discussion about stack sizes in the pcre2stack man page.
-
 . For speed, PCRE2 uses four tables for manipulating and identifying characters
  whose code point values are less than 256. By default, it uses a set of
  tables for ASCII encoding that is part of the distribution. If you specify
@ -389,6 +374,13 @@ library. They are also documented in the pcre2build man page.
  string. Otherwise, it is assumed to be a file name, and the contents of the
  file are the test string.

+. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
+  which caused pcre2_match() to use individual blocks on the heap for
+  backtracking instead of recursive function calls (which use the stack). This
+  is now obsolete since pcre2_match() was refactored always to use the heap (in
+  a much more efficient way than before). This option is retained for backwards
+  compatibility, but has no effect other than to output a warning.
+
 The "configure" script builds the following files for the basic C library:

 . Makefile             the makefile that builds the library
@ -662,25 +654,32 @@ Unicode support is enabled.
 Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
 16-bit and 32-bit modes. These are tests that generate different output in
 8-bit mode. Each pair are for general cases and Unicode support, respectively.
+
 Test 13 checks the handling of non-UTF characters greater than 255 by
 pcre2_dfa_match() in 16-bit and 32-bit modes.

-Test 14 contains a number of tests that must not be run with JIT. They check,
+Test 14 contains some special UTF and UCP tests that give different output for
+the different widths.
+
+Test 15 contains a number of tests that must not be run with JIT. They check,
 among other non-JIT things, the match-limiting features of the intepretive
 matcher.

-Test 15 is run only when JIT support is not available. It checks that an
+Test 16 is run only when JIT support is not available. It checks that an
 attempt to use JIT has the expected behaviour.

-Test 16 is run only when JIT support is available. It checks JIT complete and
+Test 17 is run only when JIT support is available. It checks JIT complete and
 partial modes, match-limiting under JIT, and other JIT-specific features.

-Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
+Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
 the 8-bit library, without and with Unicode support, respectively.

-Test 19 checks the serialization functions by writing a set of compiled
+Test 20 checks the serialization functions by writing a set of compiled
 patterns to a file, and then reloading and checking them.

+Tests 21 and 22 test \C support when the use of \C is not locked out, without
+and with UTF support, respectively. Test 23 tests \C when it is locked out.
+

 Character tables
 ----------------
@ -866,4 +865,4 @@ The distribution should contain the files listed below.
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 01 November 2016
+Last updated: 17 March 2017
--- a/doc/html/pcre2.html
+++ b/doc/html/pcre2.html
@ -109,7 +109,7 @@ lose performance.
 One way of guarding against this possibility is to use the
 <b>pcre2_pattern_info()</b> function to check the compiled pattern's options for
 PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling
-<b>pcre2_compile()</b>. This causes an compile time error if a pattern contains
+<b>pcre2_compile()</b>. This causes a compile time error if the pattern contains
 a UTF-setting sequence.
 </P>
 <P>
@ -137,7 +137,8 @@ large search tree against a string that will never match. Nested unlimited
 repeats in a pattern are a common example. PCRE2 provides some protection
 against this: see the <b>pcre2_set_match_limit()</b> function in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
-page.
+page. There is a similar function called <b>pcre2_set_depth_limit()</b> that can 
+be used to restrict the amount of memory that is used.
 </P>
 <br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br>
 <P>
@ -166,7 +167,7 @@ listing), and the short pages for individual functions, are concatenated in
  pcre2perform       discussion of performance issues
  pcre2posix         the POSIX-compatible C API for the 8-bit library
  pcre2sample        discussion of the pcre2demo program
-  pcre2stack         discussion of stack usage
+  pcre2stack         discussion of stack and memory usage
  pcre2syntax        quick syntax reference
  pcre2test          description of the <b>pcre2test</b> command
  pcre2unicode       discussion of Unicode and UTF support
@ -189,9 +190,9 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
 </P>
 <br><a name="SEC5" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 16 October 2015
+Last updated: 27 March 2017
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2_callout_enumerate.html
+++ b/doc/html/pcre2_callout_enumerate.html
@ -36,20 +36,21 @@ for success and non-zero otherwise. The arguments are:
  <i>callout_data</i>   User data that is passed to the callback
 </pre>
 The <i>callback()</i> function is passed a pointer to a data block containing
-the following fields:
+the following fields (not necessarily in this order):
 <pre>
-  <i>version</i>                Block version number
-  <i>pattern_position</i>       Offset to next item in pattern
-  <i>next_item_length</i>       Length of next item in pattern
-  <i>callout_number</i>         Number for numbered callouts
-  <i>callout_string_offset</i>  Offset to string within pattern
-  <i>callout_string_length</i>  Length of callout string
-  <i>callout_string</i>         Points to callout string or is NULL
+  uint32_t   <i>version</i>                Block version number
+  uint32_t   <i>callout_number</i>         Number for numbered callouts
+  PCRE2_SIZE <i>pattern_position</i>       Offset to next item in pattern
+  PCRE2_SIZE <i>next_item_length</i>       Length of next item in pattern
+  PCRE2_SIZE <i>callout_string_offset</i>  Offset to string within pattern
+  PCRE2_SIZE <i>callout_string_length</i>  Length of callout string
+  PCRE2_SPTR <i>callout_string</i>         Points to callout string or is NULL
 </pre>
-The second argument is the callout data that was passed to
-<b>pcre2_callout_enumerate()</b>. The <b>callback()</b> function must return zero
-for success. Any other value causes the pattern scan to stop, with the value
-being passed back as the result of <b>pcre2_callout_enumerate()</b>.
+The second argument passed to the <b>callback()</b> function is the callout data
+that was passed to <b>pcre2_callout_enumerate()</b>. The <b>callback()</b>
+function must return zero for success. Any other value causes the pattern scan
+to stop, with the value being passed back as the result of
+<b>pcre2_callout_enumerate()</b>.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the
--- a/doc/html/pcre2_code_free.html
+++ b/doc/html/pcre2_code_free.html
@ -26,7 +26,9 @@ DESCRIPTION
 </b><br>
 <P>
 This function frees the memory used for a compiled pattern, including any
-memory used by the JIT compiler.
+memory used by the JIT compiler. If the compiled pattern was created by a call 
+to <b>pcre2_code_copy_with_tables()</b>, the memory for the character tables is 
+also freed.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the
--- a/doc/html/pcre2_compile.html
+++ b/doc/html/pcre2_compile.html
@ -37,19 +37,24 @@ arguments are:
  <i>erroffset</i>     Where to put an error offset
  <i>ccontext</i>      Pointer to a compile context or NULL
 </pre>
-The length of the string and any error offset that is returned are in code
-units, not characters. A compile context is needed only if you want to change
+The length of the pattern and any error offset that is returned are in code
+units, not characters. A compile context is needed only if you want to provide
+custom memory allocation functions, or to provide an external function for
+system stack size checking, or to change one or more of these parameters:
 <pre>
-  What \R matches (Unicode newlines or CR, LF, CRLF only)
-  PCRE2's character tables
-  The newline character sequence
-  The compile time nested parentheses limit
+  What \R matches (Unicode newlines, or CR, LF, CRLF only);
+  PCRE2's character tables;
+  The newline character sequence;
+  The compile time nested parentheses limit;
+  The maximum pattern length (in code units) that is allowed.
 </pre>
-or provide an external function for stack size checking. The option bits are:
+The option bits are:
 <pre>
  PCRE2_ANCHORED           Force pattern anchoring
+  PCRE2_ALLOW_EMPTY_CLASS  Allow empty classes
  PCRE2_ALT_BSUX           Alternative handling of \u, \U, and \x
  PCRE2_ALT_CIRCUMFLEX     Alternative handling of ^ in multiline mode
+  PCRE2_ALT_VERBNAMES      Process backslashes in verb names
  PCRE2_AUTO_CALLOUT       Compile automatic callouts
  PCRE2_CASELESS           Do caseless matching
  PCRE2_DOLLAR_ENDONLY     $ not to match newline at end
@ -71,19 +76,21 @@ or provide an external function for stack size checking. The option bits are:
                             (only relevant if PCRE2_UTF is set)
  PCRE2_UCP                Use Unicode properties for \d, \w, etc.
  PCRE2_UNGREEDY           Invert greediness of quantifiers
+  PCRE2_USE_OFFSET_LIMIT   Enable offset limit for unanchored matching
  PCRE2_UTF                Treat pattern and subjects as UTF strings
 </pre>
-PCRE2 must be built with Unicode support in order to use PCRE2_UTF, PCRE2_UCP
-and related options.
+PCRE2 must be built with Unicode support (the default) in order to use
+PCRE2_UTF, PCRE2_UCP and related options.
 </P>
 <P>
 The yield of the function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 </P>
 <P>
-There is a complete description of the PCRE2 native API in the
+There is a complete description of the PCRE2 native API, with more detail on
+each option, in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
-page and a description of the POSIX API in the
+page, and a description of the POSIX API in the
 <a href="pcre2posix.html"><b>pcre2posix</b></a>
 page.
 <p>
--- a/doc/html/pcre2_config.html
+++ b/doc/html/pcre2_config.html
@ -45,10 +45,9 @@ point to a uint32_t integer variable. The available codes are:
  PCRE2_CONFIG_BSR             Indicates what \R matches by default:
                                 PCRE2_BSR_UNICODE
                                 PCRE2_BSR_ANYCRLF
-  PCRE2_CONFIG_JIT             Availability of just-in-time compiler
-                                support (1=yes 0=no)
-  PCRE2_CONFIG_JITTARGET       Information about the target archi-
-                                 tecture for the JIT compiler
+  PCRE2_CONFIG_DEPTHLIMIT      Default backtracking depth limit
+  PCRE2_CONFIG_JIT             Availability of just-in-time compiler support (1=yes 0=no)
+  PCRE2_CONFIG_JITTARGET       Information (a string) about the target architecture for the JIT compiler
  PCRE2_CONFIG_LINKSIZE        Configured internal link size (2, 3, 4)
  PCRE2_CONFIG_MATCHLIMIT      Default internal resource limit
  PCRE2_CONFIG_NEWLINE         Code for the default newline sequence:
@ -58,11 +57,9 @@ point to a uint32_t integer variable. The available codes are:
                                 PCRE2_NEWLINE_ANY
                                 PCRE2_NEWLINE_ANYCRLF
  PCRE2_CONFIG_PARENSLIMIT     Default parentheses nesting limit
-  PCRE2_CONFIG_RECURSIONLIMIT  Internal recursion depth limit
-  PCRE2_CONFIG_STACKRECURSE    Recursion implementation (1=stack
-                                 0=heap)
-  PCRE2_CONFIG_UNICODE         Availability of Unicode support (1=yes
-                                 0=no)
+  PCRE2_CONFIG_RECURSIONLIMIT  Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
+  PCRE2_CONFIG_STACKRECURSE    Obsolete: always returns 0
+  PCRE2_CONFIG_UNICODE         Availability of Unicode support (1=yes 0=no)
  PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string)
  PCRE2_CONFIG_VERSION         The PCRE2 version (a string)
 </pre>
--- a/doc/html/pcre2_dfa_match.html
+++ b/doc/html/pcre2_dfa_match.html
@ -31,8 +31,9 @@ DESCRIPTION
 <P>
 This function matches a compiled regular expression against a given subject
 string, using an alternative matching algorithm that scans the subject string
-just once (<i>not</i> Perl-compatible). (The Perl-compatible matching function
-is <b>pcre2_match()</b>.) The arguments for this function are:
+just once (except when processing lookaround assertions). This function is
+<i>not</i> Perl-compatible (the Perl-compatible matching function is
+<b>pcre2_match()</b>). The arguments for this function are:
 <pre>
  <i>code</i>         Points to the compiled pattern
  <i>subject</i>      Points to the subject string
@ -45,22 +46,18 @@ is <b>pcre2_match()</b>.) The arguments for this function are:
  <i>wscount</i>      Number of elements in the vector
 </pre>
 For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
-up a callout function or specify the recursion limit. The <i>length</i> and
-<i>startoffset</i> values are code units, not characters. The options are:
+up a callout function or specify the recursion depth limit. The <i>length</i>
+and <i>startoffset</i> values are code units, not characters. The options are:
 <pre>
  PCRE2_ANCHORED          Match only at the first position
  PCRE2_NOTBOL            Subject is not the beginning of a line
  PCRE2_NOTEOL            Subject is not the end of a line
  PCRE2_NOTEMPTY          An empty string is not a valid match
-  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject
-                           is not a valid match
-  PCRE2_NO_UTF_CHECK      Do not check the subject for UTF
-                           validity (only relevant if PCRE2_UTF
+  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject is not a valid match
+  PCRE2_NO_UTF_CHECK      Do not check the subject for UTF validity (only relevant if PCRE2_UTF
                           was set at compile time)
-  PCRE2_PARTIAL_SOFT      Return PCRE2_ERROR_PARTIAL for a partial
-                            match if no full matches are found
-  PCRE2_PARTIAL_HARD      Return PCRE2_ERROR_PARTIAL for a partial match
-                           even if there is a full match as well
+  PCRE2_PARTIAL_HARD      Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match
+  PCRE2_PARTIAL_SOFT      Return PCRE2_ERROR_PARTIAL for a partial match if no full matches are found
  PCRE2_DFA_RESTART       Restart after a partial match
  PCRE2_DFA_SHORTEST      Return only the shortest match
 </pre>
--- a/doc/html/pcre2_get_error_message.html
+++ b/doc/html/pcre2_get_error_message.html
@ -34,11 +34,11 @@ errors are negative numbers. The arguments are:
  <i>buffer</i>      where to put the message
  <i>bufflen</i>     the length of the buffer (code units)
 </pre>
-The function returns the length of the message, excluding the trailing zero, or
-the negative error code PCRE2_ERROR_NOMEMORY if the buffer is too small. In
-this case, the returned message is truncated (but still with a trailing zero).
-If <i>errorcode</i> does not contain a recognized error code number, the
-negative value PCRE2_ERROR_BADDATA is returned.
+The function returns the length of the message in code units, excluding the
+trailing zero, or the negative error code PCRE2_ERROR_NOMEMORY if the buffer is
+too small. In this case, the returned message is truncated (but still with a
+trailing zero). If <i>errorcode</i> does not contain a recognized error code
+number, the negative value PCRE2_ERROR_BADDATA is returned.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the
--- a/doc/html/pcre2_jit_stack_create.html
+++ b/doc/html/pcre2_jit_stack_create.html
@ -32,10 +32,9 @@ maximum size to which it is allowed to grow. The final argument is a general
 context, for memory allocation functions, or NULL for standard memory
 allocation. The result can be passed to the JIT run-time code by calling
 <b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
-which can then be processed by <b>pcre2_match()</b>. If the "fast path" JIT
-matcher, <b>pcre2_jit_match()</b> is used, the stack can be passed directly as
-an argument. A maximum stack size of 512K to 1M should be more than enough for
-any pattern. For more details, see the
+which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
+A maximum stack size of 512K to 1M should be more than enough for any pattern.
+For more details, see the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 page.
 </P>
--- a/doc/html/pcre2_maketables.html
+++ b/doc/html/pcre2_maketables.html
@ -25,10 +25,10 @@ SYNOPSIS
 DESCRIPTION
 </b><br>
 <P>
-This function builds a set of character tables for character values less than
-256. These can be passed to <b>pcre2_compile()</b> in a compile context in order
-to override the internal, built-in tables (which were either defaulted or made
-by <b>pcre2_maketables()</b> when PCRE2 was compiled). See the
+This function builds a set of character tables for character code points that 
+are less than 256. These can be passed to <b>pcre2_compile()</b> in a compile
+context in order to override the internal, built-in tables (which were either
+defaulted or made by <b>pcre2_maketables()</b> when PCRE2 was compiled). See the
 <a href="pcre2_set_character_tables.html"><b>pcre2_set_character_tables()</b></a>
 page. You might want to do this if you are using a non-standard locale.
 </P>
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -2575,8 +2575,8 @@ The internal recursion limit was reached.
 A text message for an error code from any PCRE2 function (compile, match, or
 auxiliary) can be obtained by calling <b>pcre2_get_error_message()</b>. The code
 is passed as the first argument, with the remaining two arguments specifying a
-code unit buffer and its length, into which the text message is placed. Note
-that the message is returned in code units of the appropriate width for the
+code unit buffer and its length in code units, into which the text message is
+placed. The message is returned in code units of the appropriate width for the
 library that is being used.
 </P>
 <P>
@ -3265,9 +3265,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC41" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 December 2016
+Last updated: 21 March 2017
 <br>
-Copyright &copy; 1997-2016 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2grep.html
+++ b/doc/html/pcre2grep.html
@ -280,6 +280,10 @@ operating systems the effect of reading a directory like this is an immediate
 end-of-file; in others it may provoke an error.
 </P>
 <P>
+<b>--depth-limit</b>=<i>number</i>
+See <b>--match-limit</b> below.
+</P>
+<P>
 <b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i>
 Specify a pattern to be matched. This option can be used multiple times in
 order to specify several patterns. It can also be used as a way of specifying a
@ -498,29 +502,22 @@ used. There is no short form for this option.
 </P>
 <P>
 <b>--match-limit</b>=<i>number</i>
-Processing some regular expression patterns can require a very large amount of
-memory, leading in some cases to a program crash if not enough is available.
-Other patterns may take a very long time to search for all possible matching
-strings. The <b>pcre2_match()</b> function that is called by <b>pcre2grep</b> to
-do the matching has two parameters that can limit the resources that it uses.
+Processing some regular expression patterns may take a very long time to search
+for all possible matching strings. Others may require a very large amount of
+memory. There are two options that set resource limits for matching.
 <br>
 <br>
-The <b>--match-limit</b> option provides a means of limiting resource usage
-when processing patterns that are not going to match, but which have a very
-large number of possibilities in their search trees. The classic example is a
-pattern that uses nested unlimited repeats. Internally, PCRE2 uses a function
-called <b>match()</b> which it calls repeatedly (sometimes recursively). The
-limit set by <b>--match-limit</b> is imposed on the number of times this
-function is called during a match, which has the effect of limiting the amount
-of backtracking that can take place.
+The <b>--match-limit</b> option provides a means of limiting computing resource
+usage when processing patterns that are not going to match, but which have a
+very large number of possibilities in their search trees. The classic example
+is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a 
+counter that is incremented each time around its main processing loop. If the 
+value set by <b>--match-limit</b> is reached, an error occurs.
 <br>
 <br>
-The <b>--recursion-limit</b> option is similar to <b>--match-limit</b>, but
-instead of limiting the total number of times that <b>match()</b> is called, it
-limits the depth of recursive calls, which in turn limits the amount of memory
-that can be used. The recursion depth is a smaller number than the total number
-of calls, because not all calls to <b>match()</b> are recursive. This limit is
-of use only if it is set smaller than <b>--match-limit</b>.
+The <b>--depth-limit</b> option limits the depth of nested backtracking points,
+which in turn limits the amount of memory that is used. This limit is of use
+only if it is set smaller than <b>--match-limit</b>.
 <br>
 <br>
 There are no short forms for these options. The default settings are specified
@ -843,9 +840,9 @@ there are more than 20 such errors, <b>pcre2grep</b> gives up.
 </P>
 <P>
 The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the
-overall resource limit; there is a second option called <b>--recursion-limit</b>
-that sets a limit on the amount of memory (usually stack) that is used (see the
-discussion of these options above).
+overall resource limit; there is a second option called <b>--depth-limit</b>
+that sets a limit on the amount of memory that is used (see the discussion of
+these options above).
 </P>
 <br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br>
 <P>
@ -870,9 +867,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 31 December 2016
+Last updated: 21 March 2017
 <br>
-Copyright &copy; 1997-2016 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -170,20 +170,24 @@ the application to apply the JIT optimization by calling
 <b>pcre2_jit_compile()</b> is ignored.
 </P>
 <br><b>
-Setting match and recursion limits
+Setting match and backtracking depth limits
 </b><br>
 <P>
-The caller of <b>pcre2_match()</b> can set a limit on the number of times the
-internal <b>match()</b> function is called and on the maximum depth of
-recursive calls. These facilities are provided to catch runaway matches that
-are provoked by patterns with huge matching trees (a typical example is a
-pattern with nested unlimited repeats) and to avoid running out of system stack
-by too much recursion. When one of these limits is reached, <b>pcre2_match()</b>
-gives an error return. The limits can also be set by items at the start of the
-pattern of the form
+The pcre2_match() function contains a counter that is incremented every time it
+goes round its main loop. The caller of <b>pcre2_match()</b> can set a limit on
+this counter, which therefore limits the amount of computing resource used for
+a match. The maximum depth of nested backtracking can also be limited, and this
+restricts the amount of heap memory that is used.
+</P>
+<P>
+These facilities are provided to catch runaway matches that are provoked by
+patterns with huge matching trees (a typical example is a pattern with nested
+unlimited repeats applied to a long string that does not match). When one of
+these limits is reached, <b>pcre2_match()</b> gives an error return. The limits
+can also be set by items at the start of the pattern of the form
 <pre>
  (*LIMIT_MATCH=d)
-  (*LIMIT_RECURSION=d)
+  (*LIMIT_DEPTH=d)
 </pre>
 where d is any number of decimal digits. However, the value of the setting must
 be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
@ -192,10 +196,15 @@ limits set by the programmer, but not raise them. If there is more than one
 setting of one of these limits, the lower value is used.
 </P>
 <P>
+Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is 
+still recognized for backwards compatibility.
+</P>
+<P>
 The match limit is used (but in a different way) when JIT is being used, but it
 is not relevant, and is ignored, when matching with <b>pcre2_dfa_match()</b>.
-However, the recursion limit is relevant for DFA matching, which does use some
-function recursion, in particular, for recursions within the pattern.
+However, the depth limit is relevant for DFA matching, which uses function
+recursion for recursions within the pattern. In this case, the depth limit 
+controls the amount of system stack that is used.
 <a name="newlines"></a></P>
 <br><b>
 Newline conventions
@ -235,8 +244,8 @@ The newline convention affects where the circumflex and dollar assertions are
 true. It also affects the interpretation of the dot metacharacter when
 PCRE2_DOTALL is not set, and the behaviour of \N. However, it does not affect
 what the \R escape sequence matches. By default, this is any Unicode newline
-sequence, for Perl compatibility. However, this can be changed; see the
-description of \R in the section entitled
+sequence, for Perl compatibility. However, this can be changed; see the next 
+section and the description of \R in the section entitled
 <a href="#newlineseq">"Newline sequences"</a>
 below. A change of \R setting can be combined with a change of newline
 convention.
@ -254,7 +263,7 @@ corresponding to PCRE2_BSR_UNICODE.
 <br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
 <P>
 PCRE2 can be compiled to run in an environment that uses EBCDIC as its
-character code rather than ASCII or Unicode (typically a mainframe system). In
+character code instead of ASCII or Unicode (typically a mainframe system). In
 the sections below, character code values are ASCII or Unicode; in an EBCDIC
 environment these characters may have different code values, and there are no
 code points greater than 255.
@ -318,11 +327,11 @@ that character may have. This use of backslash as an escape character applies
 both inside and outside character classes.
 </P>
 <P>
-For example, if you want to match a * character, you write \* in the pattern.
-This escaping action applies whether or not the following character would
-otherwise be interpreted as a metacharacter, so it is always safe to precede a
-non-alphanumeric with backslash to specify that it stands for itself. In
-particular, if you want to match a backslash, you write \\.
+For example, if you want to match a * character, you must write \* in the
+pattern. This escaping action applies whether or not the following character
+would otherwise be interpreted as a metacharacter, so it is always safe to
+precede a non-alphanumeric with backslash to specify that it stands for itself.
+In particular, if you want to match a backslash, you write \\.
 </P>
 <P>
 In a UTF mode, only ASCII numbers and letters have any special meaning after a
@ -353,7 +362,7 @@ An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
 by \E later in the pattern, the literal interpretation continues to the end of
 the pattern (that is, \E is assumed at the end). If the isolated \Q is inside
 a character class, this causes an error, because the character class is not
-terminated.
+terminated by a closing square bracket.
 <a name="digitsafterbackslash"></a></P>
 <br><b>
 Non-printing characters
@ -476,9 +485,9 @@ a hexadecimal digit appears between \x{ and }, or if there is no terminating
 <P>
 If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
 described only when it is followed by two hexadecimal digits. Otherwise, it
-matches a literal "x" character. In this mode mode, support for code points
-greater than 256 is provided by \u, which must be followed by four hexadecimal
-digits; otherwise it matches a literal "u" character.
+matches a literal "x" character. In this mode, support for code points greater
+than 256 is provided by \u, which must be followed by four hexadecimal digits;
+otherwise it matches a literal "u" character.
 </P>
 <P>
 Characters whose value is less than 256 can be defined by either of the two
@ -493,12 +502,10 @@ Constraints on character values
 Characters that are specified using octal or hexadecimal numbers are
 limited to certain values, as follows:
 <pre>
-  8-bit non-UTF mode    less than 0x100
-  8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint
-  16-bit non-UTF mode   less than 0x10000
-  16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint
-  32-bit non-UTF mode   less than 0x100000000
-  32-bit UTF-32 mode    less than 0x10ffff and a valid codepoint
+  8-bit non-UTF mode    no greater than 0xff
+  16-bit non-UTF mode   no greater than 0xffff
+  32-bit non-UTF mode   no greater than 0xffffffff
+  All UTF modes         no greater than 0x10ffff and a valid codepoint
 </pre>
 Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called
 "surrogate" codepoints), and 0xffef.
@ -525,7 +532,7 @@ In Perl, the sequences \l, \L, \u, and \U are recognized by its string
 handler and used to modify the case of following characters. By default, PCRE2
 does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
 is set, \U matches a "U" character, and \u can be used to define a character
-by code point, as described in the previous section.
+by code point, as described above.
 </P>
 <br><b>
 Absolute and relative back references
@ -714,7 +721,9 @@ When PCRE2 is built with Unicode support (the default), three additional escape
 sequences that match characters with specific properties are available. In
 8-bit non-UTF-8 mode, these sequences are of course limited to testing
 characters whose codepoints are less than 256, but they do work in this mode.
-The extra escape sequences are:
+In 32-bit non-UTF mode, codepoints greater than 0x10ffff (the Unicode limit)
+may be encountered. These are all treated as being in the Common script and
+with an unassigned type. The extra escape sequences are:
 <pre>
  \p{<i>xx</i>}   a character with the <i>xx</i> property
  \P{<i>xx</i>}   a character without the <i>xx</i> property
@ -2214,16 +2223,8 @@ except that it does not cause the current matching position to be changed.
 Assertion subpatterns are not capturing subpatterns. If such an assertion
 contains capturing subpatterns within it, these are counted for the purposes of
 numbering the capturing subpatterns in the whole pattern. However, substring
-capturing is carried out only for positive assertions. (Perl sometimes, but not
-always, does do capturing in negative assertions.)
-</P>
-<P>
-WARNING: If a positive assertion containing one or more capturing subpatterns
-succeeds, but failure to match later in the pattern causes backtracking over
-this assertion, the captures within the assertion are reset only if no higher
-numbered captures are already set. This is, unfortunately, a fundamental
-limitation of the current implementation; it may get removed in a future
-reworking.
+capturing is normally carried out only for positive assertions (but see the 
+discussion of conditional subpatterns below).
 </P>
 <P>
 For compatibility with Perl, most assertion subpatterns may be repeated; though
@ -2601,6 +2602,12 @@ presence of at least one letter in the subject. If a letter is found, the
 subject is matched against the first alternative; otherwise it is matched
 against the second. This pattern matches strings in one of the two forms
 dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
+</P>
+<P>
+For Perl compatibility, if an assertion that is a condition contains capturing 
+subpatterns, any capturing that occurs is retained afterwards, for both 
+positive and negative assertions. (Compare non-conditional assertions, when 
+captures are retained only for positive assertions.)
 <a name="comments"></a></P>
 <br><a name="SEC22" href="#TOC1">COMMENTS</a><br>
 <P>
@ -2773,93 +2780,57 @@ is the actual recursive call.
 Differences in recursion processing between PCRE2 and Perl
 </b><br>
 <P>
-Recursion processing in PCRE2 differs from Perl in two important ways. In PCRE2
-(like Python, but unlike Perl), a recursive subpattern call is always treated
-as an atomic group. That is, once it has matched some of the subject string, it
-is never re-entered, even if it contains untried alternatives and there is a
-subsequent matching failure. This can be illustrated by the following pattern,
-which purports to match a palindromic string that contains an odd number of
-characters (for example, "a", "aba", "abcba", "abcdcba"):
-<pre>
-  ^(.|(.)(?1)\2)$
-</pre>
-The idea is that it either matches a single character, or two identical
-characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE2
-it does not if the pattern is longer than three characters. Consider the
-subject string "abcba":
+Some former differences between PCRE2 and Perl no longer exist.
 </P>
 <P>
-At the top level, the first character is matched, but as it is not at the end
-of the string, the first alternative fails; the second alternative is taken
-and the recursion kicks in. The recursive call to subpattern 1 successfully
-matches the next character ("b"). (Note that the beginning and end of line
-tests are not part of the recursion).
+Before release 10.30, recursion processing in PCRE2 differed from Perl in that
+a recursive subpattern call was always treated as an atomic group. That is,
+once it had matched some of the subject string, it was never re-entered, even
+if it contained untried alternatives and there was a subsequent matching
+failure. (Historical note: PCRE implemented recursion before Perl did.)
 </P>
 <P>
-Back at the top level, the next character ("c") is compared with what
-subpattern 2 matched, which was "a". This fails. Because the recursion is
-treated as an atomic group, there are now no backtracking points, and so the
-entire match fails. (Perl is able, at this point, to re-enter the recursion and
-try the second alternative.) However, if the pattern is written with the
-alternatives in the other order, things are different:
-<pre>
-  ^((.)(?1)\2|.)$
-</pre>
-This time, the recursing alternative is tried first, and continues to recurse
-until it runs out of characters, at which point the recursion fails. But this
-time we do have another alternative to try at the higher level. That is the big
-difference: in the previous case the remaining alternative is at a deeper
-recursion level, which PCRE2 cannot use.
+Starting with release 10.30, recursive subroutine calls are no longer treated 
+as atomic. That is, they can be re-entered to try unused alternatives if there 
+is a matching failure later in the pattern. This is now compatible with the way 
+Perl works. If you want a subroutine call to be atomic, you must explicitly
+enclose it in an atomic group.
 </P>
 <P>
-To change the pattern so that it matches all palindromic strings, not just
-those with an odd number of characters, it is tempting to change the pattern to
-this:
+Supporting backtracking into recursions simplifies certain types of recursive 
+pattern. For example, this pattern matches palindromic strings:
 <pre>
  ^((.)(?1)\2|.?)$
 </pre>
-Again, this works in Perl, but not in PCRE2, and for the same reason. When a
-deeper recursion has matched a single character, it cannot be entered again in
-order to match an empty string. The solution is to separate the two cases, and
-write out the odd and even cases as alternatives at the higher level:
+The second branch in the group matches a single central character in the
+palindrome when there are an odd number of characters, or nothing when there
+are an even number of characters, but in order to work it has to be able to try
+the second case when the rest of the pattern match fails. If you want to match
+typical palindromic phrases, the pattern has to ignore all non-word characters,
+which can be done like this:
 <pre>
-  ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
-</pre>
-If you want to match typical palindromic phrases, the pattern has to ignore all
-non-word characters, which can be done like this:
-<pre>
-  ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
+  ^\W*+((.)\W*+(?1)\W*+\2|\W*+.?)\W*+$
 </pre>
 If run with the PCRE2_CASELESS option, this pattern matches phrases such as "A
-man, a plan, a canal: Panama!" and it works in both PCRE2 and Perl. Note the
-use of the possessive quantifier *+ to avoid backtracking into sequences of
-non-word characters. Without this, PCRE2 takes a great deal longer (ten times
-or more) to match typical phrases, and Perl takes so long that you think it has
-gone into a loop.
+man, a plan, a canal: Panama!". Note the use of the possessive quantifier *+ to
+avoid backtracking into sequences of non-word characters. Without this, PCRE2
+takes a great deal longer (ten times or more) to match typical phrases, and
+Perl takes so long that you think it has gone into a loop.
 </P>
 <P>
-<b>WARNING</b>: The palindrome-matching patterns above work only if the subject
-string does not start with a palindrome that is shorter than the entire string.
-For example, although "abcba" is correctly matched, if the subject is "ababa",
-PCRE2 finds the palindrome "aba" at the start, then fails at top level because
-the end of the string does not follow. Once again, it cannot jump back into the
-recursion to try other alternatives, so the entire match fails.
-</P>
-<P>
-The second way in which PCRE2 and Perl differ in their recursion processing is
-in the handling of captured values. In Perl, when a subpattern is called
-recursively or as a subpattern (see the next section), it has no access to any
-values that were captured outside the recursion, whereas in PCRE2 these values
-can be referenced. Consider this pattern:
+Another way in which PCRE2 and Perl used to differ in their recursion
+processing is in the handling of captured values. Formerly in Perl, when a
+subpattern was called recursively or as a subpattern (see the next section), it
+had no access to any values that were captured outside the recursion, whereas
+in PCRE2 these values can be referenced. Consider this pattern:
 <pre>
  ^(.)(\1|a(?2))
 </pre>
-In PCRE2, this pattern matches "bab". The first capturing parentheses match "b",
-then in the second group, when the back reference \1 fails to match "b", the
-second alternative matches "a" and then recurses. In the recursion, \1 does
-now match "b" and so the whole match succeeds. In Perl, the pattern fails to
-match because inside the recursive call \1 cannot access the externally set
-value.
+This pattern matches "bab". The first capturing parentheses match "b", then in
+the second group, when the back reference \1 fails to match "b", the second
+alternative matches "a" and then recurses. In the recursion, \1 does now match
+"b" and so the whole match succeeds. This match used to fail in Perl, but in 
+later versions (I tried 5.024) it now works.
 <a name="subpatternsassubroutines"></a></P>
 <br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
 <P>
@ -2886,11 +2857,10 @@ is used, it does match "sense and responsibility" as well as the other two
 strings. Another example is given in the discussion of DEFINE above.
 </P>
 <P>
-All subroutine calls, whether recursive or not, are always treated as atomic
-groups. That is, once a subroutine has matched some of the subject string, it
-is never re-entered, even if it contains untried alternatives and there is a
-subsequent matching failure. Any capturing parentheses that are set during the
-subroutine call revert to their previous values afterwards.
+Like recursions, subroutine calls used to be treated as atomic, but this
+changed at PCRE2 release 10.30, so backtracking into subroutine calls can now
+occur. However, any capturing parentheses that are set during the subroutine
+call revert to their previous values afterwards.
 </P>
 <P>
 Processing options such as case-independence are fixed when a subpattern is
@ -2998,17 +2968,10 @@ The doubling is removed before the string is passed to the callout function.
 <a name="backtrackcontrol"></a></P>
 <br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
-Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
-are still described in the Perl documentation as "experimental and subject to
-change or removal in a future version of Perl". It goes on to say: "Their usage
-in production code should be noted to avoid problems during upgrades." The same
-remarks apply to the PCRE2 features described in this section.
-</P>
-<P>
-The new verbs make use of what was previously invalid syntax: an opening
-parenthesis followed by an asterisk. They are generally of the form (*VERB) or
-(*VERB:NAME). Some verbs take either form, possibly behaving differently
-depending on whether or not a name is present.
+There are a number of special "Backtracking Control Verbs" (to use Perl's
+terminology) that modify the behaviour of backtracking during matching. They
+are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
+possibly behaving differently depending on whether or not a name is present.
 </P>
 <P>
 By default, for compatibility with Perl, a name is any sequence of characters
@ -3040,7 +3003,7 @@ not there. Any number of these verbs may occur in a pattern.
 <P>
 Since these verbs are specifically related to backtracking, most of them can be
 used only when the pattern is to be matched using the traditional matching
-function, because these use a backtracking algorithm. With the exception of
+function, because that uses a backtracking algorithm. With the exception of
 (*FAIL), which behaves like a failing negative assertion, the backtracking
 control verbs cause an error if encountered by the DFA matching function.
 </P>
@ -3178,11 +3141,11 @@ Verbs that act after backtracking
 The following verbs do nothing when they are encountered. Matching continues
 with what follows, but if there is no subsequent match, causing a backtrack to
 the verb, a failure is forced. That is, backtracking cannot pass to the left of
-the verb. However, when one of these verbs appears inside an atomic group
-(which includes any group that is called as a subroutine) or in an assertion
-that is true, its effect is confined to that group, because once the group has
-been matched, there is never any backtracking into it. In this situation,
-backtracking has to jump to the left of the entire atomic group or assertion.
+the verb. However, when one of these verbs appears inside an atomic group or in
+an assertion that is true, its effect is confined to that group, because once
+the group has been matched, there is never any backtracking into it. In this
+situation, backtracking has to jump to the left of the entire atomic group or
+assertion.
 </P>
 <P>
 These verbs differ in exactly what kind of failure occurs when backtracking
@ -3246,8 +3209,8 @@ expressed in any other way. In an anchored pattern (*PRUNE) has the same effect
 as (*COMMIT).
 </P>
 <P>
-The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE).
-It is like (*MARK:NAME) in that the name is remembered for passing back to the
+The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
+like (*MARK:NAME) in that the name is remembered for passing back to the
 caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
 ignoring those set by (*PRUNE) or (*THEN).
 <pre>
@ -3452,9 +3415,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 27 December 2016
+Last updated: 18 March 2017
 <br>
-Copyright &copy; 1997-2016 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2serialize.html
+++ b/doc/html/pcre2serialize.html
@ -55,7 +55,10 @@ The facility for saving and restoring compiled patterns is intended for use
 within individual applications. As such, the data supplied to
 <b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from
 arbitrary external sources. There is only some simple consistency checking, not
-complete validation of what is being re-loaded.
+complete validation of what is being re-loaded. Corrupted data may cause
+undefined results. For example, if the length field of a pattern in the
+serialized data is corrupted, the deserializing code may read beyond the end of
+the byte stream that is passed to it.
 </P>
 <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
 <P>
@ -190,9 +193,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 24 May 2016
+Last updated: 21 March 2017
 <br>
-Copyright &copy; 1997-2016 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@ -126,12 +126,13 @@ character values up to 0x7fffffff. Each character is placed in one 16-bit or
 to occur).
 </P>
 <P>
-UTF-8 is not capable of encoding values greater than 0x7fffffff, but such
-values can be handled by the 32-bit library. When testing this library in
-non-UTF mode with <b>utf8_input</b> set, if any character is preceded by the
-byte 0xff (which is an illegal byte in UTF-8) 0x80000000 is added to the
-character's value. This is the only way of passing such code points in a
-pattern string. For subject strings, using an escape sequence is preferable.
+UTF-8 (in its original definition) is not capable of encoding values greater
+than 0x7fffffff, but such values can be handled by the 32-bit library. When
+testing this library in non-UTF mode with <b>utf8_input</b> set, if any
+character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
+0x80000000 is added to the character's value. This is the only way of passing
+such code points in a pattern string. For subject strings, using an escape
+sequence is preferable.
 </P>
 <br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
 <P>
@ -602,6 +603,7 @@ about the pattern:
  /B  bincode                   show binary code without lengths
      callout_info              show callout information
      debug                     same as info,fullbincode
+      framesize                 show matching frame size 
      fullbincode               show binary code with lengths
  /I  info                      show info about compiled pattern
      hex                       unquoted characters are hexadecimal
@ -689,6 +691,11 @@ not necessarily the last character. These lines are omitted if no starting or
 ending code units are recorded.
 </P>
 <P>
+The <b>framesize</b> modifier shows the size, in bytes, of the storage frames 
+used by <b>pcre2_match()</b> for handling backtracking. The size depends on the
+number of capturing parentheses in the pattern.
+</P>
+<P>
 The <b>callout_info</b> modifier requests information about all the callouts in
 the pattern. A list of them is output at the end of any other information that
 is requested. For each callout, either its number or string is given, followed
@ -1073,6 +1080,7 @@ pattern.
      callout_fail=&#60;n&#62;[:&#60;m&#62;]     control callout failure
      callout_none               do not supply a callout function
      copy=&#60;number or name&#62;      copy captured substring
+      depth_limit=&#60;n&#62;            set a depth limit
      dfa                        use <b>pcre2_dfa_match()</b>
      find_limits                find match and recursion limits
      get=&#60;number or name&#62;       extract captured substring
@ -1086,7 +1094,7 @@ pattern.
      offset=&#60;n&#62;                 set starting offset
      offset_limit=&#60;n&#62;           set offset limit
      ovector=&#60;n&#62;                set size of output vector
-      recursion_limit=&#60;n&#62;        set a recursion limit
+      recursion_limit=&#60;n&#62;        obsolete synonym for depth_limit
      replace=&#60;string&#62;           specify a replacement string
      startchar                  show startchar when relevant
      startoffset=&#60;n&#62;            same as offset=&#60;n&#62;
@ -1320,10 +1328,10 @@ stack that is larger than the default 32K is necessary only for very
 complicated patterns.
 </P>
 <br><b>
-Setting match and recursion limits
+Setting match and depth limits
 </b><br>
 <P>
-The <b>match_limit</b> and <b>recursion_limit</b> modifiers set the appropriate
+The <b>match_limit</b> and <b>depth_limit</b> modifiers set the appropriate
 limits in the match context. These values are ignored when the
 <b>find_limits</b> modifier is specified.
 </P>
@ -1333,23 +1341,23 @@ Finding minimum limits
 <P>
 If the <b>find_limits</b> modifier is present, <b>pcre2test</b> calls
 <b>pcre2_match()</b> several times, setting different values in the match
-context via <b>pcre2_set_match_limit()</b> and <b>pcre2_set_recursion_limit()</b>
+context via <b>pcre2_set_match_limit()</b> and <b>pcre2_set_depth_limit()</b>
 until it finds the minimum values for each parameter that allow
 <b>pcre2_match()</b> to complete without error.
 </P>
 <P>
 If JIT is being used, only the match limit is relevant. If DFA matching is
-being used, neither limit is relevant, and this modifier is ignored (with a
-warning message).
+being used, only the depth limit is relevant, but at present this modifier is
+ignored (with a warning message).
 </P>
 <P>
 The <i>match_limit</i> number is a measure of the amount of backtracking
 that takes place, and learning the minimum value can be instructive. For most
 simple matches, the number is quite small, but for patterns with very large
 numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string. The <i>match_limit_recursion</i> number is
-a measure of how much stack (or, if PCRE2 is compiled with NO_RECURSE, how much
-heap) memory is needed to complete the match attempt.
+increasing length of subject string. The <i>depth_limit</i> number is
+a measure of how much memory for recording backtracking points is needed to
+complete the match attempt.
 </P>
 <br><b>
 Showing MARK names
@ -1466,7 +1474,7 @@ code unit offset of the start of the failing character is also output. Here is
 an example of an interactive <b>pcre2test</b> run.
 <pre>
  $ pcre2test
-  PCRE2 version 9.00 2014-05-10
+  PCRE2 version 10.22 2016-07-29

    re&#62; /^abc(\d+)/
  data&#62; abc123
@ -1779,9 +1787,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 28 December 2016
+Last updated: 21 March 2017
 <br>
-Copyright &copy; 1997-2016 University of Cambridge.
+Copyright &copy; 1997-2017 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@ -89,8 +89,8 @@ SECURITY CONSIDERATIONS
       One  way  of guarding against this possibility is to use the pcre2_pat-
       tern_info() function  to  check  the  compiled  pattern's  options  for
       PCRE2_UTF.  Alternatively,  you can set the PCRE2_NEVER_UTF option when
-       calling pcre2_compile(). This causes an compile time error if a pattern
-       contains a UTF-setting sequence.
+       calling pcre2_compile(). This causes a compile time error if  the  pat-
+       tern contains a UTF-setting sequence.

       The  use  of Unicode properties for character types such as \d can also
       be enabled from within the pattern, by specifying "(*UCP)".  This  fea-
@ -112,7 +112,9 @@ SECURITY CONSIDERATIONS
       has a very large search tree against a string that  will  never  match.
       Nested  unlimited repeats in a pattern are a common example. PCRE2 pro-
       vides some protection against  this:  see  the  pcre2_set_match_limit()
-       function in the pcre2api page.
+       function  in  the  pcre2api  page.  There  is a similar function called
+       pcre2_set_depth_limit() that can be used to restrict the amount of mem-
+       ory that is used.


 USER DOCUMENTATION
@ -144,7 +146,7 @@ USER DOCUMENTATION
         pcre2perform       discussion of performance issues
         pcre2posix         the POSIX-compatible C API for the 8-bit library
         pcre2sample        discussion of the pcre2demo program
-         pcre2stack         discussion of stack usage
+         pcre2stack         discussion of stack and memory usage
         pcre2syntax        quick syntax reference
         pcre2test          description of the pcre2test command
         pcre2unicode       discussion of Unicode and UTF support
@ -166,8 +168,8 @@ AUTHOR

 REVISION

-       Last updated: 16 October 2015
-       Copyright (c) 1997-2015 University of Cambridge.
+       Last updated: 27 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@ -2533,9 +2535,10 @@ OBTAINING A TEXTUAL ERROR MESSAGE
       A  text  message  for  an  error code from any PCRE2 function (compile,
       match, or auxiliary) can be obtained  by  calling  pcre2_get_error_mes-
       sage().  The  code  is passed as the first argument, with the remaining
-       two arguments specifying a code unit buffer and its length, into  which
-       the  text  message is placed. Note that the message is returned in code
-       units of the appropriate width for the library that is being used.
+       two arguments specifying a code unit buffer  and  its  length  in  code
+       units,  into  which the text message is placed. The message is returned
+       in code units of the appropriate width for the library  that  is  being
+       used.

       The  returned message is terminated with a trailing zero, and the func-
       tion returns the number of code  units  used,  excluding  the  trailing
@ -3178,8 +3181,8 @@ AUTHOR

 REVISION

-       Last updated: 23 December 2016
-       Copyright (c) 1997-2016 University of Cambridge.
+       Last updated: 21 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@ -5519,19 +5522,24 @@ SPECIAL START-OF-PATTERN ITEMS
       attempt by the application to apply the  JIT  optimization  by  calling
       pcre2_jit_compile() is ignored.

-   Setting match and recursion limits
+   Setting match and backtracking depth limits

-       The  caller of pcre2_match() can set a limit on the number of times the
-       internal match() function is called and on the maximum depth of  recur-
-       sive calls. These facilities are provided to catch runaway matches that
-       are provoked by patterns with huge matching trees (a typical example is
-       a  pattern  with  nested unlimited repeats) and to avoid running out of
-       system stack by too  much  recursion.  When  one  of  these  limits  is
-       reached,  pcre2_match()  gives  an error return. The limits can also be
-       set by items at the start of the pattern of the form
+       The pcre2_match() function contains a counter that is incremented every
+       time it goes round its main loop. The caller of pcre2_match() can set a
+       limit  on  this counter, which therefore limits the amount of computing
+       resource used for a match. The maximum depth of nested backtracking can
+       also  be  limited, and this restricts the amount of heap memory that is
+       used.
+
+       These facilities are provided to catch runaway matches  that  are  pro-
+       voked by patterns with huge matching trees (a typical example is a pat-
+       tern with nested unlimited repeats applied to a long string  that  does
+       not match). When one of these limits is reached, pcre2_match() gives an
+       error return. The limits can also be set by items at the start  of  the
+       pattern of the form

         (*LIMIT_MATCH=d)
-         (*LIMIT_RECURSION=d)
+         (*LIMIT_DEPTH=d)

       where d is any number of decimal digits. However, the value of the set-
       ting must be less than the value set (or defaulted) by  the  caller  of
@ -5540,11 +5548,15 @@ SPECIAL START-OF-PATTERN ITEMS
       If  there  is  more  than one setting of one of these limits, the lower
       value is used.

+       Prior to release 10.30, LIMIT_DEPTH was  called  LIMIT_RECURSION.  This
+       name is still recognized for backwards compatibility.
+
       The  match  limit  is  used  (but in a different way) when JIT is being
       used, but it is not  relevant,  and  is  ignored,  when  matching  with
-       pcre2_dfa_match().   However,  the  recursion limit is relevant for DFA
-       matching, which does use some function recursion,  in  particular,  for
-       recursions within the pattern.
+       pcre2_dfa_match().  However, the depth limit is relevant for DFA match-
+       ing, which uses function recursion for recursions within  the  pattern.
+       In  this case, the depth limit controls the amount of system stack that
+       is used.

   Newline conventions

@ -5579,9 +5591,9 @@ SPECIAL START-OF-PATTERN ITEMS
       acter when PCRE2_DOTALL is not set, and the behaviour of  \N.  However,
       it  does  not  affect  what the \R escape sequence matches. By default,
       this is any Unicode newline sequence, for Perl compatibility.  However,
-       this can be changed; see the description of \R in the section  entitled
-       "Newline  sequences" below. A change of \R setting can be combined with
-       a change of newline convention.
+       this  can be changed; see the next section and the description of \R in
+       the section entitled "Newline sequences" below. A change of \R  setting
+       can be combined with a change of newline convention.

   Specifying what \R matches

@ -5595,7 +5607,7 @@ SPECIAL START-OF-PATTERN ITEMS
 EBCDIC CHARACTER CODES

       PCRE2  can be compiled to run in an environment that uses EBCDIC as its
-       character code rather than ASCII or Unicode (typically a mainframe sys-
+       character code instead of ASCII or Unicode (typically a mainframe  sys-
       tem).  In  the  sections below, character code values are ASCII or Uni-
       code; in an EBCDIC environment these characters may have different code
       values, and there are no code points greater than 255.
@ -5660,8 +5672,8 @@ BACKSLASH
       meaning  that  character  may  have. This use of backslash as an escape
       character applies both inside and outside character classes.

-       For  example,  if  you want to match a * character, you write \* in the
-       pattern.  This escaping action applies whether  or  not  the  following
+       For example, if you want to match a * character, you must write  \*  in
+       the  pattern. This escaping action applies whether or not the following
       character would otherwise be interpreted as a metacharacter, so  it  is
       always  safe  to  precede  a non-alphanumeric with backslash to specify
       that it stands for itself.  In particular, if you want to match a back-
@ -5695,7 +5707,8 @@ BACKSLASH
       is  not followed by \E later in the pattern, the literal interpretation
       continues to the end of the pattern (that is,  \E  is  assumed  at  the
       end).  If  the  isolated \Q is inside a character class, this causes an
-       error, because the character class is not terminated.
+       error, because the character class  is  not  terminated  by  a  closing
+       square bracket.

   Non-printing characters

@ -5810,10 +5823,10 @@ BACKSLASH

       If the PCRE2_ALT_BSUX option is set, the interpretation  of  \x  is  as
       just described only when it is followed by two hexadecimal digits. Oth-
-       erwise, it matches a literal "x" character. In this mode mode,  support
-       for  code points greater than 256 is provided by \u, which must be fol-
-       lowed by four hexadecimal digits; otherwise it matches  a  literal  "u"
-       character.
+       erwise, it matches a literal "x" character. In this mode,  support  for
+       code  points greater than 256 is provided by \u, which must be followed
+       by four hexadecimal digits; otherwise it matches a literal "u"  charac-
+       ter.

       Characters whose value is less than 256 can be defined by either of the
       two syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no dif-
@ -5825,12 +5838,10 @@ BACKSLASH
       Characters that are specified using octal or  hexadecimal  numbers  are
       limited to certain values, as follows:

-         8-bit non-UTF mode    less than 0x100
-         8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint
-         16-bit non-UTF mode   less than 0x10000
-         16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint
-         32-bit non-UTF mode   less than 0x100000000
-         32-bit UTF-32 mode    less than 0x10ffff and a valid codepoint
+         8-bit non-UTF mode    no greater than 0xff
+         16-bit non-UTF mode   no greater than 0xffff
+         32-bit non-UTF mode   no greater than 0xffffffff
+         All UTF modes         no greater than 0x10ffff and a valid codepoint

       Invalid  Unicode  codepoints  are  the  range 0xd800 to 0xdfff (the so-
       called "surrogate" codepoints), and 0xffef.
@ -5852,8 +5863,7 @@ BACKSLASH
       handler and used  to  modify  the  case  of  following  characters.  By
       default, PCRE2 does not support these escape sequences. However, if the
       PCRE2_ALT_BSUX option is set, \U matches a "U" character, and \u can be
-       used  to define a character by code point, as described in the previous
-       section.
+       used to define a character by code point, as described above.

   Absolute and relative back references

@ -6022,7 +6032,10 @@ BACKSLASH
       tional  escape sequences that match characters with specific properties
       are available. In 8-bit non-UTF-8 mode, these sequences are  of  course
       limited  to  testing characters whose codepoints are less than 256, but
-       they do work in this mode.  The extra escape sequences are:
+       they do work in this mode.  In 32-bit non-UTF mode, codepoints  greater
+       than  0x10ffff  (the  Unicode  limit) may be encountered. These are all
+       treated as being in the Common script and with an unassigned type.  The
+       extra escape sequences are:

         \p{xx}   a character with the xx property
         \P{xx}   a character without the xx property
@ -7328,16 +7341,9 @@ ASSERTIONS
       Assertion subpatterns are not capturing subpatterns. If such an  asser-
       tion  contains  capturing  subpatterns within it, these are counted for
       the purposes of numbering the capturing subpatterns in the  whole  pat-
-       tern.  However,  substring  capturing  is carried out only for positive
-       assertions. (Perl sometimes, but not always, does do capturing in nega-
-       tive assertions.)
-
-       WARNING:  If a positive assertion containing one or more capturing sub-
-       patterns succeeds, but failure to match later  in  the  pattern  causes
-       backtracking over this assertion, the captures within the assertion are
-       reset only if no higher numbered captures are  already  set.  This  is,
-       unfortunately,  a fundamental limitation of the current implementation;
-       it may get removed in a future reworking.
+       tern.  However,  substring  capturing  is normally carried out only for
+       positive assertions (but see the discussion of conditional  subpatterns
+       below).

       For   compatibility  with  Perl,  most  assertion  subpatterns  may  be
       repeated; though it makes no sense to assert  the  same  thing  several
@ -7686,6 +7692,12 @@ CONDITIONAL SUBPATTERNS
       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
       letters and dd are digits.

+       For  Perl  compatibility,  if an assertion that is a condition contains
+       capturing subpatterns, any capturing that  occurs  is  retained  after-
+       wards,  for  both positive and negative assertions. (Compare non-condi-
+       tional assertions, when captures are retained only for positive  asser-
+       tions.)
+

 COMMENTS

@ -7849,94 +7861,59 @@ RECURSIVE PATTERNS

   Differences in recursion processing between PCRE2 and Perl

-       Recursion  processing in PCRE2 differs from Perl in two important ways.
-       In PCRE2 (like Python, but unlike Perl), a recursive subpattern call is
-       always treated as an atomic group. That is, once it has matched some of
-       the subject string, it is never re-entered, even if it contains untried
-       alternatives  and  there  is a subsequent matching failure. This can be
-       illustrated by the following pattern, which purports to match a  palin-
-       dromic  string  that contains an odd number of characters (for example,
-       "a", "aba", "abcba", "abcdcba"):
+       Some former differences between PCRE2 and Perl no longer exist.

-         ^(.|(.)(?1)\2)$
+       Before release 10.30, recursion processing in PCRE2 differed from  Perl
+       in  that  a  recursive  subpattern call was always treated as an atomic
+       group. That is, once it had matched some of the subject string, it  was
+       never  re-entered,  even if it contained untried alternatives and there
+       was a subsequent matching failure. (Historical note:  PCRE  implemented
+       recursion before Perl did.)

-       The idea is that it either matches a single character, or two identical
-       characters  surrounding  a sub-palindrome. In Perl, this pattern works;
-       in PCRE2 it does not if the pattern is longer  than  three  characters.
-       Consider the subject string "abcba":
+       Starting  with  release 10.30, recursive subroutine calls are no longer
+       treated as atomic. That is, they can be re-entered to try unused alter-
+       natives  if  there  is a matching failure later in the pattern. This is
+       now compatible with the way Perl works. If you want a  subroutine  call
+       to be atomic, you must explicitly enclose it in an atomic group.

-       At  the  top level, the first character is matched, but as it is not at
-       the end of the string, the first alternative fails; the second alterna-
-       tive is taken and the recursion kicks in. The recursive call to subpat-
-       tern 1 successfully matches the next character ("b").  (Note  that  the
-       beginning and end of line tests are not part of the recursion).
-
-       Back  at  the top level, the next character ("c") is compared with what
-       subpattern 2 matched, which was "a". This fails. Because the  recursion
-       is  treated  as  an atomic group, there are now no backtracking points,
-       and so the entire match fails. (Perl is able, at  this  point,  to  re-
-       enter  the  recursion  and try the second alternative.) However, if the
-       pattern is written with the alternatives in the other order, things are
-       different:
-
-         ^((.)(?1)\2|.)$
-
-       This  time,  the recursing alternative is tried first, and continues to
-       recurse until it runs out of characters, at which point  the  recursion
-       fails.  But  this  time  we  do  have another alternative to try at the
-       higher level. That is the big difference:  in  the  previous  case  the
-       remaining  alternative is at a deeper recursion level, which PCRE2 can-
-       not use.
-
-       To change the pattern so that it matches all palindromic  strings,  not
-       just  those  with an odd number of characters, it is tempting to change
-       the pattern to this:
+       Supporting  backtracking  into  recursions  simplifies certain types of
+       recursive  pattern.  For  example,  this  pattern  matches  palindromic
+       strings:

         ^((.)(?1)\2|.?)$

-       Again, this works in Perl, but not in PCRE2, and for the  same  reason.
-       When  a  deeper  recursion has matched a single character, it cannot be
-       entered again in order to match an empty string.  The  solution  is  to
-       separate  the two cases, and write out the odd and even cases as alter-
-       natives at the higher level:
+       The  second  branch  in the group matches a single central character in
+       the palindrome when there are an odd number of characters,  or  nothing
+       when  there  are  an even number of characters, but in order to work it
+       has to be able to try the second case when  the  rest  of  the  pattern
+       match fails. If you want to match typical palindromic phrases, the pat-
+       tern has to ignore all non-word characters,  which  can  be  done  like
+       this:

-         ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
-
-       If you want to match typical palindromic phrases, the  pattern  has  to
-       ignore all non-word characters, which can be done like this:
-
-         ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
+         ^\W*+((.)\W*+(?1)\W*+\2|\W*+.?)\W*+$

       If  run  with  the  PCRE2_CASELESS option, this pattern matches phrases
-       such as "A man, a plan, a canal: Panama!" and it works  in  both  PCRE2
-       and  Perl.  Note the use of the possessive quantifier *+ to avoid back-
-       tracking into sequences of non-word  characters.  Without  this,  PCRE2
-       takes a great deal longer (ten times or more) to match typical phrases,
-       and Perl takes so long that you think it has gone into a loop.
+       such as "A man, a plan, a canal: Panama!". Note the use of the  posses-
+       sive  quantifier  *+  to  avoid backtracking into sequences of non-word
+       characters. Without this, PCRE2 takes a great deal longer (ten times or
+       more)  to  match typical phrases, and Perl takes so long that you think
+       it has gone into a loop.

-       WARNING: The palindrome-matching patterns above work only if  the  sub-
-       ject  string  does not start with a palindrome that is shorter than the
-       entire string.  For example, although "abcba" is correctly matched,  if
-       the  subject is "ababa", PCRE2 finds the palindrome "aba" at the start,
-       then fails at top level because the end of the string does not  follow.
-       Once  again, it cannot jump back into the recursion to try other alter-
-       natives, so the entire match fails.
-
-       The second way in which PCRE2 and Perl differ in their  recursion  pro-
-       cessing  is in the handling of captured values. In Perl, when a subpat-
-       tern is called recursively or as a subpattern (see the  next  section),
-       it  has  no  access to any values that were captured outside the recur-
-       sion, whereas in PCRE2 these values can be  referenced.  Consider  this
-       pattern:
+       Another way in which PCRE2 and Perl used to differ in  their  recursion
+       processing  is  in  the  handling of captured values. Formerly in Perl,
+       when a subpattern was called recursively or as a  subpattern  (see  the
+       next  section),  it had no access to any values that were captured out-
+       side the recursion, whereas in PCRE2 these values  can  be  referenced.
+       Consider this pattern:

         ^(.)(\1|a(?2))

-       In  PCRE2,  this pattern matches "bab". The first capturing parentheses
-       match "b", then in the second group, when the back reference  \1  fails
-       to  match "b", the second alternative matches "a" and then recurses. In
-       the recursion, \1 does now match "b" and so the whole  match  succeeds.
-       In  Perl,  the pattern fails to match because inside the recursive call
-       \1 cannot access the externally set value.
+       This  pattern matches "bab". The first capturing parentheses match "b",
+       then in the second group, when the back reference  \1  fails  to  match
+       "b",  the  second  alternative  matches  "a"  and then recurses. In the
+       recursion, \1 does now match "b" and so the whole match succeeds.  This
+       match  used  to  fail in Perl, but in later versions (I tried 5.024) it
+       now works.


 SUBPATTERNS AS SUBROUTINES
@ -7964,12 +7941,10 @@ SUBPATTERNS AS SUBROUTINES
       two strings. Another example is  given  in  the  discussion  of  DEFINE
       above.

-       All  subroutine  calls, whether recursive or not, are always treated as
-       atomic groups. That is, once a subroutine has matched some of the  sub-
-       ject string, it is never re-entered, even if it contains untried alter-
-       natives and there is  a  subsequent  matching  failure.  Any  capturing
-       parentheses  that  are  set  during the subroutine call revert to their
-       previous values afterwards.
+       Like  recursions,  subroutine  calls  used to be treated as atomic, but
+       this changed at PCRE2 release 10.30, so  backtracking  into  subroutine
+       calls  can  now  occur. However, any capturing parentheses that are set
+       during the subroutine call revert to their previous values afterwards.

       Processing options such as case-independence are fixed when  a  subpat-
       tern  is defined, so if it is used as a subroutine, such options cannot
@ -8076,17 +8051,11 @@ CALLOUTS

 BACKTRACKING CONTROL

-       Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
-       which are still described in the Perl  documentation  as  "experimental
-       and  subject to change or removal in a future version of Perl". It goes
-       on to say: "Their usage in production code should  be  noted  to  avoid
-       problems during upgrades." The same remarks apply to the PCRE2 features
-       described in this section.
-
-       The new verbs make use of what was previously invalid syntax: an  open-
-       ing parenthesis followed by an asterisk. They are generally of the form
-       (*VERB) or (*VERB:NAME). Some verbs take either form, possibly behaving
-       differently depending on whether or not a name is present.
+       There  are  a  number  of  special "Backtracking Control Verbs" (to use
+       Perl's terminology) that modify the behaviour  of  backtracking  during
+       matching.  They are generally of the form (*VERB) or (*VERB:NAME). Some
+       verbs take either form,  possibly  behaving  differently  depending  on
+       whether or not a name is present.

       By  default,  for  compatibility  with  Perl, a name is any sequence of
       characters that does not include a closing parenthesis. The name is not
@ -8116,7 +8085,7 @@ BACKTRACKING CONTROL

       Since  these  verbs  are  specifically related to backtracking, most of
       them can be used only when the pattern is to be matched using the  tra-
-       ditional matching function, because these use a backtracking algorithm.
+       ditional matching function, because that uses a backtracking algorithm.
       With the exception of (*FAIL), which behaves like  a  failing  negative
       assertion, the backtracking control verbs cause an error if encountered
       by the DFA matching function.
@ -8236,11 +8205,11 @@ BACKTRACKING CONTROL
       tinues with what follows, but if there is no subsequent match,  causing
       a  backtrack  to  the  verb, a failure is forced. That is, backtracking
       cannot pass to the left of the verb. However, when one of  these  verbs
-       appears inside an atomic group (which includes any group that is called
-       as a subroutine) or in an assertion that is true, its  effect  is  con-
-       fined  to that group, because once the group has been matched, there is
-       never any backtracking into it. In this situation, backtracking has  to
-       jump to the left of the entire atomic group or assertion.
+       appears  inside  an  atomic  group or in an assertion that is true, its
+       effect is confined to that group,  because  once  the  group  has  been
+       matched,  there  is  never any backtracking into it. In this situation,
+       backtracking has to jump to the left of  the  entire  atomic  group  or
+       assertion.

       These  verbs  differ  in exactly what kind of failure occurs when back-
       tracking reaches them. The behaviour described below  is  what  happens
@ -8303,11 +8272,10 @@ BACKTRACKING CONTROL
       any  other  way. In an anchored pattern (*PRUNE) has the same effect as
       (*COMMIT).

-       The   behaviour   of   (*PRUNE:NAME)   is   the   not   the   same   as
-       (*MARK:NAME)(*PRUNE).   It  is  like  (*MARK:NAME)  in that the name is
-       remembered for  passing  back  to  the  caller.  However,  (*SKIP:NAME)
-       searches  only  for  names  set  with  (*MARK),  ignoring  those set by
-       (*PRUNE) or (*THEN).
+       The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE).
+       It is like (*MARK:NAME) in that the name is remembered for passing back
+       to the caller. However, (*SKIP:NAME) searches only for names  set  with
+       (*MARK), ignoring those set by (*PRUNE) or (*THEN).

         (*SKIP)

@ -8496,8 +8464,8 @@ AUTHOR

 REVISION

-       Last updated: 27 December 2016
-       Copyright (c) 1997-2016 University of Cambridge.
+       Last updated: 18 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@ -9078,7 +9046,10 @@ SECURITY CONCERNS
       use within individual applications.  As  such,  the  data  supplied  to
       pcre2_serialize_decode()  is expected to be trusted data, not data from
       arbitrary external sources.  There  is  only  some  simple  consistency
-       checking, not complete validation of what is being re-loaded.
+       checking, not complete validation of what is being re-loaded. Corrupted
+       data may cause undefined results. For example, if the length field of a
+       pattern in the serialized data is corrupted, the deserializing code may
+       read beyond the end of the byte stream that is passed to it.


 SAVING COMPILED PATTERNS
@ -9211,8 +9182,8 @@ AUTHOR

 REVISION

-       Last updated: 24 May 2016
-       Copyright (c) 1997-2016 University of Cambridge.
+       Last updated: 21 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
--- a/doc/pcre2_config.3
+++ b/doc/pcre2_config.3
@ -1,4 +1,4 @@
-.TH PCRE2_CONFIG 3 "20 April 2014" "PCRE2 10.0"
+.TH PCRE2_CONFIG 3 "24 March 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -31,10 +31,13 @@ point to a uint32_t integer variable. The available codes are:
  PCRE2_CONFIG_BSR             Indicates what \eR matches by default:
                                 PCRE2_BSR_UNICODE
                                 PCRE2_BSR_ANYCRLF
+  PCRE2_CONFIG_DEPTHLIMIT      Default backtracking depth limit
+.\" JOIN
  PCRE2_CONFIG_JIT             Availability of just-in-time compiler
                                support (1=yes 0=no)
-  PCRE2_CONFIG_JITTARGET       Information about the target archi-
-                                 tecture for the JIT compiler
+.\" JOIN
+  PCRE2_CONFIG_JITTARGET       Information (a string) about the target
+                                 architecture for the JIT compiler
  PCRE2_CONFIG_LINKSIZE        Configured internal link size (2, 3, 4)
  PCRE2_CONFIG_MATCHLIMIT      Default internal resource limit
  PCRE2_CONFIG_NEWLINE         Code for the default newline sequence:
@ -44,9 +47,9 @@ point to a uint32_t integer variable. The available codes are:
                                 PCRE2_NEWLINE_ANY
                                 PCRE2_NEWLINE_ANYCRLF
  PCRE2_CONFIG_PARENSLIMIT     Default parentheses nesting limit
-  PCRE2_CONFIG_RECURSIONLIMIT  Internal recursion depth limit
-  PCRE2_CONFIG_STACKRECURSE    Recursion implementation (1=stack
-                                 0=heap)
+  PCRE2_CONFIG_RECURSIONLIMIT  Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
+  PCRE2_CONFIG_STACKRECURSE    Obsolete: always returns 0
+.\" JOIN
  PCRE2_CONFIG_UNICODE         Availability of Unicode support (1=yes
                                 0=no)
  PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string)
--- a/doc/pcre2_dfa_match.3
+++ b/doc/pcre2_dfa_match.3
@ -1,4 +1,4 @@
-.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23"
+.TH PCRE2_DFA_MATCH 3 "24 March 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -19,8 +19,9 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
 This function matches a compiled regular expression against a given subject
 string, using an alternative matching algorithm that scans the subject string
-just once (\fInot\fP Perl-compatible). (The Perl-compatible matching function
-is \fBpcre2_match()\fP.) The arguments for this function are:
+just once (except when processing lookaround assertions). This function is
+\fInot\fP Perl-compatible (the Perl-compatible matching function is
+\fBpcre2_match()\fP). The arguments for this function are:
 .sp
  \fIcode\fP         Points to the compiled pattern
  \fIsubject\fP      Points to the subject string
@ -33,22 +34,26 @@ is \fBpcre2_match()\fP.) The arguments for this function are:
  \fIwscount\fP      Number of elements in the vector
 .sp
 For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
-up a callout function or specify the recursion limit. The \fIlength\fP and
-\fIstartoffset\fP values are code units, not characters. The options are:
+up a callout function or specify the recursion depth limit. The \fIlength\fP
+and \fIstartoffset\fP values are code units, not characters. The options are:
 .sp
  PCRE2_ANCHORED          Match only at the first position
  PCRE2_NOTBOL            Subject is not the beginning of a line
  PCRE2_NOTEOL            Subject is not the end of a line
  PCRE2_NOTEMPTY          An empty string is not a valid match
+.\" JOIN   
  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject
                           is not a valid match
+.\" JOIN   
  PCRE2_NO_UTF_CHECK      Do not check the subject for UTF
                           validity (only relevant if PCRE2_UTF
                           was set at compile time)
+.\" JOIN   
+  PCRE2_PARTIAL_HARD      Return PCRE2_ERROR_PARTIAL for a partial
+                           match even if there is a full match
+.\" JOIN   
  PCRE2_PARTIAL_SOFT      Return PCRE2_ERROR_PARTIAL for a partial
                           match if no full matches are found
-  PCRE2_PARTIAL_HARD      Return PCRE2_ERROR_PARTIAL for a partial match
-                           even if there is a full match as well
  PCRE2_DFA_RESTART       Restart after a partial match
  PCRE2_DFA_SHORTEST      Return only the shortest match
 .sp
--- a/doc/pcre2_get_error_message.3
+++ b/doc/pcre2_get_error_message.3
@ -1,4 +1,4 @@
-.TH PCRE2_GET_ERROR_MESSAGE 3 "17 June 2016" "PCRE2 10.22"
+.TH PCRE2_GET_ERROR_MESSAGE 3 "24 March 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -22,11 +22,11 @@ errors are negative numbers. The arguments are:
  \fIbuffer\fP      where to put the message
  \fIbufflen\fP     the length of the buffer (code units)
 .sp
-The function returns the length of the message, excluding the trailing zero, or
-the negative error code PCRE2_ERROR_NOMEMORY if the buffer is too small. In
-this case, the returned message is truncated (but still with a trailing zero).
-If \fIerrorcode\fP does not contain a recognized error code number, the
-negative value PCRE2_ERROR_BADDATA is returned.
+The function returns the length of the message in code units, excluding the
+trailing zero, or the negative error code PCRE2_ERROR_NOMEMORY if the buffer is
+too small. In this case, the returned message is truncated (but still with a
+trailing zero). If \fIerrorcode\fP does not contain a recognized error code
+number, the negative value PCRE2_ERROR_BADDATA is returned.
 .P
 There is a complete description of the PCRE2 native API in the
 .\" HREF
--- a/doc/pcre2_jit_stack_create.3
+++ b/doc/pcre2_jit_stack_create.3
@ -1,4 +1,4 @@
-.TH PCRE2_JIT_STACK_CREATE 3 "03 November 2014" "PCRE2 10.00"
+.TH PCRE2_JIT_STACK_CREATE 3 "24 March 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -20,10 +20,9 @@ maximum size to which it is allowed to grow. The final argument is a general
 context, for memory allocation functions, or NULL for standard memory
 allocation. The result can be passed to the JIT run-time code by calling
 \fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
-which can then be processed by \fBpcre2_match()\fP. If the "fast path" JIT
-matcher, \fBpcre2_jit_match()\fP is used, the stack can be passed directly as
-an argument. A maximum stack size of 512K to 1M should be more than enough for
-any pattern. For more details, see the
+which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
+A maximum stack size of 512K to 1M should be more than enough for any pattern.
+For more details, see the
 .\" HREF
 \fBpcre2jit\fP
 .\"
--- a/doc/pcre2_maketables.3
+++ b/doc/pcre2_maketables.3
@ -1,4 +1,4 @@
-.TH PCRE2_MAKETABLES 3 "21 October 2014" "PCRE2 10.00"
+.TH PCRE2_MAKETABLES 3 "24 March 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -12,10 +12,10 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .SH DESCRIPTION
 .rs
 .sp
-This function builds a set of character tables for character values less than
-256. These can be passed to \fBpcre2_compile()\fP in a compile context in order
-to override the internal, built-in tables (which were either defaulted or made
-by \fBpcre2_maketables()\fP when PCRE2 was compiled). See the
+This function builds a set of character tables for character code points that 
+are less than 256. These can be passed to \fBpcre2_compile()\fP in a compile
+context in order to override the internal, built-in tables (which were either
+defaulted or made by \fBpcre2_maketables()\fP when PCRE2 was compiled). See the
 .\" HREF
 \fBpcre2_set_character_tables()\fP
 .\"
--- a/doc/pcre2grep.txt
+++ b/doc/pcre2grep.txt
@ -255,6 +255,9 @@ OPTIONS
                 directory like this is an immediate end-of-file; in others it
                 may provoke an error.

+       --depth-limit=number
+                 See --match-limit below.
+
       -e pattern, --regex=pattern, --regexp=pattern
                 Specify a pattern to be matched. This option can be used mul-
                 tiple times in order to specify several patterns. It can also
@ -477,32 +480,24 @@ OPTIONS
                 no short form for this option.

       --match-limit=number
-                 Processing some regular expression  patterns  can  require  a
-                 very  large amount of memory, leading in some cases to a pro-
-                 gram crash if not enough is available.   Other  patterns  may
-                 take  a  very  long  time to search for all possible matching
-                 strings.  The  pcre2_match()  function  that  is  called   by
-                 pcre2grep  to  do  the  matching  has two parameters that can
-                 limit the resources that it uses.
+                 Processing some regular expression patterns may take  a  very
+                 long time to search for all possible matching strings. Others
+                 may require a very large amount  of  memory.  There  are  two
+                 options that set resource limits for matching.

-                 The  --match-limit  option  provides  a  means  of   limiting
-                 resource usage when processing patterns that are not going to
-                 match, but which have a very large number of possibilities in
-                 their  search  trees.  The  classic example is a pattern that
-                 uses nested unlimited repeats. Internally, PCRE2 uses a func-
-                 tion  called  match()  which  it  calls repeatedly (sometimes
-                 recursively). The limit set by --match-limit  is  imposed  on
-                 the  number  of times this function is called during a match,
-                 which has the effect of limiting the amount  of  backtracking
-                 that can take place.
+                 The --match-limit option provides a means of limiting comput-
+                 ing resource usage when  processing  patterns  that  are  not
+                 going  to match, but which have a very large number of possi-
+                 bilities in their search trees. The classic example is a pat-
+                 tern  that  uses  nested unlimited repeats. Internally, PCRE2
+                 has a counter that is incremented each time around  its  main
+                 processing  loop.  If  the  value  set  by  --match-limit  is
+                 reached, an error occurs.

-                 The --recursion-limit option is similar to --match-limit, but
-                 instead of limiting the total number of times that match() is
-                 called, it limits the depth of recursive calls, which in turn
-                 limits the amount of memory that can be used.  The  recursion
-                 depth  is  a  smaller  number than the total number of calls,
-                 because not all calls to match() are recursive. This limit is
-                 of use only if it is set smaller than --match-limit.
+                 The --depth-limit option limits the  depth  of  nested  back-
+                 tracking  points,  which  in turn limits the amount of memory
+                 that is used. This limit is of use only if it is set  smaller
+                 than --match-limit.

                 There  are no short forms for these options. The default set-
                 tings are specified when the PCRE2 library is compiled,  with
@ -834,9 +829,9 @@ MATCHING ERRORS
       such errors, pcre2grep gives up.

       The --match-limit option of pcre2grep can be used to  set  the  overall
-       resource  limit; there is a second option called --recursion-limit that
-       sets a limit on the amount of memory (usually stack) that is used  (see
-       the discussion of these options above).
+       resource limit; there is a second option called --depth-limit that sets
+       a limit on the amount of memory that is used  (see  the  discussion  of
+       these options above).


 DIAGNOSTICS
@ -862,5 +857,5 @@ AUTHOR

 REVISION

-       Last updated: 31 December 2016
-       Copyright (c) 1997-2016 University of Cambridge.
+       Last updated: 21 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.
--- a/doc/pcre2test.txt
+++ b/doc/pcre2test.txt
@ -91,13 +91,13 @@ INPUT ENCODING
       ter is placed in one 16-bit or 32-bit code unit (in  the  16-bit  case,
       values greater than 0xffff cause an error to occur).

-       UTF-8  is  not  capable of encoding values greater than 0x7fffffff, but
-       such values can be handled by the 32-bit  library.  When  testing  this
-       library  in  non-UTF mode with utf8_input set, if any character is pre-
-       ceded by the byte 0xff (which is an illegal byte in  UTF-8)  0x80000000
-       is added to the character's value. This is the only way of passing such
-       code points in a pattern string. For subject strings, using  an  escape
-       sequence is preferable.
+       UTF-8  (in  its  original definition) is not capable of encoding values
+       greater than 0x7fffffff, but such values can be handled by  the  32-bit
+       library. When testing this library in non-UTF mode with utf8_input set,
+       if any character is preceded by the byte 0xff (which is an illegal byte
+       in  UTF-8)  0x80000000  is  added to the character's value. This is the
+       only way of passing such code points in a pattern string.  For  subject
+       strings, using an escape sequence is preferable.


 COMMAND LINE OPTIONS
@ -544,6 +544,7 @@ PATTERN MODIFIERS
         /B  bincode                   show binary code without lengths
             callout_info              show callout information
             debug                     same as info,fullbincode
+             framesize                 show matching frame size
             fullbincode               show binary code with lengths
         /I  info                      show info about compiled pattern
             hex                       unquoted characters are hexadecimal
@ -624,6 +625,10 @@ PATTERN MODIFIERS
       last  character.  These lines are omitted if no starting or ending code
       units are recorded.

+       The framesize modifier shows the size, in bytes, of the storage  frames
+       used  by  pcre2_match()  for handling backtracking. The size depends on
+       the number of capturing parentheses in the pattern.
+
       The callout_info modifier requests information about all  the  callouts
       in the pattern. A list of them is output at the end of any other infor-
       mation that is requested. For each callout, either its number or string
@ -959,6 +964,7 @@ SUBJECT MODIFIERS
             callout_fail=<n>[:<m>]     control callout failure
             callout_none               do not supply a callout function
             copy=<number or name>      copy captured substring
+             depth_limit=<n>            set a depth limit
             dfa                        use pcre2_dfa_match()
             find_limits                find match and recursion limits
             get=<number or name>       extract captured substring
@ -972,7 +978,7 @@ SUBJECT MODIFIERS
             offset=<n>                 set starting offset
             offset_limit=<n>           set offset limit
             ovector=<n>                set size of output vector
-             recursion_limit=<n>        set a recursion limit
+             recursion_limit=<n>        obsolete synonym for depth_limit
             replace=<string>           specify a replacement string
             startchar                  show startchar when relevant
             startoffset=<n>            same as offset=<n>
@ -1188,32 +1194,31 @@ SUBJECT MODIFIERS
       Providing a stack that is larger than the default 32K is necessary only
       for very complicated patterns.

-   Setting match and recursion limits
+   Setting match and depth limits

-       The match_limit and recursion_limit modifiers set the appropriate  lim-
-       its in the match context. These values are ignored when the find_limits
-       modifier is specified.
+       The match_limit and depth_limit modifiers set the appropriate limits in
+       the  match context. These values are ignored when the find_limits modi-
+       fier is specified.

   Finding minimum limits

       If the find_limits modifier is present, pcre2test  calls  pcre2_match()
       several  times,  setting  different  values  in  the  match context via
-       pcre2_set_match_limit() and pcre2_set_recursion_limit() until it  finds
-       the  minimum values for each parameter that allow pcre2_match() to com-
-       plete without error.
+       pcre2_set_match_limit() and pcre2_set_depth_limit() until it finds  the
+       minimum  values for each parameter that allow pcre2_match() to complete
+       without error.

       If JIT is being used, only the match limit is relevant. If DFA matching
-       is  being used, neither limit is relevant, and this modifier is ignored
-       (with a warning message).
+       is  being  used,  only the depth limit is relevant, but at present this
+       modifier is ignored (with a warning message).

       The match_limit number is a measure of the amount of backtracking  that
       takes  place,  and  learning  the minimum value can be instructive. For
       most simple matches, the number is quite small, but for  patterns  with
       very  large numbers of matching possibilities, it can become large very
-       quickly   with   increasing   length    of    subject    string.    The
-       match_limit_recursion  number  is  a  measure of how much stack (or, if
-       PCRE2 is compiled with NO_RECURSE, how much heap) memory is  needed  to
-       complete the match attempt.
+       quickly with increasing length of subject string. The depth_limit  num-
+       ber  is  a measure of how much memory for recording backtracking points
+       is needed to complete the match attempt.

   Showing MARK names

@ -1314,7 +1319,7 @@ DEFAULT OUTPUT FROM pcre2test
       also output. Here is an example of an interactive pcre2test run.

         $ pcre2test
-         PCRE2 version 9.00 2014-05-10
+         PCRE2 version 10.22 2016-07-29

           re> /^abc(\d+)/
         data> abc123
@ -1614,5 +1619,5 @@ AUTHOR

 REVISION

-       Last updated: 28 December 2016
-       Copyright (c) 1997-2016 University of Cambridge.
+       Last updated: 21 March 2017
+       Copyright (c) 1997-2017 University of Cambridge.