More documentation

2014-10-20 16:48:14 +00:00 · 2014-10-20 16:48:14 +00:00 · 4352f00bb9
parent 0dfe4e5e98
commit 4352f00bb9
11 changed files with 2649 additions and 12 deletions
--- a/Makefile.am
+++ b/Makefile.am
@ -36,6 +36,11 @@ dist_html_DATA = \
  doc/html/pcre2matching.html \
  doc/html/pcre2partial.html \
  doc/html/pcre2pattern.html \
  doc/html/pcre2perform.html \
  doc/html/pcre2posix.html \
  doc/html/pcre2sample.html \
  doc/html/pcre2stack.html \
  doc/html/pcre2syntax.html \
  doc/html/pcre2test.html \
  doc/html/pcre2unicode.html
@ -66,12 +71,7 @@ dist_html_DATA = \
 #  doc/html/pcre2_utf16_to_host_byte_order.html \
 #  doc/html/pcre2_utf32_to_host_byte_order.html \
 #  doc/html/pcre2_version.html \
-#  doc/html/pcre2perform.html \
+#  doc/html/pcre2precompile.html
 #  doc/html/pcre2posix.html \
 #  doc/html/pcre2precompile.html \
 #  doc/html/pcre2sample.html \
 #  doc/html/pcre2stack.html \
 #  doc/html/pcre2syntax.html
 # FIXME
 dist_man_MANS = \
@ -88,6 +88,11 @@ dist_man_MANS = \
  doc/pcre2matching.3 \
  doc/pcre2partial.3 \
  doc/pcre2pattern.3 \
  doc/pcre2perform.3 \
  doc/pcre2posix.3 \
  doc/pcre2sample.3 \
  doc/pcre2stack.3 \
  doc/pcre2syntax.3 \
  doc/pcre2test.1 \
  doc/pcre2unicode.3
@ -120,12 +125,7 @@ dist_man_MANS = \
 #  doc/pcre2_utf16_to_host_byte_order.3 \
 #  doc/pcre2_utf32_to_host_byte_order.3 \
 #  doc/pcre2_version.3 \
-#  doc/pcre2perform.3 \
+#  doc/pcre2precompile.3
 #  doc/pcre2posix.3 \
 #  doc/pcre2precompile.3 \
 #  doc/pcre2sample.3 \
 #  doc/pcre2stack.3 \
 #  doc/pcre2syntax.3
 # The Libtool libraries to install.  We'll add to this later.
--- a/doc/html/pcre2perform.html
+++ b/doc/html/pcre2perform.html
@ -0,0 +1,196 @@
 <html>
 <head>
 <title>pcre2perform specification</title>
 </head>
 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
 <h1>pcre2perform man page</h1>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
 <p>
 This page is part of the PCRE2 HTML documentation. It was generated
 automatically from the original man page. If there is any nonsense in it,
 please consult the man page, in case the conversion went wrong.
 <br>
 <br><b>
 PCRE2 PERFORMANCE
 </b><br>
 <P>
 Two aspects of performance are discussed below: memory usage and processing
 time. The way you express your pattern as a regular expression can affect both
 of them.
 </P>
 <br><b>
 COMPILED PATTERN MEMORY USAGE
 </b><br>
 <P>
 Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
 so that most simple patterns do not use much memory. However, there is one case
 where the memory usage of a compiled pattern can be unexpectedly large. If a
 parenthesized subpattern has a quantifier with a minimum greater than 1 and/or
 a limited maximum, the whole subpattern is repeated in the compiled code. For
 example, the pattern
 <pre>
  (abc|def){2,4}
 </pre>
 is compiled as if it were
 <pre>
  (abc|def)(abc|def)((abc|def)(abc|def)?)?
 </pre>
 (Technical aside: It is done this way so that backtrack points within each of
 the repetitions can be independently maintained.)
 </P>
 <P>
 For regular expressions whose quantifiers use only small numbers, this is not
 usually a problem. However, if the numbers are large, and particularly if such
 repetitions are nested, the memory usage can become an embarrassment. For
 example, the very simple pattern
 <pre>
  ((ab){1,1000}c){1,3}
 </pre>
 uses 51K bytes when compiled using the 8-bit library. When PCRE2 is compiled
 with its default internal pointer size of two bytes, the size limit on a
 compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and this
 is reached with the above pattern if the outer repetition is increased from 3
 to 4. PCRE2 can be compiled to use larger internal pointers and thus handle
 larger compiled patterns, but it is better to try to rewrite your pattern to
 use less memory if you can.
 </P>
 <P>
 One way of reducing the memory usage for such patterns is to make use of
 PCRE2's
 <a href="pcre2pattern.html#subpatternsassubroutines">"subroutine"</a>
 facility. Re-writing the above pattern as
 <pre>
  ((ab)(?2){0,999}c)(?1){0,2}
 </pre>
 reduces the memory requirements to 18K, and indeed it remains under 20K even
 with the outer repetition increased to 100. However, this pattern is not
 exactly equivalent, because the "subroutine" calls are treated as
 <a href="pcre2pattern.html#atomicgroup">atomic groups</a>
 into which there can be no backtracking if there is a subsequent matching
 failure. Therefore, PCRE2 cannot do this kind of rewriting automatically.
 Furthermore, there is a noticeable loss of speed when executing the modified
 pattern. Nevertheless, if the atomic grouping is not a problem and the loss of
 speed is acceptable, this kind of rewriting will allow you to process patterns
 that PCRE2 cannot otherwise handle.
 </P>
 <br><b>
 STACK USAGE AT RUN TIME
 </b><br>
 <P>
 When <b>pcre2_match()</b> is used for matching, certain kinds of pattern can
 cause it to use large amounts of the process stack. In some environments the
 default process stack is quite small, and if it runs out the result is often
 SIGSEGV. Rewriting your pattern can often help. The
 <a href="pcre2stack.html"><b>pcre2stack</b></a>
 documentation discusses this issue in detail.
 </P>
 <br><b>
 PROCESSING TIME
 </b><br>
 <P>
 Certain items in regular expression patterns are processed more efficiently
 than others. It is more efficient to use a character class like [aeiou] than a
 set of single-character alternatives such as (a|e|i|o|u). In general, the
 simplest construction that provides the required behaviour is usually the most
 efficient. Jeffrey Friedl's book contains a lot of useful general discussion
 about optimizing regular expressions for efficient performance. This document
 contains a few observations about PCRE2.
 </P>
 <P>
 Using Unicode character properties (the \p, \P, and \X escapes) is slow,
 because PCRE2 has to use a multi-stage table lookup whenever it needs a
 character's property. If you can find an alternative pattern that does not use
 character properties, it will probably be faster.
 </P>
 <P>
 By default, the escape sequences \b, \d, \s, and \w, and the POSIX
 character classes such as [:alpha:] do not use Unicode properties, partly for
 backwards compatibility, and partly for performance reasons. However, you can
 set the PCRE2_UCP option or start the pattern with (*UCP) if you want Unicode
 character properties to be used. This can double the matching time for items
 such as \d, when matched with <b>pcre2_match()</b>; the performance loss is
 less with a DFA matching function, and in both cases there is not much
 difference for \b.
 </P>
 <P>
 When a pattern begins with .* not in parentheses, or in parentheses that are
 not the subject of a backreference, and the PCRE2_DOTALL option is set, the
 pattern is implicitly anchored by PCRE2, since it can match only at the start
 of a subject string. However, if PCRE2_DOTALL is not set, PCRE2 cannot make
 this optimization, because the dot metacharacter does not then match a newline,
 and if the subject string contains newlines, the pattern may match from the
 character immediately following one of them instead of from the very start. For
 example, the pattern
 <pre>
  .*second
 </pre>
 matches the subject "first\nand second" (where \n stands for a newline
 character), with the match starting at the seventh character. In order to do
 this, PCRE2 has to retry the match starting after every newline in the subject.
 </P>
 <P>
 If you are using such a pattern with subject strings that do not contain
 newlines, the best performance is obtained by setting PCRE2_DOTALL, or starting
 the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE2
 from having to scan along the subject looking for a newline to restart at.
 </P>
 <P>
 Beware of patterns that contain nested indefinite repeats. These can take a
 long time to run when applied to a string that does not match. Consider the
 pattern fragment
 <pre>
  ^(a+)*
 </pre>
 This can match "aaaa" in 16 different ways, and this number increases very
 rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
 times, and for each of those cases other than 0 or 4, the + repeats can match
 different numbers of times.) When the remainder of the pattern is such that the
 entire match is going to fail, PCRE2 has in principle to try every possible
 variation, and this can take an extremely long time, even for relatively short
 strings.
 </P>
 <P>
 An optimization catches some of the more simple cases such as
 <pre>
  (a+)*b
 </pre>
 where a literal character follows. Before embarking on the standard matching
 procedure, PCRE2 checks that there is a "b" later in the subject string, and if
 there is not, it fails the match immediately. However, when there is no
 following literal this optimization cannot be used. You can see the difference
 by comparing the behaviour of
 <pre>
  (a+)*\d
 </pre>
 with the pattern above. The former gives a failure almost instantly when
 applied to a whole line of "a" characters, whereas the latter takes an
 appreciable time with strings longer than about 20 characters.
 </P>
 <P>
 In many cases, the solution to this kind of performance issue is to use an
 atomic group or a possessive quantifier.
 </P>
 <br><b>
 AUTHOR
 </b><br>
 <P>
 Philip Hazel
 <br>
 University Computing Service
 <br>
 Cambridge CB2 3QH, England.
 <br>
 </P>
 <br><b>
 REVISION
 </b><br>
 <P>
 Last updated: 20 October 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
--- a/doc/html/pcre2posix.html
+++ b/doc/html/pcre2posix.html
@ -0,0 +1,292 @@
 <html>
 <head>
 <title>pcre2posix specification</title>
 </head>
 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
 <h1>pcre2posix man page</h1>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
 <p>
 This page is part of the PCRE2 HTML documentation. It was generated
 automatically from the original man page. If there is any nonsense in it,
 please consult the man page, in case the conversion went wrong.
 <br>
 <ul>
 <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
 <li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
 <li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
 <li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
 <li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
 <li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
 <li><a name="TOC8" href="#SEC8">AUTHOR</a>
 <li><a name="TOC9" href="#SEC9">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <P>
 <b>#include &#60;pcre2posix.h&#62;</b>
 </P>
 <P>
 <b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
 <b>     int <i>cflags</i>);</b>
 <br>
 <br>
 <b>int regexec(const regex_t *<i>preg</i>, const char *<i>string</i>,</b>
 <b>     size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
 <br>
 <br>
 <b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
 <b>     char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
 <br>
 <br>
 <b>void regfree(regex_t *<i>preg</i>);</b>
 </P>
 <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
 <P>
 This set of functions provides a POSIX-style API for the PCRE2 regular
 expression 8-bit library. See the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation for a description of PCRE2's native API, which contains much
 additional functionality. There is no POSIX-style wrapper for PCRE2's 16-bit
 and 32-bit libraries.
 </P>
 <P>
 The functions described here are just wrapper functions that ultimately call
 the PCRE2 native API. Their prototypes are defined in the <b>pcre2posix.h</b>
 header file, and on Unix systems the library itself is called
 <b>libpcre2-posix.a</b>, so can be accessed by adding <b>-lpcre2-posix</b> to the
 command for linking an application that uses them. Because the POSIX functions
 call the native ones, it is also necessary to add <b>-lpcre2-8</b>.
 </P>
 <P>
 Those POSIX option bits that can reasonably be mapped to PCRE2 native options
 have been implemented. In addition, the option REG_EXTENDED is defined with the
 value zero. This has no effect, but since programs that are written to the
 POSIX interface often use it, this makes it easier to slot in PCRE2 as a
 replacement library. Other POSIX options are not even defined.
 </P>
 <P>
 There are also some other options that are not defined by POSIX. These have
 been added at the request of users who want to make use of certain
 PCRE2-specific features via the POSIX calling interface.
 </P>
 <P>
 When PCRE2 is called via these functions, it is only the API that is POSIX-like
 in style. The syntax and semantics of the regular expressions themselves are
 still those of Perl, subject to the setting of various PCRE2 options, as
 described below. "POSIX-like in style" means that the API approximates to the
 POSIX definition; it is not fully POSIX-compatible, and in multi-unit encoding
 domains it is probably even less compatible.
 </P>
 <P>
 The header for these functions is supplied as <b>pcre2posix.h</b> to avoid any
 potential clash with other POSIX libraries. It can, of course, be renamed or
 aliased as <b>regex.h</b>, which is the "correct" name. It provides two
 structure types, <i>regex_t</i> for compiled internal forms, and
 <i>regmatch_t</i> for returning captured substrings. It also defines some
 constants whose names start with "REG_"; these are used for setting options and
 identifying error codes.
 </P>
 <br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
 <P>
 The function <b>regcomp()</b> is called to compile a pattern into an
 internal form. The pattern is a C string terminated by a binary zero, and
 is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
 to a <b>regex_t</b> structure that is used as a base for storing information
 about the compiled regular expression.
 </P>
 <P>
 The argument <i>cflags</i> is either zero, or contains one or more of the bits
 defined by the following macros:
 <pre>
  REG_DOTALL
 </pre>
 The PCRE2_DOTALL option is set when the regular expression is passed for
 compilation to the native function. Note that REG_DOTALL is not part of the
 POSIX standard.
 <pre>
  REG_ICASE
 </pre>
 The PCRE2_CASELESS option is set when the regular expression is passed for
 compilation to the native function.
 <pre>
  REG_NEWLINE
 </pre>
 The PCRE2_MULTILINE option is set when the regular expression is passed for
 compilation to the native function. Note that this does <i>not</i> mimic the
 defined POSIX behaviour for REG_NEWLINE (see the following section).
 <pre>
  REG_NOSUB
 </pre>
 The PCRE2_NO_AUTO_CAPTURE option is set when the regular expression is passed
 for compilation to the native function. In addition, when a pattern that is
 compiled with this flag is passed to <b>regexec()</b> for matching, the
 <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
 are returned.
 <pre>
  REG_UCP
 </pre>
 The PCRE2_UCP option is set when the regular expression is passed for
 compilation to the native function. This causes PCRE2 to use Unicode properties
 when matchine \d, \w, etc., instead of just recognizing ASCII values. Note
 that REG_UCP is not part of the POSIX standard.
 <pre>
  REG_UNGREEDY
 </pre>
 The PCRE2_UNGREEDY option is set when the regular expression is passed for
 compilation to the native function. Note that REG_UNGREEDY is not part of the
 POSIX standard.
 <pre>
  REG_UTF
 </pre>
 The PCRE2_UTF option is set when the regular expression is passed for
 compilation to the native function. This causes the pattern itself and all data
 strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF
 is not part of the POSIX standard.
 </P>
 <P>
 In the absence of these flags, no options are passed to the native function.
 This means the the regex is compiled with PCRE2 default semantics. In
 particular, the way it handles newline characters in the subject string is the
 Perl way, not the POSIX way. Note that setting PCRE2_MULTILINE has only
 <i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
 newlines are matched by the dot metacharacter (they are not) or by a negative
 class such as [^a] (they are).
 </P>
 <P>
 The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
 <i>preg</i> structure is filled in on success, and one member of the structure
 is public: <i>re_nsub</i> contains the number of capturing subpatterns in
 the regular expression. Various error codes are defined in the header file.
 </P>
 <P>
 NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
 use the contents of the <i>preg</i> structure. If, for example, you pass it to
 <b>regexec()</b>, the result is undefined and your program is likely to crash.
 </P>
 <br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
 <P>
 This area is not simple, because POSIX and Perl take different views of things.
 It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was
 never intended to be a POSIX engine. The following table lists the different
 possibilities for matching newline characters in PCRE2:
 <pre>
                          Default   Change with
  . matches newline          no     PCRE2_DOTALL
  newline matches [^a]       yes    not changeable
  $ matches \n at end        yes    PCRE2_DOLLAR_ENDONLY
  $ matches \n in middle     no     PCRE2_MULTILINE
  ^ matches \n in middle     no     PCRE2_MULTILINE
 </pre>
 This is the equivalent table for POSIX:
 <pre>
                          Default   Change with
  . matches newline          yes    REG_NEWLINE
  newline matches [^a]       yes    REG_NEWLINE
  $ matches \n at end        no     REG_NEWLINE
  $ matches \n in middle     no     REG_NEWLINE
  ^ matches \n in middle     no     REG_NEWLINE
 </pre>
 PCRE2's behaviour is the same as Perl's, except that there is no equivalent for
 PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there is no way to stop
 newline from matching [^a].
 </P>
 <P>
 The default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
 PCRE2_DOLLAR_ENDONLY, but there is no way to make PCRE2 behave exactly as for
 the REG_NEWLINE action.
 </P>
 <br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
 <P>
 The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
 against a given <i>string</i>, which is by default terminated by a zero byte
 (but see REG_STARTEND below), subject to the options in <i>eflags</i>. These can
 be:
 <pre>
  REG_NOTBOL
 </pre>
 The PCRE2_NOTBOL option is set when calling the underlying PCRE2 matching
 function.
 <pre>
  REG_NOTEMPTY
 </pre>
 The PCRE2_NOTEMPTY option is set when calling the underlying PCRE2 matching
 function. Note that REG_NOTEMPTY is not part of the POSIX standard. However,
 setting this option can give more POSIX-like behaviour in some situations.
 <pre>
  REG_NOTEOL
 </pre>
 The PCRE2_NOTEOL option is set when calling the underlying PCRE2 matching
 function.
 <pre>
  REG_STARTEND
 </pre>
 The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and
 to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i>
 (there need not actually be a NUL at that location), regardless of the value of
 <i>nmatch</i>. This is a BSD extension, compatible with but not specified by
 IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
 intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does
 not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
 how it is matched.
 </P>
 <P>
 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
 strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
 <b>regexec()</b> are ignored.
 </P>
 <P>
 If the value of <i>nmatch</i> is zero, or if the value <i>pmatch</i> is NULL,
 no data about any matched strings is returned.
 </P>
 <P>
 Otherwise,the portion of the string that was matched, and also any captured
 substrings, are returned via the <i>pmatch</i> argument, which points to an
 array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
 members <i>rm_so</i> and <i>rm_eo</i>. These contain the byte offset to the first
 character of each substring and the offset to the first character after the end
 of each substring, respectively. The 0th element of the vector relates to the
 entire portion of <i>string</i> that was matched; subsequent elements relate to
 the capturing subpatterns of the regular expression. Unused entries in the
 array have both structure members set to -1.
 </P>
 <P>
 A successful match yields a zero return; various error codes are defined in the
 header file, of which REG_NOMATCH is the "expected" failure code.
 </P>
 <br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
 <P>
 The <b>regerror()</b> function maps a non-zero errorcode from either
 <b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
 NULL, the error should have arisen from the use of that structure. A message
 terminated by a binary zero is placed in <i>errbuf</i>. The length of the
 message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
 function is the size of buffer needed to hold the whole message.
 </P>
 <br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
 <P>
 Compiling a regular expression causes memory to be allocated and associated
 with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
 memory, after which <i>preg</i> may no longer be used as a compiled expression.
 </P>
 <br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
 University Computing Service
 <br>
 Cambridge CB2 3QH, England.
 <br>
 </P>
 <br><a name="SEC9" href="#TOC1">REVISION</a><br>
 <P>
 Last updated: 20 October 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
--- a/doc/html/pcre2sample.html
+++ b/doc/html/pcre2sample.html
@ -0,0 +1,106 @@
 <html>
 <head>
 <title>pcre2sample specification</title>
 </head>
 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
 <h1>pcre2sample man page</h1>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
 <p>
 This page is part of the PCRE2 HTML documentation. It was generated
 automatically from the original man page. If there is any nonsense in it,
 please consult the man page, in case the conversion went wrong.
 <br>
 <br><b>
 PCRE2 SAMPLE PROGRAM
 </b><br>
 <P>
 A simple, complete demonstration program to get you started with using PCRE2 is
 supplied in the file <i>pcre2demo.c</i> in the <b>src</b> directory in the PCRE2
 distribution. A listing of this program is given in the
 <a href="pcre2demo.html"><b>pcre2demo</b></a>
 documentation. If you do not have a copy of the PCRE2 distribution, you can
 save this listing to re-create the contents of <i>pcre2demo.c</i>.
 </P>
 <P>
 The demonstration program, which uses the PCRE2 8-bit library, compiles the
 regular expression that is its first argument, and matches it against the
 subject string in its second argument. No PCRE2 options are set, and default
 character tables are used. If matching succeeds, the program outputs the
 portion of the subject that matched, together with the contents of any captured
 substrings.
 </P>
 <P>
 If the -g option is given on the command line, the program then goes on to
 check for further matches of the same regular expression in the same subject
 string. The logic is a little bit tricky because of the possibility of matching
 an empty string. Comments in the code explain what is going on.
 </P>
 <P>
 If PCRE2 is installed in the standard include and library directories for your
 operating system, you should be able to compile the demonstration program using
 this command:
 <pre>
  gcc -o pcre2demo pcre2demo.c -lpcre2-8
 </pre>
 If PCRE2 is installed elsewhere, you may need to add additional options to the
 command line. For example, on a Unix-like system that has PCRE2 installed in
 <i>/usr/local</i>, you can compile the demonstration program using a command
 like this:
 <pre>
  gcc -o pcre2demo -I/usr/local/include pcre2demo.c -L/usr/local/lib -lpcre2-8
 </PRE>
 </P>
 <P>
 Once you have compiled and linked the demonstration program, you can run simple
 tests like this:
 <pre>
  ./pcre2demo 'cat|dog' 'the cat sat on the mat'
  ./pcre2demo -g 'cat|dog' 'the dog sat on the cat'
 </pre>
 Note that there is a much more comprehensive test program, called
 <a href="pcre2test.html"><b>pcre2test</b>,</a>
 which supports many more facilities for testing regular expressions using the
 PCRE2 libraries. The
 <a href="pcre2demo.html"><b>pcre2demo</b></a>
 program is provided as a simple coding example.
 </P>
 <P>
 If you try to run
 <a href="pcre2demo.html"><b>pcre2demo</b></a>
 when PCRE2 is not installed in the standard library directory, you may get an
 error like this on some operating systems (e.g. Solaris):
 <pre>
  ld.so.1: a.out: fatal: libpcre2.so.0: open failed: No such file or directory
 </pre>
 This is caused by the way shared library support works on those systems. You
 need to add
 <pre>
  -R/usr/local/lib
 </pre>
 (for example) to the compile command to get round this problem.
 </P>
 <br><b>
 AUTHOR
 </b><br>
 <P>
 Philip Hazel
 <br>
 University Computing Service
 <br>
 Cambridge CB2 3QH, England.
 <br>
 </P>
 <br><b>
 REVISION
 </b><br>
 <P>
 Last updated: 20 October 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
--- a/doc/html/pcre2stack.html
+++ b/doc/html/pcre2stack.html
@ -0,0 +1,203 @@
 <html>
 <head>
 <title>pcre2stack specification</title>
 </head>
 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
 <h1>pcre2stack man page</h1>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
 <p>
 This page is part of the PCRE2 HTML documentation. It was generated
 automatically from the original man page. If there is any nonsense in it,
 please consult the man page, in case the conversion went wrong.
 <br>
 <br><b>
 PCRE2 DISCUSSION OF STACK USAGE
 </b><br>
 <P>
 When you call <b>pcre2_match()</b>, it makes use of an internal function called
 <b>match()</b>. This calls itself recursively at branch points in the pattern,
 in order to remember the state of the match so that it can back up and try a
 different alternative after a failure. As matching proceeds deeper and deeper
 into the tree of possibilities, the recursion depth increases. The
 <b>match()</b> function is also called in other circumstances, for example,
 whenever a parenthesized sub-pattern is entered, and in certain cases of
 repetition.
 </P>
 <P>
 Not all calls of <b>match()</b> increase the recursion depth; for an item such
 as a* it may be called several times at the same level, after matching
 different numbers of a's. Furthermore, in a number of cases where the result of
 the recursive call would immediately be passed back as the result of the
 current call (a "tail recursion"), the function is just restarted instead.
 </P>
 <P>
 The above comments apply when <b>pcre2_match()</b> is run in its normal
 interpretive manner. If the compiled pattern was processed by
 <b>pcre2_jit_compile()</b>, and just-in-time compiling was successful, and the
 options passed to <b>pcre2_match()</b> were not incompatible, the matching
 process uses the JIT-compiled code instead of the <b>match()</b> function. In
 this case, the memory requirements are handled entirely differently. See the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 documentation for details.
 </P>
 <P>
 The <b>pcre2_dfa_match()</b> function operates in a different way to
 <b>pcre2_match()</b>, and uses recursion only when there is a regular expression
 recursion or subroutine call in the pattern. This includes the processing of
 assertion and "once-only" subpatterns, which are handled like subroutine calls.
 Normally, these are never very deep, and the limit on the complexity of
 <b>pcre2_dfa_match()</b> is controlled by the amount of workspace it is given.
 However, it is possible to write patterns with runaway infinite recursions;
 such patterns will cause <b>pcre2_dfa_match()</b> to run out of stack. At
 present, there is no protection against this.
 </P>
 <P>
 The comments that follow do NOT apply to <b>pcre2_dfa_match()</b>; they are
 relevant only for <b>pcre2_match()</b> without the JIT optimization.
 </P>
 <br><b>
 Reducing <b>pcre2_match()</b>'s stack usage
 </b><br>
 <P>
 Each time that the internal <b>match()</b> function is called recursively, it
 uses memory from the process stack. For certain kinds of pattern and data, very
 large amounts of stack may be needed, despite the recognition of "tail
 recursion". You can often reduce the amount of recursion, and therefore the
 amount of stack used, by modifying the pattern that is being matched. Consider,
 for example, this pattern:
 <pre>
  ([^&#60;]|&#60;(?!inet))+
 </pre>
 It matches from wherever it starts until it encounters "&#60;inet" or the end of
 the data, and is the kind of pattern that might be used when processing an XML
 file. Each iteration of the outer parentheses matches either one character that
 is not "&#60;" or a "&#60;" that is not followed by "inet". However, each time a
 parenthesis is processed, a recursion occurs, so this formulation uses a stack
 frame for each matched character. For a long string, a lot of stack is
 required. Consider now this rewritten pattern, which matches exactly the same
 strings:
 <pre>
  ([^&#60;]++|&#60;(?!inet))+
 </pre>
 This uses very much less stack, because runs of characters that do not contain
 "&#60;" are "swallowed" in one item inside the parentheses. Recursion happens only
 when a "&#60;" character that is not followed by "inet" is encountered (and we
 assume this is relatively rare). A possessive quantifier is used to stop any
 backtracking into the runs of non-"&#60;" characters, but that is not related to
 stack usage.
 </P>
 <P>
 This example shows that one way of avoiding stack problems when matching long
 subject strings is to write repeated parenthesized subpatterns to match more
 than one character whenever possible.
 </P>
 <br><b>
 Compiling PCRE2 to use heap instead of stack for <b>pcre2_match()</b>
 </b><br>
 <P>
 In environments where stack memory is constrained, you might want to compile
 PCRE2 to use heap memory instead of stack for remembering back-up points when
 <b>pcre2_match()</b> is running. This makes it run more slowly, however. Details
 of how to do this are given in the
 <a href="pcre2build.html"><b>pcre2build</b></a>
 documentation. When built in this way, instead of using the stack, PCRE2
 gets memory for remembering backup points from the heap. By default, the memory 
 is obtained by calling the system <b>malloc()</b> function, but you can arrange 
 to supply your own memory management function. For details, see the section 
 entitled 
 <a href="pcre2api.html#matchcontext">"The match context"</a>
 in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation. Since the block sizes are always the same, it may be possible to
 implement customized a memory handler that is more efficient than the standard
 function. The memory blocks obtained for this purpose are retained and re-used  
 if possible while <b>pcre2_match()</b> is running. They are all freed just 
 before it exits.
 </P>
 <br><b>
 Limiting <b>pcre2_match()</b>'s stack usage
 </b><br>
 <P>
 You can set limits on the number of times the internal <b>match()</b> function
 is called, both in total and recursively. If a limit is exceeded,
 <b>pcre2_match()</b> returns an error code. Setting suitable limits should
 prevent it from running out of stack. The default values of the limits are very
 large, and unlikely ever to operate. They can be changed when PCRE2 is built,
 and they can also be set when <b>pcre2_match()</b> is called. For details of
 these interfaces, see the
 <a href="pcre2build.html"><b>pcre2build</b></a>
 documentation and the section entitled
 <a href="pcre2api.html#matchcontext">"The match context"</a>
 in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation.
 </P>
 <P>
 As a very rough rule of thumb, you should reckon on about 500 bytes per
 recursion. Thus, if you want to limit your stack usage to 8Mb, you should set
 the limit at 16000 recursions. A 64Mb stack, on the other hand, can support
 around 128000 recursions.
 </P>
 <P>
 The <b>pcre2test</b> test program has a modifier called "find_limits" which, if
 applied to a subject line, causes it to find the smallest limits that allow a a
 pattern to match. This is done by calling <b>pcre2_match()</b> repeatedly with
 different limits.
 </P>
 <br><b>
 Changing stack size in Unix-like systems
 </b><br>
 <P>
 In Unix-like environments, there is not often a problem with the stack unless
 very long strings are involved, though the default limit on stack size varies
 from system to system. Values from 8Mb to 64Mb are common. You can find your
 default limit by running the command:
 <pre>
  ulimit -s
 </pre>
 Unfortunately, the effect of running out of stack is often SIGSEGV, though
 sometimes a more explicit error message is given. You can normally increase the
 limit on stack size by code such as this:
 <pre>
  struct rlimit rlim;
  getrlimit(RLIMIT_STACK, &rlim);
  rlim.rlim_cur = 100*1024*1024;
  setrlimit(RLIMIT_STACK, &rlim);
 </pre>
 This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
 attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
 do this before calling <b>pcre2_match()</b>.
 </P>
 <br><b>
 Changing stack size in Mac OS X
 </b><br>
 <P>
 Using <b>setrlimit()</b>, as described above, should also work on Mac OS X. It
 is also possible to set a stack size when linking a program. There is a
 discussion about stack sizes in Mac OS X at this web site:
 <a href="http://developer.apple.com/qa/qa2005/qa1419.html">http://developer.apple.com/qa/qa2005/qa1419.html.</a>
 </P>
 <br><b>
 AUTHOR
 </b><br>
 <P>
 Philip Hazel
 <br>
 University Computing Service
 <br>
 Cambridge CB2 3QH, England.
 <br>
 </P>
 <br><b>
 REVISION
 </b><br>
 <P>
 Last updated: 20 October 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -0,0 +1,561 @@
 <html>
 <head>
 <title>pcre2syntax specification</title>
 </head>
 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
 <h1>pcre2syntax man page</h1>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
 <p>
 This page is part of the PCRE2 HTML documentation. It was generated
 automatically from the original man page. If there is any nonsense in it,
 please consult the man page, in case the conversion went wrong.
 <br>
 <ul>
 <li><a name="TOC1" href="#SEC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a>
 <li><a name="TOC2" href="#SEC2">QUOTING</a>
 <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
 <li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
 <li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
 <li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
 <li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
 <li><a name="TOC12" href="#SEC12">ALTERNATION</a>
 <li><a name="TOC13" href="#SEC13">CAPTURING</a>
 <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
 <li><a name="TOC15" href="#SEC15">COMMENT</a>
 <li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
 <li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
 <li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
 <li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
 <li><a name="TOC20" href="#SEC20">BACKREFERENCES</a>
 <li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
 <li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a>
 <li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a>
 <li><a name="TOC24" href="#SEC24">CALLOUTS</a>
 <li><a name="TOC25" href="#SEC25">SEE ALSO</a>
 <li><a name="TOC26" href="#SEC26">AUTHOR</a>
 <li><a name="TOC27" href="#SEC27">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 <P>
 The full syntax and semantics of the regular expressions that are supported by
 PCRE2 are described in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation. This document contains a quick-reference summary of the syntax.
 </P>
 <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
 <P>
 <pre>
  \x         where x is non-alphanumeric is a literal x
  \Q...\E    treat enclosed characters as literal
 </PRE>
 </P>
 <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
 <P>
 <pre>
  \a         alarm, that is, the BEL character (hex 07)
  \cx        "control-x", where x is any ASCII character
  \e         escape (hex 1B)
  \f         form feed (hex 0C)
  \n         newline (hex 0A)
  \r         carriage return (hex 0D)
  \t         tab (hex 09)
  \0dd       character with octal code 0dd
  \ddd       character with octal code ddd, or backreference
  \o{ddd..}  character with octal code ddd..
  \xhh       character with hex code hh
  \x{hhh..}  character with hex code hhh..
 </pre>
 Note that \0dd is always an octal code, and that \8 and \9 are the literal
 characters "8" and "9".
 </P>
 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
 <P>
 <pre>
  .          any character except newline;
               in dotall mode, any character whatsoever
  \C         one data unit, even in UTF mode (best avoided)
  \d         a decimal digit
  \D         a character that is not a decimal digit
  \h         a horizontal white space character
  \H         a character that is not a horizontal white space character
  \N         a character that is not a newline
  \p{<i>xx</i>}     a character with the <i>xx</i> property
  \P{<i>xx</i>}     a character without the <i>xx</i> property
  \R         a newline sequence
  \s         a white space character
  \S         a character that is not a white space character
  \v         a vertical white space character
  \V         a character that is not a vertical white space character
  \w         a "word" character
  \W         a "non-word" character
  \X         a Unicode extended grapheme cluster
 </pre>
 By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
 or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
 happening, \s and \w may also match characters with code points in the range
 128-255. If the PCRE2_UCP option is set, the behaviour of these escape
 sequences is changed to use Unicode properties and they match many more
 characters.
 </P>
 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
 <P>
 <pre>
  C          Other
  Cc         Control
  Cf         Format
  Cn         Unassigned
  Co         Private use
  Cs         Surrogate
  L          Letter
  Ll         Lower case letter
  Lm         Modifier letter
  Lo         Other letter
  Lt         Title case letter
  Lu         Upper case letter
  L&         Ll, Lu, or Lt
  M          Mark
  Mc         Spacing mark
  Me         Enclosing mark
  Mn         Non-spacing mark
  N          Number
  Nd         Decimal number
  Nl         Letter number
  No         Other number
  P          Punctuation
  Pc         Connector punctuation
  Pd         Dash punctuation
  Pe         Close punctuation
  Pf         Final punctuation
  Pi         Initial punctuation
  Po         Other punctuation
  Ps         Open punctuation
  S          Symbol
  Sc         Currency symbol
  Sk         Modifier symbol
  Sm         Mathematical symbol
  So         Other symbol
  Z          Separator
  Zl         Line separator
  Zp         Paragraph separator
  Zs         Space separator
 </PRE>
 </P>
 <br><a name="SEC6" href="#TOC1">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
 <P>
 <pre>
  Xan        Alphanumeric: union of properties L and N
  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
  Xuc        Univerally-named character: one that can be
               represented by a Universal Character Name
  Xwd        Perl word: property Xan or underscore
 </pre>
 Perl and POSIX space are now the same. Perl added VT to its space character set
 at release 5.18.
 </P>
 <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
 <P>
 Arabic,
 Armenian,
 Avestan,
 Balinese,
 Bamum,
 Bassa_Vah,
 Batak,
 Bengali,
 Bopomofo,
 Brahmi,
 Braille,
 Buginese,
 Buhid,
 Canadian_Aboriginal,
 Carian,
 Caucasian_Albanian,
 Chakma,
 Cham,
 Cherokee,
 Common,
 Coptic,
 Cuneiform,
 Cypriot,
 Cyrillic,
 Deseret,
 Devanagari,
 Duployan,
 Egyptian_Hieroglyphs,
 Elbasan,
 Ethiopic,
 Georgian,
 Glagolitic,
 Gothic,
 Grantha,
 Greek,
 Gujarati,
 Gurmukhi,
 Han,
 Hangul,
 Hanunoo,
 Hebrew,
 Hiragana,
 Imperial_Aramaic,
 Inherited,
 Inscriptional_Pahlavi,
 Inscriptional_Parthian,
 Javanese,
 Kaithi,
 Kannada,
 Katakana,
 Kayah_Li,
 Kharoshthi,
 Khmer,
 Khojki,
 Khudawadi,
 Lao,
 Latin,
 Lepcha,
 Limbu,
 Linear_A,
 Linear_B,
 Lisu,
 Lycian,
 Lydian,
 Mahajani,
 Malayalam,
 Mandaic,
 Manichaean,
 Meetei_Mayek,
 Mende_Kikakui,
 Meroitic_Cursive,
 Meroitic_Hieroglyphs,
 Miao,
 Modi,
 Mongolian,
 Mro,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
 Nko,
 Ogham,
 Ol_Chiki,
 Old_Italic,
 Old_North_Arabian,
 Old_Permic,
 Old_Persian,
 Old_South_Arabian,
 Old_Turkic,
 Oriya,
 Osmanya,
 Pahawh_Hmong,
 Palmyrene,
 Pau_Cin_Hau,
 Phags_Pa,
 Phoenician,
 Psalter_Pahlavi,
 Rejang,
 Runic,
 Samaritan,
 Saurashtra,
 Sharada,
 Shavian,
 Siddham,
 Sinhala,
 Sora_Sompeng,
 Sundanese,
 Syloti_Nagri,
 Syriac,
 Tagalog,
 Tagbanwa,
 Tai_Le,
 Tai_Tham,
 Tai_Viet,
 Takri,
 Tamil,
 Telugu,
 Thaana,
 Thai,
 Tibetan,
 Tifinagh,
 Tirhuta,
 Ugaritic,
 Vai,
 Warang_Citi,
 Yi.
 </P>
 <br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
 <P>
 <pre>
  [...]       positive character class
  [^...]      negative character class
  [x-y]       range (can be used for hex characters)
  [[:xxx:]]   positive POSIX named set
  [[:^xxx:]]  negative POSIX named set
  alnum       alphanumeric
  alpha       alphabetic
  ascii       0-127
  blank       space or tab
  cntrl       control character
  digit       decimal digit
  graph       printing, excluding space
  lower       lower case letter
  print       printing, including space
  punct       printing, excluding alphanumeric
  space       white space
  upper       upper case letter
  word        same as \w
  xdigit      hexadecimal digit
 </pre>
 In PCRE2, POSIX character set names recognize only ASCII characters by default,
 but some of them use Unicode properties if PCRE2_UCP is set. You can use
 \Q...\E inside a character class.
 </P>
 <br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
 <P>
 <pre>
  ?           0 or 1, greedy
  ?+          0 or 1, possessive
  ??          0 or 1, lazy
  *           0 or more, greedy
  *+          0 or more, possessive
  *?          0 or more, lazy
  +           1 or more, greedy
  ++          1 or more, possessive
  +?          1 or more, lazy
  {n}         exactly n
  {n,m}       at least n, no more than m, greedy
  {n,m}+      at least n, no more than m, possessive
  {n,m}?      at least n, no more than m, lazy
  {n,}        n or more, greedy
  {n,}+       n or more, possessive
  {n,}?       n or more, lazy
 </PRE>
 </P>
 <br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
 <P>
 <pre>
  \b          word boundary
  \B          not a word boundary
  ^           start of subject
               also after internal newline in multiline mode
  \A          start of subject
  $           end of subject
               also before newline at end of subject
               also before internal newline in multiline mode
  \Z          end of subject
               also before newline at end of subject
  \z          end of subject
  \G          first matching position in subject
 </PRE>
 </P>
 <br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
 <P>
 <pre>
  \K          reset start of match
 </pre>
 \K is honoured in positive assertions, but ignored in negative ones.
 </P>
 <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
 <P>
 <pre>
  expr|expr|expr...
 </PRE>
 </P>
 <br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
 <P>
 <pre>
  (...)           capturing group
  (?&#60;name&#62;...)    named capturing group (Perl)
  (?'name'...)    named capturing group (Perl)
  (?P&#60;name&#62;...)   named capturing group (Python)
  (?:...)         non-capturing group
  (?|...)         non-capturing group; reset group numbers for
                   capturing groups in each alternative
 </PRE>
 </P>
 <br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
 <P>
 <pre>
  (?&#62;...)         atomic, non-capturing group
 </PRE>
 </P>
 <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
 <P>
 <pre>
  (?#....)        comment (not nestable)
 </PRE>
 </P>
 <br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
 <P>
 <pre>
  (?i)            caseless
  (?J)            allow duplicate names
  (?m)            multiline
  (?s)            single line (dotall)
  (?U)            default ungreedy (lazy)
  (?x)            extended (ignore white space)
  (?-...)         unset option(s)
 </pre>
 The following are recognized only at the very start of a pattern or after one
 of the newline or \R options with similar syntax. More than one of them may
 appear.
 <pre>
  (*LIMIT_MATCH=d) set the match limit to d (decimal number)
  (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
  (*NOTEMPTY)     set PCRE2_NOTEMPTY when matching
  (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching 
  (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
  (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
  (*UTF)          set appropriate UTF mode for the library in use
  (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
 </pre>
 Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
 limits set by the caller of pcre2_exec(), not increase them.
 </P>
 <br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 settings with a similar syntax.
 <pre>
  (*CR)           carriage return only
  (*LF)           linefeed only
  (*CRLF)         carriage return followed by linefeed
  (*ANYCRLF)      all three of the above
  (*ANY)          any Unicode newline sequence
 </PRE>
 </P>
 <br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 setting with a similar syntax.
 <pre>
  (*BSR_ANYCRLF)  CR, LF, or CRLF
  (*BSR_UNICODE)  any Unicode newline sequence
 </PRE>
 </P>
 <br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <P>
 <pre>
  (?=...)         positive look ahead
  (?!...)         negative look ahead
  (?&#60;=...)        positive look behind
  (?&#60;!...)        negative look behind
 </pre>
 Each top-level branch of a look behind must be of a fixed length.
 </P>
 <br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
 <P>
 <pre>
  \n              reference by number (can be ambiguous)
  \gn             reference by number
  \g{n}           reference by number
  \g{-n}          relative reference by number
  \k&#60;name&#62;        reference by name (Perl)
  \k'name'        reference by name (Perl)
  \g{name}        reference by name (Perl)
  \k{name}        reference by name (.NET)
  (?P=name)       reference by name (Python)
 </PRE>
 </P>
 <br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 <P>
 <pre>
  (?R)            recurse whole pattern
  (?n)            call subpattern by absolute number
  (?+n)           call subpattern by relative number
  (?-n)           call subpattern by relative number
  (?&name)        call subpattern by name (Perl)
  (?P&#62;name)       call subpattern by name (Python)
  \g&#60;name&#62;        call subpattern by name (Oniguruma)
  \g'name'        call subpattern by name (Oniguruma)
  \g&#60;n&#62;           call subpattern by absolute number (Oniguruma)
  \g'n'           call subpattern by absolute number (Oniguruma)
  \g&#60;+n&#62;          call subpattern by relative number (PCRE2 extension)
  \g'+n'          call subpattern by relative number (PCRE2 extension)
  \g&#60;-n&#62;          call subpattern by relative number (PCRE2 extension)
  \g'-n'          call subpattern by relative number (PCRE2 extension)
 </PRE>
 </P>
 <br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 <P>
 <pre>
  (?(condition)yes-pattern)
  (?(condition)yes-pattern|no-pattern)
  (?(n)...        absolute reference condition
  (?(+n)...       relative reference condition
  (?(-n)...       relative reference condition
  (?(&#60;name&#62;)...   named reference condition (Perl)
  (?('name')...   named reference condition (Perl)
  (?(name)...     named reference condition (PCRE2)
  (?(R)...        overall recursion condition
  (?(Rn)...       specific group recursion condition
  (?(R&name)...   specific recursion condition
  (?(DEFINE)...   define subpattern for reference
  (?(assert)...   assertion condition
 </PRE>
 </P>
 <br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 The following act immediately they are reached:
 <pre>
  (*ACCEPT)       force successful match
  (*FAIL)         force backtrack; synonym (*F)
  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
 </pre>
 The following act only when a subsequent match failure causes a backtrack to
 reach them. They all force a match failure, but they differ in what happens
 afterwards. Those that advance the start-of-match point do so only if the
 pattern is not anchored.
 <pre>
  (*COMMIT)       overall failure, no advance of starting point
  (*PRUNE)        advance to next starting character
  (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
  (*SKIP)         advance to current matching position
  (*SKIP:NAME)    advance to position corresponding to an earlier
                  (*MARK:NAME); if not found, the (*SKIP) is ignored
  (*THEN)         local failure, backtrack to next alternation
  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
 </PRE>
 </P>
 <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
 <P>
 <pre>
  (?C)      callout
  (?Cn)     callout with data n
 </PRE>
 </P>
 <br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 <b>pcre2matching</b>(3), <b>pcre2</b>(3).
 </P>
 <br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
 University Computing Service
 <br>
 Cambridge CB2 3QH, England.
 <br>
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
 Last updated: 20 October 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
 </p>
--- a/doc/pcre2perform.3
+++ b/doc/pcre2perform.3
@ -0,0 +1,178 @@
 .TH PCRE2PERFORM 3 "20 Ocbober 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 PERFORMANCE"
 .rs
 .sp
 Two aspects of performance are discussed below: memory usage and processing
 time. The way you express your pattern as a regular expression can affect both
 of them.
 .
 .SH "COMPILED PATTERN MEMORY USAGE"
 .rs
 .sp
 Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
 so that most simple patterns do not use much memory. However, there is one case
 where the memory usage of a compiled pattern can be unexpectedly large. If a
 parenthesized subpattern has a quantifier with a minimum greater than 1 and/or
 a limited maximum, the whole subpattern is repeated in the compiled code. For
 example, the pattern
 .sp
  (abc|def){2,4}
 .sp
 is compiled as if it were
 .sp
  (abc|def)(abc|def)((abc|def)(abc|def)?)?
 .sp
 (Technical aside: It is done this way so that backtrack points within each of
 the repetitions can be independently maintained.)
 .P
 For regular expressions whose quantifiers use only small numbers, this is not
 usually a problem. However, if the numbers are large, and particularly if such
 repetitions are nested, the memory usage can become an embarrassment. For
 example, the very simple pattern
 .sp
  ((ab){1,1000}c){1,3}
 .sp
 uses 51K bytes when compiled using the 8-bit library. When PCRE2 is compiled
 with its default internal pointer size of two bytes, the size limit on a
 compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and this
 is reached with the above pattern if the outer repetition is increased from 3
 to 4. PCRE2 can be compiled to use larger internal pointers and thus handle
 larger compiled patterns, but it is better to try to rewrite your pattern to
 use less memory if you can.
 .P
 One way of reducing the memory usage for such patterns is to make use of
 PCRE2's
 .\" HTML <a href="pcre2pattern.html#subpatternsassubroutines">
 .\" </a>
 "subroutine"
 .\"
 facility. Re-writing the above pattern as
 .sp
  ((ab)(?2){0,999}c)(?1){0,2}
 .sp
 reduces the memory requirements to 18K, and indeed it remains under 20K even
 with the outer repetition increased to 100. However, this pattern is not
 exactly equivalent, because the "subroutine" calls are treated as
 .\" HTML <a href="pcre2pattern.html#atomicgroup">
 .\" </a>
 atomic groups
 .\"
 into which there can be no backtracking if there is a subsequent matching
 failure. Therefore, PCRE2 cannot do this kind of rewriting automatically.
 Furthermore, there is a noticeable loss of speed when executing the modified
 pattern. Nevertheless, if the atomic grouping is not a problem and the loss of
 speed is acceptable, this kind of rewriting will allow you to process patterns
 that PCRE2 cannot otherwise handle.
 .
 .
 .SH "STACK USAGE AT RUN TIME"
 .rs
 .sp
 When \fBpcre2_match()\fP is used for matching, certain kinds of pattern can
 cause it to use large amounts of the process stack. In some environments the
 default process stack is quite small, and if it runs out the result is often
 SIGSEGV. Rewriting your pattern can often help. The
 .\" HREF
 \fBpcre2stack\fP
 .\"
 documentation discusses this issue in detail.
 .
 .
 .SH "PROCESSING TIME"
 .rs
 .sp
 Certain items in regular expression patterns are processed more efficiently
 than others. It is more efficient to use a character class like [aeiou] than a
 set of single-character alternatives such as (a|e|i|o|u). In general, the
 simplest construction that provides the required behaviour is usually the most
 efficient. Jeffrey Friedl's book contains a lot of useful general discussion
 about optimizing regular expressions for efficient performance. This document
 contains a few observations about PCRE2.
 .P
 Using Unicode character properties (the \ep, \eP, and \eX escapes) is slow,
 because PCRE2 has to use a multi-stage table lookup whenever it needs a
 character's property. If you can find an alternative pattern that does not use
 character properties, it will probably be faster.
 .P
 By default, the escape sequences \eb, \ed, \es, and \ew, and the POSIX
 character classes such as [:alpha:] do not use Unicode properties, partly for
 backwards compatibility, and partly for performance reasons. However, you can
 set the PCRE2_UCP option or start the pattern with (*UCP) if you want Unicode
 character properties to be used. This can double the matching time for items
 such as \ed, when matched with \fBpcre2_match()\fP; the performance loss is
 less with a DFA matching function, and in both cases there is not much
 difference for \eb.
 .P
 When a pattern begins with .* not in parentheses, or in parentheses that are
 not the subject of a backreference, and the PCRE2_DOTALL option is set, the
 pattern is implicitly anchored by PCRE2, since it can match only at the start
 of a subject string. However, if PCRE2_DOTALL is not set, PCRE2 cannot make
 this optimization, because the dot metacharacter does not then match a newline,
 and if the subject string contains newlines, the pattern may match from the
 character immediately following one of them instead of from the very start. For
 example, the pattern
 .sp
  .*second
 .sp
 matches the subject "first\enand second" (where \en stands for a newline
 character), with the match starting at the seventh character. In order to do
 this, PCRE2 has to retry the match starting after every newline in the subject.
 .P
 If you are using such a pattern with subject strings that do not contain
 newlines, the best performance is obtained by setting PCRE2_DOTALL, or starting
 the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE2
 from having to scan along the subject looking for a newline to restart at.
 .P
 Beware of patterns that contain nested indefinite repeats. These can take a
 long time to run when applied to a string that does not match. Consider the
 pattern fragment
 .sp
  ^(a+)*
 .sp
 This can match "aaaa" in 16 different ways, and this number increases very
 rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
 times, and for each of those cases other than 0 or 4, the + repeats can match
 different numbers of times.) When the remainder of the pattern is such that the
 entire match is going to fail, PCRE2 has in principle to try every possible
 variation, and this can take an extremely long time, even for relatively short
 strings.
 .P
 An optimization catches some of the more simple cases such as
 .sp
  (a+)*b
 .sp
 where a literal character follows. Before embarking on the standard matching
 procedure, PCRE2 checks that there is a "b" later in the subject string, and if
 there is not, it fails the match immediately. However, when there is no
 following literal this optimization cannot be used. You can see the difference
 by comparing the behaviour of
 .sp
  (a+)*\ed
 .sp
 with the pattern above. The former gives a failure almost instantly when
 applied to a whole line of "a" characters, whereas the latter takes an
 appreciable time with strings longer than about 20 characters.
 .P
 In many cases, the solution to this kind of performance issue is to use an
 atomic group or a possessive quantifier.
 .
 .
 .SH AUTHOR
 .rs
 .sp
 .nf
 Philip Hazel
 University Computing Service
 Cambridge CB2 3QH, England.
 .fi
 .
 .
 .SH REVISION
 .rs
 .sp
 .nf
 Last updated: 20 October 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/doc/pcre2posix.3
+++ b/doc/pcre2posix.3
@ -0,0 +1,268 @@
 .TH PCRE2POSIX 3 "20 October 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "SYNOPSIS"
 .rs
 .sp
 .B #include <pcre2posix.h>
 .PP
 .nf
 .B int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP,
 .B "     int \fIcflags\fP);"
 .sp
 .B int regexec(const regex_t *\fIpreg\fP, const char *\fIstring\fP,
 .B "     size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);"
 .sp
 .B "size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,"
 .B "     char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);"
 .sp
 .B void regfree(regex_t *\fIpreg\fP);
 .fi
 .
 .SH DESCRIPTION
 .rs
 .sp
 This set of functions provides a POSIX-style API for the PCRE2 regular
 expression 8-bit library. See the
 .\" HREF
 \fBpcre2api\fP
 .\"
 documentation for a description of PCRE2's native API, which contains much
 additional functionality. There is no POSIX-style wrapper for PCRE2's 16-bit
 and 32-bit libraries.
 .P
 The functions described here are just wrapper functions that ultimately call
 the PCRE2 native API. Their prototypes are defined in the \fBpcre2posix.h\fP
 header file, and on Unix systems the library itself is called
 \fBlibpcre2-posix.a\fP, so can be accessed by adding \fB-lpcre2-posix\fP to the
 command for linking an application that uses them. Because the POSIX functions
 call the native ones, it is also necessary to add \fB-lpcre2-8\fP.
 .P
 Those POSIX option bits that can reasonably be mapped to PCRE2 native options
 have been implemented. In addition, the option REG_EXTENDED is defined with the
 value zero. This has no effect, but since programs that are written to the
 POSIX interface often use it, this makes it easier to slot in PCRE2 as a
 replacement library. Other POSIX options are not even defined.
 .P
 There are also some other options that are not defined by POSIX. These have
 been added at the request of users who want to make use of certain
 PCRE2-specific features via the POSIX calling interface.
 .P
 When PCRE2 is called via these functions, it is only the API that is POSIX-like
 in style. The syntax and semantics of the regular expressions themselves are
 still those of Perl, subject to the setting of various PCRE2 options, as
 described below. "POSIX-like in style" means that the API approximates to the
 POSIX definition; it is not fully POSIX-compatible, and in multi-unit encoding
 domains it is probably even less compatible.
 .P
 The header for these functions is supplied as \fBpcre2posix.h\fP to avoid any
 potential clash with other POSIX libraries. It can, of course, be renamed or
 aliased as \fBregex.h\fP, which is the "correct" name. It provides two
 structure types, \fIregex_t\fP for compiled internal forms, and
 \fIregmatch_t\fP for returning captured substrings. It also defines some
 constants whose names start with "REG_"; these are used for setting options and
 identifying error codes.
 .
 .
 .SH "COMPILING A PATTERN"
 .rs
 .sp
 The function \fBregcomp()\fP is called to compile a pattern into an
 internal form. The pattern is a C string terminated by a binary zero, and
 is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer
 to a \fBregex_t\fP structure that is used as a base for storing information
 about the compiled regular expression.
 .P
 The argument \fIcflags\fP is either zero, or contains one or more of the bits
 defined by the following macros:
 .sp
  REG_DOTALL
 .sp
 The PCRE2_DOTALL option is set when the regular expression is passed for
 compilation to the native function. Note that REG_DOTALL is not part of the
 POSIX standard.
 .sp
  REG_ICASE
 .sp
 The PCRE2_CASELESS option is set when the regular expression is passed for
 compilation to the native function.
 .sp
  REG_NEWLINE
 .sp
 The PCRE2_MULTILINE option is set when the regular expression is passed for
 compilation to the native function. Note that this does \fInot\fP mimic the
 defined POSIX behaviour for REG_NEWLINE (see the following section).
 .sp
  REG_NOSUB
 .sp
 The PCRE2_NO_AUTO_CAPTURE option is set when the regular expression is passed
 for compilation to the native function. In addition, when a pattern that is
 compiled with this flag is passed to \fBregexec()\fP for matching, the
 \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings
 are returned.
 .sp
  REG_UCP
 .sp
 The PCRE2_UCP option is set when the regular expression is passed for
 compilation to the native function. This causes PCRE2 to use Unicode properties
 when matchine \ed, \ew, etc., instead of just recognizing ASCII values. Note
 that REG_UCP is not part of the POSIX standard.
 .sp
  REG_UNGREEDY
 .sp
 The PCRE2_UNGREEDY option is set when the regular expression is passed for
 compilation to the native function. Note that REG_UNGREEDY is not part of the
 POSIX standard.
 .sp
  REG_UTF
 .sp
 The PCRE2_UTF option is set when the regular expression is passed for
 compilation to the native function. This causes the pattern itself and all data
 strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF
 is not part of the POSIX standard.
 .P
 In the absence of these flags, no options are passed to the native function.
 This means the the regex is compiled with PCRE2 default semantics. In
 particular, the way it handles newline characters in the subject string is the
 Perl way, not the POSIX way. Note that setting PCRE2_MULTILINE has only
 \fIsome\fP of the effects specified for REG_NEWLINE. It does not affect the way
 newlines are matched by the dot metacharacter (they are not) or by a negative
 class such as [^a] (they are).
 .P
 The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
 \fIpreg\fP structure is filled in on success, and one member of the structure
 is public: \fIre_nsub\fP contains the number of capturing subpatterns in
 the regular expression. Various error codes are defined in the header file.
 .P
 NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to
 use the contents of the \fIpreg\fP structure. If, for example, you pass it to
 \fBregexec()\fP, the result is undefined and your program is likely to crash.
 .
 .
 .SH "MATCHING NEWLINE CHARACTERS"
 .rs
 .sp
 This area is not simple, because POSIX and Perl take different views of things.
 It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was
 never intended to be a POSIX engine. The following table lists the different
 possibilities for matching newline characters in PCRE2:
 .sp
                          Default   Change with
 .sp
  . matches newline          no     PCRE2_DOTALL
  newline matches [^a]       yes    not changeable
  $ matches \en at end        yes    PCRE2_DOLLAR_ENDONLY
  $ matches \en in middle     no     PCRE2_MULTILINE
  ^ matches \en in middle     no     PCRE2_MULTILINE
 .sp
 This is the equivalent table for POSIX:
 .sp
                          Default   Change with
 .sp
  . matches newline          yes    REG_NEWLINE
  newline matches [^a]       yes    REG_NEWLINE
  $ matches \en at end        no     REG_NEWLINE
  $ matches \en in middle     no     REG_NEWLINE
  ^ matches \en in middle     no     REG_NEWLINE
 .sp
 PCRE2's behaviour is the same as Perl's, except that there is no equivalent for
 PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there is no way to stop
 newline from matching [^a].
 .P
 The default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
 PCRE2_DOLLAR_ENDONLY, but there is no way to make PCRE2 behave exactly as for
 the REG_NEWLINE action.
 .
 .
 .SH "MATCHING A PATTERN"
 .rs
 .sp
 The function \fBregexec()\fP is called to match a compiled pattern \fIpreg\fP
 against a given \fIstring\fP, which is by default terminated by a zero byte
 (but see REG_STARTEND below), subject to the options in \fIeflags\fP. These can
 be:
 .sp
  REG_NOTBOL
 .sp
 The PCRE2_NOTBOL option is set when calling the underlying PCRE2 matching
 function.
 .sp
  REG_NOTEMPTY
 .sp
 The PCRE2_NOTEMPTY option is set when calling the underlying PCRE2 matching
 function. Note that REG_NOTEMPTY is not part of the POSIX standard. However,
 setting this option can give more POSIX-like behaviour in some situations.
 .sp
  REG_NOTEOL
 .sp
 The PCRE2_NOTEOL option is set when calling the underlying PCRE2 matching
 function.
 .sp
  REG_STARTEND
 .sp
 The string is considered to start at \fIstring\fP + \fIpmatch[0].rm_so\fP and
 to have a terminating NUL located at \fIstring\fP + \fIpmatch[0].rm_eo\fP
 (there need not actually be a NUL at that location), regardless of the value of
 \fInmatch\fP. This is a BSD extension, compatible with but not specified by
 IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
 intended to be portable to other systems. Note that a non-zero \fIrm_so\fP does
 not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
 how it is matched.
 .P
 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
 strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
 \fBregexec()\fP are ignored.
 .P
 If the value of \fInmatch\fP is zero, or if the value \fIpmatch\fP is NULL,
 no data about any matched strings is returned.
 .P
 Otherwise,the portion of the string that was matched, and also any captured
 substrings, are returned via the \fIpmatch\fP argument, which points to an
 array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the
 members \fIrm_so\fP and \fIrm_eo\fP. These contain the byte offset to the first
 character of each substring and the offset to the first character after the end
 of each substring, respectively. The 0th element of the vector relates to the
 entire portion of \fIstring\fP that was matched; subsequent elements relate to
 the capturing subpatterns of the regular expression. Unused entries in the
 array have both structure members set to -1.
 .P
 A successful match yields a zero return; various error codes are defined in the
 header file, of which REG_NOMATCH is the "expected" failure code.
 .
 .
 .SH "ERROR MESSAGES"
 .rs
 .sp
 The \fBregerror()\fP function maps a non-zero errorcode from either
 \fBregcomp()\fP or \fBregexec()\fP to a printable message. If \fIpreg\fP is not
 NULL, the error should have arisen from the use of that structure. A message
 terminated by a binary zero is placed in \fIerrbuf\fP. The length of the
 message, including the zero, is limited to \fIerrbuf_size\fP. The yield of the
 function is the size of buffer needed to hold the whole message.
 .
 .
 .SH MEMORY USAGE
 .rs
 .sp
 Compiling a regular expression causes memory to be allocated and associated
 with the \fIpreg\fP structure. The function \fBregfree()\fP frees all such
 memory, after which \fIpreg\fP may no longer be used as a compiled expression.
 .
 .
 .SH AUTHOR
 .rs
 .sp
 .nf
 Philip Hazel
 University Computing Service
 Cambridge CB2 3QH, England.
 .fi
 .
 .
 .SH REVISION
 .rs
 .sp
 .nf
 Last updated: 20 October 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/doc/pcre2sample.3
+++ b/doc/pcre2sample.3
@ -0,0 +1,94 @@
 .TH PCRE2SAMPLE 3 "20 October 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 SAMPLE PROGRAM"
 .rs
 .sp
 A simple, complete demonstration program to get you started with using PCRE2 is
 supplied in the file \fIpcre2demo.c\fP in the \fBsrc\fP directory in the PCRE2
 distribution. A listing of this program is given in the
 .\" HREF
 \fBpcre2demo\fP
 .\"
 documentation. If you do not have a copy of the PCRE2 distribution, you can
 save this listing to re-create the contents of \fIpcre2demo.c\fP.
 .P
 The demonstration program, which uses the PCRE2 8-bit library, compiles the
 regular expression that is its first argument, and matches it against the
 subject string in its second argument. No PCRE2 options are set, and default
 character tables are used. If matching succeeds, the program outputs the
 portion of the subject that matched, together with the contents of any captured
 substrings.
 .P
 If the -g option is given on the command line, the program then goes on to
 check for further matches of the same regular expression in the same subject
 string. The logic is a little bit tricky because of the possibility of matching
 an empty string. Comments in the code explain what is going on.
 .P
 If PCRE2 is installed in the standard include and library directories for your
 operating system, you should be able to compile the demonstration program using
 this command:
 .sp
  gcc -o pcre2demo pcre2demo.c -lpcre2-8
 .sp
 If PCRE2 is installed elsewhere, you may need to add additional options to the
 command line. For example, on a Unix-like system that has PCRE2 installed in
 \fI/usr/local\fP, you can compile the demonstration program using a command
 like this:
 .sp
 .\" JOINSH
  gcc -o pcre2demo -I/usr/local/include pcre2demo.c \e
      -L/usr/local/lib -lpcre2-8
 .sp
 .P
 Once you have compiled and linked the demonstration program, you can run simple
 tests like this:
 .sp
  ./pcre2demo 'cat|dog' 'the cat sat on the mat'
  ./pcre2demo -g 'cat|dog' 'the dog sat on the cat'
 .sp
 Note that there is a much more comprehensive test program, called
 .\" HREF
 \fBpcre2test\fP,
 .\"
 which supports many more facilities for testing regular expressions using the
 PCRE2 libraries. The
 .\" HREF
 \fBpcre2demo\fP
 .\"
 program is provided as a simple coding example.
 .P
 If you try to run
 .\" HREF
 \fBpcre2demo\fP
 .\"
 when PCRE2 is not installed in the standard library directory, you may get an
 error like this on some operating systems (e.g. Solaris):
 .sp
  ld.so.1: a.out: fatal: libpcre2.so.0: open failed: No such file or directory
 .sp
 This is caused by the way shared library support works on those systems. You
 need to add
 .sp
  -R/usr/local/lib
 .sp
 (for example) to the compile command to get round this problem.
 .
 .
 .SH AUTHOR
 .rs
 .sp
 .nf
 Philip Hazel
 University Computing Service
 Cambridge CB2 3QH, England.
 .fi
 .
 .
 .SH REVISION
 .rs
 .sp
 .nf
 Last updated: 20 October 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/doc/pcre2stack.3
+++ b/doc/pcre2stack.3
@ -0,0 +1,199 @@
 .TH PCRE2STACK 3 "20 October 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 DISCUSSION OF STACK USAGE"
 .rs
 .sp
 When you call \fBpcre2_match()\fP, it makes use of an internal function called
 \fBmatch()\fP. This calls itself recursively at branch points in the pattern,
 in order to remember the state of the match so that it can back up and try a
 different alternative after a failure. As matching proceeds deeper and deeper
 into the tree of possibilities, the recursion depth increases. The
 \fBmatch()\fP function is also called in other circumstances, for example,
 whenever a parenthesized sub-pattern is entered, and in certain cases of
 repetition.
 .P
 Not all calls of \fBmatch()\fP increase the recursion depth; for an item such
 as a* it may be called several times at the same level, after matching
 different numbers of a's. Furthermore, in a number of cases where the result of
 the recursive call would immediately be passed back as the result of the
 current call (a "tail recursion"), the function is just restarted instead.
 .P
 The above comments apply when \fBpcre2_match()\fP is run in its normal
 interpretive manner. If the compiled pattern was processed by
 \fBpcre2_jit_compile()\fP, and just-in-time compiling was successful, and the
 options passed to \fBpcre2_match()\fP were not incompatible, the matching
 process uses the JIT-compiled code instead of the \fBmatch()\fP function. In
 this case, the memory requirements are handled entirely differently. See the
 .\" HREF
 \fBpcre2jit\fP
 .\"
 documentation for details.
 .P
 The \fBpcre2_dfa_match()\fP function operates in a different way to
 \fBpcre2_match()\fP, and uses recursion only when there is a regular expression
 recursion or subroutine call in the pattern. This includes the processing of
 assertion and "once-only" subpatterns, which are handled like subroutine calls.
 Normally, these are never very deep, and the limit on the complexity of
 \fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
 However, it is possible to write patterns with runaway infinite recursions;
 such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At
 present, there is no protection against this.
 .P
 The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are
 relevant only for \fBpcre2_match()\fP without the JIT optimization.
 .
 .
 .SS "Reducing \fBpcre2_match()\fP's stack usage"
 .rs
 .sp
 Each time that the internal \fBmatch()\fP function is called recursively, it
 uses memory from the process stack. For certain kinds of pattern and data, very
 large amounts of stack may be needed, despite the recognition of "tail
 recursion". You can often reduce the amount of recursion, and therefore the
 amount of stack used, by modifying the pattern that is being matched. Consider,
 for example, this pattern:
 .sp
  ([^<]|<(?!inet))+
 .sp
 It matches from wherever it starts until it encounters "<inet" or the end of
 the data, and is the kind of pattern that might be used when processing an XML
 file. Each iteration of the outer parentheses matches either one character that
 is not "<" or a "<" that is not followed by "inet". However, each time a
 parenthesis is processed, a recursion occurs, so this formulation uses a stack
 frame for each matched character. For a long string, a lot of stack is
 required. Consider now this rewritten pattern, which matches exactly the same
 strings:
 .sp
  ([^<]++|<(?!inet))+
 .sp
 This uses very much less stack, because runs of characters that do not contain
 "<" are "swallowed" in one item inside the parentheses. Recursion happens only
 when a "<" character that is not followed by "inet" is encountered (and we
 assume this is relatively rare). A possessive quantifier is used to stop any
 backtracking into the runs of non-"<" characters, but that is not related to
 stack usage.
 .P
 This example shows that one way of avoiding stack problems when matching long
 subject strings is to write repeated parenthesized subpatterns to match more
 than one character whenever possible.
 .
 .
 .SS "Compiling PCRE2 to use heap instead of stack for \fBpcre2_match()\fP"
 .rs
 .sp
 In environments where stack memory is constrained, you might want to compile
 PCRE2 to use heap memory instead of stack for remembering back-up points when
 \fBpcre2_match()\fP is running. This makes it run more slowly, however. Details
 of how to do this are given in the
 .\" HREF
 \fBpcre2build\fP
 .\"
 documentation. When built in this way, instead of using the stack, PCRE2
 gets memory for remembering backup points from the heap. By default, the memory 
 is obtained by calling the system \fBmalloc()\fP function, but you can arrange 
 to supply your own memory management function. For details, see the section 
 entitled 
 .\" HTML <a href="pcre2api.html#matchcontext">
 .\" </a>
 "The match context"
 .\"
 in the
 .\" HREF
 \fBpcre2api\fP
 .\"
 documentation. Since the block sizes are always the same, it may be possible to
 implement customized a memory handler that is more efficient than the standard
 function. The memory blocks obtained for this purpose are retained and re-used  
 if possible while \fBpcre2_match()\fP is running. They are all freed just 
 before it exits.
 .
 .
 .SS "Limiting \fBpcre2_match()\fP's stack usage"
 .rs
 .sp
 You can set limits on the number of times the internal \fBmatch()\fP function
 is called, both in total and recursively. If a limit is exceeded,
 \fBpcre2_match()\fP returns an error code. Setting suitable limits should
 prevent it from running out of stack. The default values of the limits are very
 large, and unlikely ever to operate. They can be changed when PCRE2 is built,
 and they can also be set when \fBpcre2_match()\fP is called. For details of
 these interfaces, see the
 .\" HREF
 \fBpcre2build\fP
 .\"
 documentation and the section entitled
 .\" HTML <a href="pcre2api.html#matchcontext">
 .\" </a>
 "The match context"
 .\"
 in the
 .\" HREF
 \fBpcre2api\fP
 .\"
 documentation.
 .P
 As a very rough rule of thumb, you should reckon on about 500 bytes per
 recursion. Thus, if you want to limit your stack usage to 8Mb, you should set
 the limit at 16000 recursions. A 64Mb stack, on the other hand, can support
 around 128000 recursions.
 .P
 The \fBpcre2test\fP test program has a modifier called "find_limits" which, if
 applied to a subject line, causes it to find the smallest limits that allow a a
 pattern to match. This is done by calling \fBpcre2_match()\fP repeatedly with
 different limits.
 .
 .
 .SS "Changing stack size in Unix-like systems"
 .rs
 .sp
 In Unix-like environments, there is not often a problem with the stack unless
 very long strings are involved, though the default limit on stack size varies
 from system to system. Values from 8Mb to 64Mb are common. You can find your
 default limit by running the command:
 .sp
  ulimit -s
 .sp
 Unfortunately, the effect of running out of stack is often SIGSEGV, though
 sometimes a more explicit error message is given. You can normally increase the
 limit on stack size by code such as this:
 .sp
  struct rlimit rlim;
  getrlimit(RLIMIT_STACK, &rlim);
  rlim.rlim_cur = 100*1024*1024;
  setrlimit(RLIMIT_STACK, &rlim);
 .sp
 This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then
 attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must
 do this before calling \fBpcre2_match()\fP.
 .
 .
 .SS "Changing stack size in Mac OS X"
 .rs
 .sp
 Using \fBsetrlimit()\fP, as described above, should also work on Mac OS X. It
 is also possible to set a stack size when linking a program. There is a
 discussion about stack sizes in Mac OS X at this web site:
 .\" HTML <a href="http://developer.apple.com/qa/qa2005/qa1419.html">
 .\" </a>
 http://developer.apple.com/qa/qa2005/qa1419.html.
 .\"
 .
 .
 .SH AUTHOR
 .rs
 .sp
 .nf
 Philip Hazel
 University Computing Service
 Cambridge CB2 3QH, England.
 .fi
 .
 .
 .SH REVISION
 .rs
 .sp
 .nf
 Last updated: 20 October 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -0,0 +1,540 @@
 .TH PCRE2SYNTAX 3 "20 October 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
 .rs
 .sp
 The full syntax and semantics of the regular expressions that are supported by
 PCRE2 are described in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
 documentation. This document contains a quick-reference summary of the syntax.
 .
 .
 .SH "QUOTING"
 .rs
 .sp
  \ex         where x is non-alphanumeric is a literal x
  \eQ...\eE    treat enclosed characters as literal
 .
 .
 .SH "CHARACTERS"
 .rs
 .sp
  \ea         alarm, that is, the BEL character (hex 07)
  \ecx        "control-x", where x is any ASCII character
  \ee         escape (hex 1B)
  \ef         form feed (hex 0C)
  \en         newline (hex 0A)
  \er         carriage return (hex 0D)
  \et         tab (hex 09)
  \e0dd       character with octal code 0dd
  \eddd       character with octal code ddd, or backreference
  \eo{ddd..}  character with octal code ddd..
  \exhh       character with hex code hh
  \ex{hhh..}  character with hex code hhh..
 .sp
 Note that \e0dd is always an octal code, and that \e8 and \e9 are the literal
 characters "8" and "9".
 .
 .
 .SH "CHARACTER TYPES"
 .rs
 .sp
  .          any character except newline;
               in dotall mode, any character whatsoever
  \eC         one data unit, even in UTF mode (best avoided)
  \ed         a decimal digit
  \eD         a character that is not a decimal digit
  \eh         a horizontal white space character
  \eH         a character that is not a horizontal white space character
  \eN         a character that is not a newline
  \ep{\fIxx\fP}     a character with the \fIxx\fP property
  \eP{\fIxx\fP}     a character without the \fIxx\fP property
  \eR         a newline sequence
  \es         a white space character
  \eS         a character that is not a white space character
  \ev         a vertical white space character
  \eV         a character that is not a vertical white space character
  \ew         a "word" character
  \eW         a "non-word" character
  \eX         a Unicode extended grapheme cluster
 .sp
 By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode
 or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
 happening, \es and \ew may also match characters with code points in the range
 128-255. If the PCRE2_UCP option is set, the behaviour of these escape
 sequences is changed to use Unicode properties and they match many more
 characters.
 .
 .
 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
 .rs
 .sp
  C          Other
  Cc         Control
  Cf         Format
  Cn         Unassigned
  Co         Private use
  Cs         Surrogate
 .sp
  L          Letter
  Ll         Lower case letter
  Lm         Modifier letter
  Lo         Other letter
  Lt         Title case letter
  Lu         Upper case letter
  L&         Ll, Lu, or Lt
 .sp
  M          Mark
  Mc         Spacing mark
  Me         Enclosing mark
  Mn         Non-spacing mark
 .sp
  N          Number
  Nd         Decimal number
  Nl         Letter number
  No         Other number
 .sp
  P          Punctuation
  Pc         Connector punctuation
  Pd         Dash punctuation
  Pe         Close punctuation
  Pf         Final punctuation
  Pi         Initial punctuation
  Po         Other punctuation
  Ps         Open punctuation
 .sp
  S          Symbol
  Sc         Currency symbol
  Sk         Modifier symbol
  Sm         Mathematical symbol
  So         Other symbol
 .sp
  Z          Separator
  Zl         Line separator
  Zp         Paragraph separator
  Zs         Space separator
 .
 .
 .SH "PCRE2 SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
 .rs
 .sp
  Xan        Alphanumeric: union of properties L and N
  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
  Xuc        Univerally-named character: one that can be
               represented by a Universal Character Name
  Xwd        Perl word: property Xan or underscore
 .sp
 Perl and POSIX space are now the same. Perl added VT to its space character set
 at release 5.18.
 .
 .
 .SH "SCRIPT NAMES FOR \ep AND \eP"
 .rs
 .sp
 Arabic,
 Armenian,
 Avestan,
 Balinese,
 Bamum,
 Bassa_Vah,
 Batak,
 Bengali,
 Bopomofo,
 Brahmi,
 Braille,
 Buginese,
 Buhid,
 Canadian_Aboriginal,
 Carian,
 Caucasian_Albanian,
 Chakma,
 Cham,
 Cherokee,
 Common,
 Coptic,
 Cuneiform,
 Cypriot,
 Cyrillic,
 Deseret,
 Devanagari,
 Duployan,
 Egyptian_Hieroglyphs,
 Elbasan,
 Ethiopic,
 Georgian,
 Glagolitic,
 Gothic,
 Grantha,
 Greek,
 Gujarati,
 Gurmukhi,
 Han,
 Hangul,
 Hanunoo,
 Hebrew,
 Hiragana,
 Imperial_Aramaic,
 Inherited,
 Inscriptional_Pahlavi,
 Inscriptional_Parthian,
 Javanese,
 Kaithi,
 Kannada,
 Katakana,
 Kayah_Li,
 Kharoshthi,
 Khmer,
 Khojki,
 Khudawadi,
 Lao,
 Latin,
 Lepcha,
 Limbu,
 Linear_A,
 Linear_B,
 Lisu,
 Lycian,
 Lydian,
 Mahajani,
 Malayalam,
 Mandaic,
 Manichaean,
 Meetei_Mayek,
 Mende_Kikakui,
 Meroitic_Cursive,
 Meroitic_Hieroglyphs,
 Miao,
 Modi,
 Mongolian,
 Mro,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
 Nko,
 Ogham,
 Ol_Chiki,
 Old_Italic,
 Old_North_Arabian,
 Old_Permic,
 Old_Persian,
 Old_South_Arabian,
 Old_Turkic,
 Oriya,
 Osmanya,
 Pahawh_Hmong,
 Palmyrene,
 Pau_Cin_Hau,
 Phags_Pa,
 Phoenician,
 Psalter_Pahlavi,
 Rejang,
 Runic,
 Samaritan,
 Saurashtra,
 Sharada,
 Shavian,
 Siddham,
 Sinhala,
 Sora_Sompeng,
 Sundanese,
 Syloti_Nagri,
 Syriac,
 Tagalog,
 Tagbanwa,
 Tai_Le,
 Tai_Tham,
 Tai_Viet,
 Takri,
 Tamil,
 Telugu,
 Thaana,
 Thai,
 Tibetan,
 Tifinagh,
 Tirhuta,
 Ugaritic,
 Vai,
 Warang_Citi,
 Yi.
 .
 .
 .SH "CHARACTER CLASSES"
 .rs
 .sp
  [...]       positive character class
  [^...]      negative character class
  [x-y]       range (can be used for hex characters)
  [[:xxx:]]   positive POSIX named set
  [[:^xxx:]]  negative POSIX named set
 .sp
  alnum       alphanumeric
  alpha       alphabetic
  ascii       0-127
  blank       space or tab
  cntrl       control character
  digit       decimal digit
  graph       printing, excluding space
  lower       lower case letter
  print       printing, including space
  punct       printing, excluding alphanumeric
  space       white space
  upper       upper case letter
  word        same as \ew
  xdigit      hexadecimal digit
 .sp
 In PCRE2, POSIX character set names recognize only ASCII characters by default,
 but some of them use Unicode properties if PCRE2_UCP is set. You can use
 \eQ...\eE inside a character class.
 .
 .
 .SH "QUANTIFIERS"
 .rs
 .sp
  ?           0 or 1, greedy
  ?+          0 or 1, possessive
  ??          0 or 1, lazy
  *           0 or more, greedy
  *+          0 or more, possessive
  *?          0 or more, lazy
  +           1 or more, greedy
  ++          1 or more, possessive
  +?          1 or more, lazy
  {n}         exactly n
  {n,m}       at least n, no more than m, greedy
  {n,m}+      at least n, no more than m, possessive
  {n,m}?      at least n, no more than m, lazy
  {n,}        n or more, greedy
  {n,}+       n or more, possessive
  {n,}?       n or more, lazy
 .
 .
 .SH "ANCHORS AND SIMPLE ASSERTIONS"
 .rs
 .sp
  \eb          word boundary
  \eB          not a word boundary
  ^           start of subject
               also after internal newline in multiline mode
  \eA          start of subject
  $           end of subject
               also before newline at end of subject
               also before internal newline in multiline mode
  \eZ          end of subject
               also before newline at end of subject
  \ez          end of subject
  \eG          first matching position in subject
 .
 .
 .SH "MATCH POINT RESET"
 .rs
 .sp
  \eK          reset start of match
 .sp
 \eK is honoured in positive assertions, but ignored in negative ones.
 .
 .
 .SH "ALTERNATION"
 .rs
 .sp
  expr|expr|expr...
 .
 .
 .SH "CAPTURING"
 .rs
 .sp
  (...)           capturing group
  (?<name>...)    named capturing group (Perl)
  (?'name'...)    named capturing group (Perl)
  (?P<name>...)   named capturing group (Python)
  (?:...)         non-capturing group
  (?|...)         non-capturing group; reset group numbers for
                   capturing groups in each alternative
 .
 .
 .SH "ATOMIC GROUPS"
 .rs
 .sp
  (?>...)         atomic, non-capturing group
 .
 .
 .
 .
 .SH "COMMENT"
 .rs
 .sp
  (?#....)        comment (not nestable)
 .
 .
 .SH "OPTION SETTING"
 .rs
 .sp
  (?i)            caseless
  (?J)            allow duplicate names
  (?m)            multiline
  (?s)            single line (dotall)
  (?U)            default ungreedy (lazy)
  (?x)            extended (ignore white space)
  (?-...)         unset option(s)
 .sp
 The following are recognized only at the very start of a pattern or after one
 of the newline or \eR options with similar syntax. More than one of them may
 appear.
 .sp
  (*LIMIT_MATCH=d) set the match limit to d (decimal number)
  (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
  (*NOTEMPTY)     set PCRE2_NOTEMPTY when matching
  (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching 
  (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
  (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
  (*UTF)          set appropriate UTF mode for the library in use
  (*UCP)          set PCRE2_UCP (use Unicode properties for \ed etc)
 .sp
 Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
 limits set by the caller of pcre2_exec(), not increase them.
 .
 .
 .SH "NEWLINE CONVENTION"
 .rs
 .sp
 These are recognized only at the very start of the pattern or after option
 settings with a similar syntax.
 .sp
  (*CR)           carriage return only
  (*LF)           linefeed only
  (*CRLF)         carriage return followed by linefeed
  (*ANYCRLF)      all three of the above
  (*ANY)          any Unicode newline sequence
 .
 .
 .SH "WHAT \eR MATCHES"
 .rs
 .sp
 These are recognized only at the very start of the pattern or after option
 setting with a similar syntax.
 .sp
  (*BSR_ANYCRLF)  CR, LF, or CRLF
  (*BSR_UNICODE)  any Unicode newline sequence
 .
 .
 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
 .rs
 .sp
  (?=...)         positive look ahead
  (?!...)         negative look ahead
  (?<=...)        positive look behind
  (?<!...)        negative look behind
 .sp
 Each top-level branch of a look behind must be of a fixed length.
 .
 .
 .SH "BACKREFERENCES"
 .rs
 .sp
  \en              reference by number (can be ambiguous)
  \egn             reference by number
  \eg{n}           reference by number
  \eg{-n}          relative reference by number
  \ek<name>        reference by name (Perl)
  \ek'name'        reference by name (Perl)
  \eg{name}        reference by name (Perl)
  \ek{name}        reference by name (.NET)
  (?P=name)       reference by name (Python)
 .
 .
 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
 .rs
 .sp
  (?R)            recurse whole pattern
  (?n)            call subpattern by absolute number
  (?+n)           call subpattern by relative number
  (?-n)           call subpattern by relative number
  (?&name)        call subpattern by name (Perl)
  (?P>name)       call subpattern by name (Python)
  \eg<name>        call subpattern by name (Oniguruma)
  \eg'name'        call subpattern by name (Oniguruma)
  \eg<n>           call subpattern by absolute number (Oniguruma)
  \eg'n'           call subpattern by absolute number (Oniguruma)
  \eg<+n>          call subpattern by relative number (PCRE2 extension)
  \eg'+n'          call subpattern by relative number (PCRE2 extension)
  \eg<-n>          call subpattern by relative number (PCRE2 extension)
  \eg'-n'          call subpattern by relative number (PCRE2 extension)
 .
 .
 .SH "CONDITIONAL PATTERNS"
 .rs
 .sp
  (?(condition)yes-pattern)
  (?(condition)yes-pattern|no-pattern)
 .sp
  (?(n)...        absolute reference condition
  (?(+n)...       relative reference condition
  (?(-n)...       relative reference condition
  (?(<name>)...   named reference condition (Perl)
  (?('name')...   named reference condition (Perl)
  (?(name)...     named reference condition (PCRE2)
  (?(R)...        overall recursion condition
  (?(Rn)...       specific group recursion condition
  (?(R&name)...   specific recursion condition
  (?(DEFINE)...   define subpattern for reference
  (?(assert)...   assertion condition
 .
 .
 .SH "BACKTRACKING CONTROL"
 .rs
 .sp
 The following act immediately they are reached:
 .sp
  (*ACCEPT)       force successful match
  (*FAIL)         force backtrack; synonym (*F)
  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
 .sp
 The following act only when a subsequent match failure causes a backtrack to
 reach them. They all force a match failure, but they differ in what happens
 afterwards. Those that advance the start-of-match point do so only if the
 pattern is not anchored.
 .sp
  (*COMMIT)       overall failure, no advance of starting point
  (*PRUNE)        advance to next starting character
  (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
  (*SKIP)         advance to current matching position
  (*SKIP:NAME)    advance to position corresponding to an earlier
                  (*MARK:NAME); if not found, the (*SKIP) is ignored
  (*THEN)         local failure, backtrack to next alternation
  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
 .
 .
 .SH "CALLOUTS"
 .rs
 .sp
  (?C)      callout
  (?Cn)     callout with data n
 .
 .
 .SH "SEE ALSO"
 .rs
 .sp
 \fBpcre2pattern\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
 \fBpcre2matching\fP(3), \fBpcre2\fP(3).
 .
 .
 .SH AUTHOR
 .rs
 .sp
 .nf
 Philip Hazel
 University Computing Service
 Cambridge CB2 3QH, England.
 .fi
 .
 .
 .SH REVISION
 .rs
 .sp
 .nf
 Last updated: 20 October 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi