Documentation for Bidi_Control and Bidi_Class

2021-12-08 16:37:34 +00:00 · 2021-12-08 16:37:34 +00:00 · 30abd0ac8d
parent 0246c6bf64
commit 30abd0ac8d
14 changed files with 1673 additions and 1448 deletions
--- a/2
+++ b/2
@ -39,6 +39,8 @@ pcre2_substitute(), and the replacement argument of the latter, if the pointer
 is NULL and the length is zero, treat as an empty string. Apparently a number 
 of applications treat NULL/0 in this way.
 14. Added support for Bidi_Class and Bidi_Control Unicode properties.
 Version 10.39 29-October-2021
 -----------------------------
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -2055,8 +2055,8 @@ point. However, this applies only to characters whose code points are less than
 \d.
 </P>
 <P>
-When PCRE2 is built with Unicode support (the default), the Unicode properties
+When PCRE2 is built with Unicode support (the default), certain Unicode
-of all characters can be tested with \p and \P, or, alternatively, the
+character properties can be tested with \p and \P, or, alternatively, the
 PCRE2_UCP option can be set when a pattern is compiled; this causes \w and
 friends to use Unicode property support instead of the built-in tables.
 PCRE2_UCP also causes upper/lower casing operations on characters with code
@ -4018,7 +4018,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 November 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@ -142,8 +142,9 @@ locked this out by setting PCRE2_NEVER_UTF.
 UTF support allows the libraries to process character code points up to
 0x10ffff in the strings that they handle. Unicode support also gives access to
 the Unicode properties of characters, using pattern escapes such as \P, \p,
-and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
+and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i>,
-supported. Details are given in the
+script names, and some bi-directional properties are supported. Details are
 given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation.
 </P>
@ -615,9 +616,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 March 2020
+Last updated: 08 December 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@ -66,9 +66,9 @@ interprets them.
 6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 with \p and \P are limited to the general category properties such as Lu and
-Nd, script names such as Greek or Han, and the derived properties Any and L&.
+Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
-Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
+derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
-is limited. See the
+(surrogate) property, but in PCRE2 its use is limited. See the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation for details. The long synonyms for property names that Perl
 supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted
@ -257,7 +257,7 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -783,11 +783,13 @@ escape sequences are:
  \P{<i>xx</i>}   a character without the <i>xx</i> property
  \X       a Unicode extended grapheme cluster
 </pre>
-The property names represented by <i>xx</i> above are case-sensitive. There is
+The property names represented by <i>xx</i> above are not case-sensitive, and in 
-support for Unicode script names, Unicode general category properties, "Any",
+accordance with Unicode's "loose matching" rules, spaces, hyphens, and
-which matches any character (including newline), and some special PCRE2
+underscores are ignored. There is support for Unicode script names, Unicode
-properties (described in the
+general category properties, "Any", which matches any character (including
-<a href="#extraprops">next section).</a>
+newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
 (described
 <a href="#extraprops">below).</a>
 Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
 Note that \P{Any} does not match any characters, so always causes a match
 failure.
@ -1030,9 +1032,9 @@ The following general category property codes are supported:
  Zp    Paragraph separator
  Zs    Space separator
 </pre>
-The special property L& is also supported: it matches a character that has
+The special property LC, which has the synonym L&, is also supported: it
-the Lu, Ll, or Lt property, in other words, a letter that is not classified as
+matches a character that has the Lu, Ll, or Lt property, in other words, a
-a modifier or "other".
+letter that is not classified as a modifier or "other".
 </P>
 <P>
 The Cs (Surrogate) property applies only to characters whose code points are in
@ -1067,6 +1069,45 @@ properties in PCRE2 by default, though you can make them do so by setting the
 PCRE2_UCP option or by starting the pattern with (*UCP).
 </P>
 <br><b>
 Bi-directional properties for \p and \P
 </b><br>
 <P>
 Two properties relating to bi-directional text are supported:
 <pre>
  \p{Bidi_Control}         matches a Bidi control character
  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
 </pre>
 The recognized classes are:
 <pre>
  AL          Arabic letter
  AN          Arabic number
  B           paragraph separator
  BN          boundary neutral
  CS          common separator
  EN          European number
  ES          European separator
  ET          European terminator
  FSI         first strong isolate
  L           left-to-right            
  LRE         left-to-right embedding
  LRI         left-to-right isolate
  LRO         left-to-right override
  NSM         non-spacing mark
  ON          other neutral
  PDF         pop directional format
  PDI         pop directional isolate
  R           right-to-left
  RLE         right-to-left embedding
  RLI         right-to-left isolate
  RLO         right-to-left override
  S           segment separator
  WS          which space             
 </pre>
 For Bidi_Class, an equals sign may be used instead of a colon. The class names 
 are case-insensitive. As for other properties, only the short names are 
 recognized.
 </P>
 <br><b>
 Extended grapheme clusters
 </b><br>
 <P>
@ -3861,7 +3902,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC32" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -20,28 +20,29 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
-<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
+<li><a name="TOC8" href="#SEC8">BIDI_PROPERTIES FOR \p AND \P</a>
-<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
+<li><a name="TOC9" href="#SEC9">CHARACTER CLASSES</a>
-<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC10" href="#SEC10">QUANTIFIERS</a>
-<li><a name="TOC11" href="#SEC11">REPORTED MATCH POINT SETTING</a>
+<li><a name="TOC11" href="#SEC11">ANCHORS AND SIMPLE ASSERTIONS</a>
-<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
+<li><a name="TOC12" href="#SEC12">REPORTED MATCH POINT SETTING</a>
-<li><a name="TOC13" href="#SEC13">CAPTURING</a>
+<li><a name="TOC13" href="#SEC13">ALTERNATION</a>
-<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
+<li><a name="TOC14" href="#SEC14">CAPTURING</a>
-<li><a name="TOC15" href="#SEC15">COMMENT</a>
+<li><a name="TOC15" href="#SEC15">ATOMIC GROUPS</a>
-<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
+<li><a name="TOC16" href="#SEC16">COMMENT</a>
-<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
+<li><a name="TOC17" href="#SEC17">OPTION SETTING</a>
-<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
+<li><a name="TOC18" href="#SEC18">NEWLINE CONVENTION</a>
-<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC19" href="#SEC19">WHAT \R MATCHES</a>
-<li><a name="TOC20" href="#SEC20">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
+<li><a name="TOC20" href="#SEC20">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
-<li><a name="TOC21" href="#SEC21">SCRIPT RUNS</a>
+<li><a name="TOC21" href="#SEC21">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
-<li><a name="TOC22" href="#SEC22">BACKREFERENCES</a>
+<li><a name="TOC22" href="#SEC22">SCRIPT RUNS</a>
-<li><a name="TOC23" href="#SEC23">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC23" href="#SEC23">BACKREFERENCES</a>
-<li><a name="TOC24" href="#SEC24">CONDITIONAL PATTERNS</a>
+<li><a name="TOC24" href="#SEC24">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
-<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
+<li><a name="TOC25" href="#SEC25">CONDITIONAL PATTERNS</a>
-<li><a name="TOC26" href="#SEC26">CALLOUTS</a>
+<li><a name="TOC26" href="#SEC26">BACKTRACKING CONTROL</a>
-<li><a name="TOC27" href="#SEC27">SEE ALSO</a>
+<li><a name="TOC27" href="#SEC27">CALLOUTS</a>
-<li><a name="TOC28" href="#SEC28">AUTHOR</a>
+<li><a name="TOC28" href="#SEC28">SEE ALSO</a>
-<li><a name="TOC29" href="#SEC29">REVISION</a>
+<li><a name="TOC29" href="#SEC29">AUTHOR</a>
 <li><a name="TOC30" href="#SEC30">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 <P>
@ -362,7 +363,40 @@ Yezidi,
 Yi,
 Zanabazar_Square.
 </P>
-<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
+<br><a name="SEC8" href="#TOC1">BIDI_PROPERTIES FOR \p AND \P</a><br>
 <P>
 <pre>
  \p{Bidi_Control}         matches a Bidi control character
  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
 </pre>
 The recognized classes are:
 <pre>
  AL          Arabic letter
  AN          Arabic number
  B           paragraph separator
  BN          boundary neutral
  CS          common separator
  EN          European number
  ES          European separator
  ET          European terminator
  FSI         first strong isolate
  L           left-to-right            
  LRE         left-to-right embedding
  LRI         left-to-right isolate
  LRO         left-to-right override
  NSM         non-spacing mark
  ON          other neutral
  PDF         pop directional format
  PDI         pop directional isolate
  R           right-to-left
  RLE         right-to-left embedding
  RLI         right-to-left isolate
  RLO         right-to-left override
  S           segment separator
  WS          which space             
 </PRE>
 </P>
 <br><a name="SEC9" href="#TOC1">CHARACTER CLASSES</a><br>
 <P>
 <pre>
  [...]       positive character class
@ -390,7 +424,7 @@ In PCRE2, POSIX character set names recognize only ASCII characters by default,
 but some of them use Unicode properties if PCRE2_UCP is set. You can use
 \Q...\E inside a character class.
 </P>
-<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
+<br><a name="SEC10" href="#TOC1">QUANTIFIERS</a><br>
 <P>
 <pre>
  ?           0 or 1, greedy
@ -411,7 +445,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
  {n,}?       n or more, lazy
 </PRE>
 </P>
-<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<br><a name="SEC11" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
 <P>
 <pre>
  \b          word boundary
@ -429,7 +463,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
  \G          first matching position in subject
 </PRE>
 </P>
-<br><a name="SEC11" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
+<br><a name="SEC12" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
 <P>
 <pre>
  \K          set reported start of match
@ -439,13 +473,13 @@ for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
 option is set, the previous behaviour is re-enabled. When this option is set,
 \K is honoured in positive assertions, but ignored in negative ones.
 </P>
-<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
+<br><a name="SEC13" href="#TOC1">ALTERNATION</a><br>
 <P>
 <pre>
  expr|expr|expr...
 </PRE>
 </P>
-<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
+<br><a name="SEC14" href="#TOC1">CAPTURING</a><br>
 <P>
 <pre>
  (...)           capture group
@ -460,20 +494,20 @@ In non-UTF modes, names may contain underscores and ASCII letters and digits;
 in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
 both cases, a name must not start with a digit.
 </P>
-<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
+<br><a name="SEC15" href="#TOC1">ATOMIC GROUPS</a><br>
 <P>
 <pre>
  (?&#62;...)         atomic non-capture group
  (*atomic:...)   atomic non-capture group
 </PRE>
 </P>
-<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
+<br><a name="SEC16" href="#TOC1">COMMENT</a><br>
 <P>
 <pre>
  (?#....)        comment (not nestable)
 </PRE>
 </P>
-<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
+<br><a name="SEC17" href="#TOC1">OPTION SETTING</a><br>
 <P>
 Changes of these options within a group are automatically cancelled at the end
 of the group.
@ -518,7 +552,7 @@ not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
 application can lock out the use of (*UTF) and (*UCP) by setting the
 PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
 </P>
-<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
+<br><a name="SEC18" href="#TOC1">NEWLINE CONVENTION</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 settings with a similar syntax.
@ -531,7 +565,7 @@ settings with a similar syntax.
  (*NUL)          the NUL character (binary zero)
 </PRE>
 </P>
-<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
+<br><a name="SEC19" href="#TOC1">WHAT \R MATCHES</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 setting with a similar syntax.
@ -540,7 +574,7 @@ setting with a similar syntax.
  (*BSR_UNICODE)  any Unicode newline sequence
 </PRE>
 </P>
-<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<br><a name="SEC20" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <P>
 <pre>
  (?=...)                     )
@ -561,7 +595,7 @@ setting with a similar syntax.
 </pre>
 Each top-level branch of a lookbehind must be of a fixed length.
 </P>
-<br><a name="SEC20" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
+<br><a name="SEC21" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
 <P>
 These assertions are specific to PCRE2 and are not Perl-compatible.
 <pre>
@ -574,7 +608,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  (*non_atomic_positive_lookbehind:...)  )
 </PRE>
 </P>
-<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
+<br><a name="SEC22" href="#TOC1">SCRIPT RUNS</a><br>
 <P>
 <pre>
  (*script_run:...)           ) script run, can be backtracked into
@ -584,7 +618,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  (*asr:...)                  )
 </PRE>
 </P>
-<br><a name="SEC22" href="#TOC1">BACKREFERENCES</a><br>
+<br><a name="SEC23" href="#TOC1">BACKREFERENCES</a><br>
 <P>
 <pre>
  \n              reference by number (can be ambiguous)
@ -601,7 +635,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  (?P=name)       reference by name (Python)
 </PRE>
 </P>
-<br><a name="SEC23" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<br><a name="SEC24" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 <P>
 <pre>
  (?R)            recurse whole pattern
@ -620,7 +654,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  \g'-n'          call subroutine by relative number (PCRE2 extension)
 </PRE>
 </P>
-<br><a name="SEC24" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<br><a name="SEC25" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 <P>
 <pre>
  (?(condition)yes-pattern)
@ -643,7 +677,7 @@ Note the ambiguity of (?(R) and (?(Rn) which might be named reference
 conditions or recursion tests. Such a condition is interpreted as a reference
 condition if the relevant named group exists.
 </P>
-<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC26" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
 name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
@ -670,7 +704,7 @@ pattern is not anchored.
 The effect of one of these verbs in a group called as a subroutine is confined
 to the subroutine call.
 </P>
-<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC27" href="#TOC1">CALLOUTS</a><br>
 <P>
 <pre>
  (?C)            callout (assumed number 0)
@ -681,12 +715,12 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
 start and the end), and the starting delimiter { matched with the ending
 delimiter }. To encode the ending delimiter within the string, double it.
 </P>
-<br><a name="SEC27" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 <b>pcre2matching</b>(3), <b>pcre2</b>(3).
 </P>
-<br><a name="SEC28" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -695,9 +729,9 @@ Retired from University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC29" href="#TOC1">REVISION</a><br>
+<br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 August 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2unicode.html
+++ b/doc/html/pcre2unicode.html
@ -52,13 +52,13 @@ When PCRE2 is built with Unicode support, the escape sequences \p{..},
 \P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
 The Unicode properties that can be tested are limited to the general category
 properties such as Lu for an upper case letter or Nd for a decimal number, the
-Unicode script names such as Arabic or Han, and the derived properties Any and
+Unicode script names such as Arabic or Han, Bidi_Class, Bidi_Control, and the
-L&. Full lists are given in the
+derived properties Any and LC (synonym L&). Full lists are given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 and
 <a href="pcre2syntax.html"><b>pcre2syntax</b></a>
 documentation. Only the short names for properties are supported. For example,
-\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
+\p{L} matches a letter. Its longer synonym, \p{Letter}, is not supported.
 Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
 compatibility with Perl 5.6. PCRE2 does not support this.
 </P>
@ -486,9 +486,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 23 February 2020
+Last updated: 08 December 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@ -2012,13 +2012,13 @@ LOCALE SUPPORT
       code  points  are  less than 256. By default, higher-valued code points
       never match escapes such as \w or \d.
-       When PCRE2 is built with Unicode support  (the  default),  the  Unicode
+       When PCRE2 is built with Unicode support (the default), certain Unicode
-       properties of all characters can be tested with \p and \P, or, alterna-
+       character  properties  can be tested with \p and \P, or, alternatively,
-       tively, the PCRE2_UCP option can be set when  a  pattern  is  compiled;
+       the PCRE2_UCP option can be set when a pattern is compiled; this causes
-       this  causes  \w and friends to use Unicode property support instead of
+       \w  and friends to use Unicode property support instead of the built-in
-       the built-in tables.  PCRE2_UCP also causes upper/lower  casing  opera-
+       tables.  PCRE2_UCP also causes upper/lower casing operations on charac-
-       tions  on  characters  with code points greater than 127 to use Unicode
+       ters with code points greater than 127 to use Unicode properties. These
-       properties. These effects apply even when PCRE2_UTF is not set.
+       effects apply even when PCRE2_UTF is not set.
       The use of locales with Unicode is discouraged.  If  you  are  handling
       characters  with  code  points  greater than 127, you should either use
@ -3857,7 +3857,7 @@ AUTHOR
 REVISION
-       Last updated: 30 November 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
@ -3970,8 +3970,8 @@ UNICODE AND UTF SUPPORT
       0x10ffff  in  the  strings that they handle. Unicode support also gives
       access to the Unicode properties of characters, using  pattern  escapes
       such as \P, \p, and \X. Only the general category properties such as Lu
-       and Nd are supported. Details are given in the pcre2pattern  documenta-
+       and Nd, script names, and some bi-directional properties are supported.
-       tion.
+       Details are given in the pcre2pattern documentation.
       Pattern escapes such as \d and \w do not by default make use of Unicode
       properties. The application can request that they  do  by  setting  the
@ -4453,8 +4453,8 @@ AUTHOR
 REVISION
-       Last updated: 20 March 2020
+       Last updated: 08 December 2021
-       Copyright (c) 1997-2020 University of Cambridge.
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
@ -4941,12 +4941,13 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
       6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
       is built with Unicode support (the default). The properties that can be
       tested with \p and \P are limited to the  general  category  properties
-       such  as  Lu and Nd, script names such as Greek or Han, and the derived
+       such  as  Lu  and  Nd,  script  names such as Greek or Han, Bidi_Class,
-       properties Any and L&.  Both PCRE2 and Perl support the Cs  (surrogate)
+       Bidi_Control, and the derived properties Any and LC (synonym L&).  Both
-       property,  but  in PCRE2 its use is limited. See the pcre2pattern docu-
+       PCRE2  and  Perl  support the Cs (surrogate) property, but in PCRE2 its
-       mentation for details. The long synonyms for property names  that  Perl
+       use is limited. See the pcre2pattern  documentation  for  details.  The
-       supports  (such  as  \p{Letter})  are not supported by PCRE2, nor is it
+       long  synonyms  for  property names that Perl supports (such as \p{Let-
-       permitted to prefix any of these properties with "Is".
+       ter}) are not supported by PCRE2, nor is it permitted to prefix any  of
       these properties with "Is".
       7. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
       in between are treated as literals. However, this is slightly different
@ -5105,7 +5106,7 @@ AUTHOR
 REVISION
-       Last updated: 01 December 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
@ -6894,13 +6895,14 @@ BACKSLASH
         \P{xx}   a character without the xx property
         \X       a Unicode extended grapheme cluster
-       The property names represented by xx above are case-sensitive. There is
+       The  property names represented by xx above are not case-sensitive, and
-       support for Unicode script names, Unicode general category  properties,
+       in accordance with Unicode's "loose matching" rules,  spaces,  hyphens,
-       "Any",  which  matches any character (including newline), and some spe-
+       and underscores are ignored. There is support for Unicode script names,
-       cial PCRE2 properties (described in  the  next  section).   Other  Perl
+       Unicode general category properties, "Any", which matches any character
-       properties such as "InMusicalSymbols" are not supported by PCRE2.  Note
+       (including  newline),  Bidi_Control, Bidi_Class, and some special PCRE2
-       that \P{Any} does not match any characters, so always  causes  a  match
+       properties (described below).  Other Perl properties such  as  "InMusi-
-       failure.
+       calSymbols"  are  not  supported  by PCRE2.  Note that \P{Any} does not
       match any characters, so always causes a match failure.
       Sets of Unicode characters are defined as belonging to certain scripts.
       A  character from one of these sets can be matched using a script name.
@ -7000,9 +7002,9 @@ BACKSLASH
         Zp    Paragraph separator
         Zs    Space separator
-       The  special property L& is also supported: it matches a character that
+       The special property LC, which has the synonym L&, is  also  supported:
-       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
+       it  matches  a  character that has the Lu, Ll, or Lt property, in other
-       classified as a modifier or "other".
+       words, a letter that is not classified as a modifier or "other".
       The Cs (Surrogate) property  applies  only  to  characters  whose  code
       points  are in the range U+D800 to U+DFFF. These characters are no dif-
@ -7031,6 +7033,43 @@ BACKSLASH
       them do so by setting the PCRE2_UCP option or by starting  the  pattern
       with (*UCP).
   Bi-directional properties for \p and \P
       Two properties relating to bi-directional text are supported:
         \p{Bidi_Control}         matches a Bidi control character
         \p{Bidi_Class:<class>}   matches a character with the given class
       The recognized classes are:
         AL          Arabic letter
         AN          Arabic number
         B           paragraph separator
         BN          boundary neutral
         CS          common separator
         EN          European number
         ES          European separator
         ET          European terminator
         FSI         first strong isolate
         L           left-to-right
         LRE         left-to-right embedding
         LRI         left-to-right isolate
         LRO         left-to-right override
         NSM         non-spacing mark
         ON          other neutral
         PDF         pop directional format
         PDI         pop directional isolate
         R           right-to-left
         RLE         right-to-left embedding
         RLI         right-to-left isolate
         RLO         right-to-left override
         S           segment separator
         WS          which space
       For  Bidi_Class,  an  equals  sign  may be used instead of a colon. The
       class names are case-insensitive. As for  other  properties,  only  the
       short names are recognized.
   Extended grapheme clusters
       The  \X  escape  matches  any number of Unicode characters that form an
@ -9659,7 +9698,7 @@ AUTHOR
 REVISION
-       Last updated: 01 December 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
@ -10698,6 +10737,38 @@ SCRIPT NAMES FOR \p AND \P
       Warang_Citi, Yezidi, Yi, Zanabazar_Square.
 BIDI_PROPERTIES FOR \p AND \P
         \p{Bidi_Control}         matches a Bidi control character
         \p{Bidi_Class:<class>}   matches a character with the given class
       The recognized classes are:
         AL          Arabic letter
         AN          Arabic number
         B           paragraph separator
         BN          boundary neutral
         CS          common separator
         EN          European number
         ES          European separator
         ET          European terminator
         FSI         first strong isolate
         L           left-to-right
         LRE         left-to-right embedding
         LRI         left-to-right isolate
         LRO         left-to-right override
         NSM         non-spacing mark
         ON          other neutral
         PDF         pop directional format
         PDI         pop directional isolate
         R           right-to-left
         RLE         right-to-left embedding
         RLI         right-to-left isolate
         RLO         right-to-left override
         S           segment separator
         WS          which space
 CHARACTER CLASSES
         [...]       positive character class
@ -11027,7 +11098,7 @@ AUTHOR
 REVISION
-       Last updated: 30 August 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
@ -11073,12 +11144,13 @@ UNICODE PROPERTY SUPPORT
       ting.   The  Unicode  properties  that can be tested are limited to the
       general category properties such as Lu for an upper case letter  or  Nd
       for  a  decimal number, the Unicode script names such as Arabic or Han,
-       and the derived properties Any and L&. Full  lists  are  given  in  the
+       Bidi_Class, Bidi_Control, and the derived properties Any and  LC  (syn-
-       pcre2pattern  and  pcre2syntax  documentation. Only the short names for
+       onym L&). Full lists are given in the pcre2pattern and pcre2syntax doc-
-       properties are supported. For example, \p{L} matches a letter. Its Perl
+       umentation. Only the short names for properties are supported. For  ex-
-       synonym,  \p{Letter},  is  not  supported.   Furthermore, in Perl, many
+       ample,  \p{L}  matches a letter. Its longer synonym, \p{Letter}, is not
-       properties may optionally be prefixed by "Is", for  compatibility  with
+       supported.  Furthermore, in Perl, many  properties  may  optionally  be
-       Perl 5.6. PCRE2 does not support this.
+       prefixed  by "Is", for compatibility with Perl 5.6. PCRE2 does not sup-
       port this.
 WIDE CHARACTERS AND UTF MODES
@ -11462,8 +11534,8 @@ AUTHOR
 REVISION
-       Last updated: 23 February 2020
+       Last updated: 08 December 2021
-       Copyright (c) 1997-2020 University of Cambridge.
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "30 November 2021" "PCRE2 10.40"
+.TH PCRE2API 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -2015,8 +2015,8 @@ point. However, this applies only to characters whose code points are less than
 256. By default, higher-valued code points never match escapes such as \ew or
 \ed.
 .P
-When PCRE2 is built with Unicode support (the default), the Unicode properties
+When PCRE2 is built with Unicode support (the default), certain Unicode
-of all characters can be tested with \ep and \eP, or, alternatively, the
+character properties can be tested with \ep and \eP, or, alternatively, the
 PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
 friends to use Unicode property support instead of the built-in tables.
 PCRE2_UCP also causes upper/lower casing operations on characters with code
@ -4025,6 +4025,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 30 November 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2build.3
+++ b/doc/pcre2build.3
@ -1,4 +1,4 @@
-.TH PCRE2BUILD 3 "20 March 2020" "PCRE2 10.35"
+.TH PCRE2BUILD 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .
@ -122,8 +122,9 @@ locked this out by setting PCRE2_NEVER_UTF.
 UTF support allows the libraries to process character code points up to
 0x10ffff in the strings that they handle. Unicode support also gives access to
 the Unicode properties of characters, using pattern escapes such as \eP, \ep,
-and \eX. Only the general category properties such as \fILu\fP and \fINd\fP are
+and \eX. Only the general category properties such as \fILu\fP and \fINd\fP,
-supported. Details are given in the
+script names, and some bi-directional properties are supported. Details are
 given in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@ -633,6 +634,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 20 March 2020
+Last updated: 08 December 2021
-Copyright (c) 1997-2020 University of Cambridge.
+Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "01 December 2021" "PCRE2 10.40"
+.TH PCRE2COMPAT 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -50,9 +50,9 @@ interprets them.
 6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 with \ep and \eP are limited to the general category properties such as Lu and
-Nd, script names such as Greek or Han, and the derived properties Any and L&.
+Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
-Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
+derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
-is limited. See the
+(surrogate) property, but in PCRE2 its use is limited. See the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@ -222,6 +222,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "01 December 2021" "PCRE2 10.40"
+.TH PCRE2PATTERN 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -779,13 +779,15 @@ escape sequences are:
  \eP{\fIxx\fP}   a character without the \fIxx\fP property
  \eX       a Unicode extended grapheme cluster
 .sp
-The property names represented by \fIxx\fP above are case-sensitive. There is
+The property names represented by \fIxx\fP above are not case-sensitive, and in 
-support for Unicode script names, Unicode general category properties, "Any",
+accordance with Unicode's "loose matching" rules, spaces, hyphens, and
-which matches any character (including newline), and some special PCRE2
+underscores are ignored. There is support for Unicode script names, Unicode
-properties (described in the
+general category properties, "Any", which matches any character (including
 newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
 (described
 .\" HTML <a href="#extraprops">
 .\" </a>
-next section).
+below).
 .\"
 Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
 Note that \eP{Any} does not match any characters, so always causes a match
@ -1025,9 +1027,9 @@ The following general category property codes are supported:
  Zp    Paragraph separator
  Zs    Space separator
 .sp
-The special property L& is also supported: it matches a character that has
+The special property LC, which has the synonym L&, is also supported: it
-the Lu, Ll, or Lt property, in other words, a letter that is not classified as
+matches a character that has the Lu, Ll, or Lt property, in other words, a
-a modifier or "other".
+letter that is not classified as a modifier or "other".
 .P
 The Cs (Surrogate) property applies only to characters whose code points are in
 the range U+D800 to U+DFFF. These characters are no different to any other
@ -1059,6 +1061,45 @@ properties in PCRE2 by default, though you can make them do so by setting the
 PCRE2_UCP option or by starting the pattern with (*UCP).
 .
 .
 .SS "Bi-directional properties for \ep and \eP"
 .rs
 .sp
 Two properties relating to bi-directional text are supported:
 .sp
  \ep{Bidi_Control}         matches a Bidi control character
  \ep{Bidi_Class:<class>}   matches a character with the given class
 .sp
 The recognized classes are:
 .sp
  AL          Arabic letter
  AN          Arabic number
  B           paragraph separator
  BN          boundary neutral
  CS          common separator
  EN          European number
  ES          European separator
  ET          European terminator
  FSI         first strong isolate
  L           left-to-right            
  LRE         left-to-right embedding
  LRI         left-to-right isolate
  LRO         left-to-right override
  NSM         non-spacing mark
  ON          other neutral
  PDF         pop directional format
  PDI         pop directional isolate
  R           right-to-left
  RLE         right-to-left embedding
  RLI         right-to-left isolate
  RLO         right-to-left override
  S           segment separator
  WS          which space             
 .sp
 For Bidi_Class, an equals sign may be used instead of a colon. The class names 
 are case-insensitive. As for other properties, only the short names are 
 recognized.
 .
 .
 .SS Extended grapheme clusters
 .rs
 .sp
@ -3909,6 +3950,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "30 August 2021" "PCRE2 10.38"
+.TH PCRE2SYNTAX 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -333,6 +333,39 @@ Yi,
 Zanabazar_Square.
 .
 .
 .SH "BIDI_PROPERTIES FOR \ep AND \eP"
 .rs
 .sp
  \ep{Bidi_Control}         matches a Bidi control character
  \ep{Bidi_Class:<class>}   matches a character with the given class
 .sp
 The recognized classes are:
 .sp
  AL          Arabic letter
  AN          Arabic number
  B           paragraph separator
  BN          boundary neutral
  CS          common separator
  EN          European number
  ES          European separator
  ET          European terminator
  FSI         first strong isolate
  L           left-to-right            
  LRE         left-to-right embedding
  LRI         left-to-right isolate
  LRO         left-to-right override
  NSM         non-spacing mark
  ON          other neutral
  PDF         pop directional format
  PDI         pop directional isolate
  R           right-to-left
  RLE         right-to-left embedding
  RLI         right-to-left isolate
  RLO         right-to-left override
  S           segment separator
  WS          which space             
 .
 .
 .SH "CHARACTER CLASSES"
 .rs
 .sp
@ -684,6 +717,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 30 August 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2unicode.3
+++ b/doc/pcre2unicode.3
@ -1,4 +1,4 @@
-.TH PCRE2UNICODE 3 "23 February 2020" "PCRE2 10.35"
+.TH PCRE2UNICODE 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE - Perl-compatible regular expressions (revised API)
 .SH "UNICODE AND UTF SUPPORT"
@ -42,8 +42,8 @@ When PCRE2 is built with Unicode support, the escape sequences \ep{..},
 \eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting.
 The Unicode properties that can be tested are limited to the general category
 properties such as Lu for an upper case letter or Nd for a decimal number, the
-Unicode script names such as Arabic or Han, and the derived properties Any and
+Unicode script names such as Arabic or Han, Bidi_Class, Bidi_Control, and the
-L&. Full lists are given in the
+derived properties Any and LC (synonym L&). Full lists are given in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@ -52,7 +52,7 @@ and
 \fBpcre2syntax\fP
 .\"
 documentation. Only the short names for properties are supported. For example,
-\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
+\ep{L} matches a letter. Its longer synonym, \ep{Letter}, is not supported.
 Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
 compatibility with Perl 5.6. PCRE2 does not support this.
 .
@ -457,6 +457,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 23 February 2020
+Last updated: 08 December 2021
-Copyright (c) 1997-2020 University of Cambridge.
+Copyright (c) 1997-2021 University of Cambridge.
 .fi