Documentation for Bidi_Control and Bidi_Class

2021-12-08 16:37:34 +00:00 · 2021-12-08 16:37:34 +00:00 · 30abd0ac8d
parent 0246c6bf64
commit 30abd0ac8d
14 changed files with 1673 additions and 1448 deletions
--- a/2
+++ b/2
@ -39,6 +39,8 @@ pcre2_substitute(), and the replacement argument of the latter, if the pointer
 is NULL and the length is zero, treat as an empty string. Apparently a number 
 of applications treat NULL/0 in this way.

+14. Added support for Bidi_Class and Bidi_Control Unicode properties.
+

 Version 10.39 29-October-2021
 -----------------------------
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -2055,8 +2055,8 @@ point. However, this applies only to characters whose code points are less than
 \d.
 </P>
 <P>
-When PCRE2 is built with Unicode support (the default), the Unicode properties
-of all characters can be tested with \p and \P, or, alternatively, the
+When PCRE2 is built with Unicode support (the default), certain Unicode
+character properties can be tested with \p and \P, or, alternatively, the
 PCRE2_UCP option can be set when a pattern is compiled; this causes \w and
 friends to use Unicode property support instead of the built-in tables.
 PCRE2_UCP also causes upper/lower casing operations on characters with code
@ -4018,7 +4018,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 November 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@ -142,8 +142,9 @@ locked this out by setting PCRE2_NEVER_UTF.
 UTF support allows the libraries to process character code points up to
 0x10ffff in the strings that they handle. Unicode support also gives access to
 the Unicode properties of characters, using pattern escapes such as \P, \p,
-and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
-supported. Details are given in the
+and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i>,
+script names, and some bi-directional properties are supported. Details are
+given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation.
 </P>
@ -615,9 +616,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 March 2020
+Last updated: 08 December 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@ -66,9 +66,9 @@ interprets them.
 6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 with \p and \P are limited to the general category properties such as Lu and
-Nd, script names such as Greek or Han, and the derived properties Any and L&.
-Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
-is limited. See the
+Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
+derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
+(surrogate) property, but in PCRE2 its use is limited. See the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation for details. The long synonyms for property names that Perl
 supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted
@ -257,7 +257,7 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -783,11 +783,13 @@ escape sequences are:
  \P{<i>xx</i>}   a character without the <i>xx</i> property
  \X       a Unicode extended grapheme cluster
 </pre>
-The property names represented by <i>xx</i> above are case-sensitive. There is
-support for Unicode script names, Unicode general category properties, "Any",
-which matches any character (including newline), and some special PCRE2
-properties (described in the
-<a href="#extraprops">next section).</a>
+The property names represented by <i>xx</i> above are not case-sensitive, and in 
+accordance with Unicode's "loose matching" rules, spaces, hyphens, and
+underscores are ignored. There is support for Unicode script names, Unicode
+general category properties, "Any", which matches any character (including
+newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
+(described
+<a href="#extraprops">below).</a>
 Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
 Note that \P{Any} does not match any characters, so always causes a match
 failure.
@ -1030,9 +1032,9 @@ The following general category property codes are supported:
  Zp    Paragraph separator
  Zs    Space separator
 </pre>
-The special property L& is also supported: it matches a character that has
-the Lu, Ll, or Lt property, in other words, a letter that is not classified as
-a modifier or "other".
+The special property LC, which has the synonym L&, is also supported: it
+matches a character that has the Lu, Ll, or Lt property, in other words, a
+letter that is not classified as a modifier or "other".
 </P>
 <P>
 The Cs (Surrogate) property applies only to characters whose code points are in
@ -1067,6 +1069,45 @@ properties in PCRE2 by default, though you can make them do so by setting the
 PCRE2_UCP option or by starting the pattern with (*UCP).
 </P>
 <br><b>
+Bi-directional properties for \p and \P
+</b><br>
+<P>
+Two properties relating to bi-directional text are supported:
+<pre>
+  \p{Bidi_Control}         matches a Bidi control character
+  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
+</pre>
+The recognized classes are:
+<pre>
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right            
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space             
+</pre>
+For Bidi_Class, an equals sign may be used instead of a colon. The class names 
+are case-insensitive. As for other properties, only the short names are 
+recognized.
+</P>
+<br><b>
 Extended grapheme clusters
 </b><br>
 <P>
@ -3861,7 +3902,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC32" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -20,28 +20,29 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
-<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
-<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
-<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
-<li><a name="TOC11" href="#SEC11">REPORTED MATCH POINT SETTING</a>
-<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
-<li><a name="TOC13" href="#SEC13">CAPTURING</a>
-<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
-<li><a name="TOC15" href="#SEC15">COMMENT</a>
-<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
-<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
-<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
-<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
-<li><a name="TOC20" href="#SEC20">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
-<li><a name="TOC21" href="#SEC21">SCRIPT RUNS</a>
-<li><a name="TOC22" href="#SEC22">BACKREFERENCES</a>
-<li><a name="TOC23" href="#SEC23">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
-<li><a name="TOC24" href="#SEC24">CONDITIONAL PATTERNS</a>
-<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
-<li><a name="TOC26" href="#SEC26">CALLOUTS</a>
-<li><a name="TOC27" href="#SEC27">SEE ALSO</a>
-<li><a name="TOC28" href="#SEC28">AUTHOR</a>
-<li><a name="TOC29" href="#SEC29">REVISION</a>
+<li><a name="TOC8" href="#SEC8">BIDI_PROPERTIES FOR \p AND \P</a>
+<li><a name="TOC9" href="#SEC9">CHARACTER CLASSES</a>
+<li><a name="TOC10" href="#SEC10">QUANTIFIERS</a>
+<li><a name="TOC11" href="#SEC11">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC12" href="#SEC12">REPORTED MATCH POINT SETTING</a>
+<li><a name="TOC13" href="#SEC13">ALTERNATION</a>
+<li><a name="TOC14" href="#SEC14">CAPTURING</a>
+<li><a name="TOC15" href="#SEC15">ATOMIC GROUPS</a>
+<li><a name="TOC16" href="#SEC16">COMMENT</a>
+<li><a name="TOC17" href="#SEC17">OPTION SETTING</a>
+<li><a name="TOC18" href="#SEC18">NEWLINE CONVENTION</a>
+<li><a name="TOC19" href="#SEC19">WHAT \R MATCHES</a>
+<li><a name="TOC20" href="#SEC20">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC21" href="#SEC21">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
+<li><a name="TOC22" href="#SEC22">SCRIPT RUNS</a>
+<li><a name="TOC23" href="#SEC23">BACKREFERENCES</a>
+<li><a name="TOC24" href="#SEC24">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC25" href="#SEC25">CONDITIONAL PATTERNS</a>
+<li><a name="TOC26" href="#SEC26">BACKTRACKING CONTROL</a>
+<li><a name="TOC27" href="#SEC27">CALLOUTS</a>
+<li><a name="TOC28" href="#SEC28">SEE ALSO</a>
+<li><a name="TOC29" href="#SEC29">AUTHOR</a>
+<li><a name="TOC30" href="#SEC30">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 <P>
@ -362,7 +363,40 @@ Yezidi,
 Yi,
 Zanabazar_Square.
 </P>
-<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
+<br><a name="SEC8" href="#TOC1">BIDI_PROPERTIES FOR \p AND \P</a><br>
+<P>
+<pre>
+  \p{Bidi_Control}         matches a Bidi control character
+  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
+</pre>
+The recognized classes are:
+<pre>
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right            
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space             
+</PRE>
+</P>
+<br><a name="SEC9" href="#TOC1">CHARACTER CLASSES</a><br>
 <P>
 <pre>
  [...]       positive character class
@ -390,7 +424,7 @@ In PCRE2, POSIX character set names recognize only ASCII characters by default,
 but some of them use Unicode properties if PCRE2_UCP is set. You can use
 \Q...\E inside a character class.
 </P>
-<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
+<br><a name="SEC10" href="#TOC1">QUANTIFIERS</a><br>
 <P>
 <pre>
  ?           0 or 1, greedy
@ -411,7 +445,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
  {n,}?       n or more, lazy
 </PRE>
 </P>
-<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<br><a name="SEC11" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
 <P>
 <pre>
  \b          word boundary
@ -429,7 +463,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
  \G          first matching position in subject
 </PRE>
 </P>
-<br><a name="SEC11" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
+<br><a name="SEC12" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
 <P>
 <pre>
  \K          set reported start of match
@ -439,13 +473,13 @@ for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
 option is set, the previous behaviour is re-enabled. When this option is set,
 \K is honoured in positive assertions, but ignored in negative ones.
 </P>
-<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
+<br><a name="SEC13" href="#TOC1">ALTERNATION</a><br>
 <P>
 <pre>
  expr|expr|expr...
 </PRE>
 </P>
-<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
+<br><a name="SEC14" href="#TOC1">CAPTURING</a><br>
 <P>
 <pre>
  (...)           capture group
@ -460,20 +494,20 @@ In non-UTF modes, names may contain underscores and ASCII letters and digits;
 in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
 both cases, a name must not start with a digit.
 </P>
-<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
+<br><a name="SEC15" href="#TOC1">ATOMIC GROUPS</a><br>
 <P>
 <pre>
  (?&#62;...)         atomic non-capture group
  (*atomic:...)   atomic non-capture group
 </PRE>
 </P>
-<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
+<br><a name="SEC16" href="#TOC1">COMMENT</a><br>
 <P>
 <pre>
  (?#....)        comment (not nestable)
 </PRE>
 </P>
-<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
+<br><a name="SEC17" href="#TOC1">OPTION SETTING</a><br>
 <P>
 Changes of these options within a group are automatically cancelled at the end
 of the group.
@ -518,7 +552,7 @@ not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
 application can lock out the use of (*UTF) and (*UCP) by setting the
 PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
 </P>
-<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
+<br><a name="SEC18" href="#TOC1">NEWLINE CONVENTION</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 settings with a similar syntax.
@ -531,7 +565,7 @@ settings with a similar syntax.
  (*NUL)          the NUL character (binary zero)
 </PRE>
 </P>
-<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
+<br><a name="SEC19" href="#TOC1">WHAT \R MATCHES</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 setting with a similar syntax.
@ -540,7 +574,7 @@ setting with a similar syntax.
  (*BSR_UNICODE)  any Unicode newline sequence
 </PRE>
 </P>
-<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<br><a name="SEC20" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <P>
 <pre>
  (?=...)                     )
@ -561,7 +595,7 @@ setting with a similar syntax.
 </pre>
 Each top-level branch of a lookbehind must be of a fixed length.
 </P>
-<br><a name="SEC20" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
+<br><a name="SEC21" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
 <P>
 These assertions are specific to PCRE2 and are not Perl-compatible.
 <pre>
@ -574,7 +608,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  (*non_atomic_positive_lookbehind:...)  )
 </PRE>
 </P>
-<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
+<br><a name="SEC22" href="#TOC1">SCRIPT RUNS</a><br>
 <P>
 <pre>
  (*script_run:...)           ) script run, can be backtracked into
@ -584,7 +618,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  (*asr:...)                  )
 </PRE>
 </P>
-<br><a name="SEC22" href="#TOC1">BACKREFERENCES</a><br>
+<br><a name="SEC23" href="#TOC1">BACKREFERENCES</a><br>
 <P>
 <pre>
  \n              reference by number (can be ambiguous)
@ -601,7 +635,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  (?P=name)       reference by name (Python)
 </PRE>
 </P>
-<br><a name="SEC23" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<br><a name="SEC24" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 <P>
 <pre>
  (?R)            recurse whole pattern
@ -620,7 +654,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
  \g'-n'          call subroutine by relative number (PCRE2 extension)
 </PRE>
 </P>
-<br><a name="SEC24" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<br><a name="SEC25" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 <P>
 <pre>
  (?(condition)yes-pattern)
@ -643,7 +677,7 @@ Note the ambiguity of (?(R) and (?(Rn) which might be named reference
 conditions or recursion tests. Such a condition is interpreted as a reference
 condition if the relevant named group exists.
 </P>
-<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC26" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
 name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
@ -670,7 +704,7 @@ pattern is not anchored.
 The effect of one of these verbs in a group called as a subroutine is confined
 to the subroutine call.
 </P>
-<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC27" href="#TOC1">CALLOUTS</a><br>
 <P>
 <pre>
  (?C)            callout (assumed number 0)
@ -681,12 +715,12 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
 start and the end), and the starting delimiter { matched with the ending
 delimiter }. To encode the ending delimiter within the string, double it.
 </P>
-<br><a name="SEC27" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 <b>pcre2matching</b>(3), <b>pcre2</b>(3).
 </P>
-<br><a name="SEC28" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -695,9 +729,9 @@ Retired from University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC29" href="#TOC1">REVISION</a><br>
+<br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 August 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
--- a/doc/html/pcre2unicode.html
+++ b/doc/html/pcre2unicode.html
@ -52,13 +52,13 @@ When PCRE2 is built with Unicode support, the escape sequences \p{..},
 \P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
 The Unicode properties that can be tested are limited to the general category
 properties such as Lu for an upper case letter or Nd for a decimal number, the
-Unicode script names such as Arabic or Han, and the derived properties Any and
-L&. Full lists are given in the
+Unicode script names such as Arabic or Han, Bidi_Class, Bidi_Control, and the
+derived properties Any and LC (synonym L&). Full lists are given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 and
 <a href="pcre2syntax.html"><b>pcre2syntax</b></a>
 documentation. Only the short names for properties are supported. For example,
-\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
+\p{L} matches a letter. Its longer synonym, \p{Letter}, is not supported.
 Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
 compatibility with Perl 5.6. PCRE2 does not support this.
 </P>
@ -486,9 +486,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 23 February 2020
+Last updated: 08 December 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@ -2012,13 +2012,13 @@ LOCALE SUPPORT
       code  points  are  less than 256. By default, higher-valued code points
       never match escapes such as \w or \d.

-       When PCRE2 is built with Unicode support  (the  default),  the  Unicode
-       properties of all characters can be tested with \p and \P, or, alterna-
-       tively, the PCRE2_UCP option can be set when  a  pattern  is  compiled;
-       this  causes  \w and friends to use Unicode property support instead of
-       the built-in tables.  PCRE2_UCP also causes upper/lower  casing  opera-
-       tions  on  characters  with code points greater than 127 to use Unicode
-       properties. These effects apply even when PCRE2_UTF is not set.
+       When PCRE2 is built with Unicode support (the default), certain Unicode
+       character  properties  can be tested with \p and \P, or, alternatively,
+       the PCRE2_UCP option can be set when a pattern is compiled; this causes
+       \w  and friends to use Unicode property support instead of the built-in
+       tables.  PCRE2_UCP also causes upper/lower casing operations on charac-
+       ters with code points greater than 127 to use Unicode properties. These
+       effects apply even when PCRE2_UTF is not set.

       The use of locales with Unicode is discouraged.  If  you  are  handling
       characters  with  code  points  greater than 127, you should either use
@ -3857,7 +3857,7 @@ AUTHOR

 REVISION

-       Last updated: 30 November 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
@ -3970,8 +3970,8 @@ UNICODE AND UTF SUPPORT
       0x10ffff  in  the  strings that they handle. Unicode support also gives
       access to the Unicode properties of characters, using  pattern  escapes
       such as \P, \p, and \X. Only the general category properties such as Lu
-       and Nd are supported. Details are given in the pcre2pattern  documenta-
-       tion.
+       and Nd, script names, and some bi-directional properties are supported.
+       Details are given in the pcre2pattern documentation.

       Pattern escapes such as \d and \w do not by default make use of Unicode
       properties. The application can request that they  do  by  setting  the
@ -4453,8 +4453,8 @@ AUTHOR

 REVISION

-       Last updated: 20 March 2020
-       Copyright (c) 1997-2020 University of Cambridge.
+       Last updated: 08 December 2021
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@ -4941,12 +4941,13 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
       6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
       is built with Unicode support (the default). The properties that can be
       tested with \p and \P are limited to the  general  category  properties
-       such  as  Lu and Nd, script names such as Greek or Han, and the derived
-       properties Any and L&.  Both PCRE2 and Perl support the Cs  (surrogate)
-       property,  but  in PCRE2 its use is limited. See the pcre2pattern docu-
-       mentation for details. The long synonyms for property names  that  Perl
-       supports  (such  as  \p{Letter})  are not supported by PCRE2, nor is it
-       permitted to prefix any of these properties with "Is".
+       such  as  Lu  and  Nd,  script  names such as Greek or Han, Bidi_Class,
+       Bidi_Control, and the derived properties Any and LC (synonym L&).  Both
+       PCRE2  and  Perl  support the Cs (surrogate) property, but in PCRE2 its
+       use is limited. See the pcre2pattern  documentation  for  details.  The
+       long  synonyms  for  property names that Perl supports (such as \p{Let-
+       ter}) are not supported by PCRE2, nor is it permitted to prefix any  of
+       these properties with "Is".

       7. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
       in between are treated as literals. However, this is slightly different
@ -5105,7 +5106,7 @@ AUTHOR

 REVISION

-       Last updated: 01 December 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
@ -6894,13 +6895,14 @@ BACKSLASH
         \P{xx}   a character without the xx property
         \X       a Unicode extended grapheme cluster

-       The property names represented by xx above are case-sensitive. There is
-       support for Unicode script names, Unicode general category  properties,
-       "Any",  which  matches any character (including newline), and some spe-
-       cial PCRE2 properties (described in  the  next  section).   Other  Perl
-       properties such as "InMusicalSymbols" are not supported by PCRE2.  Note
-       that \P{Any} does not match any characters, so always  causes  a  match
-       failure.
+       The  property names represented by xx above are not case-sensitive, and
+       in accordance with Unicode's "loose matching" rules,  spaces,  hyphens,
+       and underscores are ignored. There is support for Unicode script names,
+       Unicode general category properties, "Any", which matches any character
+       (including  newline),  Bidi_Control, Bidi_Class, and some special PCRE2
+       properties (described below).  Other Perl properties such  as  "InMusi-
+       calSymbols"  are  not  supported  by PCRE2.  Note that \P{Any} does not
+       match any characters, so always causes a match failure.

       Sets of Unicode characters are defined as belonging to certain scripts.
       A  character from one of these sets can be matched using a script name.
@ -7000,9 +7002,9 @@ BACKSLASH
         Zp    Paragraph separator
         Zs    Space separator

-       The  special property L& is also supported: it matches a character that
-       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
-       classified as a modifier or "other".
+       The special property LC, which has the synonym L&, is  also  supported:
+       it  matches  a  character that has the Lu, Ll, or Lt property, in other
+       words, a letter that is not classified as a modifier or "other".

       The Cs (Surrogate) property  applies  only  to  characters  whose  code
       points  are in the range U+D800 to U+DFFF. These characters are no dif-
@ -7031,6 +7033,43 @@ BACKSLASH
       them do so by setting the PCRE2_UCP option or by starting  the  pattern
       with (*UCP).

+   Bi-directional properties for \p and \P
+
+       Two properties relating to bi-directional text are supported:
+
+         \p{Bidi_Control}         matches a Bidi control character
+         \p{Bidi_Class:<class>}   matches a character with the given class
+
+       The recognized classes are:
+
+         AL          Arabic letter
+         AN          Arabic number
+         B           paragraph separator
+         BN          boundary neutral
+         CS          common separator
+         EN          European number
+         ES          European separator
+         ET          European terminator
+         FSI         first strong isolate
+         L           left-to-right
+         LRE         left-to-right embedding
+         LRI         left-to-right isolate
+         LRO         left-to-right override
+         NSM         non-spacing mark
+         ON          other neutral
+         PDF         pop directional format
+         PDI         pop directional isolate
+         R           right-to-left
+         RLE         right-to-left embedding
+         RLI         right-to-left isolate
+         RLO         right-to-left override
+         S           segment separator
+         WS          which space
+
+       For  Bidi_Class,  an  equals  sign  may be used instead of a colon. The
+       class names are case-insensitive. As for  other  properties,  only  the
+       short names are recognized.
+
   Extended grapheme clusters

       The  \X  escape  matches  any number of Unicode characters that form an
@ -9659,7 +9698,7 @@ AUTHOR

 REVISION

-       Last updated: 01 December 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
@ -10698,6 +10737,38 @@ SCRIPT NAMES FOR \p AND \P
       Warang_Citi, Yezidi, Yi, Zanabazar_Square.


+BIDI_PROPERTIES FOR \p AND \P
+
+         \p{Bidi_Control}         matches a Bidi control character
+         \p{Bidi_Class:<class>}   matches a character with the given class
+
+       The recognized classes are:
+
+         AL          Arabic letter
+         AN          Arabic number
+         B           paragraph separator
+         BN          boundary neutral
+         CS          common separator
+         EN          European number
+         ES          European separator
+         ET          European terminator
+         FSI         first strong isolate
+         L           left-to-right
+         LRE         left-to-right embedding
+         LRI         left-to-right isolate
+         LRO         left-to-right override
+         NSM         non-spacing mark
+         ON          other neutral
+         PDF         pop directional format
+         PDI         pop directional isolate
+         R           right-to-left
+         RLE         right-to-left embedding
+         RLI         right-to-left isolate
+         RLO         right-to-left override
+         S           segment separator
+         WS          which space
+
+
 CHARACTER CLASSES

         [...]       positive character class
@ -11027,7 +11098,7 @@ AUTHOR

 REVISION

-       Last updated: 30 August 2021
+       Last updated: 08 December 2021
       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
@ -11073,12 +11144,13 @@ UNICODE PROPERTY SUPPORT
       ting.   The  Unicode  properties  that can be tested are limited to the
       general category properties such as Lu for an upper case letter  or  Nd
       for  a  decimal number, the Unicode script names such as Arabic or Han,
-       and the derived properties Any and L&. Full  lists  are  given  in  the
-       pcre2pattern  and  pcre2syntax  documentation. Only the short names for
-       properties are supported. For example, \p{L} matches a letter. Its Perl
-       synonym,  \p{Letter},  is  not  supported.   Furthermore, in Perl, many
-       properties may optionally be prefixed by "Is", for  compatibility  with
-       Perl 5.6. PCRE2 does not support this.
+       Bidi_Class, Bidi_Control, and the derived properties Any and  LC  (syn-
+       onym L&). Full lists are given in the pcre2pattern and pcre2syntax doc-
+       umentation. Only the short names for properties are supported. For  ex-
+       ample,  \p{L}  matches a letter. Its longer synonym, \p{Letter}, is not
+       supported.  Furthermore, in Perl, many  properties  may  optionally  be
+       prefixed  by "Is", for compatibility with Perl 5.6. PCRE2 does not sup-
+       port this.


 WIDE CHARACTERS AND UTF MODES
@ -11462,8 +11534,8 @@ AUTHOR

 REVISION

-       Last updated: 23 February 2020
-       Copyright (c) 1997-2020 University of Cambridge.
+       Last updated: 08 December 2021
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "30 November 2021" "PCRE2 10.40"
+.TH PCRE2API 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -2015,8 +2015,8 @@ point. However, this applies only to characters whose code points are less than
 256. By default, higher-valued code points never match escapes such as \ew or
 \ed.
 .P
-When PCRE2 is built with Unicode support (the default), the Unicode properties
-of all characters can be tested with \ep and \eP, or, alternatively, the
+When PCRE2 is built with Unicode support (the default), certain Unicode
+character properties can be tested with \ep and \eP, or, alternatively, the
 PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
 friends to use Unicode property support instead of the built-in tables.
 PCRE2_UCP also causes upper/lower casing operations on characters with code
@ -4025,6 +4025,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 30 November 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2build.3
+++ b/doc/pcre2build.3
@ -1,4 +1,4 @@
-.TH PCRE2BUILD 3 "20 March 2020" "PCRE2 10.35"
+.TH PCRE2BUILD 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .
@ -122,8 +122,9 @@ locked this out by setting PCRE2_NEVER_UTF.
 UTF support allows the libraries to process character code points up to
 0x10ffff in the strings that they handle. Unicode support also gives access to
 the Unicode properties of characters, using pattern escapes such as \eP, \ep,
-and \eX. Only the general category properties such as \fILu\fP and \fINd\fP are
-supported. Details are given in the
+and \eX. Only the general category properties such as \fILu\fP and \fINd\fP,
+script names, and some bi-directional properties are supported. Details are
+given in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@ -633,6 +634,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 20 March 2020
-Copyright (c) 1997-2020 University of Cambridge.
+Last updated: 08 December 2021
+Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "01 December 2021" "PCRE2 10.40"
+.TH PCRE2COMPAT 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -50,9 +50,9 @@ interprets them.
 6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 with \ep and \eP are limited to the general category properties such as Lu and
-Nd, script names such as Greek or Han, and the derived properties Any and L&.
-Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
-is limited. See the
+Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
+derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
+(surrogate) property, but in PCRE2 its use is limited. See the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@ -222,6 +222,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "01 December 2021" "PCRE2 10.40"
+.TH PCRE2PATTERN 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -779,13 +779,15 @@ escape sequences are:
  \eP{\fIxx\fP}   a character without the \fIxx\fP property
  \eX       a Unicode extended grapheme cluster
 .sp
-The property names represented by \fIxx\fP above are case-sensitive. There is
-support for Unicode script names, Unicode general category properties, "Any",
-which matches any character (including newline), and some special PCRE2
-properties (described in the
+The property names represented by \fIxx\fP above are not case-sensitive, and in 
+accordance with Unicode's "loose matching" rules, spaces, hyphens, and
+underscores are ignored. There is support for Unicode script names, Unicode
+general category properties, "Any", which matches any character (including
+newline), Bidi_Control, Bidi_Class, and some special PCRE2 properties
+(described
 .\" HTML <a href="#extraprops">
 .\" </a>
-next section).
+below).
 .\"
 Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
 Note that \eP{Any} does not match any characters, so always causes a match
@ -1025,9 +1027,9 @@ The following general category property codes are supported:
  Zp    Paragraph separator
  Zs    Space separator
 .sp
-The special property L& is also supported: it matches a character that has
-the Lu, Ll, or Lt property, in other words, a letter that is not classified as
-a modifier or "other".
+The special property LC, which has the synonym L&, is also supported: it
+matches a character that has the Lu, Ll, or Lt property, in other words, a
+letter that is not classified as a modifier or "other".
 .P
 The Cs (Surrogate) property applies only to characters whose code points are in
 the range U+D800 to U+DFFF. These characters are no different to any other
@ -1059,6 +1061,45 @@ properties in PCRE2 by default, though you can make them do so by setting the
 PCRE2_UCP option or by starting the pattern with (*UCP).
 .
 .
+.SS "Bi-directional properties for \ep and \eP"
+.rs
+.sp
+Two properties relating to bi-directional text are supported:
+.sp
+  \ep{Bidi_Control}         matches a Bidi control character
+  \ep{Bidi_Class:<class>}   matches a character with the given class
+.sp
+The recognized classes are:
+.sp
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right            
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space             
+.sp
+For Bidi_Class, an equals sign may be used instead of a colon. The class names 
+are case-insensitive. As for other properties, only the short names are 
+recognized.
+.
+.
 .SS Extended grapheme clusters
 .rs
 .sp
@ -3909,6 +3950,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 01 December 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "30 August 2021" "PCRE2 10.38"
+.TH PCRE2SYNTAX 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -333,6 +333,39 @@ Yi,
 Zanabazar_Square.
 .
 .
+.SH "BIDI_PROPERTIES FOR \ep AND \eP"
+.rs
+.sp
+  \ep{Bidi_Control}         matches a Bidi control character
+  \ep{Bidi_Class:<class>}   matches a character with the given class
+.sp
+The recognized classes are:
+.sp
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right            
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space             
+.
+.
 .SH "CHARACTER CLASSES"
 .rs
 .sp
@ -684,6 +717,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 30 August 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
--- a/doc/pcre2unicode.3
+++ b/doc/pcre2unicode.3
@ -1,4 +1,4 @@
-.TH PCRE2UNICODE 3 "23 February 2020" "PCRE2 10.35"
+.TH PCRE2UNICODE 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE - Perl-compatible regular expressions (revised API)
 .SH "UNICODE AND UTF SUPPORT"
@ -42,8 +42,8 @@ When PCRE2 is built with Unicode support, the escape sequences \ep{..},
 \eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting.
 The Unicode properties that can be tested are limited to the general category
 properties such as Lu for an upper case letter or Nd for a decimal number, the
-Unicode script names such as Arabic or Han, and the derived properties Any and
-L&. Full lists are given in the
+Unicode script names such as Arabic or Han, Bidi_Class, Bidi_Control, and the
+derived properties Any and LC (synonym L&). Full lists are given in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@ -52,7 +52,7 @@ and
 \fBpcre2syntax\fP
 .\"
 documentation. Only the short names for properties are supported. For example,
-\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
+\ep{L} matches a letter. Its longer synonym, \ep{Letter}, is not supported.
 Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
 compatibility with Perl 5.6. PCRE2 does not support this.
 .
@ -457,6 +457,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 23 February 2020
-Copyright (c) 1997-2020 University of Cambridge.
+Last updated: 08 December 2021
+Copyright (c) 1997-2021 University of Cambridge.
 .fi