Update to Unicode 10.0.0 and add callout_no_where to pcre2test to aid testing.

2017-07-02 16:32:01 +00:00 · 2017-07-02 16:32:01 +00:00 · 41bb787fb3
parent b7d5cee61f
commit 41bb787fb3
22 changed files with 6797 additions and 3360 deletions
--- a/3
+++ b/3
@ -209,6 +209,9 @@ much faster.
 because this can give a fast "no match" without searching for a "required code 
 unit". Previously only non-anchored patterns did this.

+47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
+
+48. Add the callout_no_where modifier to pcre2test.


 Version 10.23 14-February-2017
--- a/doc/html/README.txt
+++ b/doc/html/README.txt
@ -171,7 +171,10 @@ library. They are also documented in the pcre2build man page.
  give large performance improvements on certain platforms, add --enable-jit to
  the "configure" command. This support is available only for certain hardware
  architectures. If you try to enable it on an unsupported architecture, there
-  will be a compile time error.
+  will be a compile time error. If you are running under SELinux you may also
+  want to add --enable-jit-sealloc, which enables the use of an execmem
+  allocator in JIT that is compatible with SELinux. This has no effect if JIT 
+  is not enabled.

 . If you do not want to make use of the default support for UTF-8 Unicode
  character strings in the 8-bit library, UTF-16 Unicode character strings in
@ -874,4 +877,4 @@ The distribution should contain the files listed below.
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 11 April 2017
+Last updated: 17 June 2017
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@ -170,8 +170,13 @@ Just-in-time (JIT) compiler support is included in the build by specifying
  --enable-jit
 </pre>
 This support is available only for certain hardware architectures. If this
-option is set for an unsupported architecture, a building error occurs.
-See the
+option is set for an unsupported architecture, a building error occurs. If you
+are running under SELinux you may also want to add
+<pre>
+  --enable-jit-sealloc
+</pre>
+which enables the use of an execmem allocator in JIT that is compatible with
+SELinux. This has no effect if JIT is not enabled. See the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 documentation for a discussion of JIT usage. When JIT support is enabled,
 pcre2grep automatically makes use of it, unless you add
@ -516,7 +521,7 @@ contains a single function called LLVMFuzzerTestOneInput() whose arguments are
 a pointer to a string and the length of the string. When called, this function
 tries to compile the string as a pattern, and if that succeeds, to match it.
 This is done both with no options and with some random options bits that are
-generated from the string. 
+generated from the string.
 </P>
 <P>
 Setting --enable-fuzz-support also causes a binary called <b>pcre2fuzzcheck</b>
@ -529,13 +534,13 @@ file are the test string.
 </P>
 <br><a name="SEC22" href="#TOC1">OBSOLETE OPTION</a><br>
 <P>
-In versions of PCRE2 prior to 10.30, there were two ways of handling 
-backtracking in the <b>pcre2_match()</b> function. The default was to use the 
+In versions of PCRE2 prior to 10.30, there were two ways of handling
+backtracking in the <b>pcre2_match()</b> function. The default was to use the
 system stack, but if
 <pre>
  --disable-stack-for-recursion
 </pre>
-was set, memory on the heap was used. From release 10.30 onwards this has 
+was set, memory on the heap was used. From release 10.30 onwards this has
 changed (the stack is no longer used) and this option now does nothing except
 give a warning.
 </P>
@ -554,7 +559,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC25" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 May 2017
+Last updated: 17 June 2017
 <br>
 Copyright &copy; 1997-2017 University of Cambridge.
 <br>
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -755,6 +755,7 @@ Those that are not part of an identified script are lumped together as
 "Common". The current list of scripts is:
 </P>
 <P>
+Adlam,
 Ahom,
 Anatolian_Hieroglyphs,
 Arabic,
@ -765,6 +766,7 @@ Bamum,
 Bassa_Vah,
 Batak,
 Bengali,
+Bhaiksuki,
 Bopomofo,
 Brahmi,
 Braille,
@ -826,6 +828,8 @@ Mahajani,
 Malayalam,
 Mandaic,
 Manichaean,
+Marchen,
+Masaram_Gondi,
 Meetei_Mayek,
 Mende_Kikakui,
 Meroitic_Cursive,
@ -838,7 +842,9 @@ Multani,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
+Newa,
 Nko,
+Nushu,
 Ogham,
 Ol_Chiki,
 Old_Hungarian,
@ -849,6 +855,7 @@ Old_Persian,
 Old_South_Arabian,
 Old_Turkic,
 Oriya,
+Osage,
 Osmanya,
 Pahawh_Hmong,
 Palmyrene,
@ -866,6 +873,7 @@ Siddham,
 SignWriting,
 Sinhala,
 Sora_Sompeng,
+Soyombo,
 Sundanese,
 Syloti_Nagri,
 Syriac,
@ -876,6 +884,7 @@ Tai_Tham,
 Tai_Viet,
 Takri,
 Tamil,
+Tangut,
 Telugu,
 Thaana,
 Thai,
@ -885,7 +894,8 @@ Tirhuta,
 Ugaritic,
 Vai,
 Warang_Citi,
-Yi.
+Yi,
+Zanabazar_Square.
 </P>
 <P>
 Each character has exactly one Unicode general category property, specified by
@ -3445,7 +3455,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 May 2017
+Last updated: 02 July 2017
 <br>
 Copyright &copy; 1997-2017 University of Cambridge.
 <br>
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@ -568,7 +568,7 @@ Setting compilation options
 </b><br>
 <P>
 The following modifiers set options for <b>pcre2_compile()</b>. Most of them set
-bits in the options argument of that function, but those whose names start with 
+bits in the options argument of that function, but those whose names start with
 PCRE2_EXTRA are additional options that are set in the compile context. For the
 main options, there are some single-letter abbreviations that are the same as
 Perl options. There is special handling for /x: if a second x is present,
@ -579,25 +579,25 @@ way <b>pcre2_compile()</b> behaves. See
 for a description of the effects of these options.
 <pre>
      allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
-      allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 
+      allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
      alt_bsux                  set PCRE2_ALT_BSUX
      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
      alt_verbnames             set PCRE2_ALT_VERBNAMES
      anchored                  set PCRE2_ANCHORED
      auto_callout              set PCRE2_AUTO_CALLOUT
-      bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 
+      bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
  /i  caseless                  set PCRE2_CASELESS
      dollar_endonly            set PCRE2_DOLLAR_ENDONLY
  /s  dotall                    set PCRE2_DOTALL
      dupnames                  set PCRE2_DUPNAMES
      endanchored               set PCRE2_ENDANCHORED
  /x  extended                  set PCRE2_EXTENDED
-  /xx extended_more             set PCRE2_EXTENDED_MORE 
+  /xx extended_more             set PCRE2_EXTENDED_MORE
      firstline                 set PCRE2_FIRSTLINE
-      literal                   set PCRE2_LITERAL 
-      match_line                set PCRE2_EXTRA_MATCH_LINE 
+      literal                   set PCRE2_LITERAL
+      match_line                set PCRE2_EXTRA_MATCH_LINE
      match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
-      match_word                set PCRE2_EXTRA_MATCH_WORD 
+      match_word                set PCRE2_EXTRA_MATCH_WORD
  /m  multiline                 set PCRE2_MULTILINE
      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
      never_ucp                 set PCRE2_NEVER_UCP
@ -631,7 +631,7 @@ heavily used in the test files.
  /B  bincode                   show binary code without lengths
      callout_info              show callout information
      debug                     same as info,fullbincode
-      framesize                 show matching frame size 
+      framesize                 show matching frame size
      fullbincode               show binary code with lengths
  /I  info                      show info about compiled pattern
      hex                       unquoted characters are hexadecimal
@ -649,7 +649,7 @@ heavily used in the test files.
      push                      push compiled pattern onto the stack
      pushcopy                  push a copy onto the stack
      stackguard=&#60;number&#62;       test the stackguard feature
-      subject_literal           treat all subject lines as literal 
+      subject_literal           treat all subject lines as literal
      tables=[0|1|2]            select internal tables
      use_length                do not zero-terminate the pattern
      utf8_input                treat input as UTF-8
@ -720,7 +720,7 @@ not necessarily the last character. These lines are omitted if no starting or
 ending code units are recorded.
 </P>
 <P>
-The <b>framesize</b> modifier shows the size, in bytes, of the storage frames 
+The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
 used by <b>pcre2_match()</b> for handling backtracking. The size depends on the
 number of capturing parentheses in the pattern.
 </P>
@ -972,8 +972,8 @@ below. All other modifiers are either ignored, with a warning message, or cause
 an error.
 </P>
 <P>
-The pattern is passed to <b>regcomp()</b> as a zero-terminated string by 
-default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the 
+The pattern is passed to <b>regcomp()</b> as a zero-terminated string by
+default, but if the <b>use_length</b> or <b>hex</b> modifiers are set, the
 REG_PEND extension is used to pass it by length.
 </P>
 <br><b>
@ -1013,7 +1013,7 @@ are mutually exclusive.
 Setting certain match controls
 </b><br>
 <P>
-The following modifiers are really subject modifiers, and are described under 
+The following modifiers are really subject modifiers, and are described under
 "Subject Modifiers" below. However, they may be included in a pattern's
 modifier list, in which case they are applied to every subject line that is
 processed with that pattern. They may not appear in <b>#pattern</b> commands.
@ -1040,9 +1040,9 @@ defaults, set them in a <b>#subject</b> command.
 Specifying literal subject lines
 </b><br>
 <P>
-If the <b>subject_literal</b> modifier is present on a pattern, all the subject 
-lines that it matches are taken as literal strings, with no interpretation of 
-backslashes. It is not possible to set subject modifiers on such lines, but any 
+If the <b>subject_literal</b> modifier is present on a pattern, all the subject
+lines that it matches are taken as literal strings, with no interpretation of
+backslashes. It is not possible to set subject modifiers on such lines, but any
 that are set as defaults by a <b>#subject</b> command are recognized.
 </P>
 <br><b>
@ -1054,7 +1054,8 @@ pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
 line to contain a new pattern (or a command) instead of a subject line. This
 facility is used when saving compiled patterns to a file, as described in the
 section entitled "Saving and restoring compiled patterns"
-<a href="#saverestore">below. If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled</a>
+<a href="#saverestore">below.</a>
+If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled
 pattern is stacked, leaving the original as current, ready to match the
 following input lines. This provides a way of testing the
 <b>pcre2_code_copy()</b> function.
@ -1103,18 +1104,18 @@ causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
 <b>regexec()</b>. The other modifiers are ignored, with a warning message.
 </P>
 <P>
-There is one additional modifier that can be used with the POSIX wrapper. It is 
+There is one additional modifier that can be used with the POSIX wrapper. It is
 ignored (with a warning) if used for non-POSIX matching.
 <pre>
-      posix_startend=&#60;n&#62;[:&#60;m&#62;] 
+      posix_startend=&#60;n&#62;[:&#60;m&#62;]
 </pre>
 This causes the subject string to be passed to <b>regexec()</b> using the
 REG_STARTEND option, which uses offsets to specify which part of the string is
 searched. If only one number is given, the end offset is passed as the end of
 the subject string. For more detail of REG_STARTEND, see the
 <a href="pcre2posix.html"><b>pcre2posix</b></a>
-documentation. If the subject string contains binary zeros (coded as escapes 
-such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in 
+documentation. If the subject string contains binary zeros (coded as escapes
+such as \x{00} because <b>pcre2test</b> does not support actual binary zeros in
 its input), you must use <b>posix_startend</b> to specify its length.
 </P>
 <br><b>
@ -1135,6 +1136,7 @@ pattern.
      callout_data=&#60;n&#62;           set a value to pass via callouts
      callout_error=&#60;n&#62;[:&#60;m&#62;]    control callout error
      callout_fail=&#60;n&#62;[:&#60;m&#62;]     control callout failure
+      callout_no_where           do not show position of a callout
      callout_none               do not supply a callout function
      copy=&#60;number or name&#62;      copy captured substring
      depth_limit=&#60;n&#62;            set a depth limit
@ -1230,29 +1232,10 @@ Testing callouts
 </b><br>
 <P>
 A callout function is supplied when <b>pcre2test</b> calls the library matching
-functions, unless <b>callout_none</b> is specified. If <b>callout_capture</b> is
-set, the current captured groups are output when a callout occurs. The default
-return from the callout function is zero, which allows matching to continue.
-</P>
-<P>
-The <b>callout_fail</b> modifier can be given one or two numbers. If there is
-only one number, 1 is returned instead of 0 (causing matching to backtrack)
-when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;) are given, 1
-is returned when callout &#60;n&#62; is reached and there have been at least &#60;m&#62;
-callouts. The <b>callout_error</b> modifier is similar, except that
-PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
-aborted. If both these modifiers are set for the same callout number,
-<b>callout_error</b> takes precedence.
-</P>
-<P>
-Note that callouts with string arguments are always given the number zero. See
-"Callouts" below for a description of the output when a callout it taken.
-</P>
-<P>
-The <b>callout_data</b> modifier can be given an unsigned or a negative number.
-This is set as the "user data" that is passed to the matching function, and
-passed back when the callout function is invoked. Any value other than zero is
-used as a return from <b>pcre2test</b>'s callout function.
+functions, unless <b>callout_none</b> is specified. Its behaviour can be
+controlled by various modifiers listed above whose names begin with
+<b>callout_</b>. Details are given in the section entitled "Callouts"
+<a href="#callouts">below.</a>
 </P>
 <br><b>
 Finding all matches in a string
@ -1384,7 +1367,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
 optimization is not being used. The value is a number of kilobytes. Setting
 zero reverts to the default of 32K. Providing a stack that is larger than the
 default is necessary only for very complicated patterns. If <b>jitstack</b> is
-set non-zero on a subject line it overrides any value that was set on the 
+set non-zero on a subject line it overrides any value that was set on the
 pattern.
 </P>
 <br><b>
@ -1414,7 +1397,7 @@ The <i>match_limit</i> number is a measure of the amount of backtracking
 that takes place, and learning the minimum value can be instructive. For most
 simple matches, the number is quite small, but for patterns with very large
 numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string. 
+increasing length of subject string.
 </P>
 <P>
 For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
@ -1660,7 +1643,7 @@ restart the match with additional subject data by means of the
 For further information about partial matching, see the
 <a href="pcre2partial.html"><b>pcre2partial</b></a>
 documentation.
-</P>
+<a name="callouts"></a></P>
 <br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
 <P>
 If the pattern contains any callout requests, <b>pcre2test</b>'s callout
@ -1669,8 +1652,33 @@ This works with both matching functions.
 </P>
 <P>
 The callout function in <b>pcre2test</b> returns zero (carry on matching) by
-default, but you can use a <b>callout_fail</b> modifier in a subject line (as
-described above) to change this and other parameters of the callout.
+default, but you can use a <b>callout_fail</b> modifier in a subject line to
+change this and other parameters of the callout.
+</P>
+<P>
+If <b>callout_capture</b> is set, the current captured groups are output when a
+callout occurs. By default, the callout function then generates output that
+indicates where the current match start and matching points are in the subject,
+and what the next pattern item is. This output is suppressed if the
+<b>callout_no_where</b> modifier is set.
+</P>
+<P>
+The default return from the callout function is zero, which allows matching to
+continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
+there is only one number, 1 is returned instead of 0 (causing matching to
+backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
+are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
+least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
+PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
+aborted. If both these modifiers are set for the same callout number,
+<b>callout_error</b> takes precedence. Note that callouts with string arguments
+are always given the number zero. See
+</P>
+<P>
+The <b>callout_data</b> modifier can be given an unsigned or a negative number.
+This is set as the "user data" that is passed to the matching function, and
+passed back when the callout function is invoked. Any value other than zero is
+used as a return from <b>pcre2test</b>'s callout function.
 </P>
 <P>
 Inserting callouts can be helpful when using <b>pcre2test</b> to check
@ -1858,7 +1866,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 16 June 2017
+Last updated: 02 July 2017
 <br>
 Copyright &copy; 1997-2017 University of Cambridge.
 <br>
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@ -3543,9 +3543,14 @@ JUST-IN-TIME COMPILER SUPPORT

       This support is available only for certain hardware  architectures.  If
       this  option  is  set for an unsupported architecture, a building error
-       occurs.  See the pcre2jit documentation for a discussion of JIT  usage.
-       When  JIT  support is enabled, pcre2grep automatically makes use of it,
-       unless you add
+       occurs. If you are running under SELinux you may also want to add
+
+         --enable-jit-sealloc
+
+       which enables the use of an execmem allocator in JIT that is compatible
+       with  SELinux.  This  has  no  effect  if  JIT  is not enabled. See the
+       pcre2jit documentation for a discussion of JIT usage. When JIT  support
+       is enabled, pcre2grep automatically makes use of it, unless you add

         --disable-pcre2grep-jit

@ -3554,14 +3559,14 @@ JUST-IN-TIME COMPILER SUPPORT

 NEWLINE RECOGNITION

-       By default, PCRE2 interprets the linefeed (LF) character as  indicating
-       the  end  of  a line. This is the normal newline character on Unix-like
-       systems. You can compile PCRE2 to use carriage return (CR) instead,  by
+       By  default, PCRE2 interprets the linefeed (LF) character as indicating
+       the end of a line. This is the normal newline  character  on  Unix-like
+       systems.  You can compile PCRE2 to use carriage return (CR) instead, by
       adding

         --enable-newline-is-cr

-       to  the  configure  command.  There  is  also an --enable-newline-is-lf
+       to the configure  command.  There  is  also  an  --enable-newline-is-lf
       option, which explicitly specifies linefeed as the newline character.

       Alternatively, you can specify that line endings are to be indicated by
@ -3574,104 +3579,104 @@ NEWLINE RECOGNITION

         --enable-newline-is-anycrlf

-       which causes PCRE2 to recognize any of the three sequences CR,  LF,  or
+       which  causes  PCRE2 to recognize any of the three sequences CR, LF, or
       CRLF as indicating a line ending. Finally, a fifth option, specified by

         --enable-newline-is-any

-       causes  PCRE2  to  recognize  any Unicode newline sequence. The Unicode
+       causes PCRE2 to recognize any Unicode  newline  sequence.  The  Unicode
       newline sequences are the three just mentioned, plus the single charac-
       ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
-       U+0085), LS (line separator,  U+2028),  and  PS  (paragraph  separator,
+       U+0085),  LS  (line  separator,  U+2028),  and PS (paragraph separator,
       U+2029).

       Whatever default line ending convention is selected when PCRE2 is built
-       can be overridden by applications that use the library. At  build  time
+       can  be  overridden by applications that use the library. At build time
       it is conventional to use the standard for your operating system.


 WHAT \R MATCHES

-       By  default,  the  sequence \R in a pattern matches any Unicode newline
-       sequence, independently of what has been selected as  the  line  ending
+       By default, the sequence \R in a pattern matches  any  Unicode  newline
+       sequence,  independently  of  what has been selected as the line ending
       sequence. If you specify

         --enable-bsr-anycrlf

-       the  default  is changed so that \R matches only CR, LF, or CRLF. What-
-       ever is selected when PCRE2 is built can be overridden by  applications
+       the default is changed so that \R matches only CR, LF, or  CRLF.  What-
+       ever  is selected when PCRE2 is built can be overridden by applications
       that use the library.


 HANDLING VERY LARGE PATTERNS

-       Within  a  compiled  pattern,  offset values are used to point from one
-       part to another (for example, from an opening parenthesis to an  alter-
-       nation  metacharacter).  By default, in the 8-bit and 16-bit libraries,
-       two-byte values are used for these offsets, leading to a  maximum  size
-       for  a compiled pattern of around 64K code units. This is sufficient to
+       Within a compiled pattern, offset values are used  to  point  from  one
+       part  to another (for example, from an opening parenthesis to an alter-
+       nation metacharacter). By default, in the 8-bit and  16-bit  libraries,
+       two-byte  values  are used for these offsets, leading to a maximum size
+       for a compiled pattern of around 64K code units. This is sufficient  to
       handle all but the most gigantic patterns. Nevertheless, some people do
-       want  to  process truly enormous patterns, so it is possible to compile
-       PCRE2 to use three-byte or four-byte offsets by adding a  setting  such
+       want to process truly enormous patterns, so it is possible  to  compile
+       PCRE2  to  use three-byte or four-byte offsets by adding a setting such
       as

         --with-link-size=3

-       to  the  configure command. The value given must be 2, 3, or 4. For the
-       16-bit library, a value of 3 is rounded up to 4.  In  these  libraries,
-       using  longer  offsets slows down the operation of PCRE2 because it has
-       to load additional data when handling them. For the 32-bit library  the
-       value  is  always 4 and cannot be overridden; the value of --with-link-
+       to the configure command. The value given must be 2, 3, or 4.  For  the
+       16-bit  library,  a  value of 3 is rounded up to 4. In these libraries,
+       using longer offsets slows down the operation of PCRE2 because  it  has
+       to  load additional data when handling them. For the 32-bit library the
+       value is always 4 and cannot be overridden; the value  of  --with-link-
       size is ignored.


 LIMITING PCRE2 RESOURCE USAGE

       The pcre2_match() function increments a counter each time it goes round
-       its  main  loop. Putting a limit on this counter controls the amount of
-       computing resource used by a single call to  pcre2_match().  The  limit
+       its main loop. Putting a limit on this counter controls the  amount  of
+       computing  resource  used  by a single call to pcre2_match(). The limit
       can be changed at run time, as described in the pcre2api documentation.
-       The default is 10 million, but this can be changed by adding a  setting
+       The  default is 10 million, but this can be changed by adding a setting
       such as

         --with-match-limit=500000

-       to   the   configure   command.   This  setting  also  applies  to  the
-       pcre2_dfa_match() matching function, and to JIT  matching  (though  the
+       to  the  configure  command.  This  setting   also   applies   to   the
+       pcre2_dfa_match()  matching  function,  and to JIT matching (though the
       counting is done differently).

-       The  pcre2_match() function starts out using a 20K vector on the system
-       stack to record  backtracking  points.  The  more  nested  backtracking
+       The pcre2_match() function starts out using a 20K vector on the  system
+       stack  to  record  backtracking  points.  The  more nested backtracking
       points there are (that is, the deeper the search tree), the more memory
-       is needed. If the initial vector is not large enough,  heap  memory  is
+       is  needed.  If  the initial vector is not large enough, heap memory is
       used, up to a certain limit, which is specified in kilobytes. The limit
       can be changed at run time, as described in the pcre2api documentation.
-       The  default  limit (in effect unlimited) is 20 million. You can change
+       The default limit (in effect unlimited) is 20 million. You  can  change
       this by a setting such as

         --with-heap-limit=500

-       which limits the amount of heap to 500 kilobytes.  This  limit  applies
-       only  to interpretive matching in pcre2_match(). It does not apply when
-       JIT (which has its own memory arrangements) is used, nor does it  apply
+       which  limits  the  amount of heap to 500 kilobytes. This limit applies
+       only to interpretive matching in pcre2_match(). It does not apply  when
+       JIT  (which has its own memory arrangements) is used, nor does it apply
       to pcre2_dfa_match().

-       You  can  also explicitly limit the depth of nested backtracking in the
+       You can also explicitly limit the depth of nested backtracking  in  the
       pcre2_match() interpreter. This limit defaults to the value that is set
-       for  --with-match-limit.  You  can set a lower default limit by adding,
+       for --with-match-limit. You can set a lower default  limit  by  adding,
       for example,

         --with-match-limit_depth=10000

-       to the configure command. This value can be  overridden  at  run  time.
-       This  depth  limit  indirectly limits the amount of heap memory that is
-       used, but because the size of each backtracking "frame" depends on  the
-       number  of  capturing parentheses in a pattern, the amount of heap that
-       is used before the limit is reached varies  from  pattern  to  pattern.
-       This  limit  was  more  useful in versions before 10.30, where function
-       recursion was used for backtracking.  However, as well as  applying  to
+       to  the  configure  command.  This value can be overridden at run time.
+       This depth limit indirectly limits the amount of heap  memory  that  is
+       used,  but because the size of each backtracking "frame" depends on the
+       number of capturing parentheses in a pattern, the amount of  heap  that
+       is  used  before  the  limit is reached varies from pattern to pattern.
+       This limit was more useful in versions  before  10.30,  where  function
+       recursion  was  used for backtracking.  However, as well as applying to
       pcre2_match(), this limit also controls the depth of recursive function
-       calls in pcre2_dfa_match(). These are used for  lookaround  assertions,
+       calls  in  pcre2_dfa_match(). These are used for lookaround assertions,
       atomic groups, and recursion within patterns.  The limit does not apply
       to JIT matching.

@ -3680,45 +3685,45 @@ CREATING CHARACTER TABLES AT BUILD TIME

       PCRE2 uses fixed tables for processing characters whose code points are
       less than 256. By default, PCRE2 is built with a set of tables that are
-       distributed in the file src/pcre2_chartables.c.dist. These  tables  are
+       distributed  in  the file src/pcre2_chartables.c.dist. These tables are
       for ASCII codes only. If you add

         --enable-rebuild-chartables

-       to  the  configure  command, the distributed tables are no longer used.
-       Instead, a program called dftables is compiled and  run.  This  outputs
+       to the configure command, the distributed tables are  no  longer  used.
+       Instead,  a  program  called dftables is compiled and run. This outputs
       the source for new set of tables, created in the default locale of your
       C run-time system. This method of replacing the tables does not work if
-       you  are cross compiling, because dftables is run on the local host. If
-       you need to create alternative tables when cross  compiling,  you  will
+       you are cross compiling, because dftables is run on the local host.  If
+       you  need  to  create alternative tables when cross compiling, you will
       have to do so "by hand".


 USING EBCDIC CODE

-       PCRE2  assumes  by default that it will run in an environment where the
-       character code is ASCII or Unicode, which is a superset of ASCII.  This
+       PCRE2 assumes by default that it will run in an environment  where  the
+       character  code is ASCII or Unicode, which is a superset of ASCII. This
       is the case for most computer operating systems. PCRE2 can, however, be
       compiled to run in an 8-bit EBCDIC environment by adding

         --enable-ebcdic --disable-unicode

       to the configure command. This setting implies --enable-rebuild-charta-
-       bles.  You  should  only  use  it if you know that you are in an EBCDIC
+       bles. You should only use it if you know that  you  are  in  an  EBCDIC
       environment (for example, an IBM mainframe operating system).

-       It is not possible to support both EBCDIC and UTF-8 codes in  the  same
-       version  of  the  library. Consequently, --enable-unicode and --enable-
+       It  is  not possible to support both EBCDIC and UTF-8 codes in the same
+       version of the library. Consequently,  --enable-unicode  and  --enable-
       ebcdic are mutually exclusive.

       The EBCDIC character that corresponds to an ASCII LF is assumed to have
-       the  value  0x15 by default. However, in some EBCDIC environments, 0x25
+       the value 0x15 by default. However, in some EBCDIC  environments,  0x25
       is used. In such an environment you should use

         --enable-ebcdic-nl25

       as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
-       has  the  same  value  as in ASCII, namely, 0x0d. Whichever of 0x15 and
+       has the same value as in ASCII, namely, 0x0d.  Whichever  of  0x15  and
       0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
       acter (which, in Unicode, is 0x85).

@ -3731,34 +3736,34 @@ PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS

       By default, on non-Windows systems, pcre2grep supports the use of call-
       outs with string arguments within the patterns it is matching, in order
-       to  run external scripts. For details, see the pcre2grep documentation.
-       This support can be disabled by adding  --disable-pcre2grep-callout  to
+       to run external scripts. For details, see the pcre2grep  documentation.
+       This  support  can be disabled by adding --disable-pcre2grep-callout to
       the configure command.


 PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT

-       By  default,  pcre2grep reads all files as plain text. You can build it
-       so that it recognizes files whose names end in .gz or .bz2,  and  reads
+       By default, pcre2grep reads all files as plain text. You can  build  it
+       so  that  it recognizes files whose names end in .gz or .bz2, and reads
       them with libz or libbz2, respectively, by adding one or both of

         --enable-pcre2grep-libz
         --enable-pcre2grep-libbz2

       to the configure command. These options naturally require that the rel-
-       evant libraries are installed on your system. Configuration  will  fail
+       evant  libraries  are installed on your system. Configuration will fail
       if they are not.


 PCRE2GREP BUFFER SIZE

-       pcre2grep  uses an internal buffer to hold a "window" on the file it is
+       pcre2grep uses an internal buffer to hold a "window" on the file it  is
       scanning, in order to be able to output "before" and "after" lines when
-       it  finds  a  match. The starting size of the buffer is controlled by a
-       parameter whose default value is 20K. The buffer itself is three  times
-       this  size,  but  because  of  the  way it is used for holding "before"
-       lines, the longest line that is guaranteed to  be  processable  is  the
-       parameter  size.  If  a longer line is encountered, pcre2grep automati-
+       it finds a match. The starting size of the buffer is  controlled  by  a
+       parameter  whose default value is 20K. The buffer itself is three times
+       this size, but because of the way  it  is  used  for  holding  "before"
+       lines,  the  longest  line  that is guaranteed to be processable is the
+       parameter size. If a longer line is  encountered,  pcre2grep  automati-
       cally expands the buffer, up to a specified maximum size, whose default
       is 1M or the starting size, whichever is the larger. You can change the
       default parameter values by adding, for example,
@ -3766,8 +3771,8 @@ PCRE2GREP BUFFER SIZE
         --with-pcre2grep-bufsize=51200
         --with-pcre2grep-max-bufsize=2097152

-       to the configure command. The caller of pcre2grep  can  override  these
-       values  by  using  --buffer-size  and  --max-buffer-size on the command
+       to  the  configure  command. The caller of pcre2grep can override these
+       values by using --buffer-size  and  --max-buffer-size  on  the  command
       line.


@ -3778,26 +3783,26 @@ PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
         --enable-pcre2test-libreadline
         --enable-pcre2test-libedit

-       to the configure command, pcre2test  is  linked  with  the  libreadline
+       to  the  configure  command,  pcre2test  is linked with the libreadline
       orlibedit library, respectively, and when its input is from a terminal,
-       it reads it using the readline() function. This  provides  line-editing
-       and  history  facilities.  Note that libreadline is GPL-licensed, so if
-       you distribute a binary of pcre2test linked in this way, there  may  be
+       it  reads  it using the readline() function. This provides line-editing
+       and history facilities. Note that libreadline is  GPL-licensed,  so  if
+       you  distribute  a binary of pcre2test linked in this way, there may be
       licensing issues. These can be avoided by linking instead with libedit,
       which has a BSD licence.

-       Setting --enable-pcre2test-libreadline causes the -lreadline option  to
-       be  added to the pcre2test build. In many operating environments with a
-       sytem-installed readline library this is sufficient. However,  in  some
+       Setting  --enable-pcre2test-libreadline causes the -lreadline option to
+       be added to the pcre2test build. In many operating environments with  a
+       sytem-installed  readline  library this is sufficient. However, in some
       environments (e.g. if an unmodified distribution version of readline is
-       in use), some extra configuration may be necessary.  The  INSTALL  file
+       in  use),  some  extra configuration may be necessary. The INSTALL file
       for libreadline says this:

         "Readline uses the termcap functions, but does not link with
         the termcap or curses library itself, allowing applications
         which link with readline the to choose an appropriate library."

-       If  your environment has not been set up so that an appropriate library
+       If your environment has not been set up so that an appropriate  library
       is automatically included, you may need to add something like

         LIBS="-ncurses"
@ -3811,7 +3816,7 @@ INCLUDING DEBUGGING CODE

         --enable-debug

-       to the configure command, additional debugging code is included in  the
+       to  the configure command, additional debugging code is included in the
       build. This feature is intended for use by the PCRE2 maintainers.


@ -3821,15 +3826,15 @@ DEBUGGING WITH VALGRIND SUPPORT

         --enable-valgrind

-       to  the  configure command, PCRE2 will use valgrind annotations to mark
-       certain memory regions as  unaddressable.  This  allows  it  to  detect
-       invalid  memory  accesses,  and  is  mostly  useful for debugging PCRE2
+       to the configure command, PCRE2 will use valgrind annotations  to  mark
+       certain  memory  regions  as  unaddressable.  This  allows it to detect
+       invalid memory accesses, and  is  mostly  useful  for  debugging  PCRE2
       itself.


 CODE COVERAGE REPORTING

-       If your C compiler is gcc, you can build a version of  PCRE2  that  can
+       If  your  C  compiler is gcc, you can build a version of PCRE2 that can
       generate a code coverage report for its test suite. To enable this, you
       must install lcov version 1.6 or above. Then specify

@ -3838,20 +3843,20 @@ CODE COVERAGE REPORTING
       to the configure command and build PCRE2 in the usual way.

       Note that using ccache (a caching C compiler) is incompatible with code
-       coverage  reporting. If you have configured ccache to run automatically
+       coverage reporting. If you have configured ccache to run  automatically
       on your system, you must set the environment variable

         CCACHE_DISABLE=1

       before running make to build PCRE2, so that ccache is not used.

-       When --enable-coverage is used,  the  following  addition  targets  are
+       When  --enable-coverage  is  used,  the  following addition targets are
       added to the Makefile:

         make coverage

-       This  creates  a  fresh coverage report for the PCRE2 test suite. It is
-       equivalent to running "make coverage-reset", "make  coverage-baseline",
+       This creates a fresh coverage report for the PCRE2 test  suite.  It  is
+       equivalent  to running "make coverage-reset", "make coverage-baseline",
       "make check", and then "make coverage-report".

         make coverage-reset
@ -3868,56 +3873,56 @@ CODE COVERAGE REPORTING

         make coverage-clean-report

-       This  removes the generated coverage report without cleaning the cover-
+       This removes the generated coverage report without cleaning the  cover-
       age data itself.

         make coverage-clean-data

-       This removes the captured coverage data without removing  the  coverage
+       This  removes  the captured coverage data without removing the coverage
       files created at compile time (*.gcno).

         make coverage-clean

-       This  cleans all coverage data including the generated coverage report.
-       For more information about code coverage, see the gcov and  lcov  docu-
+       This cleans all coverage data including the generated coverage  report.
+       For  more  information about code coverage, see the gcov and lcov docu-
       mentation.


 SUPPORT FOR FUZZERS

-       There  is  a  special  option for use by people who want to run fuzzing
+       There is a special option for use by people who  want  to  run  fuzzing
       tests on PCRE2:

         --enable-fuzz-support

       At present this applies only to the 8-bit library. If set, it causes an
-       extra  library  called  libpcre2-fuzzsupport.a  to  be  built,  but not
-       installed. This contains a single function called  LLVMFuzzerTestOneIn-
-       put()  whose  arguments are a pointer to a string and the length of the
-       string. When called, this function tries to compile  the  string  as  a
-       pattern,  and if that succeeds, to match it.  This is done both with no
-       options and with some random options bits that are generated  from  the
+       extra library  called  libpcre2-fuzzsupport.a  to  be  built,  but  not
+       installed.  This contains a single function called LLVMFuzzerTestOneIn-
+       put() whose arguments are a pointer to a string and the length  of  the
+       string.  When  called,  this  function tries to compile the string as a
+       pattern, and if that succeeds, to match it.  This is done both with  no
+       options  and  with some random options bits that are generated from the
       string.

-       Setting  --enable-fuzz-support  also  causes  a binary called pcre2fuz-
-       zcheck to be created. This is normally run under valgrind or used  when
+       Setting --enable-fuzz-support also causes  a  binary  called  pcre2fuz-
+       zcheck  to be created. This is normally run under valgrind or used when
       PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
-       function and outputs information about it is doing. The  input  strings
-       are  specified by arguments: if an argument starts with "=" the rest of
-       it is a literal input string. Otherwise, it is assumed  to  be  a  file
+       function  and  outputs information about it is doing. The input strings
+       are specified by arguments: if an argument starts with "=" the rest  of
+       it  is  a  literal  input string. Otherwise, it is assumed to be a file
       name, and the contents of the file are the test string.


 OBSOLETE OPTION

-       In  versions  of  PCRE2 prior to 10.30, there were two ways of handling
-       backtracking in the pcre2_match() function. The default was to use  the
+       In versions of PCRE2 prior to 10.30, there were two  ways  of  handling
+       backtracking  in the pcre2_match() function. The default was to use the
       system stack, but if

         --disable-stack-for-recursion

-       was  set,  memory on the heap was used. From release 10.30 onwards this
-       has changed (the stack is no longer used)  and  this  option  now  does
+       was set, memory on the heap was used. From release 10.30  onwards  this
+       has  changed  (the  stack  is  no longer used) and this option now does
       nothing except give a warning.


@ -3935,7 +3940,7 @@ AUTHOR

 REVISION

-       Last updated: 30 May 2017
+       Last updated: 17 June 2017
       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------
 
@ -6309,26 +6314,28 @@ BACKSLASH
       Those  that are not part of an identified script are lumped together as
       "Common". The current list of scripts is:

-       Ahom,  Anatolian_Hieroglyphs,  Arabic,  Armenian,  Avestan,   Balinese,
-       Bamum,  Bassa_Vah, Batak, Bengali, Bopomofo, Brahmi, Braille, Buginese,
-       Buhid, Canadian_Aboriginal, Carian, Caucasian_Albanian,  Chakma,  Cham,
-       Cherokee,   Common,  Coptic,  Cuneiform,  Cypriot,  Cyrillic,  Deseret,
-       Devanagari, Duployan, Egyptian_Hieroglyphs,  Elbasan,  Ethiopic,  Geor-
-       gian,  Glagolitic,  Gothic,  Grantha,  Greek,  Gujarati, Gurmukhi, Han,
-       Hangul, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, Inherited,
-       Inscriptional_Pahlavi,  Inscriptional_Parthian,  Javanese, Kaithi, Kan-
-       nada, Katakana, Kayah_Li, Kharoshthi, Khmer,  Khojki,  Khudawadi,  Lao,
-       Latin,  Lepcha,  Limbu, Linear_A, Linear_B, Lisu, Lycian, Lydian, Maha-
-       jani,  Malayalam,  Mandaic,  Manichaean,  Meetei_Mayek,  Mende_Kikakui,
-       Meroitic_Cursive,  Meroitic_Hieroglyphs,  Miao,  Modi,  Mongolian, Mro,
-       Multani,  Myanmar,  Nabataean,  New_Tai_Lue,  Nko,   Ogham,   Ol_Chiki,
-       Old_Hungarian,  Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian,
-       Old_South_Arabian, Old_Turkic, Oriya, Osmanya, Pahawh_Hmong, Palmyrene,
-       Pau_Cin_Hau,  Phags_Pa,  Phoenician,  Psalter_Pahlavi,  Rejang,  Runic,
-       Samaritan, Saurashtra, Sharada, Shavian, Siddham, SignWriting, Sinhala,
-       Sora_Sompeng,   Sundanese,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
-       Tai_Le,  Tai_Tham,  Tai_Viet,  Takri,  Tamil,  Telugu,  Thaana,   Thai,
-       Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, Yi.
+       Adlam, Ahom, Anatolian_Hieroglyphs, Arabic,  Armenian,  Avestan,  Bali-
+       nese,  Bamum,  Bassa_Vah,  Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
+       Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Caucasian_Alba-
+       nian,  Chakma,  Cham,  Cherokee,  Common,  Coptic,  Cuneiform, Cypriot,
+       Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hieroglyphs, Elbasan,
+       Ethiopic,  Georgian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gur-
+       mukhi, Han, Hangul, Hanunoo, Hatran,  Hebrew,  Hiragana,  Imperial_Ara-
+       maic,    Inherited,    Inscriptional_Pahlavi,   Inscriptional_Parthian,
+       Javanese, Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer,  Kho-
+       jki,  Khudawadi,  Lao,  Latin, Lepcha, Limbu, Linear_A, Linear_B, Lisu,
+       Lycian, Lydian,  Mahajani,  Malayalam,  Mandaic,  Manichaean,  Marchen,
+       Masaram_Gondi,     Meetei_Mayek,    Mende_Kikakui,    Meroitic_Cursive,
+       Meroitic_Hieroglyphs, Miao, Modi,  Mongolian,  Mro,  Multani,  Myanmar,
+       Nabataean,  New_Tai_Lue, Newa, Nko, Nushu, Ogham, Ol_Chiki, Old_Hungar-
+       ian,   Old_Italic,    Old_North_Arabian,    Old_Permic,    Old_Persian,
+       Old_South_Arabian,  Old_Turkic,  Oriya,  Osage,  Osmanya, Pahawh_Hmong,
+       Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, Psalter_Pahlavi,  Rejang,
+       Runic,  Samaritan,  Saurashtra, Sharada, Shavian, Siddham, SignWriting,
+       Sinhala, Sora_Sompeng, Soyombo, Sundanese, Syloti_Nagri, Syriac,  Taga-
+       log,  Tagbanwa,  Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Tangut, Tel-
+       ugu,  Thaana,  Thai,  Tibetan,  Tifinagh,   Tirhuta,   Ugaritic,   Vai,
+       Warang_Citi, Yi, Zanabazar_Square.

       Each character has exactly one Unicode general category property, spec-
       ified by a two-letter abbreviation. For compatibility with Perl,  nega-
@ -8737,7 +8744,7 @@ AUTHOR

 REVISION

-       Last updated: 30 May 2017
+       Last updated: 02 July 2017
       Copyright (c) 1997-2017 University of Cambridge.
 ------------------------------------------------------------------------------
 
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "30 May 2017" "PCRE2 10.30"
+.TH PCRE2PATTERN 3 "02 July 2017" "PCRE2 10.30"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -754,6 +754,7 @@ example:
 Those that are not part of an identified script are lumped together as
 "Common". The current list of scripts is:
 .P
+Adlam,
 Ahom,
 Anatolian_Hieroglyphs,
 Arabic,
@ -764,6 +765,7 @@ Bamum,
 Bassa_Vah,
 Batak,
 Bengali,
+Bhaiksuki,
 Bopomofo,
 Brahmi,
 Braille,
@ -825,6 +827,8 @@ Mahajani,
 Malayalam,
 Mandaic,
 Manichaean,
+Marchen,
+Masaram_Gondi,
 Meetei_Mayek,
 Mende_Kikakui,
 Meroitic_Cursive,
@ -837,7 +841,9 @@ Multani,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
+Newa,
 Nko,
+Nushu,
 Ogham,
 Ol_Chiki,
 Old_Hungarian,
@ -848,6 +854,7 @@ Old_Persian,
 Old_South_Arabian,
 Old_Turkic,
 Oriya,
+Osage,
 Osmanya,
 Pahawh_Hmong,
 Palmyrene,
@ -865,6 +872,7 @@ Siddham,
 SignWriting,
 Sinhala,
 Sora_Sompeng,
+Soyombo,
 Sundanese,
 Syloti_Nagri,
 Syriac,
@ -875,6 +883,7 @@ Tai_Tham,
 Tai_Viet,
 Takri,
 Tamil,
+Tangut,
 Telugu,
 Thaana,
 Thai,
@ -884,7 +893,8 @@ Tirhuta,
 Ugaritic,
 Vai,
 Warang_Citi,
-Yi.
+Yi,
+Zanabazar_Square.
 .P
 Each character has exactly one Unicode general category property, specified by
 a two-letter abbreviation. For compatibility with Perl, negation can be
@ -3475,6 +3485,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 30 May 2017
+Last updated: 02 July 2017
 Copyright (c) 1997-2017 University of Cambridge.
 .fi
--- a/doc/pcre2test.1
+++ b/doc/pcre2test.1
@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "16 June 2017" "PCRE 10.30"
+.TH PCRE2TEST 1 "02 July 2017" "PCRE 10.30"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@ -527,7 +527,7 @@ by a previous \fB#pattern\fP command.
 .rs
 .sp
 The following modifiers set options for \fBpcre2_compile()\fP. Most of them set
-bits in the options argument of that function, but those whose names start with 
+bits in the options argument of that function, but those whose names start with
 PCRE2_EXTRA are additional options that are set in the compile context. For the
 main options, there are some single-letter abbreviations that are the same as
 Perl options. There is special handling for /x: if a second x is present,
@ -540,25 +540,25 @@ way \fBpcre2_compile()\fP behaves. See
 for a description of the effects of these options.
 .sp
      allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
-      allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 
+      allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
      alt_bsux                  set PCRE2_ALT_BSUX
      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
      alt_verbnames             set PCRE2_ALT_VERBNAMES
      anchored                  set PCRE2_ANCHORED
      auto_callout              set PCRE2_AUTO_CALLOUT
-      bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 
+      bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
  /i  caseless                  set PCRE2_CASELESS
      dollar_endonly            set PCRE2_DOLLAR_ENDONLY
  /s  dotall                    set PCRE2_DOTALL
      dupnames                  set PCRE2_DUPNAMES
      endanchored               set PCRE2_ENDANCHORED
  /x  extended                  set PCRE2_EXTENDED
-  /xx extended_more             set PCRE2_EXTENDED_MORE 
+  /xx extended_more             set PCRE2_EXTENDED_MORE
      firstline                 set PCRE2_FIRSTLINE
-      literal                   set PCRE2_LITERAL 
-      match_line                set PCRE2_EXTRA_MATCH_LINE 
+      literal                   set PCRE2_LITERAL
+      match_line                set PCRE2_EXTRA_MATCH_LINE
      match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
-      match_word                set PCRE2_EXTRA_MATCH_WORD 
+      match_word                set PCRE2_EXTRA_MATCH_WORD
  /m  multiline                 set PCRE2_MULTILINE
      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
      never_ucp                 set PCRE2_NEVER_UCP
@ -593,7 +593,7 @@ heavily used in the test files.
  /B  bincode                   show binary code without lengths
      callout_info              show callout information
      debug                     same as info,fullbincode
-      framesize                 show matching frame size 
+      framesize                 show matching frame size
      fullbincode               show binary code with lengths
  /I  info                      show info about compiled pattern
      hex                       unquoted characters are hexadecimal
@ -611,7 +611,7 @@ heavily used in the test files.
      push                      push compiled pattern onto the stack
      pushcopy                  push a copy onto the stack
      stackguard=<number>       test the stackguard feature
-      subject_literal           treat all subject lines as literal 
+      subject_literal           treat all subject lines as literal
      tables=[0|1|2]            select internal tables
      use_length                do not zero-terminate the pattern
      utf8_input                treat input as UTF-8
@ -677,7 +677,7 @@ unit" is the last literal code unit that must be present in any match. This is
 not necessarily the last character. These lines are omitted if no starting or
 ending code units are recorded.
 .P
-The \fBframesize\fP modifier shows the size, in bytes, of the storage frames 
+The \fBframesize\fP modifier shows the size, in bytes, of the storage frames
 used by \fBpcre2_match()\fP for handling backtracking. The size depends on the
 number of capturing parentheses in the pattern.
 .P
@ -934,8 +934,8 @@ The \fBaftertext\fP and \fBallaftertext\fP subject modifiers work as described
 below. All other modifiers are either ignored, with a warning message, or cause
 an error.
 .P
-The pattern is passed to \fBregcomp()\fP as a zero-terminated string by 
-default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the 
+The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
+default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
 REG_PEND extension is used to pass it by length.
 .
 .
@ -977,7 +977,7 @@ are mutually exclusive.
 .SS "Setting certain match controls"
 .rs
 .sp
-The following modifiers are really subject modifiers, and are described under 
+The following modifiers are really subject modifiers, and are described under
 "Subject Modifiers" below. However, they may be included in a pattern's
 modifier list, in which case they are applied to every subject line that is
 processed with that pattern. They may not appear in \fB#pattern\fP commands.
@ -1004,9 +1004,9 @@ defaults, set them in a \fB#subject\fP command.
 .SS "Specifying literal subject lines"
 .rs
 .sp
-If the \fBsubject_literal\fP modifier is present on a pattern, all the subject 
-lines that it matches are taken as literal strings, with no interpretation of 
-backslashes. It is not possible to set subject modifiers on such lines, but any 
+If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
+lines that it matches are taken as literal strings, with no interpretation of
+backslashes. It is not possible to set subject modifiers on such lines, but any
 that are set as defaults by a \fB#subject\fP command are recognized.
 .
 .
@ -1020,7 +1020,9 @@ facility is used when saving compiled patterns to a file, as described in the
 section entitled "Saving and restoring compiled patterns"
 .\" HTML <a href="#saverestore">
 .\" </a>
-below. If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
+below.
+.\"
+If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
 pattern is stacked, leaving the original as current, ready to match the
 following input lines. This provides a way of testing the
 \fBpcre2_code_copy()\fP function.
@ -1073,10 +1075,10 @@ that have any effect are \fBnotbol\fP, \fBnotempty\fP, and \fBnoteol\fP,
 causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to
 \fBregexec()\fP. The other modifiers are ignored, with a warning message.
 .P
-There is one additional modifier that can be used with the POSIX wrapper. It is 
+There is one additional modifier that can be used with the POSIX wrapper. It is
 ignored (with a warning) if used for non-POSIX matching.
 .sp
-      posix_startend=<n>[:<m>] 
+      posix_startend=<n>[:<m>]
 .sp
 This causes the subject string to be passed to \fBregexec()\fP using the
 REG_STARTEND option, which uses offsets to specify which part of the string is
@ -1085,8 +1087,8 @@ the subject string. For more detail of REG_STARTEND, see the
 .\" HREF
 \fBpcre2posix\fP
 .\"
-documentation. If the subject string contains binary zeros (coded as escapes 
-such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in 
+documentation. If the subject string contains binary zeros (coded as escapes
+such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in
 its input), you must use \fBposix_startend\fP to specify its length.
 .
 .
@ -1107,6 +1109,7 @@ pattern.
      callout_data=<n>           set a value to pass via callouts
      callout_error=<n>[:<m>]    control callout error
      callout_fail=<n>[:<m>]     control callout failure
+      callout_no_where           do not show position of a callout
      callout_none               do not supply a callout function
      copy=<number or name>      copy captured substring
      depth_limit=<n>            set a depth limit
@ -1200,26 +1203,13 @@ does no capturing); it is ignored, with a warning message, if present.
 .rs
 .sp
 A callout function is supplied when \fBpcre2test\fP calls the library matching
-functions, unless \fBcallout_none\fP is specified. If \fBcallout_capture\fP is
-set, the current captured groups are output when a callout occurs. The default
-return from the callout function is zero, which allows matching to continue.
-.P
-The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
-only one number, 1 is returned instead of 0 (causing matching to backtrack)
-when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1
-is returned when callout <n> is reached and there have been at least <m>
-callouts. The \fBcallout_error\fP modifier is similar, except that
-PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
-aborted. If both these modifiers are set for the same callout number,
-\fBcallout_error\fP takes precedence.
-.P
-Note that callouts with string arguments are always given the number zero. See
-"Callouts" below for a description of the output when a callout it taken.
-.P
-The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
-This is set as the "user data" that is passed to the matching function, and
-passed back when the callout function is invoked. Any value other than zero is
-used as a return from \fBpcre2test\fP's callout function.
+functions, unless \fBcallout_none\fP is specified. Its behaviour can be
+controlled by various modifiers listed above whose names begin with
+\fBcallout_\fP. Details are given in the section entitled "Callouts"
+.\" HTML <a href="#callouts">
+.\" </a>
+below.
+.\"
 .
 .
 .SS "Finding all matches in a string"
@ -1344,7 +1334,7 @@ that is used by the just-in-time optimization code. It is ignored if JIT
 optimization is not being used. The value is a number of kilobytes. Setting
 zero reverts to the default of 32K. Providing a stack that is larger than the
 default is necessary only for very complicated patterns. If \fBjitstack\fP is
-set non-zero on a subject line it overrides any value that was set on the 
+set non-zero on a subject line it overrides any value that was set on the
 pattern.
 .
 .
@ -1372,7 +1362,7 @@ The \fImatch_limit\fP number is a measure of the amount of backtracking
 that takes place, and learning the minimum value can be instructive. For most
 simple matches, the number is quite small, but for patterns with very large
 numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string. 
+increasing length of subject string.
 .P
 For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
 much nested backtracking happens (that is, how deeply the pattern's tree is
@ -1625,6 +1615,7 @@ For further information about partial matching, see the
 documentation.
 .
 .
+.\" HTML <a name="callouts"></a>
 .SH CALLOUTS
 .rs
 .sp
@ -1633,8 +1624,30 @@ function is called during matching unless \fBcallout_none\fP is specified.
 This works with both matching functions.
 .P
 The callout function in \fBpcre2test\fP returns zero (carry on matching) by
-default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
-described above) to change this and other parameters of the callout.
+default, but you can use a \fBcallout_fail\fP modifier in a subject line to
+change this and other parameters of the callout.
+.P
+If \fBcallout_capture\fP is set, the current captured groups are output when a
+callout occurs. By default, the callout function then generates output that
+indicates where the current match start and matching points are in the subject,
+and what the next pattern item is. This output is suppressed if the
+\fBcallout_no_where\fP modifier is set.
+.P
+The default return from the callout function is zero, which allows matching to
+continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
+there is only one number, 1 is returned instead of 0 (causing matching to
+backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
+are given, 1 is returned when callout <n> is reached and there have been at
+least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
+PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
+aborted. If both these modifiers are set for the same callout number,
+\fBcallout_error\fP takes precedence. Note that callouts with string arguments
+are always given the number zero. See
+.P
+The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
+This is set as the "user data" that is passed to the matching function, and
+passed back when the callout function is invoked. Any value other than zero is
+used as a return from \fBpcre2test\fP's callout function.
 .P
 Inserting callouts can be helpful when using \fBpcre2test\fP to check
 complicated regular expressions. For further information about callouts, see
@ -1837,6 +1850,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 16 June 2017
+Last updated: 02 July 2017
 Copyright (c) 1997-2017 University of Cambridge.
 .fi
--- a/doc/pcre2test.txt
+++ b/doc/pcre2test.txt
@ -943,7 +943,7 @@ PATTERN MODIFIERS
       next  line to contain a new pattern (or a command) instead of a subject
       line. This facility is used when saving compiled patterns to a file, as
       described  in  the section entitled "Saving and restoring compiled pat-
-       terns" below. If pushcopy is used instead of push, a copy of  the  com-
+       terns" below.  If pushcopy is used instead of push, a copy of the  com-
       piled  pattern  is  stacked,  leaving the original as current, ready to
       match the following input lines. This provides a  way  of  testing  the
       pcre2_code_copy()  function.   The  push  and  pushcopy   modifiers are
@ -1016,6 +1016,7 @@ SUBJECT MODIFIERS
             callout_data=<n>           set a value to pass via callouts
             callout_error=<n>[:<m>]    control callout error
             callout_fail=<n>[:<m>]     control callout failure
+             callout_no_where           do not show position of a callout
             callout_none               do not supply a callout function
             copy=<number or name>      copy captured substring
             depth_limit=<n>            set a depth limit
@ -1107,29 +1108,9 @@ SUBJECT MODIFIERS
   Testing callouts

       A  callout function is supplied when pcre2test calls the library match-
-       ing functions, unless callout_none is specified. If callout_capture  is
-       set,  the current captured groups are output when a callout occurs. The
-       default return from the callout function is zero, which allows matching
-       to continue.
-
-       The  callout_fail modifier can be given one or two numbers. If there is
-       only one number, 1 is returned instead of 0 (causing matching to  back-
-       track)  when  a  callout  of  that  number  is  reached. If two numbers
-       (<n>:<m>) are given, 1 is returned when  callout  <n>  is  reached  and
-       there  have  been  at least <m> callouts. The callout_error modifier is
-       similar, except  that  PCRE2_ERROR_CALLOUT  is  returned,  causing  the
-       entire  matching process to be aborted. If both these modifiers are set
-       for the same callout number, callout_error takes precedence.
-
-       Note that callouts with string arguments are always  given  the  number
-       zero. See "Callouts" below for a description of the output when a call-
-       out it taken.
-
-       The callout_data modifier can be given an unsigned or a  negative  num-
-       ber.   This  is  set  as the "user data" that is passed to the matching
-       function, and passed back when the callout  function  is  invoked.  Any
-       value  other  than  zero  is  used as a return from pcre2test's callout
-       function.
+       ing functions, unless callout_none is specified. Its behaviour  can  be
+       controlled  by  various  modifiers  listed above whose names begin with
+       callout_. Details are given in the section entitled "Callouts" below.

   Finding all matches in a string

@ -1511,8 +1492,32 @@ CALLOUTS
       works with both matching functions.

       The callout function in pcre2test returns zero (carry on  matching)  by
-       default,  but you can use a callout_fail modifier in a subject line (as
-       described above) to change this and other parameters of the callout.
+       default,  but  you can use a callout_fail modifier in a subject line to
+       change this and other parameters of the callout.
+
+       If callout_capture is set, the current captured groups are output  when
+       a  callout occurs. By default, the callout function then generates out-
+       put that indicates where the current match start  and  matching  points
+       are  in  the subject, and what the next pattern item is. This output is
+       suppressed if the callout_no_where modifier is set.
+
+       The default return from the callout  function  is  zero,  which  allows
+       matching to continue. The callout_fail modifier can be given one or two
+       numbers. If there is only one number, 1 is returned instead of 0 (caus-
+       ing matching to backtrack) when a callout of that number is reached. If
+       two numbers (<n>:<m>) are given, 1 is  returned  when  callout  <n>  is
+       reached  and  there  have been at least <m> callouts. The callout_error
+       modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
+       ing  the entire matching process to be aborted. If both these modifiers
+       are set for the same callout number,  callout_error  takes  precedence.
+       Note  that  callouts  with string arguments are always given the number
+       zero. See
+
+       The callout_data modifier can be given an unsigned or a  negative  num-
+       ber.   This  is  set  as the "user data" that is passed to the matching
+       function, and passed back when the callout  function  is  invoked.  Any
+       value  other  than  zero  is  used as a return from pcre2test's callout
+       function.

       Inserting callouts can be helpful when using pcre2test to check compli-
       cated  regular expressions. For further information about callouts, see
@ -1687,5 +1692,5 @@ AUTHOR

 REVISION

-       Last updated: 16 June 2017
+       Last updated: 02 July 2017
       Copyright (c) 1997-2017 University of Cambridge.
--- a/maint/GenerateUtt.py
+++ b/maint/GenerateUtt.py
@ -23,6 +23,7 @@
 # Script updated to Python 3 by running it through the 2to3 converter.
 # Added script names for Unicode 7.0.0, 20-June-2014.
 # Added script names for Unicode 8.0.0, 19-June-2015.
+# Added script names for Unicode 10.0.0, 02-July-2017.

 script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
 'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
@ -51,7 +52,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
 'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
 # New for Unicode 8.0.0
 'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
- 'SignWriting'
+ 'SignWriting',
+# New for Unicode 10.0.0
+ 'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
+ 'Nushu', 'Soyombo', 'Zanabazar_Square'
 ]

 category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
--- a/maint/MultiStage2.py
+++ b/maint/MultiStage2.py
@ -122,6 +122,7 @@
 # 20-June-2014:      Updated for Unicode 7.0.0
 # 12-August-2014:    Updated to put Unicode version into the file
 # 19-June-2015:      Updated for Unicode 8.0.0
+# 02-July-2017:      Updated for Unicode 10.0.0
 ##############################################################################


@ -335,7 +336,10 @@ script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Bugines
 'Pau_Cin_Hau', 'Siddham', 'Tirhuta', 'Warang_Citi',
 # New for Unicode 8.0.0
 'Ahom', 'Anatolian_Hieroglyphs', 'Hatran', 'Multani', 'Old_Hungarian',
- 'SignWriting'
+ 'SignWriting',
+# New for Unicode 10.0.0
+ 'Adlam', 'Bhaiksuki', 'Marchen', 'Newa', 'Osage', 'Tangut', 'Masaram_Gondi',
+ 'Nushu', 'Soyombo', 'Zanabazar_Square'
 ]
 
 category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
@ -343,7 +347,8 @@ category_names = ['Cc', 'Cf', 'Cn', 'Co', 'Cs', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
  'Sc', 'Sk', 'Sm', 'So', 'Zl', 'Zp', 'Zs' ]

 break_property_names = ['CR', 'LF', 'Control', 'Extend', 'Prepend',
-  'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other' ]
+  'SpacingMark', 'L', 'V', 'T', 'LV', 'LVT', 'Regional_Indicator', 'Other',
+  'E_Base', 'E_Modifier', 'E_Base_GAZ', 'ZWJ', 'Glue_After_Zwj' ]

 test_record_size()
 unicode_version = ""
--- a/maint/Unicode.tables/CaseFolding.txt
+++ b/maint/Unicode.tables/CaseFolding.txt
@ -1,10 +1,11 @@
-# CaseFolding-8.0.0.txt
-# Date: 2015-01-13, 18:16:36 GMT [MD]
+# CaseFolding-10.0.0.txt
+# Date: 2017-04-14, 05:40:18 GMT
+# © 2017 Unicode®, Inc.
+# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
+# For terms of use, see http://www.unicode.org/terms_of_use.html
 #
 # Unicode Character Database
-# Copyright (c) 1991-2015 Unicode, Inc.
-# For terms of use, see http://www.unicode.org/terms_of_use.html
-# For documentation, see http://www.unicode.org/reports/tr44/
+#   For documentation, see http://www.unicode.org/reports/tr44/
 #
 # Case Folding Properties
 #
@ -23,7 +24,7 @@
 #
 # NOTE: case folding does not preserve normalization formats!
 #
-# For information on case folding, including how to have case folding 
+# For information on case folding, including how to have case folding
 # preserve normalization formats, see Section 3.13 Default Case Algorithms in
 # The Unicode Standard.
 #
@ -593,6 +594,15 @@
 13FB; C; 13F3; # CHEROKEE SMALL LETTER YU
 13FC; C; 13F4; # CHEROKEE SMALL LETTER YV
 13FD; C; 13F5; # CHEROKEE SMALL LETTER MV
+1C80; C; 0432; # CYRILLIC SMALL LETTER ROUNDED VE
+1C81; C; 0434; # CYRILLIC SMALL LETTER LONG-LEGGED DE
+1C82; C; 043E; # CYRILLIC SMALL LETTER NARROW O
+1C83; C; 0441; # CYRILLIC SMALL LETTER WIDE ES
+1C84; C; 0442; # CYRILLIC SMALL LETTER TALL TE
+1C85; C; 0442; # CYRILLIC SMALL LETTER THREE-LEGGED TE
+1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN
+1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT
+1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK
 1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW
 1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE
 1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW
@ -1163,6 +1173,7 @@ A7AA; C; 0266; # LATIN CAPITAL LETTER H WITH HOOK
 A7AB; C; 025C; # LATIN CAPITAL LETTER REVERSED OPEN E
 A7AC; C; 0261; # LATIN CAPITAL LETTER SCRIPT G
 A7AD; C; 026C; # LATIN CAPITAL LETTER L WITH BELT
+A7AE; C; 026A; # LATIN CAPITAL LETTER SMALL CAPITAL I
 A7B0; C; 029E; # LATIN CAPITAL LETTER TURNED K
 A7B1; C; 0287; # LATIN CAPITAL LETTER TURNED T
 A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL
@ -1327,6 +1338,42 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
 10425; C; 1044D; # DESERET CAPITAL LETTER ENG
 10426; C; 1044E; # DESERET CAPITAL LETTER OI
 10427; C; 1044F; # DESERET CAPITAL LETTER EW
+104B0; C; 104D8; # OSAGE CAPITAL LETTER A
+104B1; C; 104D9; # OSAGE CAPITAL LETTER AI
+104B2; C; 104DA; # OSAGE CAPITAL LETTER AIN
+104B3; C; 104DB; # OSAGE CAPITAL LETTER AH
+104B4; C; 104DC; # OSAGE CAPITAL LETTER BRA
+104B5; C; 104DD; # OSAGE CAPITAL LETTER CHA
+104B6; C; 104DE; # OSAGE CAPITAL LETTER EHCHA
+104B7; C; 104DF; # OSAGE CAPITAL LETTER E
+104B8; C; 104E0; # OSAGE CAPITAL LETTER EIN
+104B9; C; 104E1; # OSAGE CAPITAL LETTER HA
+104BA; C; 104E2; # OSAGE CAPITAL LETTER HYA
+104BB; C; 104E3; # OSAGE CAPITAL LETTER I
+104BC; C; 104E4; # OSAGE CAPITAL LETTER KA
+104BD; C; 104E5; # OSAGE CAPITAL LETTER EHKA
+104BE; C; 104E6; # OSAGE CAPITAL LETTER KYA
+104BF; C; 104E7; # OSAGE CAPITAL LETTER LA
+104C0; C; 104E8; # OSAGE CAPITAL LETTER MA
+104C1; C; 104E9; # OSAGE CAPITAL LETTER NA
+104C2; C; 104EA; # OSAGE CAPITAL LETTER O
+104C3; C; 104EB; # OSAGE CAPITAL LETTER OIN
+104C4; C; 104EC; # OSAGE CAPITAL LETTER PA
+104C5; C; 104ED; # OSAGE CAPITAL LETTER EHPA
+104C6; C; 104EE; # OSAGE CAPITAL LETTER SA
+104C7; C; 104EF; # OSAGE CAPITAL LETTER SHA
+104C8; C; 104F0; # OSAGE CAPITAL LETTER TA
+104C9; C; 104F1; # OSAGE CAPITAL LETTER EHTA
+104CA; C; 104F2; # OSAGE CAPITAL LETTER TSA
+104CB; C; 104F3; # OSAGE CAPITAL LETTER EHTSA
+104CC; C; 104F4; # OSAGE CAPITAL LETTER TSHA
+104CD; C; 104F5; # OSAGE CAPITAL LETTER DHA
+104CE; C; 104F6; # OSAGE CAPITAL LETTER U
+104CF; C; 104F7; # OSAGE CAPITAL LETTER WA
+104D0; C; 104F8; # OSAGE CAPITAL LETTER KHA
+104D1; C; 104F9; # OSAGE CAPITAL LETTER GHA
+104D2; C; 104FA; # OSAGE CAPITAL LETTER ZA
+104D3; C; 104FB; # OSAGE CAPITAL LETTER ZHA
 10C80; C; 10CC0; # OLD HUNGARIAN CAPITAL LETTER A
 10C81; C; 10CC1; # OLD HUNGARIAN CAPITAL LETTER AA
 10C82; C; 10CC2; # OLD HUNGARIAN CAPITAL LETTER EB
@ -1410,5 +1457,39 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
 118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU
 118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII
 118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO
+1E900; C; 1E922; # ADLAM CAPITAL LETTER ALIF
+1E901; C; 1E923; # ADLAM CAPITAL LETTER DAALI
+1E902; C; 1E924; # ADLAM CAPITAL LETTER LAAM
+1E903; C; 1E925; # ADLAM CAPITAL LETTER MIIM
+1E904; C; 1E926; # ADLAM CAPITAL LETTER BA
+1E905; C; 1E927; # ADLAM CAPITAL LETTER SINNYIIYHE
+1E906; C; 1E928; # ADLAM CAPITAL LETTER PE
+1E907; C; 1E929; # ADLAM CAPITAL LETTER BHE
+1E908; C; 1E92A; # ADLAM CAPITAL LETTER RA
+1E909; C; 1E92B; # ADLAM CAPITAL LETTER E
+1E90A; C; 1E92C; # ADLAM CAPITAL LETTER FA
+1E90B; C; 1E92D; # ADLAM CAPITAL LETTER I
+1E90C; C; 1E92E; # ADLAM CAPITAL LETTER O
+1E90D; C; 1E92F; # ADLAM CAPITAL LETTER DHA
+1E90E; C; 1E930; # ADLAM CAPITAL LETTER YHE
+1E90F; C; 1E931; # ADLAM CAPITAL LETTER WAW
+1E910; C; 1E932; # ADLAM CAPITAL LETTER NUN
+1E911; C; 1E933; # ADLAM CAPITAL LETTER KAF
+1E912; C; 1E934; # ADLAM CAPITAL LETTER YA
+1E913; C; 1E935; # ADLAM CAPITAL LETTER U
+1E914; C; 1E936; # ADLAM CAPITAL LETTER JIIM
+1E915; C; 1E937; # ADLAM CAPITAL LETTER CHI
+1E916; C; 1E938; # ADLAM CAPITAL LETTER HA
+1E917; C; 1E939; # ADLAM CAPITAL LETTER QAAF
+1E918; C; 1E93A; # ADLAM CAPITAL LETTER GA
+1E919; C; 1E93B; # ADLAM CAPITAL LETTER NYA
+1E91A; C; 1E93C; # ADLAM CAPITAL LETTER TU
+1E91B; C; 1E93D; # ADLAM CAPITAL LETTER NHA
+1E91C; C; 1E93E; # ADLAM CAPITAL LETTER VA
+1E91D; C; 1E93F; # ADLAM CAPITAL LETTER KHA
+1E91E; C; 1E940; # ADLAM CAPITAL LETTER GBE
+1E91F; C; 1E941; # ADLAM CAPITAL LETTER ZAL
+1E920; C; 1E942; # ADLAM CAPITAL LETTER KPO
+1E921; C; 1E943; # ADLAM CAPITAL LETTER SHA
 #
 # EOF
--- a/maint/Unicode.tables/DerivedGeneralCategory.txt
+++ b/maint/Unicode.tables/DerivedGeneralCategory.txt
@ -1,10 +1,11 @@
-# DerivedGeneralCategory-8.0.0.txt
-# Date: 2015-02-13, 13:47:11 GMT [MD]
+# DerivedGeneralCategory-10.0.0.txt
+# Date: 2017-03-08, 08:41:49 GMT
+# © 2017 Unicode®, Inc.
+# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
+# For terms of use, see http://www.unicode.org/terms_of_use.html
 #
 # Unicode Character Database
-# Copyright (c) 1991-2015 Unicode, Inc.
-# For terms of use, see http://www.unicode.org/terms_of_use.html
-# For documentation, see http://www.unicode.org/reports/tr44/
+#   For documentation, see http://www.unicode.org/reports/tr44/

 # ================================================

@ -36,8 +37,10 @@
 082E..082F    ; Cn #   [2] <reserved-082E>..<reserved-082F>
 083F          ; Cn #       <reserved-083F>
 085C..085D    ; Cn #   [2] <reserved-085C>..<reserved-085D>
-085F..089F    ; Cn #  [65] <reserved-085F>..<reserved-089F>
-08B5..08E2    ; Cn #  [46] <reserved-08B5>..<reserved-08E2>
+085F          ; Cn #       <reserved-085F>
+086B..089F    ; Cn #  [53] <reserved-086B>..<reserved-089F>
+08B5          ; Cn #       <reserved-08B5>
+08BE..08D3    ; Cn #  [22] <reserved-08BE>..<reserved-08D3>
 0984          ; Cn #       <reserved-0984>
 098D..098E    ; Cn #   [2] <reserved-098D>..<reserved-098E>
 0991..0992    ; Cn #   [2] <reserved-0991>..<reserved-0992>
@ -51,7 +54,7 @@
 09D8..09DB    ; Cn #   [4] <reserved-09D8>..<reserved-09DB>
 09DE          ; Cn #       <reserved-09DE>
 09E4..09E5    ; Cn #   [2] <reserved-09E4>..<reserved-09E5>
-09FC..0A00    ; Cn #   [5] <reserved-09FC>..<reserved-0A00>
+09FE..0A00    ; Cn #   [3] <reserved-09FE>..<reserved-0A00>
 0A04          ; Cn #       <reserved-0A04>
 0A0B..0A0E    ; Cn #   [4] <reserved-0A0B>..<reserved-0A0E>
 0A11..0A12    ; Cn #   [2] <reserved-0A11>..<reserved-0A12>
@ -81,7 +84,7 @@
 0AD1..0ADF    ; Cn #  [15] <reserved-0AD1>..<reserved-0ADF>
 0AE4..0AE5    ; Cn #   [2] <reserved-0AE4>..<reserved-0AE5>
 0AF2..0AF8    ; Cn #   [7] <reserved-0AF2>..<reserved-0AF8>
-0AFA..0B00    ; Cn #   [7] <reserved-0AFA>..<reserved-0B00>
+0B00          ; Cn #       <reserved-0B00>
 0B04          ; Cn #       <reserved-0B04>
 0B0D..0B0E    ; Cn #   [2] <reserved-0B0D>..<reserved-0B0E>
 0B11..0B12    ; Cn #   [2] <reserved-0B11>..<reserved-0B12>
@ -124,7 +127,6 @@
 0C5B..0C5F    ; Cn #   [5] <reserved-0C5B>..<reserved-0C5F>
 0C64..0C65    ; Cn #   [2] <reserved-0C64>..<reserved-0C65>
 0C70..0C77    ; Cn #   [8] <reserved-0C70>..<reserved-0C77>
-0C80          ; Cn #       <reserved-0C80>
 0C84          ; Cn #       <reserved-0C84>
 0C8D          ; Cn #       <reserved-0C8D>
 0C91          ; Cn #       <reserved-0C91>
@ -138,17 +140,14 @@
 0CDF          ; Cn #       <reserved-0CDF>
 0CE4..0CE5    ; Cn #   [2] <reserved-0CE4>..<reserved-0CE5>
 0CF0          ; Cn #       <reserved-0CF0>
-0CF3..0D00    ; Cn #  [14] <reserved-0CF3>..<reserved-0D00>
+0CF3..0CFF    ; Cn #  [13] <reserved-0CF3>..<reserved-0CFF>
 0D04          ; Cn #       <reserved-0D04>
 0D0D          ; Cn #       <reserved-0D0D>
 0D11          ; Cn #       <reserved-0D11>
-0D3B..0D3C    ; Cn #   [2] <reserved-0D3B>..<reserved-0D3C>
 0D45          ; Cn #       <reserved-0D45>
 0D49          ; Cn #       <reserved-0D49>
-0D4F..0D56    ; Cn #   [8] <reserved-0D4F>..<reserved-0D56>
-0D58..0D5E    ; Cn #   [7] <reserved-0D58>..<reserved-0D5E>
+0D50..0D53    ; Cn #   [4] <reserved-0D50>..<reserved-0D53>
 0D64..0D65    ; Cn #   [2] <reserved-0D64>..<reserved-0D65>
-0D76..0D78    ; Cn #   [3] <reserved-0D76>..<reserved-0D78>
 0D80..0D81    ; Cn #   [2] <reserved-0D80>..<reserved-0D81>
 0D84          ; Cn #       <reserved-0D84>
 0D97..0D99    ; Cn #   [3] <reserved-0D97>..<reserved-0D99>
@ -249,11 +248,10 @@
 1BF4..1BFB    ; Cn #   [8] <reserved-1BF4>..<reserved-1BFB>
 1C38..1C3A    ; Cn #   [3] <reserved-1C38>..<reserved-1C3A>
 1C4A..1C4C    ; Cn #   [3] <reserved-1C4A>..<reserved-1C4C>
-1C80..1CBF    ; Cn #  [64] <reserved-1C80>..<reserved-1CBF>
+1C89..1CBF    ; Cn #  [55] <reserved-1C89>..<reserved-1CBF>
 1CC8..1CCF    ; Cn #   [8] <reserved-1CC8>..<reserved-1CCF>
-1CF7          ; Cn #       <reserved-1CF7>
 1CFA..1CFF    ; Cn #   [6] <reserved-1CFA>..<reserved-1CFF>
-1DF6..1DFB    ; Cn #   [6] <reserved-1DF6>..<reserved-1DFB>
+1DFA          ; Cn #       <reserved-1DFA>
 1F16..1F17    ; Cn #   [2] <reserved-1F16>..<reserved-1F17>
 1F1E..1F1F    ; Cn #   [2] <reserved-1F1E>..<reserved-1F1F>
 1F46..1F47    ; Cn #   [2] <reserved-1F46>..<reserved-1F47>
@ -274,17 +272,16 @@
 2072..2073    ; Cn #   [2] <reserved-2072>..<reserved-2073>
 208F          ; Cn #       <reserved-208F>
 209D..209F    ; Cn #   [3] <reserved-209D>..<reserved-209F>
-20BF..20CF    ; Cn #  [17] <reserved-20BF>..<reserved-20CF>
+20C0..20CF    ; Cn #  [16] <reserved-20C0>..<reserved-20CF>
 20F1..20FF    ; Cn #  [15] <reserved-20F1>..<reserved-20FF>
 218C..218F    ; Cn #   [4] <reserved-218C>..<reserved-218F>
-23FB..23FF    ; Cn #   [5] <reserved-23FB>..<reserved-23FF>
 2427..243F    ; Cn #  [25] <reserved-2427>..<reserved-243F>
 244B..245F    ; Cn #  [21] <reserved-244B>..<reserved-245F>
 2B74..2B75    ; Cn #   [2] <reserved-2B74>..<reserved-2B75>
 2B96..2B97    ; Cn #   [2] <reserved-2B96>..<reserved-2B97>
 2BBA..2BBC    ; Cn #   [3] <reserved-2BBA>..<reserved-2BBC>
 2BC9          ; Cn #       <reserved-2BC9>
-2BD2..2BEB    ; Cn #  [26] <reserved-2BD2>..<reserved-2BEB>
+2BD3..2BEB    ; Cn #  [25] <reserved-2BD3>..<reserved-2BEB>
 2BF0..2BFF    ; Cn #  [16] <reserved-2BF0>..<reserved-2BFF>
 2C2F          ; Cn #       <reserved-2C2F>
 2C5F          ; Cn #       <reserved-2C5F>
@ -303,7 +300,7 @@
 2DCF          ; Cn #       <reserved-2DCF>
 2DD7          ; Cn #       <reserved-2DD7>
 2DDF          ; Cn #       <reserved-2DDF>
-2E43..2E7F    ; Cn #  [61] <reserved-2E43>..<reserved-2E7F>
+2E4A..2E7F    ; Cn #  [54] <reserved-2E4A>..<reserved-2E7F>
 2E9A          ; Cn #       <reserved-2E9A>
 2EF4..2EFF    ; Cn #  [12] <reserved-2EF4>..<reserved-2EFF>
 2FD6..2FEF    ; Cn #  [26] <reserved-2FD6>..<reserved-2FEF>
@ -311,24 +308,24 @@
 3040          ; Cn #       <reserved-3040>
 3097..3098    ; Cn #   [2] <reserved-3097>..<reserved-3098>
 3100..3104    ; Cn #   [5] <reserved-3100>..<reserved-3104>
-312E..3130    ; Cn #   [3] <reserved-312E>..<reserved-3130>
+312F..3130    ; Cn #   [2] <reserved-312F>..<reserved-3130>
 318F          ; Cn #       <reserved-318F>
 31BB..31BF    ; Cn #   [5] <reserved-31BB>..<reserved-31BF>
 31E4..31EF    ; Cn #  [12] <reserved-31E4>..<reserved-31EF>
 321F          ; Cn #       <reserved-321F>
 32FF          ; Cn #       <reserved-32FF>
 4DB6..4DBF    ; Cn #  [10] <reserved-4DB6>..<reserved-4DBF>
-9FD6..9FFF    ; Cn #  [42] <reserved-9FD6>..<reserved-9FFF>
+9FEB..9FFF    ; Cn #  [21] <reserved-9FEB>..<reserved-9FFF>
 A48D..A48F    ; Cn #   [3] <reserved-A48D>..<reserved-A48F>
 A4C7..A4CF    ; Cn #   [9] <reserved-A4C7>..<reserved-A4CF>
 A62C..A63F    ; Cn #  [20] <reserved-A62C>..<reserved-A63F>
 A6F8..A6FF    ; Cn #   [8] <reserved-A6F8>..<reserved-A6FF>
-A7AE..A7AF    ; Cn #   [2] <reserved-A7AE>..<reserved-A7AF>
+A7AF          ; Cn #       <reserved-A7AF>
 A7B8..A7F6    ; Cn #  [63] <reserved-A7B8>..<reserved-A7F6>
 A82C..A82F    ; Cn #   [4] <reserved-A82C>..<reserved-A82F>
 A83A..A83F    ; Cn #   [6] <reserved-A83A>..<reserved-A83F>
 A878..A87F    ; Cn #   [8] <reserved-A878>..<reserved-A87F>
-A8C5..A8CD    ; Cn #   [9] <reserved-A8C5>..<reserved-A8CD>
+A8C6..A8CD    ; Cn #   [8] <reserved-A8C6>..<reserved-A8CD>
 A8DA..A8DF    ; Cn #   [6] <reserved-A8DA>..<reserved-A8DF>
 A8FE..A8FF    ; Cn #   [2] <reserved-A8FE>..<reserved-A8FF>
 A954..A95E    ; Cn #  [11] <reserved-A954>..<reserved-A95E>
@ -390,21 +387,23 @@ FFFE..FFFF    ; Cn #   [2] <noncharacter-FFFE>..<noncharacter-FFFF>
 100FB..100FF  ; Cn #   [5] <reserved-100FB>..<reserved-100FF>
 10103..10106  ; Cn #   [4] <reserved-10103>..<reserved-10106>
 10134..10136  ; Cn #   [3] <reserved-10134>..<reserved-10136>
-1018D..1018F  ; Cn #   [3] <reserved-1018D>..<reserved-1018F>
+1018F         ; Cn #       <reserved-1018F>
 1019C..1019F  ; Cn #   [4] <reserved-1019C>..<reserved-1019F>
 101A1..101CF  ; Cn #  [47] <reserved-101A1>..<reserved-101CF>
 101FE..1027F  ; Cn # [130] <reserved-101FE>..<reserved-1027F>
 1029D..1029F  ; Cn #   [3] <reserved-1029D>..<reserved-1029F>
 102D1..102DF  ; Cn #  [15] <reserved-102D1>..<reserved-102DF>
 102FC..102FF  ; Cn #   [4] <reserved-102FC>..<reserved-102FF>
-10324..1032F  ; Cn #  [12] <reserved-10324>..<reserved-1032F>
+10324..1032C  ; Cn #   [9] <reserved-10324>..<reserved-1032C>
 1034B..1034F  ; Cn #   [5] <reserved-1034B>..<reserved-1034F>
 1037B..1037F  ; Cn #   [5] <reserved-1037B>..<reserved-1037F>
 1039E         ; Cn #       <reserved-1039E>
 103C4..103C7  ; Cn #   [4] <reserved-103C4>..<reserved-103C7>
 103D6..103FF  ; Cn #  [42] <reserved-103D6>..<reserved-103FF>
 1049E..1049F  ; Cn #   [2] <reserved-1049E>..<reserved-1049F>
-104AA..104FF  ; Cn #  [86] <reserved-104AA>..<reserved-104FF>
+104AA..104AF  ; Cn #   [6] <reserved-104AA>..<reserved-104AF>
+104D4..104D7  ; Cn #   [4] <reserved-104D4>..<reserved-104D7>
+104FC..104FF  ; Cn #   [4] <reserved-104FC>..<reserved-104FF>
 10528..1052F  ; Cn #   [8] <reserved-10528>..<reserved-1052F>
 10564..1056E  ; Cn #  [11] <reserved-10564>..<reserved-1056E>
 10570..105FF  ; Cn # [144] <reserved-10570>..<reserved-105FF>
@ -460,7 +459,7 @@ FFFE..FFFF    ; Cn #   [2] <noncharacter-FFFE>..<noncharacter-FFFF>
 111E0         ; Cn #       <reserved-111E0>
 111F5..111FF  ; Cn #  [11] <reserved-111F5>..<reserved-111FF>
 11212         ; Cn #       <reserved-11212>
-1123E..1127F  ; Cn #  [66] <reserved-1123E>..<reserved-1127F>
+1123F..1127F  ; Cn #  [65] <reserved-1123F>..<reserved-1127F>
 11287         ; Cn #       <reserved-11287>
 11289         ; Cn #       <reserved-11289>
 1128E         ; Cn #       <reserved-1128E>
@ -482,21 +481,43 @@ FFFE..FFFF    ; Cn #   [2] <noncharacter-FFFE>..<noncharacter-FFFF>
 11358..1135C  ; Cn #   [5] <reserved-11358>..<reserved-1135C>
 11364..11365  ; Cn #   [2] <reserved-11364>..<reserved-11365>
 1136D..1136F  ; Cn #   [3] <reserved-1136D>..<reserved-1136F>
-11375..1147F  ; Cn # [267] <reserved-11375>..<reserved-1147F>
+11375..113FF  ; Cn # [139] <reserved-11375>..<reserved-113FF>
+1145A         ; Cn #       <reserved-1145A>
+1145C         ; Cn #       <reserved-1145C>
+1145E..1147F  ; Cn #  [34] <reserved-1145E>..<reserved-1147F>
 114C8..114CF  ; Cn #   [8] <reserved-114C8>..<reserved-114CF>
 114DA..1157F  ; Cn # [166] <reserved-114DA>..<reserved-1157F>
 115B6..115B7  ; Cn #   [2] <reserved-115B6>..<reserved-115B7>
 115DE..115FF  ; Cn #  [34] <reserved-115DE>..<reserved-115FF>
 11645..1164F  ; Cn #  [11] <reserved-11645>..<reserved-1164F>
-1165A..1167F  ; Cn #  [38] <reserved-1165A>..<reserved-1167F>
+1165A..1165F  ; Cn #   [6] <reserved-1165A>..<reserved-1165F>
+1166D..1167F  ; Cn #  [19] <reserved-1166D>..<reserved-1167F>
 116B8..116BF  ; Cn #   [8] <reserved-116B8>..<reserved-116BF>
 116CA..116FF  ; Cn #  [54] <reserved-116CA>..<reserved-116FF>
 1171A..1171C  ; Cn #   [3] <reserved-1171A>..<reserved-1171C>
 1172C..1172F  ; Cn #   [4] <reserved-1172C>..<reserved-1172F>
 11740..1189F  ; Cn # [352] <reserved-11740>..<reserved-1189F>
 118F3..118FE  ; Cn #  [12] <reserved-118F3>..<reserved-118FE>
-11900..11ABF  ; Cn # [448] <reserved-11900>..<reserved-11ABF>
-11AF9..11FFF  ; Cn # [1287] <reserved-11AF9>..<reserved-11FFF>
+11900..119FF  ; Cn # [256] <reserved-11900>..<reserved-119FF>
+11A48..11A4F  ; Cn #   [8] <reserved-11A48>..<reserved-11A4F>
+11A84..11A85  ; Cn #   [2] <reserved-11A84>..<reserved-11A85>
+11A9D         ; Cn #       <reserved-11A9D>
+11AA3..11ABF  ; Cn #  [29] <reserved-11AA3>..<reserved-11ABF>
+11AF9..11BFF  ; Cn # [263] <reserved-11AF9>..<reserved-11BFF>
+11C09         ; Cn #       <reserved-11C09>
+11C37         ; Cn #       <reserved-11C37>
+11C46..11C4F  ; Cn #  [10] <reserved-11C46>..<reserved-11C4F>
+11C6D..11C6F  ; Cn #   [3] <reserved-11C6D>..<reserved-11C6F>
+11C90..11C91  ; Cn #   [2] <reserved-11C90>..<reserved-11C91>
+11CA8         ; Cn #       <reserved-11CA8>
+11CB7..11CFF  ; Cn #  [73] <reserved-11CB7>..<reserved-11CFF>
+11D07         ; Cn #       <reserved-11D07>
+11D0A         ; Cn #       <reserved-11D0A>
+11D37..11D39  ; Cn #   [3] <reserved-11D37>..<reserved-11D39>
+11D3B         ; Cn #       <reserved-11D3B>
+11D3E         ; Cn #       <reserved-11D3E>
+11D48..11D4F  ; Cn #   [8] <reserved-11D48>..<reserved-11D4F>
+11D5A..11FFF  ; Cn # [678] <reserved-11D5A>..<reserved-11FFF>
 1239A..123FF  ; Cn # [102] <reserved-1239A>..<reserved-123FF>
 1246F         ; Cn #       <reserved-1246F>
 12475..1247F  ; Cn #  [11] <reserved-12475>..<reserved-1247F>
@ -516,8 +537,12 @@ FFFE..FFFF    ; Cn #   [2] <noncharacter-FFFE>..<noncharacter-FFFF>
 16B90..16EFF  ; Cn # [880] <reserved-16B90>..<reserved-16EFF>
 16F45..16F4F  ; Cn #  [11] <reserved-16F45>..<reserved-16F4F>
 16F7F..16F8E  ; Cn #  [16] <reserved-16F7F>..<reserved-16F8E>
-16FA0..1AFFF  ; Cn # [16480] <reserved-16FA0>..<reserved-1AFFF>
-1B002..1BBFF  ; Cn # [3070] <reserved-1B002>..<reserved-1BBFF>
+16FA0..16FDF  ; Cn #  [64] <reserved-16FA0>..<reserved-16FDF>
+16FE2..16FFF  ; Cn #  [30] <reserved-16FE2>..<reserved-16FFF>
+187ED..187FF  ; Cn #  [19] <reserved-187ED>..<reserved-187FF>
+18AF3..1AFFF  ; Cn # [9485] <reserved-18AF3>..<reserved-1AFFF>
+1B11F..1B16F  ; Cn #  [81] <reserved-1B11F>..<reserved-1B16F>
+1B2FC..1BBFF  ; Cn # [2308] <reserved-1B2FC>..<reserved-1BBFF>
 1BC6B..1BC6F  ; Cn #   [5] <reserved-1BC6B>..<reserved-1BC6F>
 1BC7D..1BC7F  ; Cn #   [3] <reserved-1BC7D>..<reserved-1BC7F>
 1BC89..1BC8F  ; Cn #   [7] <reserved-1BC89>..<reserved-1BC8F>
@ -551,9 +576,17 @@ FFFE..FFFF    ; Cn #   [2] <noncharacter-FFFE>..<noncharacter-FFFF>
 1D7CC..1D7CD  ; Cn #   [2] <reserved-1D7CC>..<reserved-1D7CD>
 1DA8C..1DA9A  ; Cn #  [15] <reserved-1DA8C>..<reserved-1DA9A>
 1DAA0         ; Cn #       <reserved-1DAA0>
-1DAB0..1E7FF  ; Cn # [3408] <reserved-1DAB0>..<reserved-1E7FF>
+1DAB0..1DFFF  ; Cn # [1360] <reserved-1DAB0>..<reserved-1DFFF>
+1E007         ; Cn #       <reserved-1E007>
+1E019..1E01A  ; Cn #   [2] <reserved-1E019>..<reserved-1E01A>
+1E022         ; Cn #       <reserved-1E022>
+1E025         ; Cn #       <reserved-1E025>
+1E02B..1E7FF  ; Cn # [2005] <reserved-1E02B>..<reserved-1E7FF>
 1E8C5..1E8C6  ; Cn #   [2] <reserved-1E8C5>..<reserved-1E8C6>
-1E8D7..1EDFF  ; Cn # [1321] <reserved-1E8D7>..<reserved-1EDFF>
+1E8D7..1E8FF  ; Cn #  [41] <reserved-1E8D7>..<reserved-1E8FF>
+1E94B..1E94F  ; Cn #   [5] <reserved-1E94B>..<reserved-1E94F>
+1E95A..1E95D  ; Cn #   [4] <reserved-1E95A>..<reserved-1E95D>
+1E960..1EDFF  ; Cn # [1184] <reserved-1E960>..<reserved-1EDFF>
 1EE04         ; Cn #       <reserved-1EE04>
 1EE20         ; Cn #       <reserved-1EE20>
 1EE23         ; Cn #       <reserved-1EE23>
@ -597,30 +630,34 @@ FFFE..FFFF    ; Cn #   [2] <noncharacter-FFFE>..<noncharacter-FFFF>
 1F10D..1F10F  ; Cn #   [3] <reserved-1F10D>..<reserved-1F10F>
 1F12F         ; Cn #       <reserved-1F12F>
 1F16C..1F16F  ; Cn #   [4] <reserved-1F16C>..<reserved-1F16F>
-1F19B..1F1E5  ; Cn #  [75] <reserved-1F19B>..<reserved-1F1E5>
+1F1AD..1F1E5  ; Cn #  [57] <reserved-1F1AD>..<reserved-1F1E5>
 1F203..1F20F  ; Cn #  [13] <reserved-1F203>..<reserved-1F20F>
-1F23B..1F23F  ; Cn #   [5] <reserved-1F23B>..<reserved-1F23F>
+1F23C..1F23F  ; Cn #   [4] <reserved-1F23C>..<reserved-1F23F>
 1F249..1F24F  ; Cn #   [7] <reserved-1F249>..<reserved-1F24F>
-1F252..1F2FF  ; Cn # [174] <reserved-1F252>..<reserved-1F2FF>
-1F57A         ; Cn #       <reserved-1F57A>
-1F5A4         ; Cn #       <reserved-1F5A4>
-1F6D1..1F6DF  ; Cn #  [15] <reserved-1F6D1>..<reserved-1F6DF>
+1F252..1F25F  ; Cn #  [14] <reserved-1F252>..<reserved-1F25F>
+1F266..1F2FF  ; Cn # [154] <reserved-1F266>..<reserved-1F2FF>
+1F6D5..1F6DF  ; Cn #  [11] <reserved-1F6D5>..<reserved-1F6DF>
 1F6ED..1F6EF  ; Cn #   [3] <reserved-1F6ED>..<reserved-1F6EF>
-1F6F4..1F6FF  ; Cn #  [12] <reserved-1F6F4>..<reserved-1F6FF>
+1F6F9..1F6FF  ; Cn #   [7] <reserved-1F6F9>..<reserved-1F6FF>
 1F774..1F77F  ; Cn #  [12] <reserved-1F774>..<reserved-1F77F>
 1F7D5..1F7FF  ; Cn #  [43] <reserved-1F7D5>..<reserved-1F7FF>
 1F80C..1F80F  ; Cn #   [4] <reserved-1F80C>..<reserved-1F80F>
 1F848..1F84F  ; Cn #   [8] <reserved-1F848>..<reserved-1F84F>
 1F85A..1F85F  ; Cn #   [6] <reserved-1F85A>..<reserved-1F85F>
 1F888..1F88F  ; Cn #   [8] <reserved-1F888>..<reserved-1F88F>
-1F8AE..1F90F  ; Cn #  [98] <reserved-1F8AE>..<reserved-1F90F>
-1F919..1F97F  ; Cn # [103] <reserved-1F919>..<reserved-1F97F>
-1F985..1F9BF  ; Cn #  [59] <reserved-1F985>..<reserved-1F9BF>
-1F9C1..1FFFF  ; Cn # [1599] <reserved-1F9C1>..<noncharacter-1FFFF>
+1F8AE..1F8FF  ; Cn #  [82] <reserved-1F8AE>..<reserved-1F8FF>
+1F90C..1F90F  ; Cn #   [4] <reserved-1F90C>..<reserved-1F90F>
+1F93F         ; Cn #       <reserved-1F93F>
+1F94D..1F94F  ; Cn #   [3] <reserved-1F94D>..<reserved-1F94F>
+1F96C..1F97F  ; Cn #  [20] <reserved-1F96C>..<reserved-1F97F>
+1F998..1F9BF  ; Cn #  [40] <reserved-1F998>..<reserved-1F9BF>
+1F9C1..1F9CF  ; Cn #  [15] <reserved-1F9C1>..<reserved-1F9CF>
+1F9E7..1FFFF  ; Cn # [1561] <reserved-1F9E7>..<noncharacter-1FFFF>
 2A6D7..2A6FF  ; Cn #  [41] <reserved-2A6D7>..<reserved-2A6FF>
 2B735..2B73F  ; Cn #  [11] <reserved-2B735>..<reserved-2B73F>
 2B81E..2B81F  ; Cn #   [2] <reserved-2B81E>..<reserved-2B81F>
-2CEA2..2F7FF  ; Cn # [10590] <reserved-2CEA2>..<reserved-2F7FF>
+2CEA2..2CEAF  ; Cn #  [14] <reserved-2CEA2>..<reserved-2CEAF>
+2EBE1..2F7FF  ; Cn # [3103] <reserved-2EBE1>..<reserved-2F7FF>
 2FA1E..E0000  ; Cn # [722403] <reserved-2FA1E>..<reserved-E0000>
 E0002..E001F  ; Cn #  [30] <reserved-E0002>..<reserved-E001F>
 E0080..E00FF  ; Cn # [128] <reserved-E0080>..<reserved-E00FF>
@ -628,7 +665,7 @@ E01F0..EFFFF  ; Cn # [65040] <reserved-E01F0>..<noncharacter-EFFFF>
 FFFFE..FFFFF  ; Cn #   [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
 10FFFE..10FFFF; Cn #   [2] <noncharacter-10FFFE>..<noncharacter-10FFFF>

-# Total code points: 853859
+# Total code points: 837841

 # ================================================

@ -1221,11 +1258,12 @@ A7A2          ; Lu #       LATIN CAPITAL LETTER K WITH OBLIQUE STROKE
 A7A4          ; Lu #       LATIN CAPITAL LETTER N WITH OBLIQUE STROKE
 A7A6          ; Lu #       LATIN CAPITAL LETTER R WITH OBLIQUE STROKE
 A7A8          ; Lu #       LATIN CAPITAL LETTER S WITH OBLIQUE STROKE
-A7AA..A7AD    ; Lu #   [4] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER L WITH BELT
+A7AA..A7AE    ; Lu #   [5] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER SMALL CAPITAL I
 A7B0..A7B4    ; Lu #   [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA
 A7B6          ; Lu #       LATIN CAPITAL LETTER OMEGA
 FF21..FF3A    ; Lu #  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
 10400..10427  ; Lu #  [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW
+104B0..104D3  ; Lu #  [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
 10C80..10CB2  ; Lu #  [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US
 118A0..118BF  ; Lu #  [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO
 1D400..1D419  ; Lu #  [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z
@ -1259,8 +1297,9 @@ FF21..FF3A    ; Lu #  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
 1D756..1D76E  ; Lu #  [25] MATHEMATICAL SANS-SERIF BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA
 1D790..1D7A8  ; Lu #  [25] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA
 1D7CA         ; Lu #       MATHEMATICAL BOLD CAPITAL DIGAMMA
+1E900..1E921  ; Lu #  [34] ADLAM CAPITAL LETTER ALIF..ADLAM CAPITAL LETTER SHA

-# Total code points: 1631
+# Total code points: 1702

 # ================================================

@ -1537,6 +1576,7 @@ FF21..FF3A    ; Lu #  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAP
 052F          ; Ll #       CYRILLIC SMALL LETTER EL WITH DESCENDER
 0561..0587    ; Ll #  [39] ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LIGATURE ECH YIWN
 13F8..13FD    ; Ll #   [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV
+1C80..1C88    ; Ll #   [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
 1D00..1D2B    ; Ll #  [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL
 1D6B..1D77    ; Ll #  [13] LATIN SMALL LETTER UE..LATIN SMALL LETTER TURNED G
 1D79..1D9A    ; Ll #  [34] LATIN SMALL LETTER INSULAR G..LATIN SMALL LETTER EZH WITH RETROFLEX HOOK
@ -1866,6 +1906,7 @@ FB00..FB06    ; Ll #   [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE ST
 FB13..FB17    ; Ll #   [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH
 FF41..FF5A    ; Ll #  [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
 10428..1044F  ; Ll #  [40] DESERET SMALL LETTER LONG I..DESERET SMALL LETTER EW
+104D8..104FB  ; Ll #  [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
 10CC0..10CF2  ; Ll #  [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US
 118C0..118DF  ; Ll #  [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO
 1D41A..1D433  ; Ll #  [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z
@ -1896,8 +1937,9 @@ FF41..FF5A    ; Ll #  [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL
 1D7AA..1D7C2  ; Ll #  [25] MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL OMEGA
 1D7C4..1D7C9  ; Ll #   [6] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL
 1D7CB         ; Ll #       MATHEMATICAL BOLD SMALL DIGAMMA
+1E922..1E943  ; Ll #  [34] ADLAM SMALL LETTER ALIF..ADLAM SMALL LETTER SHA

-# Total code points: 1984
+# Total code points: 2063

 # ================================================

@ -1976,8 +2018,9 @@ FF70          ; Lm #       HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
 FF9E..FF9F    ; Lm #   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
 16B40..16B43  ; Lm #   [4] PAHAWH HMONG SIGN VOS SEEV..PAHAWH HMONG SIGN IB YAM
 16F93..16F9F  ; Lm #  [13] MIAO LETTER TONE-2..MIAO LETTER REFORMED TONE-8
+16FE0..16FE1  ; Lm #   [2] TANGUT ITERATION MARK..NUSHU ITERATION MARK

-# Total code points: 248
+# Total code points: 250

 # ================================================

@ -2005,7 +2048,9 @@ FF9E..FF9F    ; Lm #   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
 07CA..07EA    ; Lo #  [33] NKO LETTER A..NKO LETTER JONA RA
 0800..0815    ; Lo #  [22] SAMARITAN LETTER ALAF..SAMARITAN LETTER TAAF
 0840..0858    ; Lo #  [25] MANDAIC LETTER HALQA..MANDAIC LETTER AIN
+0860..086A    ; Lo #  [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
 08A0..08B4    ; Lo #  [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
+08B6..08BD    ; Lo #   [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
 0904..0939    ; Lo #  [54] DEVANAGARI LETTER SHORT A..DEVANAGARI LETTER HA
 093D          ; Lo #       DEVANAGARI SIGN AVAGRAHA
 0950          ; Lo #       DEVANAGARI OM
@ -2022,6 +2067,7 @@ FF9E..FF9F    ; Lm #   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
 09DC..09DD    ; Lo #   [2] BENGALI LETTER RRA..BENGALI LETTER RHA
 09DF..09E1    ; Lo #   [3] BENGALI LETTER YYA..BENGALI LETTER VOCALIC LL
 09F0..09F1    ; Lo #   [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL
+09FC          ; Lo #       BENGALI LETTER VEDIC ANUSVARA
 0A05..0A0A    ; Lo #   [6] GURMUKHI LETTER A..GURMUKHI LETTER UU
 0A0F..0A10    ; Lo #   [2] GURMUKHI LETTER EE..GURMUKHI LETTER AI
 0A13..0A28    ; Lo #  [22] GURMUKHI LETTER OO..GURMUKHI LETTER NA
@ -2070,6 +2116,7 @@ FF9E..FF9F    ; Lm #   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
 0C3D          ; Lo #       TELUGU SIGN AVAGRAHA
 0C58..0C5A    ; Lo #   [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
 0C60..0C61    ; Lo #   [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
+0C80          ; Lo #       KANNADA SIGN SPACING CANDRABINDU
 0C85..0C8C    ; Lo #   [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
 0C8E..0C90    ; Lo #   [3] KANNADA LETTER E..KANNADA LETTER AI
 0C92..0CA8    ; Lo #  [23] KANNADA LETTER O..KANNADA LETTER NA
@ -2084,6 +2131,7 @@ FF9E..FF9F    ; Lm #   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
 0D12..0D3A    ; Lo #  [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
 0D3D          ; Lo #       MALAYALAM SIGN AVAGRAHA
 0D4E          ; Lo #       MALAYALAM LETTER DOT REPH
+0D54..0D56    ; Lo #   [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
 0D5F..0D61    ; Lo #   [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
 0D7A..0D7F    ; Lo #   [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K
 0D85..0D96    ; Lo #  [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA
@ -2156,7 +2204,8 @@ FF9E..FF9F    ; Lm #   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
 17DC          ; Lo #       KHMER SIGN AVAKRAHASANYA
 1820..1842    ; Lo #  [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
 1844..1877    ; Lo #  [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
-1880..18A8    ; Lo #  [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA
+1880..1884    ; Lo #   [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
+1887..18A8    ; Lo #  [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
 18AA          ; Lo #       MONGOLIAN LETTER MANCHU ALI GALI LHA
 18B0..18F5    ; Lo #  [70] CANADIAN SYLLABICS OY..CANADIAN SYLLABICS CARRIER DENTAL S
 1900..191E    ; Lo #  [31] LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER TRA
@ -2194,12 +2243,12 @@ FF9E..FF9F    ; Lm #   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAK
 309F          ; Lo #       HIRAGANA DIGRAPH YORI
 30A1..30FA    ; Lo #  [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
 30FF          ; Lo #       KATAKANA DIGRAPH KOTO
-3105..312D    ; Lo #  [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH
+3105..312E    ; Lo #  [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
 3131..318E    ; Lo #  [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
 31A0..31BA    ; Lo #  [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY
 31F0..31FF    ; Lo #  [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
 3400..4DB5    ; Lo # [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
-4E00..9FD5    ; Lo # [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5
+4E00..9FEA    ; Lo # [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
 A000..A014    ; Lo #  [21] YI SYLLABLE IT..YI SYLLABLE E
 A016..A48C    ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
 A4D0..A4F7    ; Lo #  [40] LISU LETTER BA..LISU LETTER OE
@ -2283,7 +2332,7 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 10280..1029C  ; Lo #  [29] LYCIAN LETTER A..LYCIAN LETTER X
 102A0..102D0  ; Lo #  [49] CARIAN LETTER A..CARIAN LETTER UUU3
 10300..1031F  ; Lo #  [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
-10330..10340  ; Lo #  [17] GOTHIC LETTER AHSA..GOTHIC LETTER PAIRTHRA
+1032D..10340  ; Lo #  [20] OLD ITALIC LETTER YE..GOTHIC LETTER PAIRTHRA
 10342..10349  ; Lo #   [8] GOTHIC LETTER RAIDA..GOTHIC LETTER OTHAL
 10350..10375  ; Lo #  [38] OLD PERMIC LETTER AN..OLD PERMIC LETTER IA
 10380..1039D  ; Lo #  [30] UGARITIC LETTER ALPA..UGARITIC LETTER SSU
@ -2349,6 +2398,8 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 1133D         ; Lo #       GRANTHA SIGN AVAGRAHA
 11350         ; Lo #       GRANTHA OM
 1135D..11361  ; Lo #   [5] GRANTHA SIGN PLUTA..GRANTHA LETTER VOCALIC LL
+11400..11434  ; Lo #  [53] NEWA LETTER A..NEWA LETTER HA
+11447..1144A  ; Lo #   [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
 11480..114AF  ; Lo #  [48] TIRHUTA ANJI..TIRHUTA LETTER HA
 114C4..114C5  ; Lo #   [2] TIRHUTA SIGN AVAGRAHA..TIRHUTA GVANG
 114C7         ; Lo #       TIRHUTA OM
@ -2359,7 +2410,21 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 11680..116AA  ; Lo #  [43] TAKRI LETTER A..TAKRI LETTER RRA
 11700..11719  ; Lo #  [26] AHOM LETTER KA..AHOM LETTER JHA
 118FF         ; Lo #       WARANG CITI OM
+11A00         ; Lo #       ZANABAZAR SQUARE LETTER A
+11A0B..11A32  ; Lo #  [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
+11A3A         ; Lo #       ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
+11A50         ; Lo #       SOYOMBO LETTER A
+11A5C..11A83  ; Lo #  [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
+11A86..11A89  ; Lo #   [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
 11AC0..11AF8  ; Lo #  [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL
+11C00..11C08  ; Lo #   [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
+11C0A..11C2E  ; Lo #  [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
+11C40         ; Lo #       BHAIKSUKI SIGN AVAGRAHA
+11C72..11C8F  ; Lo #  [30] MARCHEN LETTER KA..MARCHEN LETTER A
+11D00..11D06  ; Lo #   [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
+11D08..11D09  ; Lo #   [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
+11D0B..11D30  ; Lo #  [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
+11D46         ; Lo #       MASARAM GONDI REPHA
 12000..12399  ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U
 12480..12543  ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU
 13000..1342E  ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
@ -2372,7 +2437,10 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 16B7D..16B8F  ; Lo #  [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ
 16F00..16F44  ; Lo #  [69] MIAO LETTER PA..MIAO LETTER HHA
 16F50         ; Lo #       MIAO LETTER NASALIZATION
-1B000..1B001  ; Lo #   [2] KATAKANA LETTER ARCHAIC E..HIRAGANA LETTER ARCHAIC YE
+17000..187EC  ; Lo # [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
+18800..18AF2  ; Lo # [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
+1B000..1B11E  ; Lo # [287] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER N-MU-MO-2
+1B170..1B2FB  ; Lo # [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
 1BC00..1BC6A  ; Lo # [107] DUPLOYAN LETTER H..DUPLOYAN LETTER VOCALIC M
 1BC70..1BC7C  ; Lo #  [13] DUPLOYAN AFFIX LEFT HORIZONTAL SECANT..DUPLOYAN AFFIX ATTACHED TANGENT HOOK
 1BC80..1BC88  ; Lo #   [9] DUPLOYAN AFFIX HIGH ACUTE..DUPLOYAN AFFIX HIGH VERTICAL
@ -2415,9 +2483,10 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 2A700..2B734  ; Lo # [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
 2B740..2B81D  ; Lo # [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; Lo # [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
+2CEB0..2EBE0  ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
 2F800..2FA1D  ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D

-# Total code points: 105697
+# Total code points: 121047

 # ================================================

@ -2446,6 +2515,7 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 0825..0827    ; Mn #   [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
 0829..082D    ; Mn #   [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
 0859..085B    ; Mn #   [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
+08D4..08E1    ; Mn #  [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
 08E3..0902    ; Mn #  [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
 093A          ; Mn #       DEVANAGARI VOWEL SIGN OE
 093C          ; Mn #       DEVANAGARI SIGN NUKTA
@ -2472,6 +2542,7 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 0AC7..0AC8    ; Mn #   [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
 0ACD          ; Mn #       GUJARATI SIGN VIRAMA
 0AE2..0AE3    ; Mn #   [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
+0AFA..0AFF    ; Mn #   [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
 0B01          ; Mn #       ORIYA SIGN CANDRABINDU
 0B3C          ; Mn #       ORIYA SIGN NUKTA
 0B3F          ; Mn #       ORIYA VOWEL SIGN I
@ -2494,7 +2565,8 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 0CC6          ; Mn #       KANNADA VOWEL SIGN E
 0CCC..0CCD    ; Mn #   [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
 0CE2..0CE3    ; Mn #   [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
-0D01          ; Mn #       MALAYALAM SIGN CANDRABINDU
+0D00..0D01    ; Mn #   [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
+0D3B..0D3C    ; Mn #   [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
 0D41..0D44    ; Mn #   [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
 0D4D          ; Mn #       MALAYALAM SIGN VIRAMA
 0D62..0D63    ; Mn #   [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
@ -2540,6 +2612,7 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 17C9..17D3    ; Mn #  [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
 17DD          ; Mn #       KHMER SIGN ATTHACAN
 180B..180D    ; Mn #   [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
+1885..1886    ; Mn #   [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
 18A9          ; Mn #       MONGOLIAN LETTER ALI GALI DAGALGA
 1920..1922    ; Mn #   [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
 1927..1928    ; Mn #   [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
@ -2577,8 +2650,8 @@ FFDA..FFDC    ; Lo #   [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
 1CED          ; Mn #       VEDIC SIGN TIRYAK
 1CF4          ; Mn #       VEDIC TONE CANDRA ABOVE
 1CF8..1CF9    ; Mn #   [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
-1DC0..1DF5    ; Mn #  [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
-1DFC..1DFF    ; Mn #   [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
+1DC0..1DF9    ; Mn #  [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
+1DFB..1DFF    ; Mn #   [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
 20D0..20DC    ; Mn #  [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
 20E1          ; Mn #       COMBINING LEFT RIGHT ARROW ABOVE
 20E5..20F0    ; Mn #  [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE
@ -2595,7 +2668,7 @@ A802          ; Mn #       SYLOTI NAGRI SIGN DVISVARA
 A806          ; Mn #       SYLOTI NAGRI SIGN HASANTA
 A80B          ; Mn #       SYLOTI NAGRI SIGN ANUSVARA
 A825..A826    ; Mn #   [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
-A8C4          ; Mn #       SAURASHTRA SIGN VIRAMA
+A8C4..A8C5    ; Mn #   [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
 A8E0..A8F1    ; Mn #  [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
 A926..A92D    ; Mn #   [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
 A947..A951    ; Mn #  [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
@ -2647,6 +2720,7 @@ FE20..FE2F    ; Mn #  [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
 1122F..11231  ; Mn #   [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
 11234         ; Mn #       KHOJKI SIGN ANUSVARA
 11236..11237  ; Mn #   [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
+1123E         ; Mn #       KHOJKI SIGN SUKUN
 112DF         ; Mn #       KHUDAWADI SIGN ANUSVARA
 112E3..112EA  ; Mn #   [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
 11300..11301  ; Mn #   [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
@ -2654,6 +2728,9 @@ FE20..FE2F    ; Mn #  [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
 11340         ; Mn #       GRANTHA VOWEL SIGN II
 11366..1136C  ; Mn #   [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
 11370..11374  ; Mn #   [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
+11438..1143F  ; Mn #   [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
+11442..11444  ; Mn #   [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
+11446         ; Mn #       NEWA SIGN NUKTA
 114B3..114B8  ; Mn #   [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
 114BA         ; Mn #       TIRHUTA VOWEL SIGN SHORT E
 114BF..114C0  ; Mn #   [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA
@ -2672,6 +2749,27 @@ FE20..FE2F    ; Mn #  [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
 1171D..1171F  ; Mn #   [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
 11722..11725  ; Mn #   [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
 11727..1172B  ; Mn #   [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
+11A01..11A06  ; Mn #   [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
+11A09..11A0A  ; Mn #   [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
+11A33..11A38  ; Mn #   [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
+11A3B..11A3E  ; Mn #   [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
+11A47         ; Mn #       ZANABAZAR SQUARE SUBJOINER
+11A51..11A56  ; Mn #   [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
+11A59..11A5B  ; Mn #   [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
+11A8A..11A96  ; Mn #  [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
+11A98..11A99  ; Mn #   [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
+11C30..11C36  ; Mn #   [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
+11C38..11C3D  ; Mn #   [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
+11C3F         ; Mn #       BHAIKSUKI SIGN VIRAMA
+11C92..11CA7  ; Mn #  [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
+11CAA..11CB0  ; Mn #   [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
+11CB2..11CB3  ; Mn #   [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
+11CB5..11CB6  ; Mn #   [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
+11D31..11D36  ; Mn #   [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
+11D3A         ; Mn #       MASARAM GONDI VOWEL SIGN E
+11D3C..11D3D  ; Mn #   [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
+11D3F..11D45  ; Mn #   [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
+11D47         ; Mn #       MASARAM GONDI RA-KARA
 16AF0..16AF4  ; Mn #   [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
 16B30..16B36  ; Mn #   [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
 16F8F..16F92  ; Mn #   [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -2687,10 +2785,16 @@ FE20..FE2F    ; Mn #  [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITL
 1DA84         ; Mn #       SIGNWRITING LOCATION HEAD NECK
 1DA9B..1DA9F  ; Mn #   [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
 1DAA1..1DAAF  ; Mn #  [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
+1E000..1E006  ; Mn #   [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
+1E008..1E018  ; Mn #  [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
+1E01B..1E021  ; Mn #   [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
+1E023..1E024  ; Mn #   [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
+1E026..1E02A  ; Mn #   [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
 1E8D0..1E8D6  ; Mn #   [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
+1E944..1E94A  ; Mn #   [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
 E0100..E01EF  ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

-# Total code points: 1567
+# Total code points: 1763

 # ================================================

@ -2795,6 +2899,7 @@ A670..A672    ; Me #   [3] COMBINING CYRILLIC TEN MILLIONS SIGN..COMBINING CYRIL
 1C34..1C35    ; Mc #   [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
 1CE1          ; Mc #       VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
 1CF2..1CF3    ; Mc #   [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
+1CF7          ; Mc #       VEDIC SIGN ATIKRAMA
 302E..302F    ; Mc #   [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK
 A823..A824    ; Mc #   [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
 A827          ; Mc #       SYLOTI NAGRI VOWEL SIGN OO
@ -2837,6 +2942,9 @@ ABEC          ; Mc #       MEETEI MAYEK LUM IYEK
 1134B..1134D  ; Mc #   [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
 11357         ; Mc #       GRANTHA AU LENGTH MARK
 11362..11363  ; Mc #   [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
+11435..11437  ; Mc #   [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
+11440..11441  ; Mc #   [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
+11445         ; Mc #       NEWA SIGN VISARGA
 114B0..114B2  ; Mc #   [3] TIRHUTA VOWEL SIGN AA..TIRHUTA VOWEL SIGN II
 114B9         ; Mc #       TIRHUTA VOWEL SIGN E
 114BB..114BE  ; Mc #   [4] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN AU
@ -2852,11 +2960,20 @@ ABEC          ; Mc #       MEETEI MAYEK LUM IYEK
 116B6         ; Mc #       TAKRI SIGN VIRAMA
 11720..11721  ; Mc #   [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
 11726         ; Mc #       AHOM VOWEL SIGN E
+11A07..11A08  ; Mc #   [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
+11A39         ; Mc #       ZANABAZAR SQUARE SIGN VISARGA
+11A57..11A58  ; Mc #   [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
+11A97         ; Mc #       SOYOMBO SIGN VISARGA
+11C2F         ; Mc #       BHAIKSUKI VOWEL SIGN AA
+11C3E         ; Mc #       BHAIKSUKI SIGN VISARGA
+11CA9         ; Mc #       MARCHEN SUBJOINED LETTER YA
+11CB1         ; Mc #       MARCHEN VOWEL SIGN I
+11CB4         ; Mc #       MARCHEN VOWEL SIGN O
 16F51..16F7E  ; Mc #  [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
 1D165..1D166  ; Mc #   [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
 1D16D..1D172  ; Mc #   [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5

-# Total code points: 383
+# Total code points: 401

 # ================================================

@ -2905,16 +3022,20 @@ FF10..FF19    ; Nd #  [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE
 11136..1113F  ; Nd #  [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE
 111D0..111D9  ; Nd #  [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE
 112F0..112F9  ; Nd #  [10] KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE
+11450..11459  ; Nd #  [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
 114D0..114D9  ; Nd #  [10] TIRHUTA DIGIT ZERO..TIRHUTA DIGIT NINE
 11650..11659  ; Nd #  [10] MODI DIGIT ZERO..MODI DIGIT NINE
 116C0..116C9  ; Nd #  [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE
 11730..11739  ; Nd #  [10] AHOM DIGIT ZERO..AHOM DIGIT NINE
 118E0..118E9  ; Nd #  [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE
+11C50..11C59  ; Nd #  [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
+11D50..11D59  ; Nd #  [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
 16A60..16A69  ; Nd #  [10] MRO DIGIT ZERO..MRO DIGIT NINE
 16B50..16B59  ; Nd #  [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE
 1D7CE..1D7FF  ; Nd #  [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
+1E950..1E959  ; Nd #  [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE

-# Total code points: 550
+# Total code points: 590

 # ================================================

@ -2946,7 +3067,8 @@ A6E6..A6EF    ; Nl #  [10] BAMUM LETTER MO..BAMUM LETTER KOGHOM
 0B72..0B77    ; No #   [6] ORIYA FRACTION ONE QUARTER..ORIYA FRACTION THREE SIXTEENTHS
 0BF0..0BF2    ; No #   [3] TAMIL NUMBER TEN..TAMIL NUMBER ONE THOUSAND
 0C78..0C7E    ; No #   [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR
-0D70..0D75    ; No #   [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS
+0D58..0D5E    ; No #   [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
+0D70..0D78    ; No #   [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
 0F2A..0F33    ; No #  [10] TIBETAN DIGIT HALF ONE..TIBETAN DIGIT HALF ZERO
 1369..137C    ; No #  [20] ETHIOPIC DIGIT ONE..ETHIOPIC NUMBER TEN THOUSAND
 17F0..17F9    ; No #  [10] KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK ATTAK PRAM-BUON
@ -2993,12 +3115,13 @@ A830..A835    ; No #   [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
 111E1..111F4  ; No #  [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND
 1173A..1173B  ; No #   [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY
 118EA..118F2  ; No #   [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY
+11C5A..11C6C  ; No #  [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
 16B5B..16B61  ; No #   [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS
 1D360..1D371  ; No #  [18] COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TENS DIGIT NINE
 1E8C7..1E8CF  ; No #   [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE
 1F100..1F10C  ; No #  [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO

-# Total code points: 647
+# Total code points: 676

 # ================================================

@ -3048,6 +3171,7 @@ A830..A835    ; No #   [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTIO
 061C          ; Cf #       ARABIC LETTER MARK
 06DD          ; Cf #       ARABIC END OF AYAH
 070F          ; Cf #       SYRIAC ABBREVIATION MARK
+08E2          ; Cf #       ARABIC DISPUTED END OF AYAH
 180E          ; Cf #       MONGOLIAN VOWEL SEPARATOR
 200B..200F    ; Cf #   [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK
 202A..202E    ; Cf #   [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
@ -3061,7 +3185,7 @@ FFF9..FFFB    ; Cf #   [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION
 E0001         ; Cf #       LANGUAGE TAG
 E0020..E007F  ; Cf #  [96] TAG SPACE..CANCEL TAG

-# Total code points: 150
+# Total code points: 151

 # ================================================

@ -3315,6 +3439,7 @@ FF3F          ; Pc #       FULLWIDTH LOW LINE
 085E          ; Po #       MANDAIC PUNCTUATION
 0964..0965    ; Po #   [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
 0970          ; Po #       DEVANAGARI ABBREVIATION SIGN
+09FD          ; Po #       BENGALI ABBREVIATION SIGN
 0AF0          ; Po #       GUJARATI ABBREVIATION SIGN
 0DF4          ; Po #       SINHALA PUNCTUATION KUNDDALIYA
 0E4F          ; Po #       THAI CHARACTER FONGMAN
@ -3366,6 +3491,7 @@ FF3F          ; Pc #       FULLWIDTH LOW LINE
 2E30..2E39    ; Po #  [10] RING POINT..TOP HALF SECTION SIGN
 2E3C..2E3F    ; Po #   [4] STENOGRAPHIC FULL STOP..CAPITULUM
 2E41          ; Po #       REVERSED COMMA
+2E43..2E49    ; Po #   [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
 3001..3003    ; Po #   [3] IDEOGRAPHIC COMMA..DITTO MARK
 303D          ; Po #       PART ALTERNATION MARK
 30FB          ; Po #       KATAKANA MIDDLE DOT
@ -3429,10 +3555,19 @@ FF64..FF65    ; Po #   [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
 111DD..111DF  ; Po #   [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2
 11238..1123D  ; Po #   [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
 112A9         ; Po #       MULTANI SECTION MARK
+1144B..1144F  ; Po #   [5] NEWA DANDA..NEWA ABBREVIATION SIGN
+1145B         ; Po #       NEWA PLACEHOLDER MARK
+1145D         ; Po #       NEWA INSERTION SIGN
 114C6         ; Po #       TIRHUTA ABBREVIATION SIGN
 115C1..115D7  ; Po #  [23] SIDDHAM SIGN SIDDHAM..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
 11641..11643  ; Po #   [3] MODI DANDA..MODI ABBREVIATION SIGN
+11660..1166C  ; Po #  [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT
 1173C..1173E  ; Po #   [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
+11A3F..11A46  ; Po #   [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
+11A9A..11A9C  ; Po #   [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
+11A9E..11AA2  ; Po #   [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
+11C41..11C45  ; Po #   [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
+11C70..11C71  ; Po #   [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
 12470..12474  ; Po #   [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON
 16A6E..16A6F  ; Po #   [2] MRO DANDA..MRO DOUBLE DANDA
 16AF5         ; Po #       BASSA VAH FULL STOP
@ -3440,8 +3575,9 @@ FF64..FF65    ; Po #   [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDL
 16B44         ; Po #       PAHAWH HMONG SIGN XAUS
 1BC9F         ; Po #       DUPLOYAN PUNCTUATION CHINOOK FULL STOP
 1DA87..1DA8B  ; Po #   [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS
+1E95E..1E95F  ; Po #   [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK

-# Total code points: 513
+# Total code points: 566

 # ================================================

@ -3528,7 +3664,7 @@ FFE9..FFEC    ; Sm #   [4] HALFWIDTH LEFTWARDS ARROW..HALFWIDTH DOWNWARDS ARROW
 0BF9          ; Sc #       TAMIL RUPEE SIGN
 0E3F          ; Sc #       THAI CURRENCY SYMBOL BAHT
 17DB          ; Sc #       KHMER CURRENCY SYMBOL RIEL
-20A0..20BE    ; Sc #  [31] EURO-CURRENCY SIGN..LARI SIGN
+20A0..20BF    ; Sc #  [32] EURO-CURRENCY SIGN..BITCOIN SIGN
 A838          ; Sc #       NORTH INDIC RUPEE MARK
 FDFC          ; Sc #       RIAL SIGN
 FE69          ; Sc #       SMALL DOLLAR SIGN
@ -3536,7 +3672,7 @@ FF04          ; Sc #       FULLWIDTH DOLLAR SIGN
 FFE0..FFE1    ; Sc #   [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN
 FFE5..FFE6    ; Sc #   [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN

-# Total code points: 53
+# Total code points: 54

 # ================================================

@ -3594,6 +3730,7 @@ FFE3          ; Sk #       FULLWIDTH MACRON
 0BF3..0BF8    ; So #   [6] TAMIL DAY SIGN..TAMIL AS ABOVE SIGN
 0BFA          ; So #       TAMIL NUMBER SIGN
 0C7F          ; So #       TELUGU SIGN TUUMU
+0D4F          ; So #       MALAYALAM SIGN PARA
 0D79          ; So #       MALAYALAM DATE MARK
 0F01..0F03    ; So #   [3] TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA
 0F13          ; So #       TIBETAN MARK CARET -DZUD RTAGS ME LONG CAN
@ -3642,8 +3779,7 @@ FFE3          ; Sk #       FULLWIDTH MACRON
 232B..237B    ; So #  [81] ERASE TO THE LEFT..NOT CHECK MARK
 237D..239A    ; So #  [30] SHOULDERED OPEN BOX..CLEAR SCREEN SYMBOL
 23B4..23DB    ; So #  [40] TOP SQUARE BRACKET..FUSE
-23E2..23FA    ; So #  [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD
-2400..2426    ; So #  [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
+23E2..2426    ; So #  [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
 2440..244A    ; So #  [11] OCR HOOK..OCR DOUBLE BACKSLASH
 249C..24E9    ; So #  [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
 2500..25B6    ; So # [183] BOX DRAWINGS LIGHT HORIZONTAL..BLACK RIGHT-POINTING TRIANGLE
@ -3659,7 +3795,7 @@ FFE3          ; Sk #       FULLWIDTH MACRON
 2B76..2B95    ; So #  [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
 2B98..2BB9    ; So #  [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
 2BBD..2BC8    ; So #  [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
-2BCA..2BD1    ; So #   [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN
+2BCA..2BD2    ; So #   [9] TOP HALF BLACK CIRCLE..GROUP MARK
 2BEC..2BEF    ; So #   [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
 2CE5..2CEA    ; So #   [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA
 2E80..2E99    ; So #  [26] CJK RADICAL REPEAT..CJK RADICAL RAP
@ -3694,7 +3830,7 @@ FFED..FFEE    ; So #   [2] HALFWIDTH BLACK SQUARE..HALFWIDTH WHITE CIRCLE
 FFFC..FFFD    ; So #   [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
 10137..1013F  ; So #   [9] AEGEAN WEIGHT BASE UNIT..AEGEAN MEASURE THIRD SUBUNIT
 10179..10189  ; So #  [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
-1018C         ; So #       GREEK SINUSOID SIGN
+1018C..1018E  ; So #   [3] GREEK SINUSOID SIGN..NOMISMA SIGN
 10190..1019B  ; So #  [12] ROMAN SEXTANS SIGN..ROMAN CENTURIAL SIGN
 101A0         ; So #       GREEK SYMBOL TAU RHO
 101D0..101FC  ; So #  [45] PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC SIGN WAVY BAND
@ -3727,17 +3863,16 @@ FFFC..FFFD    ; So #   [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
 1F0D1..1F0F5  ; So #  [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21
 1F110..1F12E  ; So #  [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
 1F130..1F16B  ; So #  [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
-1F170..1F19A  ; So #  [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS
+1F170..1F1AC  ; So #  [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
 1F1E6..1F202  ; So #  [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA
-1F210..1F23A  ; So #  [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6
+1F210..1F23B  ; So #  [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
 1F240..1F248  ; So #   [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
 1F250..1F251  ; So #   [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
+1F260..1F265  ; So #   [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
 1F300..1F3FA  ; So # [251] CYCLONE..AMPHORA
-1F400..1F579  ; So # [378] RAT..JOYSTICK
-1F57B..1F5A3  ; So #  [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
-1F5A5..1F6D0  ; So # [300] DESKTOP COMPUTER..PLACE OF WORSHIP
+1F400..1F6D4  ; So # [725] RAT..PAGODA
 1F6E0..1F6EC  ; So #  [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
-1F6F0..1F6F3  ; So #   [4] SATELLITE..PASSENGER SHIP
+1F6F0..1F6F8  ; So #   [9] SATELLITE..FLYING SAUCER
 1F700..1F773  ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
 1F780..1F7D4  ; So #  [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
 1F800..1F80B  ; So #  [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
@ -3745,11 +3880,15 @@ FFFC..FFFD    ; So #   [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
 1F850..1F859  ; So #  [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
 1F860..1F887  ; So #  [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
 1F890..1F8AD  ; So #  [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
-1F910..1F918  ; So #   [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS
-1F980..1F984  ; So #   [5] CRAB..UNICORN FACE
+1F900..1F90B  ; So #  [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
+1F910..1F93E  ; So #  [47] ZIPPER-MOUTH FACE..HANDBALL
+1F940..1F94C  ; So #  [13] WILTED FLOWER..CURLING STONE
+1F950..1F96B  ; So #  [28] CROISSANT..CANNED FOOD
+1F980..1F997  ; So #  [24] CRAB..CRICKET
 1F9C0         ; So #       CHEESE WEDGE
+1F9D0..1F9E6  ; So #  [23] FACE WITH MONOCLE..SOCKS

-# Total code points: 5677
+# Total code points: 5855

 # ================================================

--- a/maint/Unicode.tables/GraphemeBreakProperty.txt
+++ b/maint/Unicode.tables/GraphemeBreakProperty.txt
@ -1,10 +1,11 @@
-# GraphemeBreakProperty-8.0.0.txt
-# Date: 2015-02-13, 13:47:14 GMT [MD]
+# GraphemeBreakProperty-10.0.0.txt
+# Date: 2017-03-12, 07:03:41 GMT
+# © 2017 Unicode®, Inc.
+# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
+# For terms of use, see http://www.unicode.org/terms_of_use.html
 #
 # Unicode Character Database
-# Copyright (c) 1991-2015 Unicode, Inc.
-# For terms of use, see http://www.unicode.org/terms_of_use.html
-# For documentation, see http://www.unicode.org/reports/tr44/
+#   For documentation, see http://www.unicode.org/reports/tr44/

 # ================================================

@ -17,6 +18,21 @@

 # ================================================

+0600..0605    ; Prepend # Cf   [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
+06DD          ; Prepend # Cf       ARABIC END OF AYAH
+070F          ; Prepend # Cf       SYRIAC ABBREVIATION MARK
+08E2          ; Prepend # Cf       ARABIC DISPUTED END OF AYAH
+0D4E          ; Prepend # Lo       MALAYALAM LETTER DOT REPH
+110BD         ; Prepend # Cf       KAITHI NUMBER SIGN
+111C2..111C3  ; Prepend # Lo   [2] SHARADA SIGN JIHVAMULIYA..SHARADA SIGN UPADHMANIYA
+11A3A         ; Prepend # Lo       ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
+11A86..11A89  ; Prepend # Lo   [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
+11D46         ; Prepend # Lo       MASARAM GONDI REPHA
+
+# Total code points: 19
+
+# ================================================
+
 000D          ; CR # Cc       <control-000D>

 # Total code points: 1
@ -34,10 +50,7 @@
 000E..001F    ; Control # Cc  [18] <control-000E>..<control-001F>
 007F..009F    ; Control # Cc  [33] <control-007F>..<control-009F>
 00AD          ; Control # Cf       SOFT HYPHEN
-0600..0605    ; Control # Cf   [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
 061C          ; Control # Cf       ARABIC LETTER MARK
-06DD          ; Control # Cf       ARABIC END OF AYAH
-070F          ; Control # Cf       SYRIAC ABBREVIATION MARK
 180E          ; Control # Cf       MONGOLIAN VOWEL SEPARATOR
 200B          ; Control # Cf       ZERO WIDTH SPACE
 200E..200F    ; Control # Cf   [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
@ -51,17 +64,15 @@ D800..DFFF    ; Control # Cs [2048] <surrogate-D800>..<surrogate-DFFF>
 FEFF          ; Control # Cf       ZERO WIDTH NO-BREAK SPACE
 FFF0..FFF8    ; Control # Cn   [9] <reserved-FFF0>..<reserved-FFF8>
 FFF9..FFFB    ; Control # Cf   [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR
-110BD         ; Control # Cf       KAITHI NUMBER SIGN
 1BCA0..1BCA3  ; Control # Cf   [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP
 1D173..1D17A  ; Control # Cf   [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE
 E0000         ; Control # Cn       <reserved-E0000>
 E0001         ; Control # Cf       LANGUAGE TAG
 E0002..E001F  ; Control # Cn  [30] <reserved-E0002>..<reserved-E001F>
-E0020..E007F  ; Control # Cf  [96] TAG SPACE..CANCEL TAG
 E0080..E00FF  ; Control # Cn [128] <reserved-E0080>..<reserved-E00FF>
 E01F0..E0FFF  ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>

-# Total code points: 6030
+# Total code points: 5925

 # ================================================

@ -89,6 +100,7 @@ E01F0..E0FFF  ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
 0825..0827    ; Extend # Mn   [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
 0829..082D    ; Extend # Mn   [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
 0859..085B    ; Extend # Mn   [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
+08D4..08E1    ; Extend # Mn  [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
 08E3..0902    ; Extend # Mn  [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA
 093A          ; Extend # Mn       DEVANAGARI VOWEL SIGN OE
 093C          ; Extend # Mn       DEVANAGARI SIGN NUKTA
@ -117,6 +129,7 @@ E01F0..E0FFF  ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
 0AC7..0AC8    ; Extend # Mn   [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI
 0ACD          ; Extend # Mn       GUJARATI SIGN VIRAMA
 0AE2..0AE3    ; Extend # Mn   [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL
+0AFA..0AFF    ; Extend # Mn   [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
 0B01          ; Extend # Mn       ORIYA SIGN CANDRABINDU
 0B3C          ; Extend # Mn       ORIYA SIGN NUKTA
 0B3E          ; Extend # Mc       ORIYA VOWEL SIGN AA
@ -145,7 +158,8 @@ E01F0..E0FFF  ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
 0CCC..0CCD    ; Extend # Mn   [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
 0CD5..0CD6    ; Extend # Mc   [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
 0CE2..0CE3    ; Extend # Mn   [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
-0D01          ; Extend # Mn       MALAYALAM SIGN CANDRABINDU
+0D00..0D01    ; Extend # Mn   [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
+0D3B..0D3C    ; Extend # Mn   [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
 0D3E          ; Extend # Mc       MALAYALAM VOWEL SIGN AA
 0D41..0D44    ; Extend # Mn   [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
 0D4D          ; Extend # Mn       MALAYALAM SIGN VIRAMA
@ -195,6 +209,7 @@ E01F0..E0FFF  ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
 17C9..17D3    ; Extend # Mn  [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT
 17DD          ; Extend # Mn       KHMER SIGN ATTHACAN
 180B..180D    ; Extend # Mn   [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
+1885..1886    ; Extend # Mn   [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
 18A9          ; Extend # Mn       MONGOLIAN LETTER ALI GALI DAGALGA
 1920..1922    ; Extend # Mn   [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U
 1927..1928    ; Extend # Mn   [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O
@ -233,9 +248,9 @@ E01F0..E0FFF  ; Control # Cn [3600] <reserved-E01F0>..<reserved-E0FFF>
 1CED          ; Extend # Mn       VEDIC SIGN TIRYAK
 1CF4          ; Extend # Mn       VEDIC TONE CANDRA ABOVE
 1CF8..1CF9    ; Extend # Mn   [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
-1DC0..1DF5    ; Extend # Mn  [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
-1DFC..1DFF    ; Extend # Mn   [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
-200C..200D    ; Extend # Cf   [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
+1DC0..1DF9    ; Extend # Mn  [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
+1DFB..1DFF    ; Extend # Mn   [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
+200C          ; Extend # Cf       ZERO WIDTH NON-JOINER
 20D0..20DC    ; Extend # Mn  [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
 20DD..20E0    ; Extend # Me   [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
 20E1          ; Extend # Mn       COMBINING LEFT RIGHT ARROW ABOVE
@ -256,7 +271,7 @@ A802          ; Extend # Mn       SYLOTI NAGRI SIGN DVISVARA
 A806          ; Extend # Mn       SYLOTI NAGRI SIGN HASANTA
 A80B          ; Extend # Mn       SYLOTI NAGRI SIGN ANUSVARA
 A825..A826    ; Extend # Mn   [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E
-A8C4          ; Extend # Mn       SAURASHTRA SIGN VIRAMA
+A8C4..A8C5    ; Extend # Mn   [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
 A8E0..A8F1    ; Extend # Mn  [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
 A926..A92D    ; Extend # Mn   [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
 A947..A951    ; Extend # Mn  [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R
@ -309,6 +324,7 @@ FF9E..FF9F    ; Extend # Lm   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
 1122F..11231  ; Extend # Mn   [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI
 11234         ; Extend # Mn       KHOJKI SIGN ANUSVARA
 11236..11237  ; Extend # Mn   [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
+1123E         ; Extend # Mn       KHOJKI SIGN SUKUN
 112DF         ; Extend # Mn       KHUDAWADI SIGN ANUSVARA
 112E3..112EA  ; Extend # Mn   [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
 11300..11301  ; Extend # Mn   [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
@ -318,6 +334,9 @@ FF9E..FF9F    ; Extend # Lm   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
 11357         ; Extend # Mc       GRANTHA AU LENGTH MARK
 11366..1136C  ; Extend # Mn   [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX
 11370..11374  ; Extend # Mn   [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA
+11438..1143F  ; Extend # Mn   [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
+11442..11444  ; Extend # Mn   [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
+11446         ; Extend # Mn       NEWA SIGN NUKTA
 114B0         ; Extend # Mc       TIRHUTA VOWEL SIGN AA
 114B3..114B8  ; Extend # Mn   [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL
 114BA         ; Extend # Mn       TIRHUTA VOWEL SIGN SHORT E
@ -339,6 +358,27 @@ FF9E..FF9F    ; Extend # Lm   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
 1171D..1171F  ; Extend # Mn   [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
 11722..11725  ; Extend # Mn   [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
 11727..1172B  ; Extend # Mn   [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
+11A01..11A06  ; Extend # Mn   [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
+11A09..11A0A  ; Extend # Mn   [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
+11A33..11A38  ; Extend # Mn   [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
+11A3B..11A3E  ; Extend # Mn   [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
+11A47         ; Extend # Mn       ZANABAZAR SQUARE SUBJOINER
+11A51..11A56  ; Extend # Mn   [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
+11A59..11A5B  ; Extend # Mn   [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
+11A8A..11A96  ; Extend # Mn  [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
+11A98..11A99  ; Extend # Mn   [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
+11C30..11C36  ; Extend # Mn   [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
+11C38..11C3D  ; Extend # Mn   [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
+11C3F         ; Extend # Mn       BHAIKSUKI SIGN VIRAMA
+11C92..11CA7  ; Extend # Mn  [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
+11CAA..11CB0  ; Extend # Mn   [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
+11CB2..11CB3  ; Extend # Mn   [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
+11CB5..11CB6  ; Extend # Mn   [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
+11D31..11D36  ; Extend # Mn   [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
+11D3A         ; Extend # Mn       MASARAM GONDI VOWEL SIGN E
+11D3C..11D3D  ; Extend # Mn   [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
+11D3F..11D45  ; Extend # Mn   [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
+11D47         ; Extend # Mn       MASARAM GONDI RA-KARA
 16AF0..16AF4  ; Extend # Mn   [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE
 16B30..16B36  ; Extend # Mn   [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM
 16F8F..16F92  ; Extend # Mn   [4] MIAO TONE RIGHT..MIAO TONE BELOW
@ -356,10 +396,17 @@ FF9E..FF9F    ; Extend # Lm   [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDT
 1DA84         ; Extend # Mn       SIGNWRITING LOCATION HEAD NECK
 1DA9B..1DA9F  ; Extend # Mn   [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
 1DAA1..1DAAF  ; Extend # Mn  [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
+1E000..1E006  ; Extend # Mn   [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
+1E008..1E018  ; Extend # Mn  [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
+1E01B..1E021  ; Extend # Mn   [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
+1E023..1E024  ; Extend # Mn   [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
+1E026..1E02A  ; Extend # Mn   [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA
 1E8D0..1E8D6  ; Extend # Mn   [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
+1E944..1E94A  ; Extend # Mn   [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
+E0020..E007F  ; Extend # Cf  [96] TAG SPACE..CANCEL TAG
 E0100..E01EF  ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

-# Total code points: 1610
+# Total code points: 1901

 # ================================================

@ -444,6 +491,7 @@ E0100..E01EF  ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
 1C34..1C35    ; SpacingMark # Mc   [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG
 1CE1          ; SpacingMark # Mc       VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA
 1CF2..1CF3    ; SpacingMark # Mc   [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
+1CF7          ; SpacingMark # Mc       VEDIC SIGN ATIKRAMA
 A823..A824    ; SpacingMark # Mc   [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I
 A827          ; SpacingMark # Mc       SYLOTI NAGRI VOWEL SIGN OO
 A880..A881    ; SpacingMark # Mc   [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
@ -482,6 +530,9 @@ ABEC          ; SpacingMark # Mc       MEETEI MAYEK LUM IYEK
 11347..11348  ; SpacingMark # Mc   [2] GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI
 1134B..1134D  ; SpacingMark # Mc   [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
 11362..11363  ; SpacingMark # Mc   [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
+11435..11437  ; SpacingMark # Mc   [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
+11440..11441  ; SpacingMark # Mc   [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
+11445         ; SpacingMark # Mc       NEWA SIGN VISARGA
 114B1..114B2  ; SpacingMark # Mc   [2] TIRHUTA VOWEL SIGN I..TIRHUTA VOWEL SIGN II
 114B9         ; SpacingMark # Mc       TIRHUTA VOWEL SIGN E
 114BB..114BC  ; SpacingMark # Mc   [2] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN O
@ -498,11 +549,20 @@ ABEC          ; SpacingMark # Mc       MEETEI MAYEK LUM IYEK
 116B6         ; SpacingMark # Mc       TAKRI SIGN VIRAMA
 11720..11721  ; SpacingMark # Mc   [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
 11726         ; SpacingMark # Mc       AHOM VOWEL SIGN E
+11A07..11A08  ; SpacingMark # Mc   [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
+11A39         ; SpacingMark # Mc       ZANABAZAR SQUARE SIGN VISARGA
+11A57..11A58  ; SpacingMark # Mc   [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
+11A97         ; SpacingMark # Mc       SOYOMBO SIGN VISARGA
+11C2F         ; SpacingMark # Mc       BHAIKSUKI VOWEL SIGN AA
+11C3E         ; SpacingMark # Mc       BHAIKSUKI SIGN VISARGA
+11CA9         ; SpacingMark # Mc       MARCHEN SUBJOINED LETTER YA
+11CB1         ; SpacingMark # Mc       MARCHEN VOWEL SIGN I
+11CB4         ; SpacingMark # Mc       MARCHEN VOWEL SIGN O
 16F51..16F7E  ; SpacingMark # Mc  [46] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN NG
 1D166         ; SpacingMark # Mc       MUSICAL SYMBOL COMBINING SPRECHGESANG STEM
 1D16D         ; SpacingMark # Mc       MUSICAL SYMBOL COMBINING AUGMENTATION DOT

-# Total code points: 330
+# Total code points: 348

 # ================================================

@ -1333,4 +1393,83 @@ D789..D7A3    ; LVT # Lo  [27] HANGUL SYLLABLE HIG..HANGUL SYLLABLE HIH

 # Total code points: 10773

+# ================================================
+
+261D          ; E_Base # So       WHITE UP POINTING INDEX
+26F9          ; E_Base # So       PERSON WITH BALL
+270A..270D    ; E_Base # So   [4] RAISED FIST..WRITING HAND
+1F385         ; E_Base # So       FATHER CHRISTMAS
+1F3C2..1F3C4  ; E_Base # So   [3] SNOWBOARDER..SURFER
+1F3C7         ; E_Base # So       HORSE RACING
+1F3CA..1F3CC  ; E_Base # So   [3] SWIMMER..GOLFER
+1F442..1F443  ; E_Base # So   [2] EAR..NOSE
+1F446..1F450  ; E_Base # So  [11] WHITE UP POINTING BACKHAND INDEX..OPEN HANDS SIGN
+1F46E         ; E_Base # So       POLICE OFFICER
+1F470..1F478  ; E_Base # So   [9] BRIDE WITH VEIL..PRINCESS
+1F47C         ; E_Base # So       BABY ANGEL
+1F481..1F483  ; E_Base # So   [3] INFORMATION DESK PERSON..DANCER
+1F485..1F487  ; E_Base # So   [3] NAIL POLISH..HAIRCUT
+1F4AA         ; E_Base # So       FLEXED BICEPS
+1F574..1F575  ; E_Base # So   [2] MAN IN BUSINESS SUIT LEVITATING..SLEUTH OR SPY
+1F57A         ; E_Base # So       MAN DANCING
+1F590         ; E_Base # So       RAISED HAND WITH FINGERS SPLAYED
+1F595..1F596  ; E_Base # So   [2] REVERSED HAND WITH MIDDLE FINGER EXTENDED..RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS
+1F645..1F647  ; E_Base # So   [3] FACE WITH NO GOOD GESTURE..PERSON BOWING DEEPLY
+1F64B..1F64F  ; E_Base # So   [5] HAPPY PERSON RAISING ONE HAND..PERSON WITH FOLDED HANDS
+1F6A3         ; E_Base # So       ROWBOAT
+1F6B4..1F6B6  ; E_Base # So   [3] BICYCLIST..PEDESTRIAN
+1F6C0         ; E_Base # So       BATH
+1F6CC         ; E_Base # So       SLEEPING ACCOMMODATION
+1F918..1F91C  ; E_Base # So   [5] SIGN OF THE HORNS..RIGHT-FACING FIST
+1F91E..1F91F  ; E_Base # So   [2] HAND WITH INDEX AND MIDDLE FINGERS CROSSED..I LOVE YOU HAND SIGN
+1F926         ; E_Base # So       FACE PALM
+1F930..1F939  ; E_Base # So  [10] PREGNANT WOMAN..JUGGLING
+1F93D..1F93E  ; E_Base # So   [2] WATER POLO..HANDBALL
+1F9D1..1F9DD  ; E_Base # So  [13] ADULT..ELF
+
+# Total code points: 98
+
+# ================================================
+
+1F3FB..1F3FF  ; E_Modifier # Sk   [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
+
+# Total code points: 5
+
+# ================================================
+
+200D          ; ZWJ # Cf       ZERO WIDTH JOINER
+
+# Total code points: 1
+
+# ================================================
+
+2640          ; Glue_After_Zwj # So       FEMALE SIGN
+2642          ; Glue_After_Zwj # So       MALE SIGN
+2695..2696    ; Glue_After_Zwj # So   [2] STAFF OF AESCULAPIUS..SCALES
+2708          ; Glue_After_Zwj # So       AIRPLANE
+2764          ; Glue_After_Zwj # So       HEAVY BLACK HEART
+1F308         ; Glue_After_Zwj # So       RAINBOW
+1F33E         ; Glue_After_Zwj # So       EAR OF RICE
+1F373         ; Glue_After_Zwj # So       COOKING
+1F393         ; Glue_After_Zwj # So       GRADUATION CAP
+1F3A4         ; Glue_After_Zwj # So       MICROPHONE
+1F3A8         ; Glue_After_Zwj # So       ARTIST PALETTE
+1F3EB         ; Glue_After_Zwj # So       SCHOOL
+1F3ED         ; Glue_After_Zwj # So       FACTORY
+1F48B         ; Glue_After_Zwj # So       KISS MARK
+1F4BB..1F4BC  ; Glue_After_Zwj # So   [2] PERSONAL COMPUTER..BRIEFCASE
+1F527         ; Glue_After_Zwj # So       WRENCH
+1F52C         ; Glue_After_Zwj # So       MICROSCOPE
+1F5E8         ; Glue_After_Zwj # So       LEFT SPEECH BUBBLE
+1F680         ; Glue_After_Zwj # So       ROCKET
+1F692         ; Glue_After_Zwj # So       FIRE ENGINE
+
+# Total code points: 22
+
+# ================================================
+
+1F466..1F469  ; E_Base_GAZ # So   [4] BOY..WOMAN
+
+# Total code points: 4
+
 # EOF
--- a/maint/Unicode.tables/Scripts.txt
+++ b/maint/Unicode.tables/Scripts.txt
@ -1,10 +1,11 @@
-# Scripts-8.0.0.txt
-# Date: 2015-03-11, 22:29:42 GMT [MD]
+# Scripts-10.0.0.txt
+# Date: 2017-03-11, 06:40:37 GMT
+# © 2017 Unicode®, Inc.
+# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
+# For terms of use, see http://www.unicode.org/terms_of_use.html
 #
 # Unicode Character Database
-# Copyright (c) 1991-2015 Unicode, Inc.
-# For terms of use, see http://www.unicode.org/terms_of_use.html
-# For documentation, see http://www.unicode.org/reports/tr44/
+#   For documentation, see http://www.unicode.org/reports/tr44/
 # For more information, see:
 #   UAX #24, Unicode Script Property: http://www.unicode.org/reports/tr24/
 #     Especially the sections:
@ -92,10 +93,10 @@
 0605          ; Common # Cf       ARABIC NUMBER MARK ABOVE
 060C          ; Common # Po       ARABIC COMMA
 061B          ; Common # Po       ARABIC SEMICOLON
-061C          ; Common # Cf       ARABIC LETTER MARK
 061F          ; Common # Po       ARABIC QUESTION MARK
 0640          ; Common # Lm       ARABIC TATWEEL
 06DD          ; Common # Cf       ARABIC END OF AYAH
+08E2          ; Common # Cf       ARABIC DISPUTED END OF AYAH
 0964..0965    ; Common # Po   [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
 0E3F          ; Common # Sc       THAI CURRENCY SYMBOL BAHT
 0FD5..0FD8    ; Common # So   [4] RIGHT-FACING SVASTI SIGN..LEFT-FACING SVASTI SIGN WITH DOTS
@ -110,6 +111,7 @@
 1CEE..1CF1    ; Common # Lo   [4] VEDIC SIGN HEXIFORM LONG ANUSVARA..VEDIC SIGN ANUSVARA UBHAYATO MUKHA
 1CF2..1CF3    ; Common # Mc   [2] VEDIC SIGN ARDHAVISARGA..VEDIC SIGN ROTATED ARDHAVISARGA
 1CF5..1CF6    ; Common # Lo   [2] VEDIC SIGN JIHVAMULIYA..VEDIC SIGN UPADHMANIYA
+1CF7          ; Common # Mc       VEDIC SIGN ATIKRAMA
 2000..200A    ; Common # Zs  [11] EN QUAD..HAIR SPACE
 200B          ; Common # Cf       ZERO WIDTH SPACE
 200E..200F    ; Common # Cf   [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
@ -153,7 +155,7 @@
 208A..208C    ; Common # Sm   [3] SUBSCRIPT PLUS SIGN..SUBSCRIPT EQUALS SIGN
 208D          ; Common # Ps       SUBSCRIPT LEFT PARENTHESIS
 208E          ; Common # Pe       SUBSCRIPT RIGHT PARENTHESIS
-20A0..20BE    ; Common # Sc  [31] EURO-CURRENCY SIGN..LARI SIGN
+20A0..20BF    ; Common # Sc  [32] EURO-CURRENCY SIGN..BITCOIN SIGN
 2100..2101    ; Common # So   [2] ACCOUNT OF..ADDRESSED TO THE SUBJECT
 2102          ; Common # L&       DOUBLE-STRUCK CAPITAL C
 2103..2106    ; Common # So   [4] DEGREE CELSIUS..CADA UNA
@ -223,8 +225,7 @@
 239B..23B3    ; Common # Sm  [25] LEFT PARENTHESIS UPPER HOOK..SUMMATION BOTTOM
 23B4..23DB    ; Common # So  [40] TOP SQUARE BRACKET..FUSE
 23DC..23E1    ; Common # Sm   [6] TOP PARENTHESIS..BOTTOM TORTOISE SHELL BRACKET
-23E2..23FA    ; Common # So  [25] WHITE TRAPEZIUM..BLACK CIRCLE FOR RECORD
-2400..2426    ; Common # So  [39] SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM TWO
+23E2..2426    ; Common # So  [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO
 2440..244A    ; Common # So  [11] OCR HOOK..OCR DOUBLE BACKSLASH
 2460..249B    ; Common # No  [60] CIRCLED DIGIT ONE..NUMBER TWENTY FULL STOP
 249C..24E9    ; Common # So  [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z
@ -309,7 +310,7 @@
 2B76..2B95    ; Common # So  [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW
 2B98..2BB9    ; Common # So  [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
 2BBD..2BC8    ; Common # So  [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
-2BCA..2BD1    ; Common # So   [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN
+2BCA..2BD2    ; Common # So   [9] TOP HALF BLACK CIRCLE..GROUP MARK
 2BEC..2BEF    ; Common # So   [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
 2E00..2E01    ; Common # Po   [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER
 2E02          ; Common # Pi       LEFT SUBSTITUTION BRACKET
@ -348,6 +349,7 @@
 2E40          ; Common # Pd       DOUBLE HYPHEN
 2E41          ; Common # Po       REVERSED COMMA
 2E42          ; Common # Ps       DOUBLE LOW-REVERSED-9 QUOTATION MARK
+2E43..2E49    ; Common # Po   [7] DASH WITH LEFT UPTURN..DOUBLE STACKED COMMA
 2FF0..2FFB    ; Common # So  [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
 3000          ; Common # Zs       IDEOGRAPHIC SPACE
 3001..3003    ; Common # Po   [3] IDEOGRAPHIC COMMA..DITTO MARK
@ -572,19 +574,18 @@ FFFC..FFFD    ; Common # So   [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
 1F100..1F10C  ; Common # No  [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
 1F110..1F12E  ; Common # So  [31] PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLED WZ
 1F130..1F16B  ; Common # So  [60] SQUARED LATIN CAPITAL LETTER A..RAISED MD SIGN
-1F170..1F19A  ; Common # So  [43] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VS
+1F170..1F1AC  ; Common # So  [61] NEGATIVE SQUARED LATIN CAPITAL LETTER A..SQUARED VOD
 1F1E6..1F1FF  ; Common # So  [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
 1F201..1F202  ; Common # So   [2] SQUARED KATAKANA KOKO..SQUARED KATAKANA SA
-1F210..1F23A  ; Common # So  [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6
+1F210..1F23B  ; Common # So  [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D
 1F240..1F248  ; Common # So   [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
 1F250..1F251  ; Common # So   [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
+1F260..1F265  ; Common # So   [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI
 1F300..1F3FA  ; Common # So [251] CYCLONE..AMPHORA
 1F3FB..1F3FF  ; Common # Sk   [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
-1F400..1F579  ; Common # So [378] RAT..JOYSTICK
-1F57B..1F5A3  ; Common # So  [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
-1F5A5..1F6D0  ; Common # So [300] DESKTOP COMPUTER..PLACE OF WORSHIP
+1F400..1F6D4  ; Common # So [725] RAT..PAGODA
 1F6E0..1F6EC  ; Common # So  [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
-1F6F0..1F6F3  ; Common # So   [4] SATELLITE..PASSENGER SHIP
+1F6F0..1F6F8  ; Common # So   [9] SATELLITE..FLYING SAUCER
 1F700..1F773  ; Common # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
 1F780..1F7D4  ; Common # So  [85] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..HEAVY TWELVE POINTED PINWHEEL STAR
 1F800..1F80B  ; Common # So  [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD
@ -592,13 +593,17 @@ FFFC..FFFD    ; Common # So   [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHAR
 1F850..1F859  ; Common # So  [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
 1F860..1F887  ; Common # So  [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
 1F890..1F8AD  ; Common # So  [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
-1F910..1F918  ; Common # So   [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS
-1F980..1F984  ; Common # So   [5] CRAB..UNICORN FACE
+1F900..1F90B  ; Common # So  [12] CIRCLED CROSS FORMEE WITH FOUR DOTS..DOWNWARD FACING NOTCHED HOOK WITH DOT
+1F910..1F93E  ; Common # So  [47] ZIPPER-MOUTH FACE..HANDBALL
+1F940..1F94C  ; Common # So  [13] WILTED FLOWER..CURLING STONE
+1F950..1F96B  ; Common # So  [28] CROISSANT..CANNED FOOD
+1F980..1F997  ; Common # So  [24] CRAB..CRICKET
 1F9C0         ; Common # So       CHEESE WEDGE
+1F9D0..1F9E6  ; Common # So  [23] FACE WITH MONOCLE..SOCKS
 E0001         ; Common # Cf       LANGUAGE TAG
 E0020..E007F  ; Common # Cf  [96] TAG SPACE..CANCEL TAG

-# Total code points: 7179
+# Total code points: 7363

 # ================================================

@ -641,7 +646,7 @@ A770          ; Latin # Lm       MODIFIER LETTER US
 A771..A787    ; Latin # L&  [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR T
 A78B..A78E    ; Latin # L&   [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT
 A78F          ; Latin # Lo       LATIN LETTER SINOLOGICAL DOT
-A790..A7AD    ; Latin # L&  [30] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER L WITH BELT
+A790..A7AE    ; Latin # L&  [31] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER SMALL CAPITAL I
 A7B0..A7B7    ; Latin # L&   [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA
 A7F7          ; Latin # Lo       LATIN EPIGRAPHIC LETTER SIDEWAYS I
 A7F8..A7F9    ; Latin # Lm   [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE
@ -654,7 +659,7 @@ FB00..FB06    ; Latin # L&   [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE S
 FF21..FF3A    ; Latin # L&  [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
 FF41..FF5A    ; Latin # L&  [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z

-# Total code points: 1349
+# Total code points: 1350

 # ================================================

@ -708,13 +713,13 @@ AB65          ; Greek # L&       GREEK LETTER SMALL CAPITAL OMEGA
 10175..10178  ; Greek # No   [4] GREEK ONE HALF SIGN..GREEK THREE QUARTERS SIGN
 10179..10189  ; Greek # So  [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN
 1018A..1018B  ; Greek # No   [2] GREEK ZERO SIGN..GREEK ONE QUARTER SIGN
-1018C         ; Greek # So       GREEK SINUSOID SIGN
+1018C..1018E  ; Greek # So   [3] GREEK SINUSOID SIGN..NOMISMA SIGN
 101A0         ; Greek # So       GREEK SYMBOL TAU RHO
 1D200..1D241  ; Greek # So  [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54
 1D242..1D244  ; Greek # Mn   [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME
 1D245         ; Greek # So       GREEK MUSICAL LEIMMA

-# Total code points: 516
+# Total code points: 518

 # ================================================

@ -724,6 +729,7 @@ AB65          ; Greek # L&       GREEK LETTER SMALL CAPITAL OMEGA
 0487          ; Cyrillic # Mn       COMBINING CYRILLIC POKRYTIE
 0488..0489    ; Cyrillic # Me   [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN
 048A..052F    ; Cyrillic # L& [166] CYRILLIC CAPITAL LETTER SHORT I WITH TAIL..CYRILLIC SMALL LETTER EL WITH DESCENDER
+1C80..1C88    ; Cyrillic # L&   [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK
 1D2B          ; Cyrillic # L&       CYRILLIC LETTER SMALL CAPITAL EL
 1D78          ; Cyrillic # Lm       MODIFIER LETTER CYRILLIC EN
 2DE0..2DFF    ; Cyrillic # Mn  [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS
@ -740,7 +746,7 @@ A69C..A69D    ; Cyrillic # Lm   [2] MODIFIER LETTER CYRILLIC HARD SIGN..MODIFIER
 A69E..A69F    ; Cyrillic # Mn   [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E
 FE2E..FE2F    ; Cyrillic # Mn   [2] COMBINING CYRILLIC TITLO LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF

-# Total code points: 434
+# Total code points: 443

 # ================================================

@ -791,6 +797,7 @@ FB46..FB4F    ; Hebrew # Lo  [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
 060D          ; Arabic # Po       ARABIC DATE SEPARATOR
 060E..060F    ; Arabic # So   [2] ARABIC POETIC VERSE SIGN..ARABIC SIGN MISRA
 0610..061A    ; Arabic # Mn  [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA
+061C          ; Arabic # Cf       ARABIC LETTER MARK
 061E          ; Arabic # Po       ARABIC TRIPLE DOT PUNCTUATION MARK
 0620..063F    ; Arabic # Lo  [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE
 0641..064A    ; Arabic # Lo  [10] ARABIC LETTER FEH..ARABIC LETTER YEH
@ -815,6 +822,8 @@ FB46..FB4F    ; Hebrew # Lo  [10] HEBREW LETTER TSADI WITH DAGESH..HEBREW LIGATU
 06FF          ; Arabic # Lo       ARABIC LETTER HEH WITH INVERTED V
 0750..077F    ; Arabic # Lo  [48] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS ABOVE
 08A0..08B4    ; Arabic # Lo  [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
+08B6..08BD    ; Arabic # Lo   [8] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER AFRICAN NOON
+08D4..08E1    ; Arabic # Mn  [14] ARABIC SMALL HIGH WORD AR-RUB..ARABIC SMALL HIGH SIGN SAFHA
 08E3..08FF    ; Arabic # Mn  [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
 FB50..FBB1    ; Arabic # Lo  [98] ARABIC LETTER ALEF WASLA ISOLATED FORM..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
 FBB2..FBC1    ; Arabic # Sk  [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW
@ -862,7 +871,7 @@ FE76..FEFC    ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
 1EEAB..1EEBB  ; Arabic # Lo  [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN
 1EEF0..1EEF1  ; Arabic # Sm   [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL

-# Total code points: 1257
+# Total code points: 1280

 # ================================================

@ -873,8 +882,9 @@ FE76..FEFC    ; Arabic # Lo [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LA
 0712..072F    ; Syriac # Lo  [30] SYRIAC LETTER BETH..SYRIAC LETTER PERSIAN DHALATH
 0730..074A    ; Syriac # Mn  [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
 074D..074F    ; Syriac # Lo   [3] SYRIAC LETTER SOGDIAN ZHAIN..SYRIAC LETTER SOGDIAN FE
+0860..086A    ; Syriac # Lo  [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA

-# Total code points: 77
+# Total code points: 88

 # ================================================

@ -944,8 +954,10 @@ A8FD          ; Devanagari # Lo       DEVANAGARI JAIN OM
 09F4..09F9    ; Bengali # No   [6] BENGALI CURRENCY NUMERATOR ONE..BENGALI CURRENCY DENOMINATOR SIXTEEN
 09FA          ; Bengali # So       BENGALI ISSHAR
 09FB          ; Bengali # Sc       BENGALI GANDA MARK
+09FC          ; Bengali # Lo       BENGALI LETTER VEDIC ANUSVARA
+09FD          ; Bengali # Po       BENGALI ABBREVIATION SIGN

-# Total code points: 93
+# Total code points: 95

 # ================================================

@ -998,8 +1010,9 @@ A8FD          ; Devanagari # Lo       DEVANAGARI JAIN OM
 0AF0          ; Gujarati # Po       GUJARATI ABBREVIATION SIGN
 0AF1          ; Gujarati # Sc       GUJARATI RUPEE SIGN
 0AF9          ; Gujarati # Lo       GUJARATI LETTER ZHA
+0AFA..0AFF    ; Gujarati # Mn   [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE

-# Total code points: 85
+# Total code points: 91

 # ================================================

@ -1086,6 +1099,7 @@ A8FD          ; Devanagari # Lo       DEVANAGARI JAIN OM

 # ================================================

+0C80          ; Kannada # Lo       KANNADA SIGN SPACING CANDRABINDU
 0C81          ; Kannada # Mn       KANNADA SIGN CANDRABINDU
 0C82..0C83    ; Kannada # Mc   [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
 0C85..0C8C    ; Kannada # Lo   [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L
@ -1109,15 +1123,16 @@ A8FD          ; Devanagari # Lo       DEVANAGARI JAIN OM
 0CE6..0CEF    ; Kannada # Nd  [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
 0CF1..0CF2    ; Kannada # Lo   [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA

-# Total code points: 87
+# Total code points: 88

 # ================================================

-0D01          ; Malayalam # Mn       MALAYALAM SIGN CANDRABINDU
+0D00..0D01    ; Malayalam # Mn   [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU
 0D02..0D03    ; Malayalam # Mc   [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA
 0D05..0D0C    ; Malayalam # Lo   [8] MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC L
 0D0E..0D10    ; Malayalam # Lo   [3] MALAYALAM LETTER E..MALAYALAM LETTER AI
 0D12..0D3A    ; Malayalam # Lo  [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA
+0D3B..0D3C    ; Malayalam # Mn   [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
 0D3D          ; Malayalam # Lo       MALAYALAM SIGN AVAGRAHA
 0D3E..0D40    ; Malayalam # Mc   [3] MALAYALAM VOWEL SIGN AA..MALAYALAM VOWEL SIGN II
 0D41..0D44    ; Malayalam # Mn   [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR
@ -1125,15 +1140,18 @@ A8FD          ; Devanagari # Lo       DEVANAGARI JAIN OM
 0D4A..0D4C    ; Malayalam # Mc   [3] MALAYALAM VOWEL SIGN O..MALAYALAM VOWEL SIGN AU
 0D4D          ; Malayalam # Mn       MALAYALAM SIGN VIRAMA
 0D4E          ; Malayalam # Lo       MALAYALAM LETTER DOT REPH
+0D4F          ; Malayalam # So       MALAYALAM SIGN PARA
+0D54..0D56    ; Malayalam # Lo   [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL
 0D57          ; Malayalam # Mc       MALAYALAM AU LENGTH MARK
+0D58..0D5E    ; Malayalam # No   [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH
 0D5F..0D61    ; Malayalam # Lo   [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
 0D62..0D63    ; Malayalam # Mn   [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
 0D66..0D6F    ; Malayalam # Nd  [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
-0D70..0D75    ; Malayalam # No   [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS
+0D70..0D78    ; Malayalam # No   [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS
 0D79          ; Malayalam # So       MALAYALAM DATE MARK
 0D7A..0D7F    ; Malayalam # Lo   [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K

-# Total code points: 100
+# Total code points: 117

 # ================================================

@ -1436,21 +1454,24 @@ AB70..ABBF    ; Cherokee # L&  [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETT
 1820..1842    ; Mongolian # Lo  [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI
 1843          ; Mongolian # Lm       MONGOLIAN LETTER TODO LONG VOWEL SIGN
 1844..1877    ; Mongolian # Lo  [52] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER MANCHU ZHA
-1880..18A8    ; Mongolian # Lo  [41] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER MANCHU ALI GALI BHA
+1880..1884    ; Mongolian # Lo   [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA
+1885..1886    ; Mongolian # Mn   [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA
+1887..18A8    ; Mongolian # Lo  [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA
 18A9          ; Mongolian # Mn       MONGOLIAN LETTER ALI GALI DAGALGA
 18AA          ; Mongolian # Lo       MONGOLIAN LETTER MANCHU ALI GALI LHA
+11660..1166C  ; Mongolian # Po  [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT

-# Total code points: 153
+# Total code points: 166

 # ================================================

 3041..3096    ; Hiragana # Lo  [86] HIRAGANA LETTER SMALL A..HIRAGANA LETTER SMALL KE
 309D..309E    ; Hiragana # Lm   [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
 309F          ; Hiragana # Lo       HIRAGANA DIGRAPH YORI
-1B001         ; Hiragana # Lo       HIRAGANA LETTER ARCHAIC YE
+1B001..1B11E  ; Hiragana # Lo [286] HIRAGANA LETTER ARCHAIC YE..HENTAIGANA LETTER N-MU-MO-2
 1F200         ; Hiragana # So       SQUARE HIRAGANA HOKA

-# Total code points: 91
+# Total code points: 376

 # ================================================

@ -1469,10 +1490,10 @@ FF71..FF9D    ; Katakana # Lo  [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
 # ================================================

 02EA..02EB    ; Bopomofo # Sk   [2] MODIFIER LETTER YIN DEPARTING TONE MARK..MODIFIER LETTER YANG DEPARTING TONE MARK
-3105..312D    ; Bopomofo # Lo  [41] BOPOMOFO LETTER B..BOPOMOFO LETTER IH
+3105..312E    ; Bopomofo # Lo  [42] BOPOMOFO LETTER B..BOPOMOFO LETTER O WITH DOT ABOVE
 31A0..31BA    ; Bopomofo # Lo  [27] BOPOMOFO LETTER BU..BOPOMOFO LETTER ZY

-# Total code points: 70
+# Total code points: 71

 # ================================================

@ -1485,16 +1506,17 @@ FF71..FF9D    ; Katakana # Lo  [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAK
 3038..303A    ; Han # Nl   [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY
 303B          ; Han # Lm       VERTICAL IDEOGRAPHIC ITERATION MARK
 3400..4DB5    ; Han # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
-4E00..9FD5    ; Han # Lo [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5
+4E00..9FEA    ; Han # Lo [20971] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FEA
 F900..FA6D    ; Han # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D
 FA70..FAD9    ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
 20000..2A6D6  ; Han # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
 2A700..2B734  ; Han # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
 2B740..2B81D  ; Han # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
 2B820..2CEA1  ; Han # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
+2CEB0..2EBE0  ; Han # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
 2F800..2FA1D  ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D

-# Total code points: 81734
+# Total code points: 89228

 # ================================================

@ -1509,8 +1531,9 @@ A490..A4C6    ; Yi # So  [55] YI RADICAL QOT..YI RADICAL KE

 10300..1031F  ; Old_Italic # Lo  [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS
 10320..10323  ; Old_Italic # No   [4] OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL FIFTY
+1032D..1032F  ; Old_Italic # Lo   [3] OLD ITALIC LETTER YE..OLD ITALIC LETTER SOUTHERN TSE

-# Total code points: 36
+# Total code points: 39

 # ================================================

@ -1542,8 +1565,8 @@ A490..A4C6    ; Yi # So  [55] YI RADICAL QOT..YI RADICAL KE
 1CED          ; Inherited # Mn       VEDIC SIGN TIRYAK
 1CF4          ; Inherited # Mn       VEDIC TONE CANDRA ABOVE
 1CF8..1CF9    ; Inherited # Mn   [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
-1DC0..1DF5    ; Inherited # Mn  [54] COMBINING DOTTED GRAVE ACCENT..COMBINING UP TACK ABOVE
-1DFC..1DFF    ; Inherited # Mn   [4] COMBINING DOUBLE INVERTED BREVE BELOW..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
+1DC0..1DF9    ; Inherited # Mn  [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW
+1DFB..1DFF    ; Inherited # Mn   [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
 200C..200D    ; Inherited # Cf   [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
 20D0..20DC    ; Inherited # Mn  [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
 20DD..20E0    ; Inherited # Me   [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
@ -1562,7 +1585,7 @@ FE20..FE2D    ; Inherited # Mn  [14] COMBINING LIGATURE LEFT HALF..COMBINING CON
 1D1AA..1D1AD  ; Inherited # Mn   [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
 E0100..E01EF  ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

-# Total code points: 563
+# Total code points: 568

 # ================================================

@ -1705,8 +1728,13 @@ E0100..E01EF  ; Inherited # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-2

 2C00..2C2E    ; Glagolitic # L&  [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
 2C30..2C5E    ; Glagolitic # L&  [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
+1E000..1E006  ; Glagolitic # Mn   [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE
+1E008..1E018  ; Glagolitic # Mn  [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU
+1E01B..1E021  ; Glagolitic # Mn   [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI
+1E023..1E024  ; Glagolitic # Mn   [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS
+1E026..1E02A  ; Glagolitic # Mn   [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA

-# Total code points: 94
+# Total code points: 132

 # ================================================

@ -1872,11 +1900,11 @@ A62A..A62B    ; Vai # Lo   [2] VAI SYLLABLE NDOLE MA..VAI SYLLABLE NDOLE DO
 A880..A881    ; Saurashtra # Mc   [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA
 A882..A8B3    ; Saurashtra # Lo  [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA
 A8B4..A8C3    ; Saurashtra # Mc  [16] SAURASHTRA CONSONANT SIGN HAARU..SAURASHTRA VOWEL SIGN AU
-A8C4          ; Saurashtra # Mn       SAURASHTRA SIGN VIRAMA
+A8C4..A8C5    ; Saurashtra # Mn   [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU
 A8CE..A8CF    ; Saurashtra # Po   [2] SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA
 A8D0..A8D9    ; Saurashtra # Nd  [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE

-# Total code points: 81
+# Total code points: 82

 # ================================================

@ -2314,8 +2342,9 @@ ABF0..ABF9    ; Meetei_Mayek # Nd  [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI
 11235         ; Khojki # Mc       KHOJKI SIGN VIRAMA
 11236..11237  ; Khojki # Mn   [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
 11238..1123D  ; Khojki # Po   [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
+1123E         ; Khojki # Mn       KHOJKI SIGN SUKUN

-# Total code points: 61
+# Total code points: 62

 # ================================================

@ -2536,4 +2565,129 @@ ABF0..ABF9    ; Meetei_Mayek # Nd  [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DI

 # Total code points: 672

+# ================================================
+
+1E900..1E943  ; Adlam # L&  [68] ADLAM CAPITAL LETTER ALIF..ADLAM SMALL LETTER SHA
+1E944..1E94A  ; Adlam # Mn   [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
+1E950..1E959  ; Adlam # Nd  [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE
+1E95E..1E95F  ; Adlam # Po   [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK
+
+# Total code points: 87
+
+# ================================================
+
+11C00..11C08  ; Bhaiksuki # Lo   [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L
+11C0A..11C2E  ; Bhaiksuki # Lo  [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA
+11C2F         ; Bhaiksuki # Mc       BHAIKSUKI VOWEL SIGN AA
+11C30..11C36  ; Bhaiksuki # Mn   [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L
+11C38..11C3D  ; Bhaiksuki # Mn   [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA
+11C3E         ; Bhaiksuki # Mc       BHAIKSUKI SIGN VISARGA
+11C3F         ; Bhaiksuki # Mn       BHAIKSUKI SIGN VIRAMA
+11C40         ; Bhaiksuki # Lo       BHAIKSUKI SIGN AVAGRAHA
+11C41..11C45  ; Bhaiksuki # Po   [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2
+11C50..11C59  ; Bhaiksuki # Nd  [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE
+11C5A..11C6C  ; Bhaiksuki # No  [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK
+
+# Total code points: 97
+
+# ================================================
+
+11C70..11C71  ; Marchen # Po   [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD
+11C72..11C8F  ; Marchen # Lo  [30] MARCHEN LETTER KA..MARCHEN LETTER A
+11C92..11CA7  ; Marchen # Mn  [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA
+11CA9         ; Marchen # Mc       MARCHEN SUBJOINED LETTER YA
+11CAA..11CB0  ; Marchen # Mn   [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA
+11CB1         ; Marchen # Mc       MARCHEN VOWEL SIGN I
+11CB2..11CB3  ; Marchen # Mn   [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E
+11CB4         ; Marchen # Mc       MARCHEN VOWEL SIGN O
+11CB5..11CB6  ; Marchen # Mn   [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU
+
+# Total code points: 68
+
+# ================================================
+
+11400..11434  ; Newa # Lo  [53] NEWA LETTER A..NEWA LETTER HA
+11435..11437  ; Newa # Mc   [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II
+11438..1143F  ; Newa # Mn   [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI
+11440..11441  ; Newa # Mc   [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU
+11442..11444  ; Newa # Mn   [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA
+11445         ; Newa # Mc       NEWA SIGN VISARGA
+11446         ; Newa # Mn       NEWA SIGN NUKTA
+11447..1144A  ; Newa # Lo   [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI
+1144B..1144F  ; Newa # Po   [5] NEWA DANDA..NEWA ABBREVIATION SIGN
+11450..11459  ; Newa # Nd  [10] NEWA DIGIT ZERO..NEWA DIGIT NINE
+1145B         ; Newa # Po       NEWA PLACEHOLDER MARK
+1145D         ; Newa # Po       NEWA INSERTION SIGN
+
+# Total code points: 92
+
+# ================================================
+
+104B0..104D3  ; Osage # L&  [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA
+104D8..104FB  ; Osage # L&  [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA
+
+# Total code points: 72
+
+# ================================================
+
+16FE0         ; Tangut # Lm       TANGUT ITERATION MARK
+17000..187EC  ; Tangut # Lo [6125] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187EC
+18800..18AF2  ; Tangut # Lo [755] TANGUT COMPONENT-001..TANGUT COMPONENT-755
+
+# Total code points: 6881
+
+# ================================================
+
+11D00..11D06  ; Masaram_Gondi # Lo   [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E
+11D08..11D09  ; Masaram_Gondi # Lo   [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O
+11D0B..11D30  ; Masaram_Gondi # Lo  [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA
+11D31..11D36  ; Masaram_Gondi # Mn   [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R
+11D3A         ; Masaram_Gondi # Mn       MASARAM GONDI VOWEL SIGN E
+11D3C..11D3D  ; Masaram_Gondi # Mn   [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O
+11D3F..11D45  ; Masaram_Gondi # Mn   [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA
+11D46         ; Masaram_Gondi # Lo       MASARAM GONDI REPHA
+11D47         ; Masaram_Gondi # Mn       MASARAM GONDI RA-KARA
+11D50..11D59  ; Masaram_Gondi # Nd  [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE
+
+# Total code points: 75
+
+# ================================================
+
+16FE1         ; Nushu # Lm       NUSHU ITERATION MARK
+1B170..1B2FB  ; Nushu # Lo [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB
+
+# Total code points: 397
+
+# ================================================
+
+11A50         ; Soyombo # Lo       SOYOMBO LETTER A
+11A51..11A56  ; Soyombo # Mn   [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE
+11A57..11A58  ; Soyombo # Mc   [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU
+11A59..11A5B  ; Soyombo # Mn   [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK
+11A5C..11A83  ; Soyombo # Lo  [40] SOYOMBO LETTER KA..SOYOMBO LETTER KSSA
+11A86..11A89  ; Soyombo # Lo   [4] SOYOMBO CLUSTER-INITIAL LETTER RA..SOYOMBO CLUSTER-INITIAL LETTER SA
+11A8A..11A96  ; Soyombo # Mn  [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA
+11A97         ; Soyombo # Mc       SOYOMBO SIGN VISARGA
+11A98..11A99  ; Soyombo # Mn   [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER
+11A9A..11A9C  ; Soyombo # Po   [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD
+11A9E..11AA2  ; Soyombo # Po   [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2
+
+# Total code points: 80
+
+# ================================================
+
+11A00         ; Zanabazar_Square # Lo       ZANABAZAR SQUARE LETTER A
+11A01..11A06  ; Zanabazar_Square # Mn   [6] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL SIGN O
+11A07..11A08  ; Zanabazar_Square # Mc   [2] ZANABAZAR SQUARE VOWEL SIGN AI..ZANABAZAR SQUARE VOWEL SIGN AU
+11A09..11A0A  ; Zanabazar_Square # Mn   [2] ZANABAZAR SQUARE VOWEL SIGN REVERSED I..ZANABAZAR SQUARE VOWEL LENGTH MARK
+11A0B..11A32  ; Zanabazar_Square # Lo  [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA
+11A33..11A38  ; Zanabazar_Square # Mn   [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA
+11A39         ; Zanabazar_Square # Mc       ZANABAZAR SQUARE SIGN VISARGA
+11A3A         ; Zanabazar_Square # Lo       ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA
+11A3B..11A3E  ; Zanabazar_Square # Mn   [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA
+11A3F..11A46  ; Zanabazar_Square # Po   [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK
+11A47         ; Zanabazar_Square # Mn       ZANABAZAR SQUARE SUBJOINER
+
+# Total code points: 72
+
 # EOF
--- a/maint/Unicode.tables/UnicodeData.txt
+++ b/maint/Unicode.tables/UnicodeData.txt
--- a/src/pcre2_tables.c
+++ b/src/pcre2_tables.c
@ -192,7 +192,12 @@ const uint32_t PRIV(ucp_gbtable)[] = {

   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbT),   /* 10 LVT */
   (1<<ucp_gbRegionalIndicator),                            /* 11 RegionalIndicator */
-   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)                 /* 12 Other */
+   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark),                /* 12 Other */
+   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark),                /* 13 E_Base */
+   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark),                /* 14 E_Modifier */
+   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark),                /* 15 E_Base_GAZ */
+   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark),                /* 16 ZWJ */
+   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)                 /* 12 Glue_After_Zwj */
 };

 #ifdef SUPPORT_JIT
@ -227,6 +232,7 @@ version. Like all other character and string literals that are compared against
 the regular expression pattern, we must use STR_ macros instead of literal
 strings to make sure that UTF-8 support works on EBCDIC platforms. */

+#define STRING_Adlam0 STR_A STR_d STR_l STR_a STR_m "\0"
 #define STRING_Ahom0 STR_A STR_h STR_o STR_m "\0"
 #define STRING_Anatolian_Hieroglyphs0 STR_A STR_n STR_a STR_t STR_o STR_l STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
 #define STRING_Any0 STR_A STR_n STR_y "\0"
@ -238,6 +244,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0"
 #define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0"
 #define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0"
+#define STRING_Bhaiksuki0 STR_B STR_h STR_a STR_i STR_k STR_s STR_u STR_k STR_i "\0"
 #define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0"
 #define STRING_Brahmi0 STR_B STR_r STR_a STR_h STR_m STR_i "\0"
 #define STRING_Braille0 STR_B STR_r STR_a STR_i STR_l STR_l STR_e "\0"
@ -313,6 +320,8 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
 #define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
 #define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
+#define STRING_Marchen0 STR_M STR_a STR_r STR_c STR_h STR_e STR_n "\0"
+#define STRING_Masaram_Gondi0 STR_M STR_a STR_s STR_a STR_r STR_a STR_m STR_UNDERSCORE STR_G STR_o STR_n STR_d STR_i "\0"
 #define STRING_Mc0 STR_M STR_c "\0"
 #define STRING_Me0 STR_M STR_e "\0"
 #define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
@ -330,9 +339,11 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0"
 #define STRING_Nd0 STR_N STR_d "\0"
 #define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0"
+#define STRING_Newa0 STR_N STR_e STR_w STR_a "\0"
 #define STRING_Nko0 STR_N STR_k STR_o "\0"
 #define STRING_Nl0 STR_N STR_l "\0"
 #define STRING_No0 STR_N STR_o "\0"
+#define STRING_Nushu0 STR_N STR_u STR_s STR_h STR_u "\0"
 #define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0"
 #define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0"
 #define STRING_Old_Hungarian0 STR_O STR_l STR_d STR_UNDERSCORE STR_H STR_u STR_n STR_g STR_a STR_r STR_i STR_a STR_n "\0"
@ -343,6 +354,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
 #define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
 #define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
+#define STRING_Osage0 STR_O STR_s STR_a STR_g STR_e "\0"
 #define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0"
 #define STRING_P0 STR_P "\0"
 #define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0"
@ -373,6 +385,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Sm0 STR_S STR_m "\0"
 #define STRING_So0 STR_S STR_o "\0"
 #define STRING_Sora_Sompeng0 STR_S STR_o STR_r STR_a STR_UNDERSCORE STR_S STR_o STR_m STR_p STR_e STR_n STR_g "\0"
+#define STRING_Soyombo0 STR_S STR_o STR_y STR_o STR_m STR_b STR_o "\0"
 #define STRING_Sundanese0 STR_S STR_u STR_n STR_d STR_a STR_n STR_e STR_s STR_e "\0"
 #define STRING_Syloti_Nagri0 STR_S STR_y STR_l STR_o STR_t STR_i STR_UNDERSCORE STR_N STR_a STR_g STR_r STR_i "\0"
 #define STRING_Syriac0 STR_S STR_y STR_r STR_i STR_a STR_c "\0"
@ -383,6 +396,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Tai_Viet0 STR_T STR_a STR_i STR_UNDERSCORE STR_V STR_i STR_e STR_t "\0"
 #define STRING_Takri0 STR_T STR_a STR_k STR_r STR_i "\0"
 #define STRING_Tamil0 STR_T STR_a STR_m STR_i STR_l "\0"
+#define STRING_Tangut0 STR_T STR_a STR_n STR_g STR_u STR_t "\0"
 #define STRING_Telugu0 STR_T STR_e STR_l STR_u STR_g STR_u "\0"
 #define STRING_Thaana0 STR_T STR_h STR_a STR_a STR_n STR_a "\0"
 #define STRING_Thai0 STR_T STR_h STR_a STR_i "\0"
@ -399,11 +413,13 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Xwd0 STR_X STR_w STR_d "\0"
 #define STRING_Yi0 STR_Y STR_i "\0"
 #define STRING_Z0 STR_Z "\0"
+#define STRING_Zanabazar_Square0 STR_Z STR_a STR_n STR_a STR_b STR_a STR_z STR_a STR_r STR_UNDERSCORE STR_S STR_q STR_u STR_a STR_r STR_e "\0"
 #define STRING_Zl0 STR_Z STR_l "\0"
 #define STRING_Zp0 STR_Z STR_p "\0"
 #define STRING_Zs0 STR_Z STR_s "\0"

 const char PRIV(utt_names)[] =
+  STRING_Adlam0
  STRING_Ahom0
  STRING_Anatolian_Hieroglyphs0
  STRING_Any0
@ -415,6 +431,7 @@ const char PRIV(utt_names)[] =
  STRING_Bassa_Vah0
  STRING_Batak0
  STRING_Bengali0
+  STRING_Bhaiksuki0
  STRING_Bopomofo0
  STRING_Brahmi0
  STRING_Braille0
@ -490,6 +507,8 @@ const char PRIV(utt_names)[] =
  STRING_Malayalam0
  STRING_Mandaic0
  STRING_Manichaean0
+  STRING_Marchen0
+  STRING_Masaram_Gondi0
  STRING_Mc0
  STRING_Me0
  STRING_Meetei_Mayek0
@ -507,9 +526,11 @@ const char PRIV(utt_names)[] =
  STRING_Nabataean0
  STRING_Nd0
  STRING_New_Tai_Lue0
+  STRING_Newa0
  STRING_Nko0
  STRING_Nl0
  STRING_No0
+  STRING_Nushu0
  STRING_Ogham0
  STRING_Ol_Chiki0
  STRING_Old_Hungarian0
@ -520,6 +541,7 @@ const char PRIV(utt_names)[] =
  STRING_Old_South_Arabian0
  STRING_Old_Turkic0
  STRING_Oriya0
+  STRING_Osage0
  STRING_Osmanya0
  STRING_P0
  STRING_Pahawh_Hmong0
@ -550,6 +572,7 @@ const char PRIV(utt_names)[] =
  STRING_Sm0
  STRING_So0
  STRING_Sora_Sompeng0
+  STRING_Soyombo0
  STRING_Sundanese0
  STRING_Syloti_Nagri0
  STRING_Syriac0
@ -560,6 +583,7 @@ const char PRIV(utt_names)[] =
  STRING_Tai_Viet0
  STRING_Takri0
  STRING_Tamil0
+  STRING_Tangut0
  STRING_Telugu0
  STRING_Thaana0
  STRING_Thai0
@ -576,186 +600,197 @@ const char PRIV(utt_names)[] =
  STRING_Xwd0
  STRING_Yi0
  STRING_Z0
+  STRING_Zanabazar_Square0
  STRING_Zl0
  STRING_Zp0
  STRING_Zs0;

 const ucp_type_table PRIV(utt)[] = {
-  {   0, PT_SC, ucp_Ahom },
-  {   5, PT_SC, ucp_Anatolian_Hieroglyphs },
-  {  27, PT_ANY, 0 },
-  {  31, PT_SC, ucp_Arabic },
-  {  38, PT_SC, ucp_Armenian },
-  {  47, PT_SC, ucp_Avestan },
-  {  55, PT_SC, ucp_Balinese },
-  {  64, PT_SC, ucp_Bamum },
-  {  70, PT_SC, ucp_Bassa_Vah },
-  {  80, PT_SC, ucp_Batak },
-  {  86, PT_SC, ucp_Bengali },
-  {  94, PT_SC, ucp_Bopomofo },
-  { 103, PT_SC, ucp_Brahmi },
-  { 110, PT_SC, ucp_Braille },
-  { 118, PT_SC, ucp_Buginese },
-  { 127, PT_SC, ucp_Buhid },
-  { 133, PT_GC, ucp_C },
-  { 135, PT_SC, ucp_Canadian_Aboriginal },
-  { 155, PT_SC, ucp_Carian },
-  { 162, PT_SC, ucp_Caucasian_Albanian },
-  { 181, PT_PC, ucp_Cc },
-  { 184, PT_PC, ucp_Cf },
-  { 187, PT_SC, ucp_Chakma },
-  { 194, PT_SC, ucp_Cham },
-  { 199, PT_SC, ucp_Cherokee },
-  { 208, PT_PC, ucp_Cn },
-  { 211, PT_PC, ucp_Co },
-  { 214, PT_SC, ucp_Common },
-  { 221, PT_SC, ucp_Coptic },
-  { 228, PT_PC, ucp_Cs },
-  { 231, PT_SC, ucp_Cuneiform },
-  { 241, PT_SC, ucp_Cypriot },
-  { 249, PT_SC, ucp_Cyrillic },
-  { 258, PT_SC, ucp_Deseret },
-  { 266, PT_SC, ucp_Devanagari },
-  { 277, PT_SC, ucp_Duployan },
-  { 286, PT_SC, ucp_Egyptian_Hieroglyphs },
-  { 307, PT_SC, ucp_Elbasan },
-  { 315, PT_SC, ucp_Ethiopic },
-  { 324, PT_SC, ucp_Georgian },
-  { 333, PT_SC, ucp_Glagolitic },
-  { 344, PT_SC, ucp_Gothic },
-  { 351, PT_SC, ucp_Grantha },
-  { 359, PT_SC, ucp_Greek },
-  { 365, PT_SC, ucp_Gujarati },
-  { 374, PT_SC, ucp_Gurmukhi },
-  { 383, PT_SC, ucp_Han },
-  { 387, PT_SC, ucp_Hangul },
-  { 394, PT_SC, ucp_Hanunoo },
-  { 402, PT_SC, ucp_Hatran },
-  { 409, PT_SC, ucp_Hebrew },
-  { 416, PT_SC, ucp_Hiragana },
-  { 425, PT_SC, ucp_Imperial_Aramaic },
-  { 442, PT_SC, ucp_Inherited },
-  { 452, PT_SC, ucp_Inscriptional_Pahlavi },
-  { 474, PT_SC, ucp_Inscriptional_Parthian },
-  { 497, PT_SC, ucp_Javanese },
-  { 506, PT_SC, ucp_Kaithi },
-  { 513, PT_SC, ucp_Kannada },
-  { 521, PT_SC, ucp_Katakana },
-  { 530, PT_SC, ucp_Kayah_Li },
-  { 539, PT_SC, ucp_Kharoshthi },
-  { 550, PT_SC, ucp_Khmer },
-  { 556, PT_SC, ucp_Khojki },
-  { 563, PT_SC, ucp_Khudawadi },
-  { 573, PT_GC, ucp_L },
-  { 575, PT_LAMP, 0 },
-  { 578, PT_SC, ucp_Lao },
-  { 582, PT_SC, ucp_Latin },
-  { 588, PT_SC, ucp_Lepcha },
-  { 595, PT_SC, ucp_Limbu },
-  { 601, PT_SC, ucp_Linear_A },
-  { 610, PT_SC, ucp_Linear_B },
-  { 619, PT_SC, ucp_Lisu },
-  { 624, PT_PC, ucp_Ll },
-  { 627, PT_PC, ucp_Lm },
-  { 630, PT_PC, ucp_Lo },
-  { 633, PT_PC, ucp_Lt },
-  { 636, PT_PC, ucp_Lu },
-  { 639, PT_SC, ucp_Lycian },
-  { 646, PT_SC, ucp_Lydian },
-  { 653, PT_GC, ucp_M },
-  { 655, PT_SC, ucp_Mahajani },
-  { 664, PT_SC, ucp_Malayalam },
-  { 674, PT_SC, ucp_Mandaic },
-  { 682, PT_SC, ucp_Manichaean },
-  { 693, PT_PC, ucp_Mc },
-  { 696, PT_PC, ucp_Me },
-  { 699, PT_SC, ucp_Meetei_Mayek },
-  { 712, PT_SC, ucp_Mende_Kikakui },
-  { 726, PT_SC, ucp_Meroitic_Cursive },
-  { 743, PT_SC, ucp_Meroitic_Hieroglyphs },
-  { 764, PT_SC, ucp_Miao },
-  { 769, PT_PC, ucp_Mn },
-  { 772, PT_SC, ucp_Modi },
-  { 777, PT_SC, ucp_Mongolian },
-  { 787, PT_SC, ucp_Mro },
-  { 791, PT_SC, ucp_Multani },
-  { 799, PT_SC, ucp_Myanmar },
-  { 807, PT_GC, ucp_N },
-  { 809, PT_SC, ucp_Nabataean },
-  { 819, PT_PC, ucp_Nd },
-  { 822, PT_SC, ucp_New_Tai_Lue },
-  { 834, PT_SC, ucp_Nko },
-  { 838, PT_PC, ucp_Nl },
-  { 841, PT_PC, ucp_No },
-  { 844, PT_SC, ucp_Ogham },
-  { 850, PT_SC, ucp_Ol_Chiki },
-  { 859, PT_SC, ucp_Old_Hungarian },
-  { 873, PT_SC, ucp_Old_Italic },
-  { 884, PT_SC, ucp_Old_North_Arabian },
-  { 902, PT_SC, ucp_Old_Permic },
-  { 913, PT_SC, ucp_Old_Persian },
-  { 925, PT_SC, ucp_Old_South_Arabian },
-  { 943, PT_SC, ucp_Old_Turkic },
-  { 954, PT_SC, ucp_Oriya },
-  { 960, PT_SC, ucp_Osmanya },
-  { 968, PT_GC, ucp_P },
-  { 970, PT_SC, ucp_Pahawh_Hmong },
-  { 983, PT_SC, ucp_Palmyrene },
-  { 993, PT_SC, ucp_Pau_Cin_Hau },
-  { 1005, PT_PC, ucp_Pc },
-  { 1008, PT_PC, ucp_Pd },
-  { 1011, PT_PC, ucp_Pe },
-  { 1014, PT_PC, ucp_Pf },
-  { 1017, PT_SC, ucp_Phags_Pa },
-  { 1026, PT_SC, ucp_Phoenician },
-  { 1037, PT_PC, ucp_Pi },
-  { 1040, PT_PC, ucp_Po },
-  { 1043, PT_PC, ucp_Ps },
-  { 1046, PT_SC, ucp_Psalter_Pahlavi },
-  { 1062, PT_SC, ucp_Rejang },
-  { 1069, PT_SC, ucp_Runic },
-  { 1075, PT_GC, ucp_S },
-  { 1077, PT_SC, ucp_Samaritan },
-  { 1087, PT_SC, ucp_Saurashtra },
-  { 1098, PT_PC, ucp_Sc },
-  { 1101, PT_SC, ucp_Sharada },
-  { 1109, PT_SC, ucp_Shavian },
-  { 1117, PT_SC, ucp_Siddham },
-  { 1125, PT_SC, ucp_SignWriting },
-  { 1137, PT_SC, ucp_Sinhala },
-  { 1145, PT_PC, ucp_Sk },
-  { 1148, PT_PC, ucp_Sm },
-  { 1151, PT_PC, ucp_So },
-  { 1154, PT_SC, ucp_Sora_Sompeng },
-  { 1167, PT_SC, ucp_Sundanese },
-  { 1177, PT_SC, ucp_Syloti_Nagri },
-  { 1190, PT_SC, ucp_Syriac },
-  { 1197, PT_SC, ucp_Tagalog },
-  { 1205, PT_SC, ucp_Tagbanwa },
-  { 1214, PT_SC, ucp_Tai_Le },
-  { 1221, PT_SC, ucp_Tai_Tham },
-  { 1230, PT_SC, ucp_Tai_Viet },
-  { 1239, PT_SC, ucp_Takri },
-  { 1245, PT_SC, ucp_Tamil },
-  { 1251, PT_SC, ucp_Telugu },
-  { 1258, PT_SC, ucp_Thaana },
-  { 1265, PT_SC, ucp_Thai },
-  { 1270, PT_SC, ucp_Tibetan },
-  { 1278, PT_SC, ucp_Tifinagh },
-  { 1287, PT_SC, ucp_Tirhuta },
-  { 1295, PT_SC, ucp_Ugaritic },
-  { 1304, PT_SC, ucp_Vai },
-  { 1308, PT_SC, ucp_Warang_Citi },
-  { 1320, PT_ALNUM, 0 },
-  { 1324, PT_PXSPACE, 0 },
-  { 1328, PT_SPACE, 0 },
-  { 1332, PT_UCNC, 0 },
-  { 1336, PT_WORD, 0 },
-  { 1340, PT_SC, ucp_Yi },
-  { 1343, PT_GC, ucp_Z },
-  { 1345, PT_PC, ucp_Zl },
-  { 1348, PT_PC, ucp_Zp },
-  { 1351, PT_PC, ucp_Zs }
+  {   0, PT_SC, ucp_Adlam },
+  {   6, PT_SC, ucp_Ahom },
+  {  11, PT_SC, ucp_Anatolian_Hieroglyphs },
+  {  33, PT_ANY, 0 },
+  {  37, PT_SC, ucp_Arabic },
+  {  44, PT_SC, ucp_Armenian },
+  {  53, PT_SC, ucp_Avestan },
+  {  61, PT_SC, ucp_Balinese },
+  {  70, PT_SC, ucp_Bamum },
+  {  76, PT_SC, ucp_Bassa_Vah },
+  {  86, PT_SC, ucp_Batak },
+  {  92, PT_SC, ucp_Bengali },
+  { 100, PT_SC, ucp_Bhaiksuki },
+  { 110, PT_SC, ucp_Bopomofo },
+  { 119, PT_SC, ucp_Brahmi },
+  { 126, PT_SC, ucp_Braille },
+  { 134, PT_SC, ucp_Buginese },
+  { 143, PT_SC, ucp_Buhid },
+  { 149, PT_GC, ucp_C },
+  { 151, PT_SC, ucp_Canadian_Aboriginal },
+  { 171, PT_SC, ucp_Carian },
+  { 178, PT_SC, ucp_Caucasian_Albanian },
+  { 197, PT_PC, ucp_Cc },
+  { 200, PT_PC, ucp_Cf },
+  { 203, PT_SC, ucp_Chakma },
+  { 210, PT_SC, ucp_Cham },
+  { 215, PT_SC, ucp_Cherokee },
+  { 224, PT_PC, ucp_Cn },
+  { 227, PT_PC, ucp_Co },
+  { 230, PT_SC, ucp_Common },
+  { 237, PT_SC, ucp_Coptic },
+  { 244, PT_PC, ucp_Cs },
+  { 247, PT_SC, ucp_Cuneiform },
+  { 257, PT_SC, ucp_Cypriot },
+  { 265, PT_SC, ucp_Cyrillic },
+  { 274, PT_SC, ucp_Deseret },
+  { 282, PT_SC, ucp_Devanagari },
+  { 293, PT_SC, ucp_Duployan },
+  { 302, PT_SC, ucp_Egyptian_Hieroglyphs },
+  { 323, PT_SC, ucp_Elbasan },
+  { 331, PT_SC, ucp_Ethiopic },
+  { 340, PT_SC, ucp_Georgian },
+  { 349, PT_SC, ucp_Glagolitic },
+  { 360, PT_SC, ucp_Gothic },
+  { 367, PT_SC, ucp_Grantha },
+  { 375, PT_SC, ucp_Greek },
+  { 381, PT_SC, ucp_Gujarati },
+  { 390, PT_SC, ucp_Gurmukhi },
+  { 399, PT_SC, ucp_Han },
+  { 403, PT_SC, ucp_Hangul },
+  { 410, PT_SC, ucp_Hanunoo },
+  { 418, PT_SC, ucp_Hatran },
+  { 425, PT_SC, ucp_Hebrew },
+  { 432, PT_SC, ucp_Hiragana },
+  { 441, PT_SC, ucp_Imperial_Aramaic },
+  { 458, PT_SC, ucp_Inherited },
+  { 468, PT_SC, ucp_Inscriptional_Pahlavi },
+  { 490, PT_SC, ucp_Inscriptional_Parthian },
+  { 513, PT_SC, ucp_Javanese },
+  { 522, PT_SC, ucp_Kaithi },
+  { 529, PT_SC, ucp_Kannada },
+  { 537, PT_SC, ucp_Katakana },
+  { 546, PT_SC, ucp_Kayah_Li },
+  { 555, PT_SC, ucp_Kharoshthi },
+  { 566, PT_SC, ucp_Khmer },
+  { 572, PT_SC, ucp_Khojki },
+  { 579, PT_SC, ucp_Khudawadi },
+  { 589, PT_GC, ucp_L },
+  { 591, PT_LAMP, 0 },
+  { 594, PT_SC, ucp_Lao },
+  { 598, PT_SC, ucp_Latin },
+  { 604, PT_SC, ucp_Lepcha },
+  { 611, PT_SC, ucp_Limbu },
+  { 617, PT_SC, ucp_Linear_A },
+  { 626, PT_SC, ucp_Linear_B },
+  { 635, PT_SC, ucp_Lisu },
+  { 640, PT_PC, ucp_Ll },
+  { 643, PT_PC, ucp_Lm },
+  { 646, PT_PC, ucp_Lo },
+  { 649, PT_PC, ucp_Lt },
+  { 652, PT_PC, ucp_Lu },
+  { 655, PT_SC, ucp_Lycian },
+  { 662, PT_SC, ucp_Lydian },
+  { 669, PT_GC, ucp_M },
+  { 671, PT_SC, ucp_Mahajani },
+  { 680, PT_SC, ucp_Malayalam },
+  { 690, PT_SC, ucp_Mandaic },
+  { 698, PT_SC, ucp_Manichaean },
+  { 709, PT_SC, ucp_Marchen },
+  { 717, PT_SC, ucp_Masaram_Gondi },
+  { 731, PT_PC, ucp_Mc },
+  { 734, PT_PC, ucp_Me },
+  { 737, PT_SC, ucp_Meetei_Mayek },
+  { 750, PT_SC, ucp_Mende_Kikakui },
+  { 764, PT_SC, ucp_Meroitic_Cursive },
+  { 781, PT_SC, ucp_Meroitic_Hieroglyphs },
+  { 802, PT_SC, ucp_Miao },
+  { 807, PT_PC, ucp_Mn },
+  { 810, PT_SC, ucp_Modi },
+  { 815, PT_SC, ucp_Mongolian },
+  { 825, PT_SC, ucp_Mro },
+  { 829, PT_SC, ucp_Multani },
+  { 837, PT_SC, ucp_Myanmar },
+  { 845, PT_GC, ucp_N },
+  { 847, PT_SC, ucp_Nabataean },
+  { 857, PT_PC, ucp_Nd },
+  { 860, PT_SC, ucp_New_Tai_Lue },
+  { 872, PT_SC, ucp_Newa },
+  { 877, PT_SC, ucp_Nko },
+  { 881, PT_PC, ucp_Nl },
+  { 884, PT_PC, ucp_No },
+  { 887, PT_SC, ucp_Nushu },
+  { 893, PT_SC, ucp_Ogham },
+  { 899, PT_SC, ucp_Ol_Chiki },
+  { 908, PT_SC, ucp_Old_Hungarian },
+  { 922, PT_SC, ucp_Old_Italic },
+  { 933, PT_SC, ucp_Old_North_Arabian },
+  { 951, PT_SC, ucp_Old_Permic },
+  { 962, PT_SC, ucp_Old_Persian },
+  { 974, PT_SC, ucp_Old_South_Arabian },
+  { 992, PT_SC, ucp_Old_Turkic },
+  { 1003, PT_SC, ucp_Oriya },
+  { 1009, PT_SC, ucp_Osage },
+  { 1015, PT_SC, ucp_Osmanya },
+  { 1023, PT_GC, ucp_P },
+  { 1025, PT_SC, ucp_Pahawh_Hmong },
+  { 1038, PT_SC, ucp_Palmyrene },
+  { 1048, PT_SC, ucp_Pau_Cin_Hau },
+  { 1060, PT_PC, ucp_Pc },
+  { 1063, PT_PC, ucp_Pd },
+  { 1066, PT_PC, ucp_Pe },
+  { 1069, PT_PC, ucp_Pf },
+  { 1072, PT_SC, ucp_Phags_Pa },
+  { 1081, PT_SC, ucp_Phoenician },
+  { 1092, PT_PC, ucp_Pi },
+  { 1095, PT_PC, ucp_Po },
+  { 1098, PT_PC, ucp_Ps },
+  { 1101, PT_SC, ucp_Psalter_Pahlavi },
+  { 1117, PT_SC, ucp_Rejang },
+  { 1124, PT_SC, ucp_Runic },
+  { 1130, PT_GC, ucp_S },
+  { 1132, PT_SC, ucp_Samaritan },
+  { 1142, PT_SC, ucp_Saurashtra },
+  { 1153, PT_PC, ucp_Sc },
+  { 1156, PT_SC, ucp_Sharada },
+  { 1164, PT_SC, ucp_Shavian },
+  { 1172, PT_SC, ucp_Siddham },
+  { 1180, PT_SC, ucp_SignWriting },
+  { 1192, PT_SC, ucp_Sinhala },
+  { 1200, PT_PC, ucp_Sk },
+  { 1203, PT_PC, ucp_Sm },
+  { 1206, PT_PC, ucp_So },
+  { 1209, PT_SC, ucp_Sora_Sompeng },
+  { 1222, PT_SC, ucp_Soyombo },
+  { 1230, PT_SC, ucp_Sundanese },
+  { 1240, PT_SC, ucp_Syloti_Nagri },
+  { 1253, PT_SC, ucp_Syriac },
+  { 1260, PT_SC, ucp_Tagalog },
+  { 1268, PT_SC, ucp_Tagbanwa },
+  { 1277, PT_SC, ucp_Tai_Le },
+  { 1284, PT_SC, ucp_Tai_Tham },
+  { 1293, PT_SC, ucp_Tai_Viet },
+  { 1302, PT_SC, ucp_Takri },
+  { 1308, PT_SC, ucp_Tamil },
+  { 1314, PT_SC, ucp_Tangut },
+  { 1321, PT_SC, ucp_Telugu },
+  { 1328, PT_SC, ucp_Thaana },
+  { 1335, PT_SC, ucp_Thai },
+  { 1340, PT_SC, ucp_Tibetan },
+  { 1348, PT_SC, ucp_Tifinagh },
+  { 1357, PT_SC, ucp_Tirhuta },
+  { 1365, PT_SC, ucp_Ugaritic },
+  { 1374, PT_SC, ucp_Vai },
+  { 1378, PT_SC, ucp_Warang_Citi },
+  { 1390, PT_ALNUM, 0 },
+  { 1394, PT_PXSPACE, 0 },
+  { 1398, PT_SPACE, 0 },
+  { 1402, PT_UCNC, 0 },
+  { 1406, PT_WORD, 0 },
+  { 1410, PT_SC, ucp_Yi },
+  { 1413, PT_GC, ucp_Z },
+  { 1415, PT_SC, ucp_Zanabazar_Square },
+  { 1432, PT_PC, ucp_Zl },
+  { 1435, PT_PC, ucp_Zp },
+  { 1438, PT_PC, ucp_Zs }
 };

 const size_t PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);
--- a/src/pcre2_ucd.c
+++ b/src/pcre2_ucd.c
--- a/src/pcre2_ucp.h
+++ b/src/pcre2_ucp.h
@ -100,9 +100,7 @@ enum {
  ucp_Zs     /* Space separator */
 };

-/* These are grapheme break properties. Note that the code for processing them
-assumes that the values are less than 16. If more values are added that take
-the number to 16 or more, the code will have to be rewritten. */
+/* These are grapheme break properties. */

 enum {
  ucp_gbCR,                /*  0 */
@ -117,7 +115,12 @@ enum {
  ucp_gbLV,                /*  9 Hangul syllable type LV */
  ucp_gbLVT,               /* 10 Hangul syllable type LVT */
  ucp_gbRegionalIndicator, /* 11 */
-  ucp_gbOther              /* 12 */
+  ucp_gbOther,             /* 12 */
+  ucp_gbE_Base,            /* 13 */
+  ucp_gbE_Modifier,        /* 14 */
+  ucp_gbE_Base_GAZ,        /* 15 */
+  ucp_gbZWJ,               /* 16 */
+  ucp_gbGlue_After_Zwj     /* 17 */
 };

 /* These are the script identifications. */
@ -184,13 +187,13 @@ enum {
  ucp_Tifinagh,
  ucp_Ugaritic,
  ucp_Yi,
-  /* New for Unicode 5.0: */
+  /* New for Unicode 5.0 */
  ucp_Balinese,
  ucp_Cuneiform,
  ucp_Nko,
  ucp_Phags_Pa,
  ucp_Phoenician,
-  /* New for Unicode 5.1: */
+  /* New for Unicode 5.1 */
  ucp_Carian,
  ucp_Cham,
  ucp_Kayah_Li,
@ -202,7 +205,7 @@ enum {
  ucp_Saurashtra,
  ucp_Sundanese,
  ucp_Vai,
-  /* New for Unicode 5.2: */
+  /* New for Unicode 5.2 */
  ucp_Avestan,
  ucp_Bamum,
  ucp_Egyptian_Hieroglyphs,
@ -218,11 +221,11 @@ enum {
  ucp_Samaritan,
  ucp_Tai_Tham,
  ucp_Tai_Viet,
-  /* New for Unicode 6.0.0: */
+  /* New for Unicode 6.0.0 */
  ucp_Batak,
  ucp_Brahmi,
  ucp_Mandaic,
-  /* New for Unicode 6.1.0: */
+  /* New for Unicode 6.1.0 */
  ucp_Chakma,
  ucp_Meroitic_Cursive,
  ucp_Meroitic_Hieroglyphs,
@ -230,7 +233,7 @@ enum {
  ucp_Sharada,
  ucp_Sora_Sompeng,
  ucp_Takri,
-  /* New for Unicode 7.0.0: */
+  /* New for Unicode 7.0.0 */
  ucp_Bassa_Vah,
  ucp_Caucasian_Albanian,
  ucp_Duployan,
@ -254,13 +257,24 @@ enum {
  ucp_Siddham,
  ucp_Tirhuta,
  ucp_Warang_Citi,
-  /* New for Unicode 8.0.0: */
+  /* New for Unicode 8.0.0 */
  ucp_Ahom,
  ucp_Anatolian_Hieroglyphs,
  ucp_Hatran,
  ucp_Multani,
  ucp_Old_Hungarian,
-  ucp_SignWriting
+  ucp_SignWriting,
+  /* New for Unicode 10.0.0 (no update since 8.0.0) */ 
+  ucp_Adlam,
+  ucp_Bhaiksuki,
+  ucp_Marchen,
+  ucp_Newa,
+  ucp_Osage,
+  ucp_Tangut,
+  ucp_Masaram_Gondi,
+  ucp_Nushu, 
+  ucp_Soyombo,
+  ucp_Zanabazar_Square
 };

 #endif  /* PCRE2_UCP_H_IDEMPOTENT_GUARD */
--- a/src/pcre2test.c
+++ b/src/pcre2test.c
@ -473,6 +473,12 @@ so many of them that they are split into two fields. */
 #define CTL_UTF8_INPUT                   0x40000000u
 #define CTL_ZERO_TERMINATE               0x80000000u

+/* Combinations */
+
+#define CTL_DEBUG            (CTL_FULLBINCODE|CTL_INFO)  /* For setting */
+#define CTL_ANYINFO          (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
+#define CTL_ANYGLOB          (CTL_ALTGLOBAL|CTL_GLOBAL)
+
 /* Second control word */

 #define CTL2_SUBSTITUTE_EXTENDED         0x00000001u
@ -480,15 +486,10 @@ so many of them that they are split into two fields. */
 #define CTL2_SUBSTITUTE_UNKNOWN_UNSET    0x00000004u
 #define CTL2_SUBSTITUTE_UNSET_EMPTY      0x00000008u
 #define CTL2_SUBJECT_LITERAL             0x00000010u
+#define CTL2_CALLOUT_NO_WHERE            0x00000020u

-#define CTL_NL_SET                       0x40000000u  /* Informational */
-#define CTL_BSR_SET                      0x80000000u  /* Informational */
-
-/* Combinations */
-
-#define CTL_DEBUG            (CTL_FULLBINCODE|CTL_INFO)  /* For setting */
-#define CTL_ANYINFO          (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO)
-#define CTL_ANYGLOB          (CTL_ALTGLOBAL|CTL_GLOBAL)
+#define CTL2_NL_SET                      0x40000000u  /* Informational */
+#define CTL2_BSR_SET                     0x80000000u  /* Informational */

 /* These are the matching controls that may be set either on a pattern or on a
 data line. They are copied from the pattern controls as initial settings for
@ -601,6 +602,7 @@ static modstruct modlist[] = {
  { "callout_error",              MOD_DAT,  MOD_IN2, 0,                          DO(cerror) },
  { "callout_fail",               MOD_DAT,  MOD_IN2, 0,                          DO(cfail) },
  { "callout_info",               MOD_PAT,  MOD_CTL, CTL_CALLOUT_INFO,           PO(control) },
+  { "callout_no_where",           MOD_DAT,  MOD_CTL, CTL2_CALLOUT_NO_WHERE,      DO(control2) },
  { "callout_none",               MOD_DAT,  MOD_CTL, CTL_CALLOUT_NONE,           DO(control) },
  { "caseless",                   MOD_PATP, MOD_OPT, PCRE2_CASELESS,             PO(options) },
  { "convert",                    MOD_PAT,  MOD_CON, 0,                          PO(convert_type) },
@ -723,7 +725,7 @@ static modstruct modlist[] = {
  CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \
  CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)

-#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL_BSR_SET|CTL_NL_SET)
+#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_NL_SET)

 /* Controls that apply only at compile time with 'push'. */

@ -3688,8 +3690,8 @@ for (;;)
 #else
      *((uint16_t *)field) = PCRE2_BSR_UNICODE;
 #endif
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_BSR_SET;
-        else dctl->control2 &= ~CTL_BSR_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_BSR_SET;
+        else dctl->control2 &= ~CTL2_BSR_SET;
      }
    else
      {
@ -3698,8 +3700,8 @@ for (;;)
      else if (len == 7 && strncmpic(pp, (const uint8_t *)"unicode", 7) == 0)
        *((uint16_t *)field) = PCRE2_BSR_UNICODE;
      else goto INVALID_VALUE;
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_BSR_SET;
-        else dctl->control2 |= CTL_BSR_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_BSR_SET;
+        else dctl->control2 |= CTL2_BSR_SET;
      }
    pp = ep;
    break;
@ -3792,14 +3794,14 @@ for (;;)
    if (i == 0)
      {
      *((uint16_t *)field) = NEWLINE_DEFAULT;
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL_NL_SET;
-        else dctl->control2 &= ~CTL_NL_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_NL_SET;
+        else dctl->control2 &= ~CTL2_NL_SET;
      }
    else
      {
      *((uint16_t *)field) = i;
-      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL_NL_SET;
-        else dctl->control2 |= CTL_NL_SET;
+      if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_NL_SET;
+        else dctl->control2 |= CTL2_NL_SET;
      }
    pp = ep;
    break;
@ -3971,7 +3973,7 @@ Returns:      nothing
 static void
 show_controls(uint32_t controls, uint32_t controls2, const char *before)
 {
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
  before,
  ((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
  ((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -3979,10 +3981,11 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
  ((controls & CTL_ALLUSEDTEXT) != 0)? " allusedtext" : "",
  ((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "",
  ((controls & CTL_BINCODE) != 0)? " bincode" : "",
-  ((controls2 & CTL_BSR_SET) != 0)? " bsr" : "",
+  ((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
  ((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
  ((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
  ((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
+  ((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
  ((controls & CTL_DFA) != 0)? " dfa" : "",
  ((controls & CTL_EXPAND) != 0)? " expand" : "",
  ((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "",
@ -3996,7 +3999,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
  ((controls & CTL_JITVERIFY) != 0)? " jitverify" : "",
  ((controls & CTL_MARK) != 0)? " mark" : "",
  ((controls & CTL_MEMORY) != 0)? " memory" : "",
-  ((controls2 & CTL_NL_SET) != 0)? " newline" : "",
+  ((controls2 & CTL2_NL_SET) != 0)? " newline" : "",
  ((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "",
  ((controls & CTL_POSIX) != 0)? " posix" : "",
  ((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "",
@ -4435,7 +4438,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)

  if (jchanged) fprintf(outfile, "Duplicate name status changes\n");

-  if ((pat_patctl.control2 & CTL_BSR_SET) != 0 ||
+  if ((pat_patctl.control2 & CTL2_BSR_SET) != 0 ||
      (FLD(compiled_code, flags) & PCRE2_BSR_SET) != 0)
    fprintf(outfile, "\\R matches %s\n", (bsr_convention == PCRE2_BSR_UNICODE)?
      "any Unicode newline" : "CR, LF, or CRLF");
@ -5268,7 +5271,7 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
  if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) cflags |= REG_NOSUB;
  if ((pat_patctl.options & PCRE2_UCP) != 0) cflags |= REG_UCP;
  if ((pat_patctl.options & PCRE2_CASELESS) != 0) cflags |= REG_ICASE;
-  if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC; 
+  if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC;
  if ((pat_patctl.options & PCRE2_MULTILINE) != 0) cflags |= REG_NEWLINE;
  if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
  if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
@ -5276,8 +5279,8 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
  if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0)
    {
    preg.re_endp = (char *)pbuffer8 + patlen;
-    cflags |= REG_PEND;  
-    }  
+    cflags |= REG_PEND;
+    }

  rc = regcomp(&preg, (char *)pbuffer8, cflags);

@ -5530,7 +5533,7 @@ if (test_mode == PCRE32_MODE && pbuffer32 != NULL)
 appropriate default newline setting, local_newline_default will be non-zero. We
 use this if there is no explicit newline modifier. */

-if ((pat_patctl.control2 & CTL_NL_SET) == 0 && local_newline_default != 0)
+if ((pat_patctl.control2 & CTL2_NL_SET) == 0 && local_newline_default != 0)
  {
  SETFLD(pat_context, newline_convention, local_newline_default);
  }
@ -5540,11 +5543,11 @@ NULL context. */

 use_pat_context = ((pat_patctl.control & CTL_NULLCONTEXT) != 0)?
  NULL : PTR(pat_context);
-  
+
 /* If PCRE2_LITERAL is set, set use_forbid_utf zero because PCRE2_NEVER_UTF
 and PCRE2_NEVER_UCP are invalid with it. */

-if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0; 
+if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0;

 /* Compile many times when timing. */

@ -5556,7 +5559,7 @@ if (timeit > 0)
    {
    clock_t start_time = clock();
    PCRE2_COMPILE(compiled_code, pbuffer, patlen,
-      pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset, 
+      pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
        use_pat_context);
    time_taken += clock() - start_time;
    if (TEST(compiled_code, !=, NULL))
@ -5665,7 +5668,7 @@ if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
 /* If an explicit newline modifier was given, set the information flag in the
 pattern so that it is preserved over push/pop. */

-if ((pat_patctl.control2 & CTL_NL_SET) != 0)
+if ((pat_patctl.control2 & CTL2_NL_SET) != 0)
  {
  SETFLD(compiled_code, flags, FLD(compiled_code, flags) | PCRE2_NL_SET);
  }
@ -5822,11 +5825,11 @@ return capcount;
 *************************************************/

 /* Called from a PCRE2 library as a result of the (?C) item. We print out where
-we are in the match. Yield zero unless more callouts than the fail count, or
-the callout data is not zero. The only differences in the callout block for
-different code unit widths are that the pointers to the subject, the most
-recent MARK, and a callout argument string point to strings of the appropriate
-width. Casts can be used to deal with this.
+we are in the match (unless suppressed). Yield zero unless more callouts than
+the fail count, or the callout data is not zero. The only differences in the
+callout block for different code unit widths are that the pointers to the
+subject, the most recent MARK, and a callout argument string point to strings
+of the appropriate width. Casts can be used to deal with this.

 Argument:  a pointer to a callout block
 Return:
@ -5839,6 +5842,7 @@ uint32_t i, pre_start, post_start, subject_length;
 PCRE2_SIZE current_position;
 BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
 BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
+BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;

 /* This FILE is used for echoing the subject. This is done only once in simple
 cases. */
@ -5887,75 +5891,82 @@ if (callout_capture)
    }
  }

-/* Re-print the subject in canonical form (with escapes for non-printing
-characters), the first time, or if giving full details. On subsequent calls in
-the same match, we use PCHARS() just to find the printed lengths of the
-substrings. */
+/* Unless suppressed, re-print the subject in canonical form (with escapes for
+non-printing characters), the first time, or if giving full details. On
+subsequent calls in the same match, we use PCHARS() just to find the printed
+lengths of the substrings. */

-if (f != NULL) fprintf(f, "--->");
-
-/* The subject before the match start. */
-
-PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
-
-/* If a lookbehind is involved, the current position may be earlier than the
-match start. If so, use the match start instead. */
-
-current_position = (cb->current_position >= cb->start_match)?
-  cb->current_position : cb->start_match;
-
-/* The subject between the match start and the current position. */
-
-PCHARS(post_start, cb->subject, cb->start_match,
-  current_position - cb->start_match, utf, f);
-
-/* Print from the current position to the end. */
-
-PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
-  utf, f);
-
-/* Calculate the total subject printed length (no print). */
-
-PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
-
-if (f != NULL) fprintf(f, "\n");
-
-/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
-callout whose number has not already been shown with captured strings, show the
-number here. A callout with a string argument has been displayed above. */
-
-if (cb->callout_number == 255)
+if (callout_where)
  {
-  fprintf(outfile, "%+3d ", (int)cb->pattern_position);
-  if (cb->pattern_position > 99) fprintf(outfile, "\n    ");
-  }
-else
-  {
-  if (callout_capture || cb->callout_string != NULL) fprintf(outfile, "    ");
-    else fprintf(outfile, "%3d ", cb->callout_number);
-  }

-/* Now show position indicators */
+  if (f != NULL) fprintf(f, "--->");

-for (i = 0; i < pre_start; i++) fprintf(outfile, " ");
-fprintf(outfile, "^");
+  /* The subject before the match start. */

-if (post_start > 0)
-  {
-  for (i = 0; i < post_start - 1; i++) fprintf(outfile, " ");
+  PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f);
+
+  /* If a lookbehind is involved, the current position may be earlier than the
+  match start. If so, use the match start instead. */
+
+  current_position = (cb->current_position >= cb->start_match)?
+    cb->current_position : cb->start_match;
+
+  /* The subject between the match start and the current position. */
+
+  PCHARS(post_start, cb->subject, cb->start_match,
+    current_position - cb->start_match, utf, f);
+
+  /* Print from the current position to the end. */
+
+  PCHARSV(cb->subject, current_position, cb->subject_length - current_position,
+    utf, f);
+
+  /* Calculate the total subject printed length (no print). */
+
+  PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL);
+
+  if (f != NULL) fprintf(f, "\n");
+
+  /* For automatic callouts, show the pattern offset. Otherwise, for a numerical
+  callout whose number has not already been shown with captured strings, show the
+  number here. A callout with a string argument has been displayed above. */
+
+  if (cb->callout_number == 255)
+    {
+    fprintf(outfile, "%+3d ", (int)cb->pattern_position);
+    if (cb->pattern_position > 99) fprintf(outfile, "\n    ");
+    }
+  else
+    {
+    if (callout_capture || cb->callout_string != NULL) fprintf(outfile, "    ");
+      else fprintf(outfile, "%3d ", cb->callout_number);
+    }
+
+  /* Now show position indicators */
+
+  for (i = 0; i < pre_start; i++) fprintf(outfile, " ");
  fprintf(outfile, "^");
+
+  if (post_start > 0)
+    {
+    for (i = 0; i < post_start - 1; i++) fprintf(outfile, " ");
+    fprintf(outfile, "^");
+    }
+
+  for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
+    fprintf(outfile, " ");
+
+  if (cb->next_item_length != 0)
+    fprintf(outfile, "%.*s", (int)(cb->next_item_length),
+      pbuffer8 + cb->pattern_position);
+
+  fprintf(outfile, "\n");
  }

-for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
-  fprintf(outfile, " ");
-
-if (cb->next_item_length != 0)
-  fprintf(outfile, "%.*s", (int)(cb->next_item_length),
-    pbuffer8 + cb->pattern_position);
-
-fprintf(outfile, "\n");
 first_callout = FALSE;

+/* Show any mark info */
+
 if (cb->mark != last_callout_mark)
  {
  if (cb->mark == NULL)
@ -5969,6 +5980,8 @@ if (cb->mark != last_callout_mark)
  last_callout_mark = cb->mark;
  }

+/* Show callout data */
+
 if (callout_data_ptr != NULL)
  {
  int callout_data = *((int32_t *)callout_data_ptr);
@ -5979,6 +5992,8 @@ if (callout_data_ptr != NULL)
    }
  }

+/* Keep count and give the appropriate return code */
+
 callout_count++;

 if (cb->callout_number == dat_datctl.cerror[0] &&
--- a/testdata/testinput5
+++ b/testdata/testinput5
@ -6,14 +6,16 @@
 #newline_default lf any anycrlf

 # PCRE2 and Perl disagree about the characteristics of certain Unicode
-# characters. For example, 061C is considered by Perl to be Arabic, though
-# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
-# graphic and printable according to Perl, though they are actually "isolate"
-# control characters. That is why the following tests are here rather than in
-# test 4.
+# characters. For example, 061C was considered by Perl to be Arabic, though
+# it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
+# However, it *is* in that file for Unicode 10, but when I came to re-check,
+# Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
+
+# 2066-2069 are graphic and printable according to Perl, though they are
+# actually "isolate" control characters. That is why the following tests are
+# here rather than in test 4.

 /^[\p{Arabic}]/utf
-\= Expect no match
    \x{061c}

 /^[[:graph:]]+$/utf,ucp
@ -2022,5 +2024,21 @@

 /Aሴ+B/literal,utf,no_utf_check
    Aሴ+B
+    
+# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
+# doesn't recognize all these scripts. In time these three tests can be moved
+# to test 4.
+
+/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
+ (\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
+ (\p{Zanabazar_Square}+)/x,utf
+    \x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47} 
+
+/^\x{1E900}\x{104B0}/i,utf
+    \x{1E900}\x{104B0}
+    \x{1E922}\x{104D8}
+
+/^(?:(\X)(?C))+$/utf
+    \x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where 

 # End of testinput5
--- a/testdata/testoutput5
+++ b/testdata/testoutput5
@ -6,16 +6,18 @@
 #newline_default lf any anycrlf

 # PCRE2 and Perl disagree about the characteristics of certain Unicode
-# characters. For example, 061C is considered by Perl to be Arabic, though
-# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
-# graphic and printable according to Perl, though they are actually "isolate"
-# control characters. That is why the following tests are here rather than in
-# test 4.
+# characters. For example, 061C was considered by Perl to be Arabic, though
+# it was not listed as such in the Unicode Scripts.txt file for Unicode 8.
+# However, it *is* in that file for Unicode 10, but when I came to re-check,
+# Perl had changed in the meantime, with 5.026 not recognizing it as Arabic.
+
+# 2066-2069 are graphic and printable according to Perl, though they are
+# actually "isolate" control characters. That is why the following tests are
+# here rather than in test 4.

 /^[\p{Arabic}]/utf
-\= Expect no match
    \x{061c}
-No match
+ 0: \x{61c}

 /^[[:graph:]]+$/utf,ucp
 \= Expect no match
@ -4585,5 +4587,84 @@ No match
 /Aሴ+B/literal,utf,no_utf_check
    Aሴ+B
 0: A\x{1234}+B
+    
+# These are here because I upgraded to Unicode 10.0.0 before Perl did, so it
+# doesn't recognize all these scripts. In time these three tests can be moved
+# to test 4.
+
+/^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+)
+ (\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+)
+ (\p{Zanabazar_Square}+)/x,utf
+    \x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47} 
+ 0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
+ 1: \x{1e900}\x{1e924}\x{1e953}
+ 2: \x{11c00}\x{11c2d}\x{11c3e}
+ 3: \x{11c70}\x{11c77}\x{11cab}
+ 4: \x{11400}\x{1142f}\x{11455}
+ 5: \x{104b0}\x{104d8}\x{104fb}
+ 6: \x{16fe0}\x{18800}\x{18af2}
+ 7: \x{11d00}\x{11d3a}\x{11d59}
+ 8: \x{16fe1}\x{1b170}\x{1b2fb}
+ 9: \x{11a50}\x{11a58}\x{11aa2}
+10: \x{11a00}\x{11a07}\x{11a47}
+
+/^\x{1E900}\x{104B0}/i,utf
+    \x{1E900}\x{104B0}
+ 0: \x{1e900}\x{104b0}
+    \x{1E922}\x{104D8}
+ 0: \x{1e922}\x{104d8}
+
+/^(?:(\X)(?C))+$/utf
+    \x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where 
+Callout 0: last capture = 1
+ 1: \x{1e900}
+Callout 0: last capture = 1
+ 1: \x{1e924}
+Callout 0: last capture = 1
+ 1: \x{1e953}
+Callout 0: last capture = 1
+ 1: \x{11c00}
+Callout 0: last capture = 1
+ 1: \x{11c2d}\x{11c3e}
+Callout 0: last capture = 1
+ 1: \x{11c70}
+Callout 0: last capture = 1
+ 1: \x{11c77}\x{11cab}
+Callout 0: last capture = 1
+ 1: \x{11400}
+Callout 0: last capture = 1
+ 1: \x{1142f}
+Callout 0: last capture = 1
+ 1: \x{11455}
+Callout 0: last capture = 1
+ 1: \x{104b0}
+Callout 0: last capture = 1
+ 1: \x{104d8}
+Callout 0: last capture = 1
+ 1: \x{104fb}
+Callout 0: last capture = 1
+ 1: \x{16fe0}
+Callout 0: last capture = 1
+ 1: \x{18800}
+Callout 0: last capture = 1
+ 1: \x{18af2}
+Callout 0: last capture = 1
+ 1: \x{11d00}\x{11d3a}
+Callout 0: last capture = 1
+ 1: \x{11d59}
+Callout 0: last capture = 1
+ 1: \x{16fe1}
+Callout 0: last capture = 1
+ 1: \x{1b170}
+Callout 0: last capture = 1
+ 1: \x{1b2fb}
+Callout 0: last capture = 1
+ 1: \x{11a50}\x{11a58}
+Callout 0: last capture = 1
+ 1: \x{11aa2}
+Callout 0: last capture = 1
+ 1: \x{11a00}\x{11a07}\x{11a47}
+ 0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47}
+ 1: \x{11a00}\x{11a07}\x{11a47}

 # End of testinput5