File tidies for 10.35-RC1 release candidate.

This commit is contained in:
Philip.Hazel 2020-04-15 16:34:36 +00:00
parent cf670e3bb9
commit 8b3f8af535
33 changed files with 232 additions and 207 deletions

View File

@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service, University of Cambridge Computing Service,
Cambridge, England. Cambridge, England.
Copyright (c) 1997-2019 University of Cambridge Copyright (c) 1997-2020 University of Cambridge
All rights reserved All rights reserved
@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2010-2019 Zoltan Herczeg Copyright(c) 2010-2020 Zoltan Herczeg
All rights reserved. All rights reserved.
@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2009-2019 Zoltan Herczeg Copyright(c) 2009-2020 Zoltan Herczeg
All rights reserved. All rights reserved.
#### ####

View File

@ -1,8 +1,8 @@
Change Log for PCRE2 Change Log for PCRE2
-------------------- --------------------
Version 10.35 Version 10.35 15-April-2020
------------- ---------------------------
1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT. 1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT.
@ -45,19 +45,19 @@ the minimum.
12. The JIT stack should be freed when the low-level stack allocation fails. 12. The JIT stack should be freed when the low-level stack allocation fails.
13. In pcre2grep, if the final line in a scanned file is output but does not 13. In pcre2grep, if the final line in a scanned file is output but does not
end with a newline sequence, add a newline according to the --newline setting. end with a newline sequence, add a newline according to the --newline setting.
14. (?(DEFINE)...) groups were not being handled correctly when checking for 14. (?(DEFINE)...) groups were not being handled correctly when checking for
the fixed length of a lookbehind assertion. Such a group within a lookbehind the fixed length of a lookbehind assertion. Such a group within a lookbehind
should be skipped, as it does not contribute to the length of the group. should be skipped, as it does not contribute to the length of the group.
Instead, the (DEFINE) group was being processed, and if at the end of the Instead, the (DEFINE) group was being processed, and if at the end of the
lookbehind, that end was not correctly recognized. Errors such as "lookbehind lookbehind, that end was not correctly recognized. Errors such as "lookbehind
assertion is not fixed length" and also "internal error: bad code value in assertion is not fixed length" and also "internal error: bad code value in
parsed_skip()" could result. parsed_skip()" could result.
15. Put a limit of 1000 on recursive calls in pcre2_study() when searching 15. Put a limit of 1000 on recursive calls in pcre2_study() when searching
nested groups for starting code units, in order to avoid stack overflow issues. nested groups for starting code units, in order to avoid stack overflow issues.
If the limit is reached, it just gives up trying for this optimization. If the limit is reached, it just gives up trying for this optimization.
16. The control verb chain list must always be restored when exiting from a 16. The control verb chain list must always be restored when exiting from a
@ -66,9 +66,9 @@ recurse function in JIT.
17. Fix a crash which occurs when the character type of an invalid UTF 17. Fix a crash which occurs when the character type of an invalid UTF
character is decoded in JIT. character is decoded in JIT.
18. Changes in many areas of the code so that when Unicode is supported and 18. Changes in many areas of the code so that when Unicode is supported and
PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for
upper/lower case computations on characters whose code points are greater than upper/lower case computations on characters whose code points are greater than
127. 127.
19. The function for checking UTF-16 validity was returning an incorrect offset 19. The function for checking UTF-16 validity was returning an incorrect offset
@ -77,24 +77,24 @@ low surrogate. This caused incorrect behaviour, for example when
PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the
invalid high surrogate, such as /aa/ matching "\x{d800}aa". invalid high surrogate, such as /aa/ matching "\x{d800}aa".
20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern 20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern
could be mis-compiled and therefore not match correctly. This is the example could be mis-compiled and therefore not match correctly. This is the example
that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to
match "word" because the "move back" value was set to zero. match "word" because the "move back" value was set to zero.
21. Following a request from a user, some extensions and tidies to the 21. Following a request from a user, some extensions and tidies to the
character tables handling have been done: character tables handling have been done:
(a) The dftables auxiliary program is renamed pcre2_dftables, but it is still (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still
not installed for public use. not installed for public use.
(b) There is now a -b option for pcre2_dftables, which causes the tables to (b) There is now a -b option for pcre2_dftables, which causes the tables to
be written in binary. There is also a -help option. be written in binary. There is also a -help option.
(c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an
application that wants to save tables in binary knows how long they are. application that wants to save tables in binary knows how long they are.
22. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to 22. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to
LIST(APPEND...) to allow a setting from the command line to be included. LIST(APPEND...) to allow a setting from the command line to be included.
23. Updated to Unicode 13.0.0. 23. Updated to Unicode 13.0.0.

View File

@ -26,7 +26,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service, University of Cambridge Computing Service,
Cambridge, England. Cambridge, England.
Copyright (c) 1997-2019 University of Cambridge Copyright (c) 1997-2020 University of Cambridge
All rights reserved. All rights reserved.
@ -37,7 +37,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Email domain: freemail.hu Email domain: freemail.hu
Copyright(c) 2010-2019 Zoltan Herczeg Copyright(c) 2010-2020 Zoltan Herczeg
All rights reserved. All rights reserved.
@ -48,7 +48,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Email domain: freemail.hu Email domain: freemail.hu
Copyright(c) 2009-2019 Zoltan Herczeg Copyright(c) 2009-2020 Zoltan Herczeg
All rights reserved. All rights reserved.

21
NEWS
View File

@ -2,6 +2,27 @@ News about PCRE2 releases
------------------------- -------------------------
Version 10.35 15-April-2020
---------------------------
Bugfixes, tidies, and a few new enhancements.
1. Capturing groups that contain recursive backreferences to themselves are no
longer automatically atomic, because the restriction is no longer necessary
as a result of the 10.30 restructuring.
2. Several new options for pcre2_substitute().
3. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode
character properties are used for upper/lower case computations on characters
whose code points are greater than 127.
4. The character tabless (for low-valued characters) can now more easily be
saved and restored in binary.
5. Updated to Unicode 13.0.0.
Version 10.34 21-November-2019 Version 10.34 21-November-2019
------------------------------ ------------------------------

4
README
View File

@ -562,7 +562,7 @@ not be a problem.
If you need to modify the character tables when cross-compiling, you should If you need to modify the character tables when cross-compiling, you should
move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
hand and run it on the local host to make a new version of hand and run it on the local host to make a new version of
pcre2_chartables.c.dist. See the pcre2build section "Creating character tables pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
at build time" for more details. at build time" for more details.
@ -753,7 +753,7 @@ that represent character classes for code points less than 256. The final
1 white space character 1 white space character
2 letter 2 letter
4 lower case letter 4 lower case letter
8 decimal digit 8 decimal digit
16 alphanumeric or '_' 16 alphanumeric or '_'

View File

@ -11,15 +11,15 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre2_major, [10]) m4_define(pcre2_major, [10])
m4_define(pcre2_minor, [35]) m4_define(pcre2_minor, [35])
m4_define(pcre2_prerelease, [-RC1]) m4_define(pcre2_prerelease, [-RC1])
m4_define(pcre2_date, [2019-11-27]) m4_define(pcre2_date, [2020-04-15])
# NOTE: The CMakeLists.txt file searches for the above variables in the first # NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved. # 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age) # Libtool shared library interface versions (current:revision:age)
m4_define(libpcre2_8_version, [9:0:9]) m4_define(libpcre2_8_version, [10:0:10])
m4_define(libpcre2_16_version, [9:0:9]) m4_define(libpcre2_16_version, [10:0:10])
m4_define(libpcre2_32_version, [9:0:9]) m4_define(libpcre2_32_version, [10:0:10])
m4_define(libpcre2_posix_version, [2:3:0]) m4_define(libpcre2_posix_version, [2:3:0])
AC_PREREQ(2.57) AC_PREREQ(2.57)

View File

@ -562,7 +562,7 @@ not be a problem.
If you need to modify the character tables when cross-compiling, you should If you need to modify the character tables when cross-compiling, you should
move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
hand and run it on the local host to make a new version of hand and run it on the local host to make a new version of
pcre2_chartables.c.dist. See the pcre2build section "Creating character tables pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
at build time" for more details. at build time" for more details.
@ -753,7 +753,7 @@ that represent character classes for code points less than 256. The final
1 white space character 1 white space character
2 letter 2 letter
4 lower case letter 4 lower case letter
8 decimal digit 8 decimal digit
16 alphanumeric or '_' 16 alphanumeric or '_'

View File

@ -28,11 +28,11 @@ DESCRIPTION
<P> <P>
This function sets a pointer to custom character tables within a compile This function sets a pointer to custom character tables within a compile
context. The second argument must point to a set of PCRE2 character tables or context. The second argument must point to a set of PCRE2 character tables or
be NULL to request the default tables. The result is always zero. Character be NULL to request the default tables. The result is always zero. Character
tables can be created by calling <b>pcre2_maketables()</b> or by running the tables can be created by calling <b>pcre2_maketables()</b> or by running the
<b>pcre2_dftables</b> maintenance command in binary mode (see the <b>pcre2_dftables</b> maintenance command in binary mode (see the
<a href="pcre2build.html"><b>pcre2build</b></a> <a href="pcre2build.html"><b>pcre2build</b></a>
documentation). documentation).
</P> </P>
<P> <P>
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the

View File

@ -82,7 +82,7 @@ zero-terminated strings. The options are:
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s) PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s)
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
</pre> </pre>

View File

@ -1228,7 +1228,7 @@ uint32_t integer that is always set to zero.
<pre> <pre>
PCRE2_CONFIG_TABLES_LENGTH PCRE2_CONFIG_TABLES_LENGTH
</pre> </pre>
The output is a uint32_t integer that gives the length of PCRE2's character The output is a uint32_t integer that gives the length of PCRE2's character
processing tables in bytes. For details of these tables see the processing tables in bytes. For details of these tables see the
<a href="#localesupport">section on locale support</a> <a href="#localesupport">section on locale support</a>
below. below.
@ -1489,7 +1489,7 @@ documentation.
</pre> </pre>
If this bit is set, letters in the pattern match both upper and lower case If this bit is set, letters in the pattern match both upper and lower case
letters in the subject. It is equivalent to Perl's /i option, and it can be letters in the subject. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting. If either PCRE2_UTF or changed within a pattern by a (?i) option setting. If either PCRE2_UTF or
PCRE2_UCP is set, Unicode properties are used for all characters with more than PCRE2_UCP is set, Unicode properties are used for all characters with more than
one other case, and for all characters whose code points are greater than one other case, and for all characters whose code points are greater than
U+007F. For lower valued characters with only one other case, a lookup table is U+007F. For lower valued characters with only one other case, a lookup table is
@ -1837,7 +1837,7 @@ the section on
in the in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page. If you set PCRE2_UCP, matching one of the items it affects takes much page. If you set PCRE2_UCP, matching one of the items it affects takes much
longer. longer.
</P> </P>
<P> <P>
The second effect of PCRE2_UCP is to force the use of Unicode properties for The second effect of PCRE2_UCP is to force the use of Unicode properties for
@ -2012,10 +2012,10 @@ PCRE2 handles caseless matching, and determines whether characters are letters,
digits, or whatever, by reference to a set of tables, indexed by character code digits, or whatever, by reference to a set of tables, indexed by character code
point. However, this applies only to characters whose code points are less than point. However, this applies only to characters whose code points are less than
256. By default, higher-valued code points never match escapes such as \w or 256. By default, higher-valued code points never match escapes such as \w or
\d. \d.
</P> </P>
<P> <P>
When PCRE2 is built with Unicode support (the default), the Unicode properties When PCRE2 is built with Unicode support (the default), the Unicode properties
of all characters can be tested with \p and \P, or, alternatively, the of all characters can be tested with \p and \P, or, alternatively, the
PCRE2_UCP option can be set when a pattern is compiled; this causes \w and PCRE2_UCP option can be set when a pattern is compiled; this causes \w and
friends to use Unicode property support instead of the built-in tables. friends to use Unicode property support instead of the built-in tables.
@ -3532,8 +3532,8 @@ terminating a \Q quoted sequence) reverts to no case forcing. The sequences
\u and \l force the next character (if it is a letter) to upper or lower \u and \l force the next character (if it is a letter) to upper or lower
case, respectively, and then the state automatically reverts to no case case, respectively, and then the state automatically reverts to no case
forcing. Case forcing applies to all inserted characters, including those from forcing. Case forcing applies to all inserted characters, including those from
capture groups and letters within \Q...\E quoted sequences. If either capture groups and letters within \Q...\E quoted sequences. If either
PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode
properties are used for case forcing characters whose code points are greater properties are used for case forcing characters whose code points are greater
than 127. than 127.
</P> </P>

View File

@ -342,23 +342,23 @@ host and therefore not compiled with the cross compiler.
If you need to create alternative tables when cross compiling, you will have to If you need to create alternative tables when cross compiling, you will have to
do so "by hand". There may also be other reasons for creating tables manually. do so "by hand". There may also be other reasons for creating tables manually.
To cause <b>pcre2_dftables</b> to be built on the local host, run a normal To cause <b>pcre2_dftables</b> to be built on the local host, run a normal
compiling command, and then run the program with the output file as its compiling command, and then run the program with the output file as its
argument, for example: argument, for example:
<pre> <pre>
cc src/pcre2_dftables.c -o pcre2_dftables cc src/pcre2_dftables.c -o pcre2_dftables
./pcre2_dftables src/pcre2_chartables.c ./pcre2_dftables src/pcre2_chartables.c
</pre> </pre>
This builds the tables in the default locale of the local host. If you want to This builds the tables in the default locale of the local host. If you want to
specify a locale, you must use the -L option: specify a locale, you must use the -L option:
<pre> <pre>
LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c
</pre> </pre>
You can also specify -b (with or without -L). This causes the tables to be You can also specify -b (with or without -L). This causes the tables to be
written in binary instead of as source code. A set of binary tables can be written in binary instead of as source code. A set of binary tables can be
loaded into memory by an application and passed to <b>pcre2_compile()</b> in the loaded into memory by an application and passed to <b>pcre2_compile()</b> in the
same way as tables created by calling <b>pcre2_maketables()</b>. The tables are same way as tables created by calling <b>pcre2_maketables()</b>. The tables are
just a string of bytes, independent of hardware characteristics such as just a string of bytes, independent of hardware characteristics such as
endianness. This means they can be bundled with an application that runs in endianness. This means they can be bundled with an application that runs in
different environments, to ensure consistent behaviour. different environments, to ensure consistent behaviour.
</P> </P>
<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br> <br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>

View File

@ -647,7 +647,7 @@ use of JIT at run time. It is provided for testing and working round problems.
It should never be needed in normal use. It should never be needed in normal use.
</P> </P>
<P> <P>
<b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i> <b>-O</b> <i>text</i>, <b>--output</b>=<i>text</i>
When there is a match, instead of outputting the whole line that matched, When there is a match, instead of outputting the whole line that matched,
output just the given text, followed by an operating-system standard newline. output just the given text, followed by an operating-system standard newline.
The <b>--newline</b> option has no effect on this option, which is mutually The <b>--newline</b> option has no effect on this option, which is mutually

View File

@ -114,7 +114,7 @@ Another special sequence that may appear at the start of a pattern is (*UCP).
This has the same effect as setting the PCRE2_UCP option: it causes sequences This has the same effect as setting the PCRE2_UCP option: it causes sequences
such as \d and \w to use Unicode properties to determine character types, such as \d and \w to use Unicode properties to determine character types,
instead of recognizing only characters with codes less than 256 via a lookup instead of recognizing only characters with codes less than 256 via a lookup
table. If also causes upper/lower casing operations to use Unicode properties table. If also causes upper/lower casing operations to use Unicode properties
for characters with code points greater than 127, even when UTF is not set. for characters with code points greater than 127, even when UTF is not set.
</P> </P>
<P> <P>
@ -2664,8 +2664,8 @@ as before because nothing has changed, so using a non-atomic assertion just
wastes resources. wastes resources.
</P> </P>
<P> <P>
There is one exception to backtracking into a non-atomic assertion. If an There is one exception to backtracking into a non-atomic assertion. If an
(*ACCEPT) control verb is triggered, the assertion succeeds atomically. That (*ACCEPT) control verb is triggered, the assertion succeeds atomically. That
is, a subsequent match failure cannot backtrack into the assertion. is, a subsequent match failure cannot backtrack into the assertion.
</P> </P>
<P> <P>

View File

@ -379,7 +379,7 @@ described in the section entitled "Saving and restoring compiled patterns"
#loadtables &#60;filename&#62; #loadtables &#60;filename&#62;
</pre> </pre>
This command is used to load a set of binary character tables that can be This command is used to load a set of binary character tables that can be
accessed by the tables=3 qualifier. Such tables can be created by the accessed by the tables=3 qualifier. Such tables can be created by the
<b>pcre2_dftables</b> program with the -b option. <b>pcre2_dftables</b> program with the -b option.
<pre> <pre>
#newline_default [&#60;newline-list&#62;] #newline_default [&#60;newline-list&#62;]
@ -1041,7 +1041,7 @@ with different character tables. The digit specifies the tables as follows:
1 the default ASCII tables, as distributed in 1 the default ASCII tables, as distributed in
pcre2_chartables.c.dist pcre2_chartables.c.dist
2 a set of tables defining ISO 8859 characters 2 a set of tables defining ISO 8859 characters
3 a set of tables loaded by the #loadtables command 3 a set of tables loaded by the #loadtables command
</pre> </pre>
In tables 2, some characters whose codes are greater than 128 are identified as In tables 2, some characters whose codes are greater than 128 are identified as
letters, digits, spaces, etc. Tables 3 can be used only after a letters, digits, spaces, etc. Tables 3 can be used only after a
@ -1072,9 +1072,9 @@ process.
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=&#60;n&#62; skip substitution &#60;n&#62; substitute_skip=&#60;n&#62; skip substitution &#60;n&#62;
substitute_stop=&#60;n&#62; skip substitution &#60;n&#62; and following substitute_stop=&#60;n&#62; skip substitution &#60;n&#62; and following
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1244,10 +1244,10 @@ pattern.
startoffset=&#60;n&#62; same as offset=&#60;n&#62; startoffset=&#60;n&#62; same as offset=&#60;n&#62;
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=&#60;n&#62; skip substitution number n substitute_skip=&#60;n&#62; skip substitution number n
substitute_stop=&#60;n&#62; skip substitution number n and greater substitute_stop=&#60;n&#62; skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1409,7 +1409,7 @@ Testing the substitution function
</b><br> </b><br>
<P> <P>
If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
called instead of one of the matching functions (or after one call of called instead of one of the matching functions (or after one call of
<b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that <b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
replacement strings cannot contain commas, because a comma signifies the end of replacement strings cannot contain commas, because a comma signifies the end of
a modifier. This is not thought to be an issue in a test program. a modifier. This is not thought to be an issue in a test program.
@ -1428,10 +1428,10 @@ for <b>pcre2_substitute()</b>:
<pre> <pre>
global PCRE2_SUBSTITUTE_GLOBAL global PCRE2_SUBSTITUTE_GLOBAL
substitute_extended PCRE2_SUBSTITUTE_EXTENDED substitute_extended PCRE2_SUBSTITUTE_EXTENDED
substitute_literal PCRE2_SUBSTITUTE_LITERAL substitute_literal PCRE2_SUBSTITUTE_LITERAL
substitute_matched PCRE2_SUBSTITUTE_MATCHED substitute_matched PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
</pre> </pre>

View File

@ -142,7 +142,7 @@ of Unicode properties except for characters whose code points are less than 128
and that have at most two case-equivalent values. For these, a direct table and that have at most two case-equivalent values. For these, a direct table
lookup is used for speed. A few Unicode characters such as Greek sigma have lookup is used for speed. A few Unicode characters such as Greek sigma have
more than two code points that are case-equivalent, and these are treated more than two code points that are case-equivalent, and these are treated
specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case
processing for non-UTF character encodings such as UCS-2. processing for non-UTF character encodings such as UCS-2.
<a name="scriptruns"></a></P> <a name="scriptruns"></a></P>
<br><b> <br><b>

View File

@ -180,8 +180,8 @@ REVISION
Last updated: 17 September 2018 Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2API(3) Library Functions Manual PCRE2API(3) PCRE2API(3) Library Functions Manual PCRE2API(3)
@ -3796,8 +3796,8 @@ REVISION
Last updated: 19 March 2020 Last updated: 19 March 2020
Copyright (c) 1997-2020 University of Cambridge. Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2BUILD(3) Library Functions Manual PCRE2BUILD(3) PCRE2BUILD(3) Library Functions Manual PCRE2BUILD(3)
@ -4390,8 +4390,8 @@ REVISION
Last updated: 20 March 2020 Last updated: 20 March 2020
Copyright (c) 1997-2020 University of Cambridge. Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2CALLOUT(3) Library Functions Manual PCRE2CALLOUT(3) PCRE2CALLOUT(3) Library Functions Manual PCRE2CALLOUT(3)
@ -4820,8 +4820,8 @@ REVISION
Last updated: 03 February 2019 Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2COMPAT(3) Library Functions Manual PCRE2COMPAT(3) PCRE2COMPAT(3) Library Functions Manual PCRE2COMPAT(3)
@ -5029,8 +5029,8 @@ REVISION
Last updated: 13 July 2019 Last updated: 13 July 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2JIT(3) Library Functions Manual PCRE2JIT(3) PCRE2JIT(3) Library Functions Manual PCRE2JIT(3)
@ -5454,8 +5454,8 @@ REVISION
Last updated: 23 May 2019 Last updated: 23 May 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2LIMITS(3) Library Functions Manual PCRE2LIMITS(3) PCRE2LIMITS(3) Library Functions Manual PCRE2LIMITS(3)
@ -5524,8 +5524,8 @@ REVISION
Last updated: 02 February 2019 Last updated: 02 February 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2MATCHING(3) Library Functions Manual PCRE2MATCHING(3) PCRE2MATCHING(3) Library Functions Manual PCRE2MATCHING(3)
@ -5748,8 +5748,8 @@ REVISION
Last updated: 23 May 2019 Last updated: 23 May 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2PARTIAL(3) Library Functions Manual PCRE2PARTIAL(3) PCRE2PARTIAL(3) Library Functions Manual PCRE2PARTIAL(3)
@ -6128,8 +6128,8 @@ REVISION
Last updated: 04 September 2019 Last updated: 04 September 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2PATTERN(3) Library Functions Manual PCRE2PATTERN(3) PCRE2PATTERN(3) Library Functions Manual PCRE2PATTERN(3)
@ -9562,8 +9562,8 @@ REVISION
Last updated: 24 February 2020 Last updated: 24 February 2020
Copyright (c) 1997-2020 University of Cambridge. Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2PERFORM(3) Library Functions Manual PCRE2PERFORM(3) PCRE2PERFORM(3) Library Functions Manual PCRE2PERFORM(3)
@ -9797,8 +9797,8 @@ REVISION
Last updated: 03 February 2019 Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2POSIX(3) Library Functions Manual PCRE2POSIX(3) PCRE2POSIX(3) Library Functions Manual PCRE2POSIX(3)
@ -10127,8 +10127,8 @@ REVISION
Last updated: 30 January 2019 Last updated: 30 January 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2SAMPLE(3) Library Functions Manual PCRE2SAMPLE(3) PCRE2SAMPLE(3) Library Functions Manual PCRE2SAMPLE(3)
@ -10406,8 +10406,8 @@ REVISION
Last updated: 27 June 2018 Last updated: 27 June 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2SYNTAX(3) Library Functions Manual PCRE2SYNTAX(3) PCRE2SYNTAX(3) Library Functions Manual PCRE2SYNTAX(3)
@ -10922,8 +10922,8 @@ REVISION
Last updated: 28 December 2019 Last updated: 28 December 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2UNICODE(3) Library Functions Manual PCRE2UNICODE(3) PCRE2UNICODE(3) Library Functions Manual PCRE2UNICODE(3)
@ -11357,5 +11357,5 @@ REVISION
Last updated: 23 February 2020 Last updated: 23 February 2020
Copyright (c) 1997-2020 University of Cambridge. Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -16,13 +16,13 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
This function sets a pointer to custom character tables within a compile This function sets a pointer to custom character tables within a compile
context. The second argument must point to a set of PCRE2 character tables or context. The second argument must point to a set of PCRE2 character tables or
be NULL to request the default tables. The result is always zero. Character be NULL to request the default tables. The result is always zero. Character
tables can be created by calling \fBpcre2_maketables()\fP or by running the tables can be created by calling \fBpcre2_maketables()\fP or by running the
\fBpcre2_dftables\fP maintenance command in binary mode (see the \fBpcre2_dftables\fP maintenance command in binary mode (see the
.\" HREF .\" HREF
\fBpcre2build\fP \fBpcre2build\fP
.\" .\"
documentation). documentation).
.P .P
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the
.\" HREF .\" HREF

View File

@ -73,7 +73,7 @@ zero-terminated strings. The options are:
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s) PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s)
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
.sp .sp

View File

@ -1156,7 +1156,7 @@ uint32_t integer that is always set to zero.
.sp .sp
PCRE2_CONFIG_TABLES_LENGTH PCRE2_CONFIG_TABLES_LENGTH
.sp .sp
The output is a uint32_t integer that gives the length of PCRE2's character The output is a uint32_t integer that gives the length of PCRE2's character
processing tables in bytes. For details of these tables see the processing tables in bytes. For details of these tables see the
.\" HTML <a href="#localesupport"> .\" HTML <a href="#localesupport">
.\" </a> .\" </a>
@ -1431,7 +1431,7 @@ documentation.
.sp .sp
If this bit is set, letters in the pattern match both upper and lower case If this bit is set, letters in the pattern match both upper and lower case
letters in the subject. It is equivalent to Perl's /i option, and it can be letters in the subject. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting. If either PCRE2_UTF or changed within a pattern by a (?i) option setting. If either PCRE2_UTF or
PCRE2_UCP is set, Unicode properties are used for all characters with more than PCRE2_UCP is set, Unicode properties are used for all characters with more than
one other case, and for all characters whose code points are greater than one other case, and for all characters whose code points are greater than
U+007F. For lower valued characters with only one other case, a lookup table is U+007F. For lower valued characters with only one other case, a lookup table is
@ -1794,7 +1794,7 @@ in the
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
page. If you set PCRE2_UCP, matching one of the items it affects takes much page. If you set PCRE2_UCP, matching one of the items it affects takes much
longer. longer.
.P .P
The second effect of PCRE2_UCP is to force the use of Unicode properties for The second effect of PCRE2_UCP is to force the use of Unicode properties for
upper/lower casing operations on characters with code points greater than 127, upper/lower casing operations on characters with code points greater than 127,
@ -1974,9 +1974,9 @@ PCRE2 handles caseless matching, and determines whether characters are letters,
digits, or whatever, by reference to a set of tables, indexed by character code digits, or whatever, by reference to a set of tables, indexed by character code
point. However, this applies only to characters whose code points are less than point. However, this applies only to characters whose code points are less than
256. By default, higher-valued code points never match escapes such as \ew or 256. By default, higher-valued code points never match escapes such as \ew or
\ed. \ed.
.P .P
When PCRE2 is built with Unicode support (the default), the Unicode properties When PCRE2 is built with Unicode support (the default), the Unicode properties
of all characters can be tested with \ep and \eP, or, alternatively, the of all characters can be tested with \ep and \eP, or, alternatively, the
PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
friends to use Unicode property support instead of the built-in tables. friends to use Unicode property support instead of the built-in tables.
@ -3537,8 +3537,8 @@ terminating a \eQ quoted sequence) reverts to no case forcing. The sequences
\eu and \el force the next character (if it is a letter) to upper or lower \eu and \el force the next character (if it is a letter) to upper or lower
case, respectively, and then the state automatically reverts to no case case, respectively, and then the state automatically reverts to no case
forcing. Case forcing applies to all inserted characters, including those from forcing. Case forcing applies to all inserted characters, including those from
capture groups and letters within \eQ...\eE quoted sequences. If either capture groups and letters within \eQ...\eE quoted sequences. If either
PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode
properties are used for case forcing characters whose code points are greater properties are used for case forcing characters whose code points are greater
than 127. than 127.
.P .P

View File

@ -338,23 +338,23 @@ host and therefore not compiled with the cross compiler.
If you need to create alternative tables when cross compiling, you will have to If you need to create alternative tables when cross compiling, you will have to
do so "by hand". There may also be other reasons for creating tables manually. do so "by hand". There may also be other reasons for creating tables manually.
To cause \fBpcre2_dftables\fP to be built on the local host, run a normal To cause \fBpcre2_dftables\fP to be built on the local host, run a normal
compiling command, and then run the program with the output file as its compiling command, and then run the program with the output file as its
argument, for example: argument, for example:
.sp .sp
cc src/pcre2_dftables.c -o pcre2_dftables cc src/pcre2_dftables.c -o pcre2_dftables
./pcre2_dftables src/pcre2_chartables.c ./pcre2_dftables src/pcre2_chartables.c
.sp .sp
This builds the tables in the default locale of the local host. If you want to This builds the tables in the default locale of the local host. If you want to
specify a locale, you must use the -L option: specify a locale, you must use the -L option:
.sp .sp
LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c
.sp .sp
You can also specify -b (with or without -L). This causes the tables to be You can also specify -b (with or without -L). This causes the tables to be
written in binary instead of as source code. A set of binary tables can be written in binary instead of as source code. A set of binary tables can be
loaded into memory by an application and passed to \fBpcre2_compile()\fP in the loaded into memory by an application and passed to \fBpcre2_compile()\fP in the
same way as tables created by calling \fBpcre2_maketables()\fP. The tables are same way as tables created by calling \fBpcre2_maketables()\fP. The tables are
just a string of bytes, independent of hardware characteristics such as just a string of bytes, independent of hardware characteristics such as
endianness. This means they can be bundled with an application that runs in endianness. This means they can be bundled with an application that runs in
different environments, to ensure consistent behaviour. different environments, to ensure consistent behaviour.
. .
. .

View File

@ -564,7 +564,7 @@ was explicitly disabled at build time. This option can be used to disable the
use of JIT at run time. It is provided for testing and working round problems. use of JIT at run time. It is provided for testing and working round problems.
It should never be needed in normal use. It should never be needed in normal use.
.TP .TP
\fB-O\fP \fItext\fP, \fB--output\fP=\fItext\fP \fB-O\fP \fItext\fP, \fB--output\fP=\fItext\fP
When there is a match, instead of outputting the whole line that matched, When there is a match, instead of outputting the whole line that matched,
output just the given text, followed by an operating-system standard newline. output just the given text, followed by an operating-system standard newline.
The \fB--newline\fP option has no effect on this option, which is mutually The \fB--newline\fP option has no effect on this option, which is mutually

View File

@ -75,7 +75,7 @@ Another special sequence that may appear at the start of a pattern is (*UCP).
This has the same effect as setting the PCRE2_UCP option: it causes sequences This has the same effect as setting the PCRE2_UCP option: it causes sequences
such as \ed and \ew to use Unicode properties to determine character types, such as \ed and \ew to use Unicode properties to determine character types,
instead of recognizing only characters with codes less than 256 via a lookup instead of recognizing only characters with codes less than 256 via a lookup
table. If also causes upper/lower casing operations to use Unicode properties table. If also causes upper/lower casing operations to use Unicode properties
for characters with code points greater than 127, even when UTF is not set. for characters with code points greater than 127, even when UTF is not set.
.P .P
Some applications that allow their users to supply patterns may wish to Some applications that allow their users to supply patterns may wish to
@ -2676,8 +2676,8 @@ pattern. If this is not the case, the rest of the pattern match fails exactly
as before because nothing has changed, so using a non-atomic assertion just as before because nothing has changed, so using a non-atomic assertion just
wastes resources. wastes resources.
.P .P
There is one exception to backtracking into a non-atomic assertion. If an There is one exception to backtracking into a non-atomic assertion. If an
(*ACCEPT) control verb is triggered, the assertion succeeds atomically. That (*ACCEPT) control verb is triggered, the assertion succeeds atomically. That
is, a subsequent match failure cannot backtrack into the assertion. is, a subsequent match failure cannot backtrack into the assertion.
.P .P
Non-atomic assertions are not supported by the alternative matching function Non-atomic assertions are not supported by the alternative matching function

View File

@ -538,7 +538,7 @@ These assertions are specific to PCRE2 and are not Perl-compatible.
(?*...) ) (?*...) )
(*napla:...) ) synonyms (*napla:...) ) synonyms
(*non_atomic_positive_lookahead:...) ) (*non_atomic_positive_lookahead:...) )
.sp .sp
(?<*...) ) (?<*...) )
(*naplb:...) ) synonyms (*naplb:...) ) synonyms
(*non_atomic_positive_lookbehind:...) ) (*non_atomic_positive_lookbehind:...) )

View File

@ -330,7 +330,7 @@ below.
#loadtables <filename> #loadtables <filename>
.sp .sp
This command is used to load a set of binary character tables that can be This command is used to load a set of binary character tables that can be
accessed by the tables=3 qualifier. Such tables can be created by the accessed by the tables=3 qualifier. Such tables can be created by the
\fBpcre2_dftables\fP program with the -b option. \fBpcre2_dftables\fP program with the -b option.
.sp .sp
#newline_default [<newline-list>] #newline_default [<newline-list>]
@ -1002,7 +1002,7 @@ with different character tables. The digit specifies the tables as follows:
1 the default ASCII tables, as distributed in 1 the default ASCII tables, as distributed in
pcre2_chartables.c.dist pcre2_chartables.c.dist
2 a set of tables defining ISO 8859 characters 2 a set of tables defining ISO 8859 characters
3 a set of tables loaded by the #loadtables command 3 a set of tables loaded by the #loadtables command
.sp .sp
In tables 2, some characters whose codes are greater than 128 are identified as In tables 2, some characters whose codes are greater than 128 are identified as
letters, digits, spaces, etc. Tables 3 can be used only after a letters, digits, spaces, etc. Tables 3 can be used only after a
@ -1033,9 +1033,9 @@ process.
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=<n> skip substitution <n> substitute_skip=<n> skip substitution <n>
substitute_stop=<n> skip substitution <n> and following substitute_stop=<n> skip substitution <n> and following
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1212,10 +1212,10 @@ pattern.
startoffset=<n> same as offset=<n> startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=<n> skip substitution number n substitute_skip=<n> skip substitution number n
substitute_stop=<n> skip substitution number n and greater substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1379,7 +1379,7 @@ by name.
.rs .rs
.sp .sp
If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
called instead of one of the matching functions (or after one call of called instead of one of the matching functions (or after one call of
\fBpcre2_match()\fP in the case of PCRE2_SUBSTITUTE_MATCHED). Note that \fBpcre2_match()\fP in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
replacement strings cannot contain commas, because a comma signifies the end of replacement strings cannot contain commas, because a comma signifies the end of
a modifier. This is not thought to be an issue in a test program. a modifier. This is not thought to be an issue in a test program.
@ -1396,10 +1396,10 @@ for \fBpcre2_substitute()\fP:
.sp .sp
global PCRE2_SUBSTITUTE_GLOBAL global PCRE2_SUBSTITUTE_GLOBAL
substitute_extended PCRE2_SUBSTITUTE_EXTENDED substitute_extended PCRE2_SUBSTITUTE_EXTENDED
substitute_literal PCRE2_SUBSTITUTE_LITERAL substitute_literal PCRE2_SUBSTITUTE_LITERAL
substitute_matched PCRE2_SUBSTITUTE_MATCHED substitute_matched PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
.sp .sp

View File

@ -134,7 +134,7 @@ of Unicode properties except for characters whose code points are less than 128
and that have at most two case-equivalent values. For these, a direct table and that have at most two case-equivalent values. For these, a direct table
lookup is used for speed. A few Unicode characters such as Greek sigma have lookup is used for speed. A few Unicode characters such as Greek sigma have
more than two code points that are case-equivalent, and these are treated more than two code points that are case-equivalent, and these are treated
specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case
processing for non-UTF character encodings such as UCS-2. processing for non-UTF character encodings such as UCS-2.
. .
. .

View File

@ -218,7 +218,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_NAME "PCRE2" #define PACKAGE_NAME "PCRE2"
/* Define to the full name and version of this package. */ /* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE2 10.34" #define PACKAGE_STRING "PCRE2 10.35-RC1"
/* Define to the one symbol short name of this package. */ /* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre2" #define PACKAGE_TARNAME "pcre2"
@ -227,7 +227,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_URL "" #define PACKAGE_URL ""
/* Define to the version of this package. */ /* Define to the version of this package. */
#define PACKAGE_VERSION "10.34" #define PACKAGE_VERSION "10.35-RC1"
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested /* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system parentheses (of any kind) in a pattern. This limits the amount of system
@ -352,7 +352,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif #endif
/* Version number of package */ /* Version number of package */
#define VERSION "10.34" #define VERSION "10.35-RC1"
/* Define to 1 if on MINIX. */ /* Define to 1 if on MINIX. */
/* #undef _MINIX */ /* #undef _MINIX */

View File

@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, second API, to be /* This is the public header file for the PCRE library, second API, to be
#included by applications that call PCRE2 functions. #included by applications that call PCRE2 functions.
Copyright (c) 2016-2019 University of Cambridge Copyright (c) 2016-2020 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
/* The current PCRE version information. */ /* The current PCRE version information. */
#define PCRE2_MAJOR 10 #define PCRE2_MAJOR 10
#define PCRE2_MINOR 34 #define PCRE2_MINOR 35
#define PCRE2_PRERELEASE #define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2019-11-21 #define PCRE2_DATE 2020-04-15
/* When an application links to a PCRE DLL in Windows, the symbols that are /* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate imported have to be identified as such. When building PCRE2, the appropriate
@ -181,6 +181,9 @@ pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */ #define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */ #define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u #define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 0x00020000u /* pcre2_substitute() only */
/* Options for pcre2_pattern_convert(). */ /* Options for pcre2_pattern_convert(). */
@ -445,6 +448,7 @@ released, the numbers must not be changed. */
#define PCRE2_CONFIG_HEAPLIMIT 12 #define PCRE2_CONFIG_HEAPLIMIT 12
#define PCRE2_CONFIG_NEVER_BACKSLASH_C 13 #define PCRE2_CONFIG_NEVER_BACKSLASH_C 13
#define PCRE2_CONFIG_COMPILED_WIDTHS 14 #define PCRE2_CONFIG_COMPILED_WIDTHS 14
#define PCRE2_CONFIG_TABLES_LENGTH 15
/* Types for code units in patterns and subject strings. */ /* Types for code units in patterns and subject strings. */

View File

@ -505,7 +505,7 @@ which case the base cannot be possessified.
Arguments: Arguments:
code points to the byte code code points to the byte code
utf TRUE in UTF mode utf TRUE in UTF mode
ucp TRUE in UCP mode ucp TRUE in UCP mode
cb compile data block cb compile data block
base_list the data list of the base opcode base_list the data list of the base opcode
base_end the end of the base opcode base_end the end of the base opcode
@ -675,7 +675,7 @@ for(;;)
/* The bracket content will be checked by the OP_BRA/OP_CBRA case above. */ /* The bracket content will be checked by the OP_BRA/OP_CBRA case above. */
next_code += 1 + LINK_SIZE; next_code += 1 + LINK_SIZE;
if (!compare_opcodes(next_code, utf, ucp, cb, base_list, base_end, if (!compare_opcodes(next_code, utf, ucp, cb, base_list, base_end,
rec_limit)) rec_limit))
return FALSE; return FALSE;
@ -1134,7 +1134,7 @@ for (;;)
get_chr_property_list(code, utf, ucp, cb->fcc, list) : NULL; get_chr_property_list(code, utf, ucp, cb->fcc, list) : NULL;
list[1] = c == OP_STAR || c == OP_PLUS || c == OP_QUERY || c == OP_UPTO; list[1] = c == OP_STAR || c == OP_PLUS || c == OP_QUERY || c == OP_UPTO;
if (end != NULL && compare_opcodes(end, utf, ucp, cb, list, end, if (end != NULL && compare_opcodes(end, utf, ucp, cb, list, end,
&rec_limit)) &rec_limit))
{ {
switch(c) switch(c)

View File

@ -3527,14 +3527,14 @@ if ((re->flags & PCRE2_FIRSTSET) != 0)
if ((re->flags & PCRE2_FIRSTCASELESS) != 0) if ((re->flags & PCRE2_FIRSTCASELESS) != 0)
{ {
first_cu2 = TABLE_GET(first_cu, mb->tables + fcc_offset, first_cu); first_cu2 = TABLE_GET(first_cu, mb->tables + fcc_offset, first_cu);
#ifdef SUPPORT_UNICODE #ifdef SUPPORT_UNICODE
#if PCRE2_CODE_UNIT_WIDTH == 8 #if PCRE2_CODE_UNIT_WIDTH == 8
if (first_cu > 127 && !utf && (re->overall_options & PCRE2_UCP) != 0) if (first_cu > 127 && !utf && (re->overall_options & PCRE2_UCP) != 0)
first_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(first_cu); first_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(first_cu);
#else #else
if (first_cu > 127 && (utf || (re->overall_options & PCRE2_UCP) != 0)) if (first_cu > 127 && (utf || (re->overall_options & PCRE2_UCP) != 0))
first_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(first_cu); first_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(first_cu);
#endif #endif
#endif /* SUPPORT_UNICODE */ #endif /* SUPPORT_UNICODE */
} }
} }
@ -3553,10 +3553,10 @@ if ((re->flags & PCRE2_LASTSET) != 0)
req_cu2 = TABLE_GET(req_cu, mb->tables + fcc_offset, req_cu); req_cu2 = TABLE_GET(req_cu, mb->tables + fcc_offset, req_cu);
#ifdef SUPPORT_UNICODE #ifdef SUPPORT_UNICODE
#if PCRE2_CODE_UNIT_WIDTH == 8 #if PCRE2_CODE_UNIT_WIDTH == 8
if (req_cu > 127 && !utf && (re->overall_options & PCRE2_UCP) != 0) if (req_cu > 127 && !utf && (re->overall_options & PCRE2_UCP) != 0)
req_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(req_cu); req_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(req_cu);
#else #else
if (req_cu > 127 && (utf || (re->overall_options & PCRE2_UCP) != 0)) if (req_cu > 127 && (utf || (re->overall_options & PCRE2_UCP) != 0))
req_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(req_cu); req_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(req_cu);
#endif #endif
#endif /* SUPPORT_UNICODE */ #endif /* SUPPORT_UNICODE */

View File

@ -65,23 +65,23 @@ given, they are written in binary. */
static char *classlist[] = static char *classlist[] =
{ {
"space", "xdigit", "digit", "upper", "lower", "space", "xdigit", "digit", "upper", "lower",
"word", "graph", "print", "punct", "cntrl" "word", "graph", "print", "punct", "cntrl"
}; };
/************************************************* /*************************************************
* Usage * * Usage *
*************************************************/ *************************************************/
static void static void
usage(void) usage(void)
{ {
(void)fprintf(stderr, (void)fprintf(stderr,
"Usage: pcre2_dftables [options] <output file>\n" "Usage: pcre2_dftables [options] <output file>\n"
" -b Write output in binary (default is source code)\n" " -b Write output in binary (default is source code)\n"
" -L Use locale from LC_ALL (default is \"C\" locale)\n" " -L Use locale from LC_ALL (default is \"C\" locale)\n"
); );
} }
@ -97,7 +97,7 @@ FILE *f;
int i; int i;
int nclass = 0; int nclass = 0;
BOOL binary = FALSE; BOOL binary = FALSE;
char *env = "C"; char *env = "C";
const unsigned char *tables; const unsigned char *tables;
const unsigned char *base_of_tables; const unsigned char *base_of_tables;
@ -107,40 +107,40 @@ for (i = 1; i < argc; i++)
{ {
unsigned char *arg = (unsigned char *)argv[i]; unsigned char *arg = (unsigned char *)argv[i];
if (*arg != '-') break; if (*arg != '-') break;
if (strcmp(arg, "-help") == 0 || strcmp(arg, "--help") == 0) if (strcmp(arg, "-help") == 0 || strcmp(arg, "--help") == 0)
{ {
usage(); usage();
return 0; return 0;
} }
else if (strcmp(arg, "-L") == 0) else if (strcmp(arg, "-L") == 0)
{ {
if (setlocale(LC_ALL, "") == NULL) if (setlocale(LC_ALL, "") == NULL)
{ {
(void)fprintf(stderr, "pcre2_dftables: setlocale() failed\n"); (void)fprintf(stderr, "pcre2_dftables: setlocale() failed\n");
return 1; return 1;
} }
env = getenv("LC_ALL"); env = getenv("LC_ALL");
} }
else if (strcmp(arg, "-b") == 0) else if (strcmp(arg, "-b") == 0)
binary = TRUE; binary = TRUE;
else else
{ {
(void)fprintf(stderr, "pcre2_dftables: unrecognized option %s\n", arg); (void)fprintf(stderr, "pcre2_dftables: unrecognized option %s\n", arg);
return 1; return 1;
} }
} }
if (i != argc - 1) if (i != argc - 1)
{ {
(void)fprintf(stderr, "pcre2_dftables: one filename argument is required\n"); (void)fprintf(stderr, "pcre2_dftables: one filename argument is required\n");
return 1; return 1;
} }
/* Make the tables */ /* Make the tables */
tables = maketables(); tables = maketables();
base_of_tables = tables; base_of_tables = tables;
@ -151,19 +151,19 @@ if (f == NULL)
fprintf(stderr, "pcre2_dftables: failed to open %s for writing\n", argv[1]); fprintf(stderr, "pcre2_dftables: failed to open %s for writing\n", argv[1]);
return 1; return 1;
} }
/* If -b was specified, we write the tables in binary. */ /* If -b was specified, we write the tables in binary. */
if (binary) if (binary)
{ {
int yield = 0; int yield = 0;
size_t len = fwrite(tables, 1, TABLES_LENGTH, f); size_t len = fwrite(tables, 1, TABLES_LENGTH, f);
if (len != TABLES_LENGTH) if (len != TABLES_LENGTH)
{ {
(void)fprintf(stderr, "pcre2_dftables: fwrite() returned wrong length %d " (void)fprintf(stderr, "pcre2_dftables: fwrite() returned wrong length %d "
"instead of %d\n", (int)len, TABLES_LENGTH); "instead of %d\n", (int)len, TABLES_LENGTH);
yield = 1; yield = 1;
} }
fclose(f); fclose(f);
free((void *)base_of_tables); free((void *)base_of_tables);
return yield; return yield;
@ -181,9 +181,9 @@ the very long string otherwise. */
"program. It contains character tables that are used when no external\n" "program. It contains character tables that are used when no external\n"
"tables are passed to PCRE2 by the application that calls it. The tables\n" "tables are passed to PCRE2 by the application that calls it. The tables\n"
"are used only for characters whose code values are less than 256. */\n\n"); "are used only for characters whose code values are less than 256. */\n\n");
(void)fprintf(f, (void)fprintf(f,
"/* This set of tables was written in the %s locale. */\n\n", env); "/* This set of tables was written in the %s locale. */\n\n", env);
(void)fprintf(f, (void)fprintf(f,
"/* The pcre2_ftables program (which is distributed with PCRE2) can be used\n" "/* The pcre2_ftables program (which is distributed with PCRE2) can be used\n"
@ -255,7 +255,7 @@ for (i = 0; i < cbit_length; i++)
if ((i & 7) == 0 && i != 0) if ((i & 7) == 0 && i != 0)
{ {
if ((i & 31) == 0) (void)fprintf(f, "\n"); if ((i & 31) == 0) (void)fprintf(f, "\n");
if ((i & 24) == 8) (void)fprintf(f, " /* %s */", classlist[nclass++]); if ((i & 24) == 8) (void)fprintf(f, " /* %s */", classlist[nclass++]);
(void)fprintf(f, "\n "); (void)fprintf(f, "\n ");
} }
(void)fprintf(f, "0x%02x", *tables++); (void)fprintf(f, "0x%02x", *tables++);

View File

@ -1174,7 +1174,7 @@ while (cc < ccend)
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
if (cc < assert_na_end) if (cc < assert_na_end)
return FALSE; return FALSE;
/* Fall through */ /* Fall through */
case OP_MARK: case OP_MARK:
if (common->mark_ptr == 0) if (common->mark_ptr == 0)
{ {

View File

@ -382,7 +382,7 @@ if (caseless)
{ {
#if defined SUPPORT_UNICODE #if defined SUPPORT_UNICODE
BOOL utf = (mb->poptions & PCRE2_UTF) != 0; BOOL utf = (mb->poptions & PCRE2_UTF) != 0;
if (utf || (mb->poptions & PCRE2_UCP) != 0) if (utf || (mb->poptions & PCRE2_UCP) != 0)
{ {
PCRE2_SPTR endptr = p + length; PCRE2_SPTR endptr = p + length;
@ -395,24 +395,24 @@ if (caseless)
sequence of two of the latter. It is important, therefore, to check the sequence of two of the latter. It is important, therefore, to check the
length along the reference, not along the subject (earlier code did this length along the reference, not along the subject (earlier code did this
wrong). UCP without uses Unicode properties but without UTF encoding. */ wrong). UCP without uses Unicode properties but without UTF encoding. */
while (p < endptr) while (p < endptr)
{ {
uint32_t c, d; uint32_t c, d;
const ucd_record *ur; const ucd_record *ur;
if (eptr >= mb->end_subject) return 1; /* Partial match */ if (eptr >= mb->end_subject) return 1; /* Partial match */
if (utf) if (utf)
{ {
GETCHARINC(c, eptr); GETCHARINC(c, eptr);
GETCHARINC(d, p); GETCHARINC(d, p);
} }
else else
{ {
c = *eptr++; c = *eptr++;
d = *p++; d = *p++;
} }
ur = GET_UCD(d); ur = GET_UCD(d);
if (c != d && c != (uint32_t)((int)d + ur->other_case)) if (c != d && c != (uint32_t)((int)d + ur->other_case))
{ {

View File

@ -772,13 +772,13 @@ Arguments:
p points to the first code unit of the character p points to the first code unit of the character
caseless TRUE if caseless caseless TRUE if caseless
utf TRUE for UTF mode utf TRUE for UTF mode
ucp TRUE for UCP mode ucp TRUE for UCP mode
Returns: pointer after the character Returns: pointer after the character
*/ */
static PCRE2_SPTR static PCRE2_SPTR
set_table_bit(pcre2_real_code *re, PCRE2_SPTR p, BOOL caseless, BOOL utf, set_table_bit(pcre2_real_code *re, PCRE2_SPTR p, BOOL caseless, BOOL utf,
BOOL ucp) BOOL ucp)
{ {
uint32_t c = *p++; /* First code unit */ uint32_t c = *p++; /* First code unit */
@ -819,17 +819,17 @@ if (caseless)
c = UCD_OTHERCASE(c); c = UCD_OTHERCASE(c);
#if PCRE2_CODE_UNIT_WIDTH == 8 #if PCRE2_CODE_UNIT_WIDTH == 8
if (utf) if (utf)
{ {
PCRE2_UCHAR buff[6]; PCRE2_UCHAR buff[6];
(void)PRIV(ord2utf)(c, buff); (void)PRIV(ord2utf)(c, buff);
SET_BIT(buff[0]); SET_BIT(buff[0]);
} }
else if (c < 256) SET_BIT(c); else if (c < 256) SET_BIT(c);
#else /* 16-bit or 32-bit mode */ #else /* 16-bit or 32-bit mode */
if (c > 0xff) SET_BIT(0xff); else SET_BIT(c); if (c > 0xff) SET_BIT(0xff); else SET_BIT(c);
#endif #endif
} }
else else
#endif /* SUPPORT_UNICODE */ #endif /* SUPPORT_UNICODE */
@ -939,7 +939,7 @@ Arguments:
re points to the compiled regex block re points to the compiled regex block
code points to an expression code points to an expression
utf TRUE if in UTF mode utf TRUE if in UTF mode
ucp TRUE if in UCP mode ucp TRUE if in UCP mode
depthptr pointer to recurse depth depthptr pointer to recurse depth
Returns: SSB_FAIL => Failed to find any starting code units Returns: SSB_FAIL => Failed to find any starting code units
@ -1706,7 +1706,7 @@ if ((re->flags & (PCRE2_FIRSTSET|PCRE2_STARTLINE)) == 0)
int b = -1; int b = -1;
uint8_t *p = re->start_bitmap; uint8_t *p = re->start_bitmap;
uint32_t flags = PCRE2_FIRSTMAPSET; uint32_t flags = PCRE2_FIRSTMAPSET;
for (i = 0; i < 256; p++, i += 8) for (i = 0; i < 256; p++, i += 8)
{ {
uint8_t x = *p; uint8_t x = *p;
@ -1736,7 +1736,7 @@ if ((re->flags & (PCRE2_FIRSTSET|PCRE2_STARTLINE)) == 0)
} }
/* c contains the code unit value, in the range 0-255. In 8-bit UTF /* c contains the code unit value, in the range 0-255. In 8-bit UTF
mode, only values < 128 can be used. In all the other cases, c is a mode, only values < 128 can be used. In all the other cases, c is a
character value. */ character value. */
#if PCRE2_CODE_UNIT_WIDTH == 8 #if PCRE2_CODE_UNIT_WIDTH == 8
@ -1746,10 +1746,10 @@ if ((re->flags & (PCRE2_FIRSTSET|PCRE2_STARTLINE)) == 0)
else if (b < 0) /* Second one found */ else if (b < 0) /* Second one found */
{ {
int d = TABLE_GET((unsigned int)c, re->tables + fcc_offset, c); int d = TABLE_GET((unsigned int)c, re->tables + fcc_offset, c);
#ifdef SUPPORT_UNICODE #ifdef SUPPORT_UNICODE
if (utf || ucp) if (utf || ucp)
{ {
if (UCD_CASESET(c) != 0) goto DONE; /* Multiple case set */ if (UCD_CASESET(c) != 0) goto DONE; /* Multiple case set */
if (c > 127) d = UCD_OTHERCASE(c); if (c > 127) d = UCD_OTHERCASE(c);
} }