Code tidies for 10.30-RC1 release candidate.

This commit is contained in:
Philip.Hazel 2017-07-19 16:04:15 +00:00
parent e3052af6fd
commit 810d9b6da5
65 changed files with 1320 additions and 1152 deletions

View File

@ -429,7 +429,7 @@ SET(PCRE2_SOURCES
src/pcre2_compile.c
src/pcre2_config.c
src/pcre2_context.c
src/pcre2_convert.c
src/pcre2_convert.c
src/pcre2_dfa_match.c
src/pcre2_error.c
src/pcre2_find_bracket.c

172
ChangeLog
View File

@ -2,105 +2,105 @@ Change Log for PCRE2
--------------------
Version 10.30-DEV 09-March-2017
-------------------------------
Version 10.30-RC1 18-July-2017
------------------------------
1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the stack) for
remembering backtracking positions. This makes --disable-stack-for-recursion a
1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the stack) for
remembering backtracking positions. This makes --disable-stack-for-recursion a
NOOP. The new implementation allows backtracking into recursive group calls in
patterns, making it more compatible with Perl, and also fixes some other
hard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because
the old code had a number of fudges to try to reduce stack usage. It seems to
patterns, making it more compatible with Perl, and also fixes some other
hard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because
the old code had a number of fudges to try to reduce stack usage. It seems to
run no slower than the old code.
A number of bugs in the refactored code were subsequently fixed during testing
before release, but after the code was made available in the repository. These
bugs were never in fully released code, but are noted here for the record.
(a) If a pattern had fewer capturing parentheses than the ovector supplied in
(a) If a pattern had fewer capturing parentheses than the ovector supplied in
the match data block, a memory error (detectable by ASAN) occurred after
a match, because the external block was being set from non-existent
internal ovector fields. Fixes oss-fuzz issue 781.
(b) A pattern with very many capturing parentheses (when the internal frame
size was greater than the initial frame vector on the stack) caused a
crash. A vector on the heap is now set up at the start of matching if the
vector on the stack is not big enough to handle at least 10 frames.
Fixes oss-fuzz issue 783.
(b) A pattern with very many capturing parentheses (when the internal frame
size was greater than the initial frame vector on the stack) caused a
crash. A vector on the heap is now set up at the start of matching if the
vector on the stack is not big enough to handle at least 10 frames.
Fixes oss-fuzz issue 783.
(c) Handling of (*VERB)s in recursions was wrong in some cases.
(d) Captures in negative assertions that were used as conditions were not
happening if the assertion matched via (*ACCEPT).
(e) Mark values were not being passed out of recursions.
(f) Refactor some code in do_callout() to avoid picky compiler warnings about
happening if the assertion matched via (*ACCEPT).
(e) Mark values were not being passed out of recursions.
(f) Refactor some code in do_callout() to avoid picky compiler warnings about
negative indices. Fixes oss-fuzz issue 1454.
(g) Similarly refactor the way the variable length ovector is addressed for
similar reasons. Fixes oss-fuzz issue 1465.
2. Now that pcre2_match() no longer uses recursive function calls (see above),
the "match limit recursion" value seems misnamed. It still exists, and limits
the depth of tree that is searched. To avoid future confusion, it has been
renamed as "depth limit" in all relevant places (--with-depth-limit,
(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still
available for backwards compatibility.
the "match limit recursion" value seems misnamed. It still exists, and limits
the depth of tree that is searched. To avoid future confusion, it has been
renamed as "depth limit" in all relevant places (--with-depth-limit,
(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still
available for backwards compatibility.
3. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers:
(a) Check for malloc failures when getting memory for the ovector (POSIX) or
the match data block (non-POSIX).
(a) Check for malloc failures when getting memory for the ovector (POSIX) or
the match data block (non-POSIX).
4. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property
for a character with a code point greater than 0x10ffff (the Unicode maximum)
caused a crash.
5. If a lookbehind assertion that contained a back reference to a group
5. If a lookbehind assertion that contained a back reference to a group
appearing later in the pattern was compiled with the PCRE2_ANCHORED option,
undefined actions (often a segmentation fault) could occur, depending on what
other options were set. An example assertion is (?<!\1(abc)) where the
undefined actions (often a segmentation fault) could occur, depending on what
other options were set. An example assertion is (?<!\1(abc)) where the
reference \1 precedes the group (abc). This fixes oss-fuzz issue 865.
6. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for
6. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for
pcre2test to use it to output the frame size when the "framesize" modifier is
given.
7. Reworked the recursive pattern matching in the JIT compiler to follow the
interpreter changes.
8. When the zero_terminate modifier was specified on a pcre2test subject line
for global matching, unpredictable things could happen. For example, in UTF-8
mode, the pattern //g,zero_terminate read random memory when matched against an
8. When the zero_terminate modifier was specified on a pcre2test subject line
for global matching, unpredictable things could happen. For example, in UTF-8
mode, the pattern //g,zero_terminate read random memory when matched against an
empty string with zero_terminate. This was a bug in pcre2test, not the library.
9. Moved some Windows-specific code in pcre2grep (introduced in 10.23/13) out
of the section that is compiled when Unix-style directory scanning is
available, and into a new section that is always compiled for Windows.
10. In pcre2test, explicitly close the file after an error during serialization
10. In pcre2test, explicitly close the file after an error during serialization
or deserialization (the "load" or "save" commands).
11. Fix memory leak in pcre2_serialize_decode() when the input is invalid.
12. Fix potential NULL dereference in pcre2_callout_enumerate() if called with
12. Fix potential NULL dereference in pcre2_callout_enumerate() if called with
a NULL pattern pointer when Unicode support is available.
13. When the 32-bit library was being tested by pcre2test, error messages that
were longer than 64 code units could cause a buffer overflow. This was a bug in
13. When the 32-bit library was being tested by pcre2test, error messages that
were longer than 64 code units could cause a buffer overflow. This was a bug in
pcre2test.
14. The alternative matching function, pcre2_dfa_match() misbehaved if it
14. The alternative matching function, pcre2_dfa_match() misbehaved if it
encountered a character class with a possessive repeat, for example [a-f]{3}+.
15. The depth (formerly recursion) limit now applies to DFA matching (as
of 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA
15. The depth (formerly recursion) limit now applies to DFA matching (as
of 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA
matching to find the minimum value for this limit.
16. Since 10.21, if pcre2_match() was called with a null context, default
memory allocation functions were used instead of whatever was used when the
16. Since 10.21, if pcre2_match() was called with a null context, default
memory allocation functions were used instead of whatever was used when the
pattern was compiled.
17. Changes to the pcre2test "memory" modifier on a subject line. These apply
@ -108,33 +108,33 @@ only to pcre2_match():
(a) Warn if null_context is set on both pattern and subject, because the
memory details cannot then be shown.
(b) Remember (up to a certain number of) memory allocations and their
lengths, and list only the lengths, so as to be system-independent.
(In practice, the new interpreter never has more than 2 blocks allocated
simultaneously.)
(In practice, the new interpreter never has more than 2 blocks allocated
simultaneously.)
18. Make pcre2test detect an error return from pcre2_get_error_message(), give
a message, and abandon the run (this would have detected #13 above).
a message, and abandon the run (this would have detected #13 above).
19. Implemented PCRE2_ENDANCHORED.
20. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement
20. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement
the --output=text (-O) option and the inbuilt callout echo.
21. Extend auto-anchoring etc. to ignore groups with a zero qualifier and
single-branch conditions with a false condition (e.g. DEFINE) at the start of a
21. Extend auto-anchoring etc. to ignore groups with a zero qualifier and
single-branch conditions with a false condition (e.g. DEFINE) at the start of a
branch. For example, /(?(DEFINE)...)^A/ and /(...){0}^B/ are now flagged as
anchored.
22. Added an explicit limit on the amount of heap used by pcre2_match(), set by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the
heap limit along with other pattern information, and to find the minimum when
22. Added an explicit limit on the amount of heap used by pcre2_match(), set by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the
heap limit along with other pattern information, and to find the minimum when
the find_limits modifier is set.
23. Write to the last 8 bytes of the pcre2_real_code structure when a compiled
pattern is set up so as to initialize any padding the compiler might have
included. This avoids valgrind warnings when a compiled pattern is copied, in
23. Write to the last 8 bytes of the pcre2_real_code structure when a compiled
pattern is set up so as to initialize any padding the compiler might have
included. This avoids valgrind warnings when a compiled pattern is copied, in
particular when it is serialized.
24. Remove a redundant line of code left in accidentally a long time ago.
@ -143,80 +143,80 @@ particular when it is serialized.
26. Correct an incorrect cast in pcre2_valid_utf.c
27. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the
27. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the
tests to improve coverage.
28. Some fixes/tidies as a result of looking at Coverity Scan output:
(a) Typo: ">" should be ">=" in opcode check in pcre2_auto_possess.c.
(b) Added some casts to avoid "suspicious implicit sign extension".
(c) Resource leaks in pcre2test in rare error cases.
(b) Added some casts to avoid "suspicious implicit sign extension".
(c) Resource leaks in pcre2test in rare error cases.
(d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge
for checking at compile time that tables are the right size.
(e) Add missing "fall through" comment.
29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features.
for checking at compile time that tables are the right size.
(e) Add missing "fall through" comment.
29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features.
30. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
pcre2test, a crash could occur.
32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
all the tests can run with clang's sanitizing options.
33. Implement extra compile options in the compile context and add the first
33. Implement extra compile options in the compile context and add the first
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
34. Implement newline type PCRE2_NEWLINE_NUL.
35. A lookbehind assertion that had a zero-length branch caused undefined
35. A lookbehind assertion that had a zero-length branch caused undefined
behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
36. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
36. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
37. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
38. Fix returned offsets from regexec() when REG_STARTEND is used with a
38. Fix returned offsets from regexec() when REG_STARTEND is used with a
starting offset greater than zero.
39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
40. Implement the subject_literal modifier in pcre2test, and allow jitstack on
40. Implement the subject_literal modifier in pcre2test, and allow jitstack on
pattern lines.
41. Implement PCRE2_LITERAL and use it to support REG_NOSPEC.
42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
of pcre2grep.
43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:
(a) The -F option did not work for fixed strings containing \E.
(b) The -w option did not work for patterns with multiple branches.
(b) The -w option did not work for patterns with multiple branches.
44. Added configuration options for the SELinux compatible execmem allocator in
JIT.
45. Increased the limit for searching for a "must be present" code unit in
subjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are
45. Increased the limit for searching for a "must be present" code unit in
subjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are
much faster.
46. Arrange for anchored patterns to record and use "first code unit" data,
because this can give a fast "no match" without searching for a "required code
because this can give a fast "no match" without searching for a "required code
unit". Previously only non-anchored patterns did this.
47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
48. Add the callout_no_where modifier to pcre2test.
49. Update extended grapheme breaking rules to the latest set that are in
49. Update extended grapheme breaking rules to the latest set that are in
Unicode Standard Annex #29.
50. Added experimental foreign pattern conversion facilities
50. Added experimental foreign pattern conversion facilities
(pcre2_pattern_convert() and friends).

57
NEWS
View File

@ -1,6 +1,63 @@
News about PCRE2 releases
-------------------------
Version 10.30-RC1 18-July-2017
------------------------------
The full list of changes that includes bugfixes and tidies is, as always, in
ChangeLog. These are the most important new features:
1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the system stack) for
remembering backtracking positions. This makes --disable-stack-for-recursion a
NOOP. The new implementation allows backtracking into recursive group calls in
patterns, making it more compatible with Perl, and also fixes some other
previously hard-to-do issues. For patterns that have a lot of backtracking, the
heap is now used, and there is explicit limit on the amount, settable by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
but is renamed as "depth limit" (though the old names remain for
compatibility).
There is also a change in the way callouts from pcre2_match() are handled. The
offset_vector field in the callout block is no longer a pointer to the
actual ovector that was passed to the matching function in the match data
block. Instead it points to an internal ovector of a size large enough to hold
all possible captured substrings in the pattern.
2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
the end of the subject.
3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
also supported.
4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
5. Additional compile options in the compile context are now available, and the
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.
6. The newline type PCRE2_NEWLINE_NUL is now available.
7. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply.
8. The option REG_PEND (a GNU extension) is now available for the POSIX
wrapper. Also there is a new option PCRE2_LITERAL which is used to support
REG_NOSPEC.
9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
is tidier and also fixes some bugs.
10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.
11. There are some experimental functions for converting foreign patterns
(globs and POSIX patterns) into PCRE2 patterns.
Version 10.23 14-February-2017
------------------------------

View File

@ -179,8 +179,8 @@ can skip ahead to the CMake section.
STACK SIZE IN WINDOWS ENVIRONMENTS
Prior to release 10.30 the default system stack size of 1Mb in some Windows
environments caused issues with some tests. This should no longer be the case
Prior to release 10.30 the default system stack size of 1Mb in some Windows
environments caused issues with some tests. This should no longer be the case
for 10.30 and later releases.

63
README
View File

@ -173,7 +173,7 @@ library. They are also documented in the pcre2build man page.
architectures. If you try to enable it on an unsupported architecture, there
will be a compile time error. If you are running under SELinux you may also
want to add --enable-jit-sealloc, which enables the use of an execmem
allocator in JIT that is compatible with SELinux. This has no effect if JIT
allocator in JIT that is compatible with SELinux. This has no effect if JIT
is not enabled.
. If you do not want to make use of the default support for UTF-8 Unicode
@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
of the preceding, or any of the Unicode newline sequences, as indicating the
end of a line. Whatever you specify at build time is the default; the caller
of PCRE2 can change the selection at run time. The default newline indicator
is a single LF character (the Unix standard). You can specify the default
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
--enable-newline-is-any to the "configure" command, respectively.
of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
character as indicating the end of a line. Whatever you specify at build time
is the default; the caller of PCRE2 can change the selection at run time. The
default newline indicator is a single LF character (the Unix standard). You
can specify the default newline indicator by adding --enable-newline-is-cr,
--enable-newline-is-lf, --enable-newline-is-crlf,
--enable-newline-is-anycrlf, --enable-newline-is-any, or
--enable-newline-is-nul to the "configure" command, respectively.
. By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
--with-parens-nest-limit=500
. PCRE2 has a counter that can be set to limit the amount of computing resource
it uses when matching a pattern with the Perl-compatible matching function.
If the limit is exceeded during a match, the match fails. The default is ten
million. You can change the default by setting, for example,
it uses when matching a pattern. If the limit is exceeded during a match, the
match fails. The default is ten million. You can change the default by
setting, for example,
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion in the
pcre2api man page (search for pcre2_set_match_limit).
pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory
@ -246,15 +247,15 @@ library. They are also documented in the pcre2build man page.
There is more discussion in the pcre2api man page (search for
pcre2_set_depth_limit).
. You can also set an explicit limit on the amount of heap memory used by
. You can also set an explicit limit on the amount of heap memory used by
the pcre2_match() interpreter:
--with-heap-limit=500
The units are kilobytes. This limit does not apply when the JIT optimization
(which has its own memory control features) is used. There is more discussion
on the pcre2api man page (search for pcre2_set_heap_limit).
The units are kilobytes. This limit does not apply when the JIT optimization
(which has its own memory control features) is used. There is more discussion
on the pcre2api man page (search for pcre2_set_heap_limit).
. In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively.
Test 8 checks some internal offsets and code size features; it is run only when
the default "link size" of 2 is set (in other cases the sizes change) and when
Unicode support is enabled.
Test 8 checks some internal offsets and code size features, but it is run only
when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
32-bit modes and for different link sizes, so there are different output files
for each mode and link size.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes.
Test 14 contains some special UTF and UCP tests that give different output for
the different widths.
different code unit widths.
Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
Tests 21 and 22 test \C support when the use of \C is not locked out, without
and with UTF support, respectively. Test 23 tests \C when it is locked out.
Tests 24 and 25 test the experimental pattern conversion functions, without and
with UTF support, respectively.
Character tables
----------------
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C
locale which is set for your system will control the contents of these default
locale that is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
src/pcre2_compile.c )
src/pcre2_config.c )
src/pcre2_context.c )
src/pcre2_convert.c )
src/pcre2_dfa_match.c )
src/pcre2_error.c )
src/pcre2_find_bracket.c )
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program
src/pcre2_printint.c part of pcre2test
src/pcre2_jit_test.c JIT test program
(C) Auxiliary files:
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
(E) Auxiliary files for building PCRE2 "by hand"
pcre2.h.generic ) a version of the public PCRE2 header file
src/pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments
config.h.generic ) a version of config.h for use in non-"configure"
src/config.h.generic ) a version of config.h for use in non-"configure"
) environments
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 June 2017
Last updated: 18 July 2017

View File

@ -830,7 +830,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ $supportBSC -ne 0 ] ; then
echo " Skipped because \C is not disabled"
else
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput23 testtry
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput23 testtry
checkresult $? 23 ""
fi
fi
@ -839,7 +839,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ "$do24" = yes ] ; then
echo $title24
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput24 testtry
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput24 testtry
checkresult $? 24 ""
fi
@ -850,7 +850,7 @@ for bmode in "$test8" "$test16" "$test32"; do
if [ $utf -eq 0 ] ; then
echo " Skipped because UTF-$bits support is not available"
else
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput25 testtry
$sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput25 testtry
checkresult $? 25 ""
fi
fi

View File

@ -10,17 +10,17 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre2_major, [10])
m4_define(pcre2_minor, [30])
m4_define(pcre2_prerelease, [-DEV])
m4_define(pcre2_date, [2017-03-05])
m4_define(pcre2_prerelease, [-RC1])
m4_define(pcre2_date, [2017-07-18])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre2_8_version, [5:0:5])
m4_define(libpcre2_16_version, [5:0:5])
m4_define(libpcre2_32_version, [5:0:5])
m4_define(libpcre2_posix_version, [1:1:0])
m4_define(libpcre2_8_version, [6:0:6])
m4_define(libpcre2_16_version, [6:0:6])
m4_define(libpcre2_32_version, [6:0:6])
m4_define(libpcre2_posix_version, [2:0:0])
AC_PREREQ(2.57)
AC_INIT(PCRE2, pcre2_major.pcre2_minor[]pcre2_prerelease, , pcre2)
@ -277,7 +277,7 @@ AC_ARG_WITH(parens-nest-limit,
AC_ARG_WITH(heap-limit,
AS_HELP_STRING([--with-heap-limit=N],
[default limit on heap memory (kilobytes, default=20000000)]),
, with_heap_limit=20000000)
, with_heap_limit=20000000)
# Handle --with-match-limit=N
AC_ARG_WITH(match-limit,
@ -301,7 +301,7 @@ AC_ARG_WITH(match-limit-depth,
AC_ARG_WITH(match-limit-recursion,,
, with_match_limit_recursion=UNSET)
# Handle --enable-valgrind
AC_ARG_ENABLE(valgrind,
AS_HELP_STRING([--enable-valgrind],
@ -370,7 +370,7 @@ case "$enable_newline" in
crlf) ac_pcre2_newline_value=3 ;;
any) ac_pcre2_newline_value=4 ;;
anycrlf) ac_pcre2_newline_value=5 ;;
nul) ac_pcre2_newline_value=6 ;;
nul) ac_pcre2_newline_value=6 ;;
*)
AC_MSG_ERROR([invalid argument \"$enable_newline\" to --enable-newline option])
;;
@ -737,7 +737,7 @@ AC_DEFINE_UNQUOTED([MATCH_LIMIT_DEPTH], [$with_match_limit_depth], [
AC_DEFINE_UNQUOTED([HEAP_LIMIT], [$with_heap_limit], [
This limits the amount of memory that pcre2_match() may use while matching
a pattern. The value is in kilobytes.])
a pattern. The value is in kilobytes.])
AC_DEFINE([MAX_NAME_SIZE], [32], [
This limit is parameterized just in case anybody ever wants to

View File

@ -179,8 +179,8 @@ can skip ahead to the CMake section.
STACK SIZE IN WINDOWS ENVIRONMENTS
Prior to release 10.30 the default system stack size of 1Mb in some Windows
environments caused issues with some tests. This should no longer be the case
Prior to release 10.30 the default system stack size of 1Mb in some Windows
environments caused issues with some tests. This should no longer be the case
for 10.30 and later releases.

View File

@ -173,7 +173,7 @@ library. They are also documented in the pcre2build man page.
architectures. If you try to enable it on an unsupported architecture, there
will be a compile time error. If you are running under SELinux you may also
want to add --enable-jit-sealloc, which enables the use of an execmem
allocator in JIT that is compatible with SELinux. This has no effect if JIT
allocator in JIT that is compatible with SELinux. This has no effect if JIT
is not enabled.
. If you do not want to make use of the default support for UTF-8 Unicode
@ -198,13 +198,14 @@ library. They are also documented in the pcre2build man page.
or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
of the preceding, or any of the Unicode newline sequences, as indicating the
end of a line. Whatever you specify at build time is the default; the caller
of PCRE2 can change the selection at run time. The default newline indicator
is a single LF character (the Unix standard). You can specify the default
newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
--enable-newline-is-crlf, --enable-newline-is-anycrlf, or
--enable-newline-is-any to the "configure" command, respectively.
of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
character as indicating the end of a line. Whatever you specify at build time
is the default; the caller of PCRE2 can change the selection at run time. The
default newline indicator is a single LF character (the Unix standard). You
can specify the default newline indicator by adding --enable-newline-is-cr,
--enable-newline-is-lf, --enable-newline-is-crlf,
--enable-newline-is-anycrlf, --enable-newline-is-any, or
--enable-newline-is-nul to the "configure" command, respectively.
. By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers
@ -227,15 +228,15 @@ library. They are also documented in the pcre2build man page.
--with-parens-nest-limit=500
. PCRE2 has a counter that can be set to limit the amount of computing resource
it uses when matching a pattern with the Perl-compatible matching function.
If the limit is exceeded during a match, the match fails. The default is ten
million. You can change the default by setting, for example,
it uses when matching a pattern. If the limit is exceeded during a match, the
match fails. The default is ten million. You can change the default by
setting, for example,
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion in the
pcre2api man page (search for pcre2_set_match_limit).
pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking
during a matching process, which indirectly limits the amount of heap memory
@ -246,15 +247,15 @@ library. They are also documented in the pcre2build man page.
There is more discussion in the pcre2api man page (search for
pcre2_set_depth_limit).
. You can also set an explicit limit on the amount of heap memory used by
. You can also set an explicit limit on the amount of heap memory used by
the pcre2_match() interpreter:
--with-heap-limit=500
The units are kilobytes. This limit does not apply when the JIT optimization
(which has its own memory control features) is used. There is more discussion
on the pcre2api man page (search for pcre2_set_heap_limit).
The units are kilobytes. This limit does not apply when the JIT optimization
(which has its own memory control features) is used. There is more discussion
on the pcre2api man page (search for pcre2_set_heap_limit).
. In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the
@ -659,9 +660,10 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively.
Test 8 checks some internal offsets and code size features; it is run only when
the default "link size" of 2 is set (in other cases the sizes change) and when
Unicode support is enabled.
Test 8 checks some internal offsets and code size features, but it is run only
when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
32-bit modes and for different link sizes, so there are different output files
for each mode and link size.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in
@ -671,7 +673,7 @@ Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes.
Test 14 contains some special UTF and UCP tests that give different output for
the different widths.
different code unit widths.
Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive
@ -692,6 +694,9 @@ patterns to a file, and then reloading and checking them.
Tests 21 and 22 test \C support when the use of \C is not locked out, without
and with UTF support, respectively. Test 23 tests \C when it is locked out.
Tests 24 and 25 test the experimental pattern conversion functions, without and
with UTF support, respectively.
Character tables
----------------
@ -710,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C
locale which is set for your system will control the contents of these default
locale that is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to
@ -765,6 +770,7 @@ The distribution should contain the files listed below.
src/pcre2_compile.c )
src/pcre2_config.c )
src/pcre2_context.c )
src/pcre2_convert.c )
src/pcre2_dfa_match.c )
src/pcre2_error.c )
src/pcre2_find_bracket.c )
@ -804,7 +810,6 @@ The distribution should contain the files listed below.
src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program
src/pcre2_printint.c part of pcre2test
src/pcre2_jit_test.c JIT test program
(C) Auxiliary files:
@ -869,12 +874,12 @@ The distribution should contain the files listed below.
(E) Auxiliary files for building PCRE2 "by hand"
pcre2.h.generic ) a version of the public PCRE2 header file
src/pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments
config.h.generic ) a version of config.h for use in non-"configure"
src/config.h.generic ) a version of config.h for use in non-"configure"
) environments
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 June 2017
Last updated: 18 July 2017

View File

@ -137,7 +137,7 @@ large search tree against a string that will never match. Nested unlimited
repeats in a pattern are a common example. PCRE2 provides some protection
against this: see the <b>pcre2_set_match_limit()</b> function in the
<a href="pcre2api.html"><b>pcre2api</b></a>
page. There is a similar function called <b>pcre2_set_depth_limit()</b> that can
page. There is a similar function called <b>pcre2_set_depth_limit()</b> that can
be used to restrict the amount of memory that is used.
</P>
<br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br>

View File

@ -26,8 +26,8 @@ DESCRIPTION
</b><br>
<P>
This function frees the memory used for a compiled pattern, including any
memory used by the JIT compiler. If the compiled pattern was created by a call
to <b>pcre2_code_copy_with_tables()</b>, the memory for the character tables is
memory used by the JIT compiler. If the compiled pattern was created by a call
to <b>pcre2_code_copy_with_tables()</b>, the memory for the character tables is
also freed.
</P>
<P>

View File

@ -64,7 +64,7 @@ The option bits are:
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_EXTENDED Ignore white space and # comments
PCRE2_FIRSTLINE Force matching to be before newline
PCRE2_LITERAL Pattern characters are all literal
PCRE2_LITERAL Pattern characters are all literal
PCRE2_MATCH_UNSET_BACKREF Match unset back references
PCRE2_MULTILINE ^ and $ match newlines within data
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns

View File

@ -45,7 +45,7 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_CONFIG_BSR Indicates what \R matches by default:
PCRE2_BSR_UNICODE
PCRE2_BSR_ANYCRLF
PCRE2_CONFIG_HEAPLIMIT Default heap memory limit
PCRE2_CONFIG_HEAPLIMIT Default heap memory limit
PCRE2_CONFIG_DEPTHLIMIT Default backtracking depth limit
PCRE2_CONFIG_JIT Availability of just-in-time compiler support (1=yes 0=no)
PCRE2_CONFIG_JITTARGET Information (a string) about the target architecture for the JIT compiler
@ -57,7 +57,7 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_NEWLINE_CRLF
PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF
PCRE2_NEWLINE_NUL
PCRE2_NEWLINE_NUL
PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit
PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0

View File

@ -26,8 +26,8 @@ DESCRIPTION
</b><br>
<P>
This function is part of an experimental set of pattern conversion functions.
It frees the memory occupied by a converted pattern that was obtained by
calling <b>pcre2_pattern_convert()</b> with arguments that caused it to place
It frees the memory occupied by a converted pattern that was obtained by
calling <b>pcre2_pattern_convert()</b> with arguments that caused it to place
the converted pattern into newly obtained heap memory.
</P>
<P>

View File

@ -25,7 +25,7 @@ SYNOPSIS
DESCRIPTION
</b><br>
<P>
This function builds a set of character tables for character code points that
This function builds a set of character tables for character code points that
are less than 256. These can be passed to <b>pcre2_compile()</b> in a compile
context in order to override the internal, built-in tables (which were either
defaulted or made by <b>pcre2_maketables()</b> when PCRE2 was compiled). See the

View File

@ -43,14 +43,14 @@ offsets to captured substrings. Its arguments are:
A match context is needed only if you want to:
<pre>
Set up a callout function
Set a matching offset limit
Change the heap memory limit
Change the backtracking match limit
Set a matching offset limit
Change the heap memory limit
Change the backtracking match limit
Change the backtracking depth limit
Set custom memory management specifically for the match
</pre>
The <i>length</i> and <i>startoffset</i> values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
subject that is terminated by a binary zero code unit. The options are:
<pre>
PCRE2_ANCHORED Match only at the first position
@ -59,7 +59,7 @@ subject that is terminated by a binary zero code unit. The options are:
PCRE2_NOTEOL Subject string is not the end of a line
PCRE2_NOTEMPTY An empty string is not a valid match
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject is not a valid match
PCRE2_NO_JIT Do not use JIT matching
PCRE2_NO_JIT Do not use JIT matching
PCRE2_NO_UTF_CHECK Do not check the subject for UTF validity (only relevant if PCRE2_UTF
was set at compile time)
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match

View File

@ -48,7 +48,7 @@ request are as follows:
1 first code unit is set
2 start of string or after newline
PCRE2_INFO_FIRSTCODEUNIT First code unit when type is 1
PCRE2_INFO_FRAMESIZE Size of backtracking frame
PCRE2_INFO_FRAMESIZE Size of backtracking frame
PCRE2_INFO_HASBACKSLASHC Return 1 if pattern contains \C
PCRE2_INFO_HASCRORLF Return 1 if explicit CR or LF matches exist in the pattern
PCRE2_INFO_HEAPLIMIT Heap memory limit if set, otherwise PCRE2_ERROR_UNSET
@ -71,7 +71,7 @@ request are as follows:
PCRE2_NEWLINE_CRLF
PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF
PCRE2_NEWLINE_NUL
PCRE2_NEWLINE_NUL
PCRE2_INFO_RECURSIONLIMIT Obsolete synonym for PCRE2_INFO_DEPTHLIMIT
PCRE2_INFO_SIZE Size of compiled pattern
</pre>

View File

@ -35,7 +35,7 @@ matching patterns. The second argument must be one of:
PCRE2_NEWLINE_CRLF CR followed by LF only
PCRE2_NEWLINE_ANYCRLF Any of the above
PCRE2_NEWLINE_ANY Any Unicode newline sequence
PCRE2_NEWLINE_NUL The NUL character (binary zero)
PCRE2_NEWLINE_NUL The NUL character (binary zero)
</pre>
The result is zero for success or PCRE2_ERROR_BADDATA if the second argument is
invalid.

View File

@ -26,7 +26,7 @@ SYNOPSIS
DESCRIPTION
</b><br>
<P>
This function is obsolete and should not be used in new code. Use
This function is obsolete and should not be used in new code. Use
<b>pcre2_set_depth_limit()</b> instead.
</P>
<P>

View File

@ -60,7 +60,7 @@ want to:
The <i>length</i>, <i>startoffset</i> and <i>rlength</i> values are code
units, not characters, as is the contents of the variable pointed at by
<i>outlengthptr</i>, which is updated to the actual length of the new string.
The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
zero-terminated strings. The options are:
<pre>
PCRE2_ANCHORED Match only at the first position

View File

@ -87,10 +87,10 @@ Options that specify values have names that start with --with.
<br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
<P>
By default, a library called <b>libpcre2-8</b> is built, containing functions
that take string arguments contained in vectors of bytes, interpreted either as
that take string arguments contained in arrays of bytes, interpreted either as
single-byte characters, or UTF-8 strings. You can also build two other
libraries, called <b>libpcre2-16</b> and <b>libpcre2-32</b>, which process
strings that are contained in vectors of 16-bit and 32-bit code units,
strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the <b>configure</b> command:
@ -208,19 +208,23 @@ to the <b>configure</b> command. There is a fourth option, specified by
--enable-newline-is-anycrlf
</pre>
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
indicating a line ending. Finally, a fifth option, specified by
indicating a line ending. A fifth option, specified by
<pre>
--enable-newline-is-any
</pre>
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
sequences are the three just mentioned, plus the single characters VT (vertical
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029).
separator, U+2028), and PS (paragraph separator, U+2029). The final option is
<pre>
--enable-newline-is-nul
</pre>
which causes NUL (binary zero) is set as the default line-ending character.
</P>
<P>
Whatever default line ending convention is selected when PCRE2 is built can be
overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system.
recommended to use the standard for your operating system.
</P>
<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
@ -301,7 +305,9 @@ because the size of each backtracking "frame" depends on the number of
capturing parentheses in a pattern, the amount of heap that is used before the
limit is reached varies from pattern to pattern. This limit was more useful in
versions before 10.30, where function recursion was used for backtracking.
However, as well as applying to <b>pcre2_match()</b>, this limit also controls
</P>
<P>
As well as applying to <b>pcre2_match()</b>, the depth limit also controls
the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are
used for lookaround assertions, atomic groups, and recursion within patterns.
The limit does not apply to JIT matching.
@ -559,7 +565,7 @@ Cambridge, England.
</P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 June 2017
Last updated: 18 July 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -85,7 +85,7 @@ documentation for details.
<P>
8. Subroutine calls (whether recursive or not) were treated as atomic groups up
to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
into subroutine calls is now supported, as in Perl.
into subroutine calls is now supported, as in Perl.
</P>
<P>
9. If any of the backtracking control verbs are used in a subpattern that is

View File

@ -517,20 +517,20 @@ memory. There are three options that set resource limits for matching.
The <b>--match-limit</b> option provides a means of limiting computing resource
usage when processing patterns that are not going to match, but which have a
very large number of possibilities in their search trees. The classic example
is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
counter that is incremented each time around its main processing loop. If the
is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
counter that is incremented each time around its main processing loop. If the
value set by <b>--match-limit</b> is reached, an error occurs.
<br>
<br>
The <b>--heap-limit</b> option specifies, as a number of kilobytes, the amount
of heap memory that may be used for matching. Heap memory is needed only if
matching the pattern requires a significant number of nested backtracking
points to be remembered. This parameter can be set to zero to forbid the use of
points to be remembered. This parameter can be set to zero to forbid the use of
heap memory altogether.
<br>
<br>
The <b>--depth-limit</b> option limits the depth of nested backtracking points,
which indirectly limits the amount of memory that is used. The amount of memory
which indirectly limits the amount of memory that is used. The amount of memory
needed for each backtracking point depends on the number of capturing
parentheses in the pattern, so the amount of memory that is used before this
limit acts varies from pattern to pattern. This limit is of use only if it is
@ -538,7 +538,7 @@ set smaller than <b>--match-limit</b>.
<br>
<br>
There are no short forms for these options. The default settings are specified
when the PCRE2 library is compiled, with the default defaults being very large
when the PCRE2 library is compiled, with the default defaults being very large
and so effectively unlimited.
</P>
<P>
@ -841,7 +841,7 @@ patterns are ignored by <b>pcre2grep</b>.
A callout in a PCRE2 pattern is of the form (?C&#60;arg&#62;) where the argument is
either a number or a quoted string (see the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation for details). Numbered callouts are ignored by <b>pcre2grep</b>;
documentation for details). Numbered callouts are ignored by <b>pcre2grep</b>;
only callouts with string arguments are useful.
</P>
<br><b>
@ -892,10 +892,10 @@ Echoing a specific string
If the callout string starts with a pipe (vertical bar) character, the rest of
the string is written to the output, having been passed through the same escape
processing as text from the --output option. This provides a simple echoing
facility that avoids calling an external program or script. No terminator is
facility that avoids calling an external program or script. No terminator is
added to the string, so if you want a newline, you must include it explicitly.
Matching continues normally after the string is output. If you want to see only
the callout output but not any output from an actual match, you should end the
Matching continues normally after the string is output. If you want to see only
the callout output but not any output from an actual match, you should end the
relevant pattern with (*FAIL).
</P>
<br><a name="SEC11" href="#TOC1">MATCHING ERRORS</a><br>
@ -910,8 +910,8 @@ there are more than 20 such errors, <b>pcre2grep</b> gives up.
</P>
<P>
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the
overall resource limit. There are also other limits that affect the amount of
memory used during matching; see the discussion of <b>--heap-limit</b> and
overall resource limit. There are also other limits that affect the amount of
memory used during matching; see the discussion of <b>--heap-limit</b> and
<b>--depth-limit</b> above.
</P>
<br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br>

View File

@ -29,7 +29,7 @@ of them.
<br><a name="SEC2" href="#TOC1">COMPILED PATTERN MEMORY USAGE</a><br>
<P>
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
so that most simple patterns do not use much memory for storing the compiled
so that most simple patterns do not use much memory for storing the compiled
version. However, there is one case where the memory usage of a compiled
pattern can be unexpectedly large. If a parenthesized subpattern has a
quantifier with a minimum greater than 1 and/or a limited maximum, the whole
@ -91,7 +91,7 @@ vector is used. Rewriting patterns to be time-efficient, as described below,
may also reduce the memory requirements.
</P>
<P>
In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
function calls, but only for processing atomic groups, lookaround assertions,
and recursion within the pattern. Too much nested recursion may cause stack
issues. The "match depth" parameter can be used to limit the depth of function
@ -184,7 +184,7 @@ appreciable time with strings longer than about 20 characters.
</P>
<P>
In many cases, the solution to this kind of performance issue is to use an
atomic group or a possessive quantifier. This can often reduce memory
atomic group or a possessive quantifier. This can often reduce memory
requirements as well. As another example, consider this pattern:
<pre>
([^&#60;]|&#60;(?!inet))+
@ -205,7 +205,7 @@ are "swallowed" in one item inside the parentheses, and a possessive quantifier
is used to stop any backtracking into the runs of non-"&#60;" characters. This
version also uses a lot less memory because entry to a new set of parentheses
happens only when a "&#60;" character that is not followed by "inet" is encountered
(and we assume this is relatively rare).
(and we assume this is relatively rare).
</P>
<P>
This example shows that one way of optimizing performance when matching long
@ -216,10 +216,10 @@ than one character whenever possible.
SETTING RESOURCE LIMITS
</b><br>
<P>
You can set limits on the amount of processing that takes place when matching,
You can set limits on the amount of processing that takes place when matching,
and on the amount of heap memory that is used. The default values of the limits
are very large, and unlikely ever to operate. They can be changed when PCRE2 is
built, and they can also be set when <b>pcre2_match()</b> or
built, and they can also be set when <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b> is called. For details of these interfaces, see the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation and the section entitled

View File

@ -430,11 +430,11 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?i) caseless
(?J) allow duplicate names
(?m) multiline
(?n) no auto capture
(?n) no auto capture
(?s) single line (dotall)
(?U) default ungreedy (lazy)
(?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes
(?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s)
</pre>
The following are recognized only at the very start of a pattern or after one

View File

@ -130,7 +130,7 @@ against this: see the \fBpcre2_set_match_limit()\fP function in the
.\" HREF
\fBpcre2api\fP
.\"
page. There is a similar function called \fBpcre2_set_depth_limit()\fP that can
page. There is a similar function called \fBpcre2_set_depth_limit()\fP that can
be used to restrict the amount of memory that is used.
.
.

View File

@ -170,8 +170,8 @@ REVISION
Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2API(3) Library Functions Manual PCRE2API(3)
@ -3432,8 +3432,8 @@ REVISION
Last updated: 10 July 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2BUILD(3) Library Functions Manual PCRE2BUILD(3)
@ -3487,10 +3487,10 @@ PCRE2 BUILD-TIME OPTIONS
BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
By default, a library called libpcre2-8 is built, containing functions
that take string arguments contained in vectors of bytes, interpreted
that take string arguments contained in arrays of bytes, interpreted
either as single-byte characters, or UTF-8 strings. You can also build
two other libraries, called libpcre2-16 and libpcre2-32, which process
strings that are contained in vectors of 16-bit and 32-bit code units,
strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters
or UTF-16/UTF-32 strings. To build these additional libraries, add one
or both of the following to the configure command:
@ -3609,7 +3609,7 @@ NEWLINE RECOGNITION
--enable-newline-is-anycrlf
which causes PCRE2 to recognize any of the three sequences CR, LF, or
CRLF as indicating a line ending. Finally, a fifth option, specified by
CRLF as indicating a line ending. A fifth option, specified by
--enable-newline-is-any
@ -3617,97 +3617,103 @@ NEWLINE RECOGNITION
newline sequences are the three just mentioned, plus the single charac-
ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line,
U+0085), LS (line separator, U+2028), and PS (paragraph separator,
U+2029).
U+2029). The final option is
--enable-newline-is-nul
which causes NUL (binary zero) is set as the default line-ending char-
acter.
Whatever default line ending convention is selected when PCRE2 is built
can be overridden by applications that use the library. At build time
it is conventional to use the standard for your operating system.
can be overridden by applications that use the library. At build time
it is recommended to use the standard for your operating system.
WHAT \R MATCHES
By default, the sequence \R in a pattern matches any Unicode newline
sequence, independently of what has been selected as the line ending
By default, the sequence \R in a pattern matches any Unicode newline
sequence, independently of what has been selected as the line ending
sequence. If you specify
--enable-bsr-anycrlf
the default is changed so that \R matches only CR, LF, or CRLF. What-
ever is selected when PCRE2 is built can be overridden by applications
the default is changed so that \R matches only CR, LF, or CRLF. What-
ever is selected when PCRE2 is built can be overridden by applications
that use the library.
HANDLING VERY LARGE PATTERNS
Within a compiled pattern, offset values are used to point from one
part to another (for example, from an opening parenthesis to an alter-
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
two-byte values are used for these offsets, leading to a maximum size
for a compiled pattern of around 64K code units. This is sufficient to
Within a compiled pattern, offset values are used to point from one
part to another (for example, from an opening parenthesis to an alter-
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
two-byte values are used for these offsets, leading to a maximum size
for a compiled pattern of around 64K code units. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do
want to process truly enormous patterns, so it is possible to compile
PCRE2 to use three-byte or four-byte offsets by adding a setting such
want to process truly enormous patterns, so it is possible to compile
PCRE2 to use three-byte or four-byte offsets by adding a setting such
as
--with-link-size=3
to the configure command. The value given must be 2, 3, or 4. For the
16-bit library, a value of 3 is rounded up to 4. In these libraries,
using longer offsets slows down the operation of PCRE2 because it has
to load additional data when handling them. For the 32-bit library the
value is always 4 and cannot be overridden; the value of --with-link-
to the configure command. The value given must be 2, 3, or 4. For the
16-bit library, a value of 3 is rounded up to 4. In these libraries,
using longer offsets slows down the operation of PCRE2 because it has
to load additional data when handling them. For the 32-bit library the
value is always 4 and cannot be overridden; the value of --with-link-
size is ignored.
LIMITING PCRE2 RESOURCE USAGE
The pcre2_match() function increments a counter each time it goes round
its main loop. Putting a limit on this counter controls the amount of
computing resource used by a single call to pcre2_match(). The limit
its main loop. Putting a limit on this counter controls the amount of
computing resource used by a single call to pcre2_match(). The limit
can be changed at run time, as described in the pcre2api documentation.
The default is 10 million, but this can be changed by adding a setting
The default is 10 million, but this can be changed by adding a setting
such as
--with-match-limit=500000
to the configure command. This setting also applies to the
pcre2_dfa_match() matching function, and to JIT matching (though the
to the configure command. This setting also applies to the
pcre2_dfa_match() matching function, and to JIT matching (though the
counting is done differently).
The pcre2_match() function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking
The pcre2_match() function starts out using a 20K vector on the system
stack to record backtracking points. The more nested backtracking
points there are (that is, the deeper the search tree), the more memory
is needed. If the initial vector is not large enough, heap memory is
is needed. If the initial vector is not large enough, heap memory is
used, up to a certain limit, which is specified in kilobytes. The limit
can be changed at run time, as described in the pcre2api documentation.
The default limit (in effect unlimited) is 20 million. You can change
The default limit (in effect unlimited) is 20 million. You can change
this by a setting such as
--with-heap-limit=500
which limits the amount of heap to 500 kilobytes. This limit applies
only to interpretive matching in pcre2_match(). It does not apply when
JIT (which has its own memory arrangements) is used, nor does it apply
which limits the amount of heap to 500 kilobytes. This limit applies
only to interpretive matching in pcre2_match(). It does not apply when
JIT (which has its own memory arrangements) is used, nor does it apply
to pcre2_dfa_match().
You can also explicitly limit the depth of nested backtracking in the
You can also explicitly limit the depth of nested backtracking in the
pcre2_match() interpreter. This limit defaults to the value that is set
for --with-match-limit. You can set a lower default limit by adding,
for --with-match-limit. You can set a lower default limit by adding,
for example,
--with-match-limit_depth=10000
to the configure command. This value can be overridden at run time.
This depth limit indirectly limits the amount of heap memory that is
used, but because the size of each backtracking "frame" depends on the
number of capturing parentheses in a pattern, the amount of heap that
is used before the limit is reached varies from pattern to pattern.
This limit was more useful in versions before 10.30, where function
recursion was used for backtracking. However, as well as applying to
pcre2_match(), this limit also controls the depth of recursive function
calls in pcre2_dfa_match(). These are used for lookaround assertions,
atomic groups, and recursion within patterns. The limit does not apply
to JIT matching.
to the configure command. This value can be overridden at run time.
This depth limit indirectly limits the amount of heap memory that is
used, but because the size of each backtracking "frame" depends on the
number of capturing parentheses in a pattern, the amount of heap that
is used before the limit is reached varies from pattern to pattern.
This limit was more useful in versions before 10.30, where function
recursion was used for backtracking.
As well as applying to pcre2_match(), the depth limit also controls the
depth of recursive function calls in pcre2_dfa_match(). These are used
for lookaround assertions, atomic groups, and recursion within pat-
terns. The limit does not apply to JIT matching.
CREATING CHARACTER TABLES AT BUILD TIME
@ -3969,11 +3975,11 @@ AUTHOR
REVISION
Last updated: 17 June 2017
Last updated: 18 July 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2CALLOUT(3) Library Functions Manual PCRE2CALLOUT(3)
@ -4366,8 +4372,8 @@ REVISION
Last updated: 14 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2COMPAT(3) Library Functions Manual PCRE2COMPAT(3)
@ -4564,8 +4570,8 @@ REVISION
Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2JIT(3) Library Functions Manual PCRE2JIT(3)
@ -4958,8 +4964,8 @@ REVISION
Last updated: 31 March 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2LIMITS(3) Library Functions Manual PCRE2LIMITS(3)
@ -5029,8 +5035,8 @@ REVISION
Last updated: 30 March 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2MATCHING(3) Library Functions Manual PCRE2MATCHING(3)
@ -5248,8 +5254,8 @@ REVISION
Last updated: 29 September 2014
Copyright (c) 1997-2014 University of Cambridge.
------------------------------------------------------------------------------
PCRE2PARTIAL(3) Library Functions Manual PCRE2PARTIAL(3)
@ -5688,8 +5694,8 @@ REVISION
Last updated: 22 December 2014
Copyright (c) 1997-2014 University of Cambridge.
------------------------------------------------------------------------------
PCRE2PATTERN(3) Library Functions Manual PCRE2PATTERN(3)
@ -8790,8 +8796,8 @@ REVISION
Last updated: 05 July 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2PERFORM(3) Library Functions Manual PCRE2PERFORM(3)
@ -9018,8 +9024,8 @@ REVISION
Last updated: 08 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2POSIX(3) Library Functions Manual PCRE2POSIX(3)
@ -9326,8 +9332,8 @@ REVISION
Last updated: 15 June 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2SAMPLE(3) Library Functions Manual PCRE2SAMPLE(3)
@ -9595,8 +9601,8 @@ REVISION
Last updated: 21 March 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2SYNTAX(3) Library Functions Manual PCRE2SYNTAX(3)
@ -10043,8 +10049,8 @@ REVISION
Last updated: 17 June 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
PCRE2UNICODE(3) Library Functions Manual PCRE2UNICODE(3)
@ -10300,5 +10306,5 @@ REVISION
Last updated: 17 May 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -14,8 +14,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.rs
.sp
This function frees the memory used for a compiled pattern, including any
memory used by the JIT compiler. If the compiled pattern was created by a call
to \fBpcre2_code_copy_with_tables()\fP, the memory for the character tables is
memory used by the JIT compiler. If the compiled pattern was created by a call
to \fBpcre2_code_copy_with_tables()\fP, the memory for the character tables is
also freed.
.P
There is a complete description of the PCRE2 native API in the

View File

@ -52,7 +52,7 @@ The option bits are:
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_EXTENDED Ignore white space and # comments
PCRE2_FIRSTLINE Force matching to be before newline
PCRE2_LITERAL Pattern characters are all literal
PCRE2_LITERAL Pattern characters are all literal
PCRE2_MATCH_UNSET_BACKREF Match unset back references
PCRE2_MULTILINE ^ and $ match newlines within data
PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns

View File

@ -31,7 +31,7 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_CONFIG_BSR Indicates what \eR matches by default:
PCRE2_BSR_UNICODE
PCRE2_BSR_ANYCRLF
PCRE2_CONFIG_HEAPLIMIT Default heap memory limit
PCRE2_CONFIG_HEAPLIMIT Default heap memory limit
PCRE2_CONFIG_DEPTHLIMIT Default backtracking depth limit
.\" JOIN
PCRE2_CONFIG_JIT Availability of just-in-time compiler
@ -47,7 +47,7 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_NEWLINE_CRLF
PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF
PCRE2_NEWLINE_NUL
PCRE2_NEWLINE_NUL
PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit
PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0

View File

@ -14,8 +14,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.rs
.sp
This function is part of an experimental set of pattern conversion functions.
It frees the memory occupied by a converted pattern that was obtained by
calling \fBpcre2_pattern_convert()\fP with arguments that caused it to place
It frees the memory occupied by a converted pattern that was obtained by
calling \fBpcre2_pattern_convert()\fP with arguments that caused it to place
the converted pattern into newly obtained heap memory.
.P
The pattern conversion functions are described in the

View File

@ -43,17 +43,17 @@ The options are:
PCRE2_NOTBOL Subject is not the beginning of a line
PCRE2_NOTEOL Subject is not the end of a line
PCRE2_NOTEMPTY An empty string is not a valid match
.\" JOIN
.\" JOIN
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject
is not a valid match
.\" JOIN
.\" JOIN
PCRE2_NO_UTF_CHECK Do not check the subject for UTF
validity (only relevant if PCRE2_UTF
was set at compile time)
.\" JOIN
.\" JOIN
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial
match even if there is a full match
.\" JOIN
.\" JOIN
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
match if no full matches are found
PCRE2_DFA_RESTART Restart after a partial match

View File

@ -12,7 +12,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.SH DESCRIPTION
.rs
.sp
This function builds a set of character tables for character code points that
This function builds a set of character tables for character code points that
are less than 256. These can be passed to \fBpcre2_compile()\fP in a compile
context in order to override the internal, built-in tables (which were either
defaulted or made by \fBpcre2_maketables()\fP when PCRE2 was compiled). See the

View File

@ -31,14 +31,14 @@ offsets to captured substrings. Its arguments are:
A match context is needed only if you want to:
.sp
Set up a callout function
Set a matching offset limit
Change the heap memory limit
Change the backtracking match limit
Set a matching offset limit
Change the heap memory limit
Change the backtracking match limit
Change the backtracking depth limit
Set custom memory management specifically for the match
.sp
The \fIlength\fP and \fIstartoffset\fP values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
subject that is terminated by a binary zero code unit. The options are:
.sp
PCRE2_ANCHORED Match only at the first position
@ -49,7 +49,7 @@ subject that is terminated by a binary zero code unit. The options are:
.\" JOIN
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject
is not a valid match
PCRE2_NO_JIT Do not use JIT matching
PCRE2_NO_JIT Do not use JIT matching
.\" JOIN
PCRE2_NO_UTF_CHECK Do not check the subject for UTF
validity (only relevant if PCRE2_UTF

View File

@ -38,7 +38,7 @@ request are as follows:
1 first code unit is set
2 start of string or after newline
PCRE2_INFO_FIRSTCODEUNIT First code unit when type is 1
PCRE2_INFO_FRAMESIZE Size of backtracking frame
PCRE2_INFO_FRAMESIZE Size of backtracking frame
PCRE2_INFO_HASBACKSLASHC Return 1 if pattern contains \eC
.\" JOIN
PCRE2_INFO_HASCRORLF Return 1 if explicit CR or LF matches
@ -71,7 +71,7 @@ request are as follows:
PCRE2_NEWLINE_CRLF
PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF
PCRE2_NEWLINE_NUL
PCRE2_NEWLINE_NUL
PCRE2_INFO_RECURSIONLIMIT Obsolete synonym for PCRE2_INFO_DEPTHLIMIT
PCRE2_INFO_SIZE Size of compiled pattern
.sp

View File

@ -23,7 +23,7 @@ matching patterns. The second argument must be one of:
PCRE2_NEWLINE_CRLF CR followed by LF only
PCRE2_NEWLINE_ANYCRLF Any of the above
PCRE2_NEWLINE_ANY Any Unicode newline sequence
PCRE2_NEWLINE_NUL The NUL character (binary zero)
PCRE2_NEWLINE_NUL The NUL character (binary zero)
.sp
The result is zero for success or PCRE2_ERROR_BADDATA if the second argument is
invalid.

View File

@ -14,7 +14,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.SH DESCRIPTION
.rs
.sp
This function is obsolete and should not be used in new code. Use
This function is obsolete and should not be used in new code. Use
\fBpcre2_set_depth_limit()\fP instead.
.P
There is a complete description of the PCRE2 native API in the

View File

@ -48,7 +48,7 @@ want to:
The \fIlength\fP, \fIstartoffset\fP and \fIrlength\fP values are code
units, not characters, as is the contents of the variable pointed at by
\fIoutlengthptr\fP, which is updated to the actual length of the new string.
The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
zero-terminated strings. The options are:
.sp
PCRE2_ANCHORED Match only at the first position

View File

@ -1,4 +1,4 @@
.TH PCRE2BUILD 3 "17 June 2017" "PCRE2 10.30"
.TH PCRE2BUILD 3 "18 July 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.
@ -66,10 +66,10 @@ Options that specify values have names that start with --with.
.rs
.sp
By default, a library called \fBlibpcre2-8\fP is built, containing functions
that take string arguments contained in vectors of bytes, interpreted either as
that take string arguments contained in arrays of bytes, interpreted either as
single-byte characters, or UTF-8 strings. You can also build two other
libraries, called \fBlibpcre2-16\fP and \fBlibpcre2-32\fP, which process
strings that are contained in vectors of 16-bit and 32-bit code units,
strings that are contained in arrays of 16-bit and 32-bit code units,
respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the \fBconfigure\fP command:
@ -197,18 +197,22 @@ to the \fBconfigure\fP command. There is a fourth option, specified by
--enable-newline-is-anycrlf
.sp
which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as
indicating a line ending. Finally, a fifth option, specified by
indicating a line ending. A fifth option, specified by
.sp
--enable-newline-is-any
.sp
causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
sequences are the three just mentioned, plus the single characters VT (vertical
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029).
separator, U+2028), and PS (paragraph separator, U+2029). The final option is
.sp
--enable-newline-is-nul
.sp
which causes NUL (binary zero) is set as the default line-ending character.
.P
Whatever default line ending convention is selected when PCRE2 is built can be
overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system.
recommended to use the standard for your operating system.
.
.
.SH "WHAT \eR MATCHES"
@ -297,7 +301,8 @@ because the size of each backtracking "frame" depends on the number of
capturing parentheses in a pattern, the amount of heap that is used before the
limit is reached varies from pattern to pattern. This limit was more useful in
versions before 10.30, where function recursion was used for backtracking.
However, as well as applying to \fBpcre2_match()\fP, this limit also controls
.P
As well as applying to \fBpcre2_match()\fP, the depth limit also controls
the depth of recursive function calls in \fBpcre2_dfa_match()\fP. These are
used for lookaround assertions, atomic groups, and recursion within patterns.
The limit does not apply to JIT matching.
@ -577,6 +582,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 June 2017
Last updated: 18 July 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -71,7 +71,7 @@ documentation for details.
.P
8. Subroutine calls (whether recursive or not) were treated as atomic groups up
to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
into subroutine calls is now supported, as in Perl.
into subroutine calls is now supported, as in Perl.
.P
9. If any of the backtracking control verbs are used in a subpattern that is
called as a subroutine (whether or not recursively), their effect is confined

View File

@ -446,25 +446,25 @@ memory. There are three options that set resource limits for matching.
The \fB--match-limit\fP option provides a means of limiting computing resource
usage when processing patterns that are not going to match, but which have a
very large number of possibilities in their search trees. The classic example
is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
counter that is incremented each time around its main processing loop. If the
is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
counter that is incremented each time around its main processing loop. If the
value set by \fB--match-limit\fP is reached, an error occurs.
.sp
The \fB--heap-limit\fP option specifies, as a number of kilobytes, the amount
of heap memory that may be used for matching. Heap memory is needed only if
matching the pattern requires a significant number of nested backtracking
points to be remembered. This parameter can be set to zero to forbid the use of
points to be remembered. This parameter can be set to zero to forbid the use of
heap memory altogether.
.sp
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
which indirectly limits the amount of memory that is used. The amount of memory
which indirectly limits the amount of memory that is used. The amount of memory
needed for each backtracking point depends on the number of capturing
parentheses in the pattern, so the amount of memory that is used before this
limit acts varies from pattern to pattern. This limit is of use only if it is
set smaller than \fB--match-limit\fP.
.sp
There are no short forms for these options. The default settings are specified
when the PCRE2 library is compiled, with the default defaults being very large
when the PCRE2 library is compiled, with the default defaults being very large
and so effectively unlimited.
.TP
\fB--max-buffer-size=\fInumber\fP
@ -747,7 +747,7 @@ either a number or a quoted string (see the
.\" HREF
\fBpcre2callout\fP
.\"
documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP;
documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP;
only callouts with string arguments are useful.
.
.
@ -797,10 +797,10 @@ matcher backtracks in the normal way.
If the callout string starts with a pipe (vertical bar) character, the rest of
the string is written to the output, having been passed through the same escape
processing as text from the --output option. This provides a simple echoing
facility that avoids calling an external program or script. No terminator is
facility that avoids calling an external program or script. No terminator is
added to the string, so if you want a newline, you must include it explicitly.
Matching continues normally after the string is output. If you want to see only
the callout output but not any output from an actual match, you should end the
Matching continues normally after the string is output. If you want to see only
the callout output but not any output from an actual match, you should end the
relevant pattern with (*FAIL).
.
.
@ -816,8 +816,8 @@ message and the line that caused the problem to the standard error stream. If
there are more than 20 such errors, \fBpcre2grep\fP gives up.
.P
The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the
overall resource limit. There are also other limits that affect the amount of
memory used during matching; see the discussion of \fB--heap-limit\fP and
overall resource limit. There are also other limits that affect the amount of
memory used during matching; see the discussion of \fB--heap-limit\fP and
\fB--depth-limit\fP above.
.
.

View File

@ -12,7 +12,7 @@ of them.
.rs
.sp
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
so that most simple patterns do not use much memory for storing the compiled
so that most simple patterns do not use much memory for storing the compiled
version. However, there is one case where the memory usage of a compiled
pattern can be unexpectedly large. If a parenthesized subpattern has a
quantifier with a minimum greater than 1 and/or a limited maximum, the whole
@ -76,7 +76,7 @@ memory can be limited; if the limit is set to zero, only the initial stack
vector is used. Rewriting patterns to be time-efficient, as described below,
may also reduce the memory requirements.
.P
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
function calls, but only for processing atomic groups, lookaround assertions,
and recursion within the pattern. Too much nested recursion may cause stack
issues. The "match depth" parameter can be used to limit the depth of function
@ -163,7 +163,7 @@ applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
.P
In many cases, the solution to this kind of performance issue is to use an
atomic group or a possessive quantifier. This can often reduce memory
atomic group or a possessive quantifier. This can often reduce memory
requirements as well. As another example, consider this pattern:
.sp
([^<]|<(?!inet))+
@ -184,7 +184,7 @@ are "swallowed" in one item inside the parentheses, and a possessive quantifier
is used to stop any backtracking into the runs of non-"<" characters. This
version also uses a lot less memory because entry to a new set of parentheses
happens only when a "<" character that is not followed by "inet" is encountered
(and we assume this is relatively rare).
(and we assume this is relatively rare).
.P
This example shows that one way of optimizing performance when matching long
subject strings is to write repeated parenthesized subpatterns to match more
@ -194,10 +194,10 @@ than one character whenever possible.
.SS "SETTING RESOURCE LIMITS"
.rs
.sp
You can set limits on the amount of processing that takes place when matching,
You can set limits on the amount of processing that takes place when matching,
and on the amount of heap memory that is used. The default values of the limits
are very large, and unlikely ever to operate. They can be changed when PCRE2 is
built, and they can also be set when \fBpcre2_match()\fP or
built, and they can also be set when \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP is called. For details of these interfaces, see the
.\" HREF
\fBpcre2build\fP

View File

@ -407,11 +407,11 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?i) caseless
(?J) allow duplicate names
(?m) multiline
(?n) no auto capture
(?n) no auto capture
(?s) single line (dotall)
(?U) default ungreedy (lazy)
(?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes
(?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s)
.sp
The following are recognized only at the very start of a pattern or after one

View File

@ -50,7 +50,7 @@ fi
# ucp sets Perl's /u modifier
# utf invoke UTF-8 functionality
#
# The data lines must not have any pcre2test modifiers. Unless
# The data lines must not have any pcre2test modifiers. Unless
# "subject_litersl" is on the pattern, data lines are processed as
# Perl double-quoted strings, so if they contain " $ or @ characters, these
# have to be escaped. For this reason, all such characters in the
@ -141,20 +141,20 @@ for (;;)
chomp($pattern);
$pattern =~ s/\s+$//;
# Split the pattern from the modifiers and adjust them as necessary.
$pattern =~ /^\s*((.).*\2)(.*)$/s;
$pat = $1;
$mod = $3;
# The private "aftertext" modifier means "print $' afterwards".
$showrest = ($mod =~ s/aftertext,?//);
# The "subject_literal" modifer disables escapes in subjects.
$subject_literal = ($mod =~ s/subject_literal,?//);
$subject_literal = ($mod =~ s/subject_literal,?//);
# "allaftertext" is used by pcre2test to print remainders after captures
@ -238,7 +238,7 @@ for (;;)
$x = $_;
}
else
{
{
$x = eval "\"$_\""; # To get escapes processed
}

View File

@ -132,6 +132,12 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <zlib.h> header file. */
/* #undef HAVE_ZLIB_H */
/* This limits the amount of memory that pcre2_match() may use while matching
a pattern. The value is in kilobytes. */
#ifndef HEAP_LIMIT
#define HEAP_LIMIT 20000000
#endif
/* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
@ -148,7 +154,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif
/* The value of MATCH_LIMIT determines the default number of times the
internal match() function can record a backtrack position during a single
pcre2_match() function can record a backtrack position during a single
matching attempt. There is a runtime interface for setting a different
limit. The limit exists in order to catch runaway regular expressions that
take for ever to determine that they do not match. The default is set very
@ -188,8 +194,8 @@ sure both macros are undefined; an emulation function will then be used. */
/* The value of NEWLINE_DEFAULT determines the default newline character
sequence. PCRE2 client programs can override this by selecting other values
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), and 5
(ANYCRLF). */
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), 5
(ANYCRLF), and 6 (NUL). */
#ifndef NEWLINE_DEFAULT
#define NEWLINE_DEFAULT 2
#endif
@ -204,7 +210,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_NAME "PCRE2"
/* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE2 10.30-DEV"
#define PACKAGE_STRING "PCRE2 10.30-RC1"
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre2"
@ -213,7 +219,7 @@ sure both macros are undefined; an emulation function will then be used. */
#define PACKAGE_URL ""
/* Define to the version of this package. */
#define PACKAGE_VERSION "10.30-DEV"
#define PACKAGE_VERSION "10.30-RC1"
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system
@ -261,6 +267,11 @@ sure both macros are undefined; an emulation function will then be used. */
your system. */
/* #undef PTHREAD_CREATE_JOINABLE */
/* Define to any non-zero number to enable support for SELinux compatible
executable memory allocator in JIT. Note that this will have no effect
unless SUPPORT_JIT is also defined. */
/* #undef SLJIT_PROT_EXECUTABLE_ALLOCATOR */
/* Define to 1 if you have the ANSI C header files. */
/* #undef STDC_HEADERS */
@ -328,7 +339,7 @@ sure both macros are undefined; an emulation function will then be used. */
#endif
/* Version number of package */
#define VERSION "10.30-DEV"
#define VERSION "10.30-RC1"
/* Define to 1 if on MINIX. */
/* #undef _MINIX */

View File

@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10
#define PCRE2_MINOR 30
#define PCRE2_PRERELEASE -DEV
#define PCRE2_DATE 2017-03-05
#define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2017-07-18
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate

View File

@ -43,8 +43,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10
#define PCRE2_MINOR 30
#define PCRE2_PRERELEASE -DEV
#define PCRE2_DATE 2017-03-05
#define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2017-07-18
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate
@ -138,6 +138,14 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */
#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */
#define PCRE2_EXTENDED_MORE 0x01000000u /* C */
#define PCRE2_LITERAL 0x02000000u /* C */
/* An additional compile options word is available in the compile context. */
#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */
#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */
#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
/* These are for pcre2_jit_compile(). */
@ -176,6 +184,16 @@ ignored for pcre2_jit_match(). */
#define PCRE2_NO_JIT 0x00002000u
/* Options for pcre2_pattern_convert(). */
#define PCRE2_CONVERT_UTF 0x00000001u
#define PCRE2_CONVERT_NO_UTF_CHECK 0x00000002u
#define PCRE2_CONVERT_POSIX_BASIC 0x00000004u
#define PCRE2_CONVERT_POSIX_EXTENDED 0x00000008u
#define PCRE2_CONVERT_GLOB 0x00000010u
#define PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 0x00000030u
#define PCRE2_CONVERT_GLOB_NO_STARSTAR 0x00000050u
/* Newline and \R settings, for use in compile contexts. The newline values
must be kept in step with values set in config.h and both sets must all be
greater than zero. */
@ -185,6 +203,7 @@ greater than zero. */
#define PCRE2_NEWLINE_CRLF 3
#define PCRE2_NEWLINE_ANY 4
#define PCRE2_NEWLINE_ANYCRLF 5
#define PCRE2_NEWLINE_NUL 6
#define PCRE2_BSR_UNICODE 1
#define PCRE2_BSR_ANYCRLF 2
@ -270,6 +289,8 @@ numbers must not be changed. */
#define PCRE2_ERROR_TOOMANYREPLACE (-61)
#define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
#define PCRE2_ERROR_HEAPLIMIT (-63)
#define PCRE2_ERROR_CONVERT_SYNTAX (-64)
/* Request types for pcre2_pattern_info() */
@ -351,6 +372,9 @@ typedef struct pcre2_real_compile_context pcre2_compile_context; \
struct pcre2_real_match_context; \
typedef struct pcre2_real_match_context pcre2_match_context; \
\
struct pcre2_real_convert_context; \
typedef struct pcre2_real_convert_context pcre2_convert_context; \
\
struct pcre2_real_code; \
typedef struct pcre2_real_code pcre2_code; \
\
@ -434,6 +458,8 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_bsr(pcre2_compile_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_character_tables(pcre2_compile_context *, const unsigned char *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_compile_extra_options(pcre2_compile_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
@ -466,6 +492,18 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_recursion_memory_management(pcre2_match_context *, \
void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *);
#define PCRE2_CONVERT_CONTEXT_FUNCTIONS \
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
*pcre2_convert_context_copy(pcre2_convert_context *); \
PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \
*pcre2_convert_context_create(pcre2_general_context *); \
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
pcre2_convert_context_free(pcre2_convert_context *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_glob_escape(pcre2_convert_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_glob_separator(pcre2_convert_context *, uint32_t);
/* Functions concerned with compiling a pattern to PCRE internal code. */
@ -572,6 +610,16 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *);
/* Functions for converting pattern source strings. */
#define PCRE2_CONVERT_FUNCTIONS \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_pattern_convert(PCRE2_SPTR, PCRE2_SIZE, uint32_t, PCRE2_UCHAR **, \
PCRE2_SIZE *, pcre2_convert_context *); \
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
pcre2_converted_pattern_free(PCRE2_UCHAR *);
/* Functions for JIT processing */
#define PCRE2_JIT_FUNCTIONS \
@ -623,6 +671,7 @@ pcre2_compile are called by application code. */
#define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_)
#define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_)
#define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_)
#define pcre2_real_convert_context PCRE2_SUFFIX(pcre2_real_convert_context_)
#define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_)
#define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_)
#define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_)
@ -634,6 +683,7 @@ pcre2_compile are called by application code. */
#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_)
#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_)
#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_)
#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_)
#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_)
#define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_)
@ -649,6 +699,10 @@ pcre2_compile are called by application code. */
#define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_)
#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_)
#define pcre2_config PCRE2_SUFFIX(pcre2_config_)
#define pcre2_convert_context_copy PCRE2_SUFFIX(pcre2_convert_context_copy_)
#define pcre2_convert_context_create PCRE2_SUFFIX(pcre2_convert_context_create_)
#define pcre2_convert_context_free PCRE2_SUFFIX(pcre2_convert_context_free_)
#define pcre2_converted_pattern_free PCRE2_SUFFIX(pcre2_converted_pattern_free_)
#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_)
#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_)
#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_)
@ -672,6 +726,7 @@ pcre2_compile are called by application code. */
#define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_)
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
#define pcre2_pattern_convert PCRE2_SUFFIX(pcre2_pattern_convert_)
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
@ -680,8 +735,11 @@ pcre2_compile are called by application code. */
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
#define pcre2_set_compile_extra_options PCRE2_SUFFIX(pcre2_set_compile_extra_options_)
#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_)
#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_)
#define pcre2_set_glob_escape PCRE2_SUFFIX(pcre2_set_glob_escape_)
#define pcre2_set_glob_separator PCRE2_SUFFIX(pcre2_set_glob_separator_)
#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_)
#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_)
#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_)
@ -716,6 +774,8 @@ PCRE2_STRUCTURE_LIST \
PCRE2_GENERAL_INFO_FUNCTIONS \
PCRE2_GENERAL_CONTEXT_FUNCTIONS \
PCRE2_COMPILE_CONTEXT_FUNCTIONS \
PCRE2_CONVERT_CONTEXT_FUNCTIONS \
PCRE2_CONVERT_FUNCTIONS \
PCRE2_MATCH_CONTEXT_FUNCTIONS \
PCRE2_COMPILE_FUNCTIONS \
PCRE2_PATTERN_INFO_FUNCTIONS \
@ -745,6 +805,7 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
#undef PCRE2_GENERAL_INFO_FUNCTIONS
#undef PCRE2_GENERAL_CONTEXT_FUNCTIONS
#undef PCRE2_COMPILE_CONTEXT_FUNCTIONS
#undef PCRE2_CONVERT_CONTEXT_FUNCTIONS
#undef PCRE2_MATCH_CONTEXT_FUNCTIONS
#undef PCRE2_COMPILE_FUNCTIONS
#undef PCRE2_PATTERN_INFO_FUNCTIONS

View File

@ -84,7 +84,7 @@ if (where == NULL) /* Requests a length */
return PCRE2_ERROR_BADOPTION;
case PCRE2_CONFIG_BSR:
case PCRE2_CONFIG_HEAPLIMIT:
case PCRE2_CONFIG_HEAPLIMIT:
case PCRE2_CONFIG_JIT:
case PCRE2_CONFIG_LINKSIZE:
case PCRE2_CONFIG_MATCHLIMIT:
@ -151,7 +151,7 @@ switch (what)
case PCRE2_CONFIG_DEPTHLIMIT:
*((uint32_t *)where) = MATCH_LIMIT_DEPTH;
break;
case PCRE2_CONFIG_NEWLINE:
*((uint32_t *)where) = NEWLINE_DEFAULT;
break;
@ -160,7 +160,7 @@ switch (what)
*((uint32_t *)where) = PARENS_NEST_LIMIT;
break;
/* This is now obsolete. The stack is no longer used via recursion for
/* This is now obsolete. The stack is no longer used via recursion for
handling backtracking in pcre2_match(). */
case PCRE2_CONFIG_STACKRECURSE:

View File

@ -198,7 +198,7 @@ const pcre2_convert_context PRIV(default_convert_context) = {
CHAR_BACKSLASH, /* Default path separator */
CHAR_GRAVE_ACCENT /* Default escape character */
#else /* Not Windows */
CHAR_SLASH, /* Default path separator */
CHAR_SLASH, /* Default path separator */
CHAR_BACKSLASH /* Default escape character */
#endif
};
@ -359,7 +359,7 @@ switch(newline)
case PCRE2_NEWLINE_CRLF:
case PCRE2_NEWLINE_ANY:
case PCRE2_NEWLINE_ANYCRLF:
case PCRE2_NEWLINE_NUL:
case PCRE2_NEWLINE_NUL:
ccontext->newline_convention = newline;
return 0;

View File

@ -177,7 +177,7 @@ static const unsigned char compile_error_texts[] =
/* 90 */
"internal error: bad code value in parsed_skip()\0"
"PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
"invalid option bits with PCRE2_LITERAL\0"
"invalid option bits with PCRE2_LITERAL\0"
;
/* Match-time and UTF error texts are in the same format. */

View File

@ -240,7 +240,7 @@ not rely on this. */
#define COMPILE_ERROR_BASE 100
/* The initial frames vector for remembering backtracking points in
/* The initial frames vector for remembering backtracking points in
pcre2_match() is allocated on the system stack, of this size (bytes). The size
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
@ -557,7 +557,7 @@ enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
/* The maximum remaining length of subject we are prepared to search for a
req_unit match. In 8-bit mode, memchr() is used and is much faster than the
req_unit match. In 8-bit mode, memchr() is used and is much faster than the
search loop that has to be used in 16-bit and 32-bit modes. */
#if PCRE2_CODE_UNIT_WIDTH == 8

View File

@ -3829,7 +3829,7 @@ while (TRUE)
{
case OP_CHARI:
caseless = TRUE;
/* Fall through */
/* Fall through */
case OP_CHAR:
last = FALSE;
cc++;
@ -3861,7 +3861,7 @@ while (TRUE)
case OP_MINPLUSI:
case OP_POSPLUSI:
caseless = TRUE;
/* Fall through */
/* Fall through */
case OP_PLUS:
case OP_MINPLUS:
case OP_POSPLUS:
@ -3870,7 +3870,7 @@ while (TRUE)
case OP_EXACTI:
caseless = TRUE;
/* Fall through */
/* Fall through */
case OP_EXACT:
repeat = GET2(cc, 1);
last = FALSE;
@ -3881,7 +3881,7 @@ while (TRUE)
case OP_MINQUERYI:
case OP_POSQUERYI:
caseless = TRUE;
/* Fall through */
/* Fall through */
case OP_QUERY:
case OP_MINQUERY:
case OP_POSQUERY:
@ -4351,7 +4351,7 @@ struct sljit_jump *quit;
struct sljit_jump *partial_quit[2];
sljit_u8 instruction[8];
sljit_s32 tmp1_ind = sljit_get_register_index(TMP1);
sljit_s32 tmp2_ind = sljit_get_register_index(TMP2);
// sljit_s32 tmp2_ind = sljit_get_register_index(TMP2);
sljit_s32 str_ptr_ind = sljit_get_register_index(STR_PTR);
sljit_s32 data_ind = 0;
sljit_s32 tmp_ind = 1;
@ -4376,7 +4376,9 @@ if (common->mode == PCRE2_JIT_COMPLETE)
OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit));
SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1);
// SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1);
SLJIT_ASSERT(tmp1_ind < 8);
/* MOVD xmm, r/m32 */
instruction[0] = 0x66;

View File

@ -80,7 +80,7 @@ if (where == NULL) /* Requests field length */
case PCRE2_INFO_FIRSTCODEUNIT:
case PCRE2_INFO_HASBACKSLASHC:
case PCRE2_INFO_HASCRORLF:
case PCRE2_INFO_HEAPLIMIT:
case PCRE2_INFO_HEAPLIMIT:
case PCRE2_INFO_JCHANGED:
case PCRE2_INFO_LASTCODETYPE:
case PCRE2_INFO_LASTCODEUNIT:

View File

@ -167,15 +167,15 @@ are implementing).
6. Do not break after Prepend characters.
7. Do not break within emoji modifier sequences (E_Base or E_Base_GAZ followed
by E_Modifier). Extend characters are allowed before the modifier; this
by E_Modifier). Extend characters are allowed before the modifier; this
cannot be represented in this table, the code has to deal with it.
8. Do not break within emoji zwj sequences (ZWJ followed by Glue_After_Zwj or
E_Base_GAZ).
9. Do not break within emoji flag sequences. That is, do not break between
regional indicator (RI) symbols if there are an odd number of RI characters
before the break point. This table encodes "join RI characters"; the code
9. Do not break within emoji flag sequences. That is, do not break between
regional indicator (RI) symbols if there are an odd number of RI characters
before the break point. This table encodes "join RI characters"; the code
has to deal with checking for previous adjoining RIs.
10. Otherwise, break everywhere.

View File

@ -264,7 +264,7 @@ enum {
ucp_Multani,
ucp_Old_Hungarian,
ucp_SignWriting,
/* New for Unicode 10.0.0 (no update since 8.0.0) */
/* New for Unicode 10.0.0 (no update since 8.0.0) */
ucp_Adlam,
ucp_Bhaiksuki,
ucp_Marchen,
@ -272,7 +272,7 @@ enum {
ucp_Osage,
ucp_Tangut,
ucp_Masaram_Gondi,
ucp_Nushu,
ucp_Nushu,
ucp_Soyombo,
ucp_Zanabazar_Square
};

View File

@ -142,7 +142,7 @@ static const int eint2[] = {
32, REG_INVARG, /* this version of PCRE2 does not have Unicode support */
37, REG_EESCAPE, /* PCRE2 does not support \L, \l, \N{name}, \U, or \u */
56, REG_INVARG, /* internal error: unknown newline setting */
92, REG_INVARG, /* invalid option bits with PCRE2_LITERAL */
92, REG_INVARG, /* invalid option bits with PCRE2_LITERAL */
};
/* Table of texts corresponding to POSIX error codes */
@ -239,7 +239,7 @@ int re_nsub = 0;
patlen = ((cflags & REG_PEND) != 0)? (PCRE2_SIZE)(preg->re_endp - pattern) :
PCRE2_ZERO_TERMINATED;
if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS;
if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE;
if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL;
@ -249,7 +249,7 @@ if ((cflags & REG_UCP) != 0) options |= PCRE2_UCP;
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;
preg->re_cflags = cflags;
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, patlen, options,
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, patlen, options,
&errorcode, &erroffset, NULL);
preg->re_erroffset = erroffset;
@ -262,7 +262,7 @@ if (preg->re_pcre2_code == NULL)
if (errorcode < COMPILE_ERROR_BASE) return REG_BADPAT;
errorcode -= COMPILE_ERROR_BASE;
if (errorcode < (int)(sizeof(eint1)/sizeof(const int)))
return eint1[errorcode];
for (i = 0; i < sizeof(eint2)/sizeof(const int); i += 2)

View File

@ -93,13 +93,13 @@ enum {
};
/* The structure representing a compiled regular expression. It is also used
/* The structure representing a compiled regular expression. It is also used
for passing the pattern end pointer when REG_PEND is set. */
typedef struct {
void *re_pcre2_code;
void *re_match_data;
const char *re_endp;
const char *re_endp;
size_t re_nsub;
size_t re_erroffset;
int re_cflags;

View File

@ -4073,7 +4073,8 @@ else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%
* Show compile extra options *
*************************************************/
/* Called for unsupported POSIX options.
/* Called only for unsupported POSIX options at present, and therefore needed
only when the 8-bit library is being compiled.
Arguments:
options an options word
@ -4083,17 +4084,21 @@ Arguments:
Returns: nothing
*/
#ifdef SUPPORT_PCRE2_8
static void
show_compile_extra_options(uint32_t options, const char *before,
const char *after)
{
if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
else fprintf(outfile, "%s%s%s%s",
else fprintf(outfile, "%s%s%s%s%s%s",
before,
((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
after);
}
#endif

View File

@ -124,10 +124,10 @@
/* SLJIT_REWRITABLE_JUMP is 0x1000. */
#if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86)
# define PATCH_MB 0x4
# define PATCH_MW 0x8
# define PATCH_MB 0x4
# define PATCH_MW 0x8
#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64)
# define PATCH_MD 0x10
# define PATCH_MD 0x10
#endif
#endif
@ -1555,6 +1555,7 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_cmov(struct sljit_compile
sljit_s32 dst_reg,
sljit_s32 src, sljit_sw srcw)
{
(void)srcw; /* To stop compiler warning */
#if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS)
CHECK_ARGUMENT(!(type & ~(0xff | SLJIT_I32_OP)));
CHECK_ARGUMENT((type & 0xff) >= SLJIT_EQUAL && (type & 0xff) <= SLJIT_ORDERED_F64);

11
testdata/testinput1 vendored
View File

@ -95,17 +95,6 @@
aaac
abbbbbbbbbbbac
/^(b+|a){1,2}?bc/
bbc
/^(b*|ba){1,2}?bc/
babc
bbabc
bababc
\= Expect no match
bababbc
babababc
/^(ba|b*){1,2}?bc/
babc
bbabc

743
testdata/testinput2 vendored

File diff suppressed because it is too large Load Diff

21
testdata/testoutput1 vendored
View File

@ -183,27 +183,6 @@ No match
abbbbbbbbbbbac
No match
/^(b+|a){1,2}?bc/
bbc
0: bbc
1: b
/^(b*|ba){1,2}?bc/
babc
0: babc
1: ba
bbabc
0: bbabc
1: ba
bababc
0: bababc
1: ba
\= Expect no match
bababbc
No match
babababc
No match
/^(ba|b*){1,2}?bc/
babc
0: babc

753
testdata/testoutput2 vendored

File diff suppressed because it is too large Load Diff

View File

@ -853,10 +853,8 @@ Memory allocation (code space): 28
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated

View File

@ -853,10 +853,8 @@ Memory allocation (code space): 28
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated