Add serialization functions and tests with updated pcre2test. Fix
PCRE2_INFO_SIZE issues.
This commit is contained in:
parent
d4daaf966d
commit
5438fc8a6a
22
ChangeLog
22
ChangeLog
|
@ -1,8 +1,8 @@
|
||||||
Change Log for PCRE2
|
Change Log for PCRE2
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
Version 10.10 13-January-2015
|
Version 10.10 xx-xxx-2015
|
||||||
-----------------------------
|
-------------------------
|
||||||
|
|
||||||
1. When a pattern is compiled, it remembers the highest back reference so that
|
1. When a pattern is compiled, it remembers the highest back reference so that
|
||||||
when matching, if the ovector is too small, extra memory can be obtained to
|
when matching, if the ovector is too small, extra memory can be obtained to
|
||||||
|
@ -16,6 +16,19 @@ bug was that the condition was always treated as FALSE when the capture could
|
||||||
not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
|
not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
|
||||||
has been fixed.
|
has been fixed.
|
||||||
|
|
||||||
|
2. Functions for serialization and deserialization of sets of compiled patterns
|
||||||
|
have been added.
|
||||||
|
|
||||||
|
3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
|
||||||
|
excess code units at the end of the data block that may occasionally occur if
|
||||||
|
the code for calculating the size over-estimates. This change stops the
|
||||||
|
serialization code copying uninitialized data, to which valgrind objects. The
|
||||||
|
documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
|
||||||
|
include the general overhead. This has been corrected.
|
||||||
|
|
||||||
|
4. All code units in every slot in the table of group names are now set, again
|
||||||
|
in order to avoid accessing uninitialized data when serializing.
|
||||||
|
|
||||||
|
|
||||||
Version 10.00 05-January-2015
|
Version 10.00 05-January-2015
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
@ -30,8 +43,9 @@ logged. In addition to the API changes, the following changes were made. They
|
||||||
are either new functionality, or bug fixes and other noticeable changes of
|
are either new functionality, or bug fixes and other noticeable changes of
|
||||||
behaviour that were implemented after the code had been forked.
|
behaviour that were implemented after the code had been forked.
|
||||||
|
|
||||||
1. Unicode support is now enabled by default, but it can optionally be
|
1. Including Unicode support at build time is now enabled by default, but it
|
||||||
disabled.
|
can optionally be disabled. It is not enabled by default at run time (no
|
||||||
|
change).
|
||||||
|
|
||||||
2. The test program, now called pcre2test, was re-specified and almost
|
2. The test program, now called pcre2test, was re-specified and almost
|
||||||
completely re-written. Its input is not compatible with input for pcretest.
|
completely re-written. Its input is not compatible with input for pcretest.
|
||||||
|
|
13
Makefile.am
13
Makefile.am
|
@ -54,6 +54,10 @@ dist_html_DATA = \
|
||||||
doc/html/pcre2_match_data_create_from_pattern.html \
|
doc/html/pcre2_match_data_create_from_pattern.html \
|
||||||
doc/html/pcre2_match_data_free.html \
|
doc/html/pcre2_match_data_free.html \
|
||||||
doc/html/pcre2_pattern_info.html \
|
doc/html/pcre2_pattern_info.html \
|
||||||
|
doc/html/pcre2_serialize_decode.html \
|
||||||
|
doc/html/pcre2_serialize_encode.html \
|
||||||
|
doc/html/pcre2_serialize_free.html \
|
||||||
|
doc/html/pcre2_serialize_get_number_of_codes.html \
|
||||||
doc/html/pcre2_set_bsr.html \
|
doc/html/pcre2_set_bsr.html \
|
||||||
doc/html/pcre2_set_callout.html \
|
doc/html/pcre2_set_callout.html \
|
||||||
doc/html/pcre2_set_character_tables.html \
|
doc/html/pcre2_set_character_tables.html \
|
||||||
|
@ -89,6 +93,7 @@ dist_html_DATA = \
|
||||||
doc/html/pcre2perform.html \
|
doc/html/pcre2perform.html \
|
||||||
doc/html/pcre2posix.html \
|
doc/html/pcre2posix.html \
|
||||||
doc/html/pcre2sample.html \
|
doc/html/pcre2sample.html \
|
||||||
|
doc/html/pcre2serialize.html \
|
||||||
doc/html/pcre2stack.html \
|
doc/html/pcre2stack.html \
|
||||||
doc/html/pcre2syntax.html \
|
doc/html/pcre2syntax.html \
|
||||||
doc/html/pcre2test.html \
|
doc/html/pcre2test.html \
|
||||||
|
@ -127,6 +132,10 @@ dist_man_MANS = \
|
||||||
doc/pcre2_match_data_create_from_pattern.3 \
|
doc/pcre2_match_data_create_from_pattern.3 \
|
||||||
doc/pcre2_match_data_free.3 \
|
doc/pcre2_match_data_free.3 \
|
||||||
doc/pcre2_pattern_info.3 \
|
doc/pcre2_pattern_info.3 \
|
||||||
|
doc/pcre2_serialize_decode.3 \
|
||||||
|
doc/pcre2_serialize_encode.3 \
|
||||||
|
doc/pcre2_serialize_free.3 \
|
||||||
|
doc/pcre2_serialize_get_number_of_codes.3 \
|
||||||
doc/pcre2_set_bsr.3 \
|
doc/pcre2_set_bsr.3 \
|
||||||
doc/pcre2_set_callout.3 \
|
doc/pcre2_set_callout.3 \
|
||||||
doc/pcre2_set_character_tables.3 \
|
doc/pcre2_set_character_tables.3 \
|
||||||
|
@ -162,6 +171,7 @@ dist_man_MANS = \
|
||||||
doc/pcre2perform.3 \
|
doc/pcre2perform.3 \
|
||||||
doc/pcre2posix.3 \
|
doc/pcre2posix.3 \
|
||||||
doc/pcre2sample.3 \
|
doc/pcre2sample.3 \
|
||||||
|
doc/pcre2serialize.3 \
|
||||||
doc/pcre2stack.3 \
|
doc/pcre2stack.3 \
|
||||||
doc/pcre2syntax.3 \
|
doc/pcre2syntax.3 \
|
||||||
doc/pcre2test.1 \
|
doc/pcre2test.1 \
|
||||||
|
@ -316,6 +326,7 @@ COMMON_SOURCES = \
|
||||||
src/pcre2_newline.c \
|
src/pcre2_newline.c \
|
||||||
src/pcre2_ord2utf.c \
|
src/pcre2_ord2utf.c \
|
||||||
src/pcre2_pattern_info.c \
|
src/pcre2_pattern_info.c \
|
||||||
|
src/pcre2_serialize.c \
|
||||||
src/pcre2_string_utils.c \
|
src/pcre2_string_utils.c \
|
||||||
src/pcre2_study.c \
|
src/pcre2_study.c \
|
||||||
src/pcre2_substitute.c \
|
src/pcre2_substitute.c \
|
||||||
|
@ -573,6 +584,7 @@ EXTRA_DIST += \
|
||||||
testdata/testinput16 \
|
testdata/testinput16 \
|
||||||
testdata/testinput17 \
|
testdata/testinput17 \
|
||||||
testdata/testinput18 \
|
testdata/testinput18 \
|
||||||
|
testdata/testinput19 \
|
||||||
testdata/testinputEBC \
|
testdata/testinputEBC \
|
||||||
testdata/testoutput1 \
|
testdata/testoutput1 \
|
||||||
testdata/testoutput2 \
|
testdata/testoutput2 \
|
||||||
|
@ -598,6 +610,7 @@ EXTRA_DIST += \
|
||||||
testdata/testoutput16 \
|
testdata/testoutput16 \
|
||||||
testdata/testoutput17 \
|
testdata/testoutput17 \
|
||||||
testdata/testoutput18 \
|
testdata/testoutput18 \
|
||||||
|
testdata/testoutput19 \
|
||||||
testdata/testoutputEBC \
|
testdata/testoutputEBC \
|
||||||
perltest.sh
|
perltest.sh
|
||||||
|
|
||||||
|
|
|
@ -108,6 +108,7 @@ can skip ahead to the CMake section.
|
||||||
pcre2_newline.c
|
pcre2_newline.c
|
||||||
pcre2_ord2utf.c
|
pcre2_ord2utf.c
|
||||||
pcre2_pattern_info.c
|
pcre2_pattern_info.c
|
||||||
|
pcre2_serialize.c
|
||||||
pcre2_string_utils.c
|
pcre2_string_utils.c
|
||||||
pcre2_study.c
|
pcre2_study.c
|
||||||
pcre2_substitute.c
|
pcre2_substitute.c
|
||||||
|
@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
|
||||||
course.
|
course.
|
||||||
|
|
||||||
=============================
|
=============================
|
||||||
Last Updated: 05 January 2015
|
Last Updated: 19 January 2015
|
||||||
|
|
101
README
101
README
|
@ -527,11 +527,10 @@ Testing PCRE2
|
||||||
------------
|
------------
|
||||||
|
|
||||||
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
||||||
There is another script called RunGrepTest that tests the options of the
|
There is another script called RunGrepTest that tests the pcre2grep command.
|
||||||
pcre2grep command. When JIT support is enabled, a third test program called
|
When JIT support is enabled, a third test program called pcre2_jit_test is
|
||||||
pcre2_jit_test is built. Both the scripts and all the program tests are run if
|
built. Both the scripts and all the program tests are run if you obey "make
|
||||||
you obey "make check". For other environments, see the instructions in
|
check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
|
||||||
NON-AUTOTOOLS-BUILD.
|
|
||||||
|
|
||||||
The RunTest script runs the pcre2test test program (which is documented in its
|
The RunTest script runs the pcre2test test program (which is documented in its
|
||||||
own man page) on each of the relevant testinput files in the testdata
|
own man page) on each of the relevant testinput files in the testdata
|
||||||
|
@ -544,9 +543,9 @@ Some tests are relevant only when certain build-time options were selected. For
|
||||||
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
||||||
is available. RunTest outputs a comment when it skips a test.
|
is available. RunTest outputs a comment when it skips a test.
|
||||||
|
|
||||||
Many of the tests that are not skipped are run twice if JIT support is
|
Many (but not all) of the tests that are not skipped are run twice if JIT
|
||||||
available. On the second run, JIT compilation is forced. This testing can be
|
support is available. On the second run, JIT compilation is forced. This
|
||||||
suppressed by putting "nojit" on the RunTest command line.
|
testing can be suppressed by putting "nojit" on the RunTest command line.
|
||||||
|
|
||||||
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
||||||
libraries that are enabled. If you want to run just one set of tests, call
|
libraries that are enabled. If you want to run just one set of tests, call
|
||||||
|
@ -570,14 +569,20 @@ in numerical order.
|
||||||
You can also call RunTest with the single argument "list" to cause it to output
|
You can also call RunTest with the single argument "list" to cause it to output
|
||||||
a list of tests.
|
a list of tests.
|
||||||
|
|
||||||
The first two tests can always be run, as they expect only plain text strings
|
The test sequence starts with "test 0", which is a special test that has no
|
||||||
(not UTF) and make no use of Unicode properties. The first test file can be fed
|
input file, and whose output is not checked. This is because it will be
|
||||||
|
different on different hardware and with different configurations. The test
|
||||||
|
exists in order to exercise some of pcre2test's code that would not otherwise
|
||||||
|
be run.
|
||||||
|
|
||||||
|
Tests 1 and 2 can always be run, as they expect only plain text strings (not
|
||||||
|
UTF) and make no use of Unicode properties. The first test file can be fed
|
||||||
directly into the perltest.sh script to check that Perl gives the same results.
|
directly into the perltest.sh script to check that Perl gives the same results.
|
||||||
The only difference you should see is in the first few lines, where the Perl
|
The only difference you should see is in the first few lines, where the Perl
|
||||||
version is given instead of the PCRE2 version. The second set of tests check
|
version is given instead of the PCRE2 version. The second set of tests check
|
||||||
auxiliary functions, error detection, and run-time flags that are specific to
|
auxiliary functions, error detection, and run-time flags that are specific to
|
||||||
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
|
PCRE2. It also uses the debugging flags to check some of the internals of
|
||||||
check some of the internals of pcre2_compile().
|
pcre2_compile().
|
||||||
|
|
||||||
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
||||||
character tables may be different (see next paragraph). In some cases, this may
|
character tables may be different (see next paragraph). In some cases, this may
|
||||||
|
@ -585,18 +590,17 @@ cause failures in the second set of tests. For example, in a locale where the
|
||||||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||||
[:isascii:] inside a character class defines a different set of characters, and
|
[:isascii:] inside a character class defines a different set of characters, and
|
||||||
this shows up in this test as a difference in the compiled code, which is being
|
this shows up in this test as a difference in the compiled code, which is being
|
||||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
listed for checking. For example, where the comparison test output contains
|
||||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
|
||||||
bug in PCRE2.
|
cases. This is not a bug in PCRE2.
|
||||||
|
|
||||||
The third set of tests checks pcre2_maketables(), the facility for building a
|
Test 3 checks pcre2_maketables(), the facility for building a set of character
|
||||||
set of character tables for a specific locale and using them instead of the
|
tables for a specific locale and using them instead of the default tables. The
|
||||||
default tables. The script uses the "locale" command to check for the
|
script uses the "locale" command to check for the availability of the "fr_FR",
|
||||||
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
|
"french", or "fr" locale, and uses the first one that it finds. If the "locale"
|
||||||
that it finds. If the "locale" command fails, or if its output doesn't include
|
command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
|
||||||
"fr_FR", "french", or "fr" in the list of available locales, the third test
|
the list of available locales, the third test cannot be run, and a comment is
|
||||||
cannot be run, and a comment is output to say why. If running this test
|
output to say why. If running this test produces an error like this:
|
||||||
produces an error like this
|
|
||||||
|
|
||||||
** Failed to set locale "fr_FR"
|
** Failed to set locale "fr_FR"
|
||||||
|
|
||||||
|
@ -606,33 +610,37 @@ alternative output files for the third test, because three different versions
|
||||||
of the French locale have been encountered. The test passes if its output
|
of the French locale have been encountered. The test passes if its output
|
||||||
matches any one of them.
|
matches any one of them.
|
||||||
|
|
||||||
The fourth and fifth tests check UTF and Unicode property support, the fourth
|
Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
|
||||||
being compatible with the perltest.sh script, and the fifth checking
|
with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
||||||
PCRE2-specific things.
|
|
||||||
|
|
||||||
The sixth and seventh tests check the pcre2_dfa_match() alternative matching
|
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||||
function, in non-UTF mode and UTF-mode with Unicode property support,
|
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||||
respectively.
|
|
||||||
|
|
||||||
The eighth test checks some internal offsets and code size features; it is
|
Test 8 checks some internal offsets and code size features; it is run only when
|
||||||
run only when the default "link size" of 2 is set (in other cases the sizes
|
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||||
change) and when Unicode support is enabled.
|
Unicode support is enabled.
|
||||||
|
|
||||||
The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
|
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||||
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
|
16-bit and 32-bit modes. These are tests that generate different output in
|
||||||
generate different output in 8-bit mode. Each pair are for general cases and
|
8-bit mode. Each pair are for general cases and Unicode support, respectively.
|
||||||
Unicode support, respectively. The thirteenth test checks the handling of
|
Test 13 checks the handling of non-UTF characters greater than 255 by
|
||||||
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
|
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||||
modes.
|
|
||||||
|
|
||||||
The fourteenth test is run only when JIT support is not available, and the
|
Test 14 contains a number of tests that must not be run with JIT. They check,
|
||||||
fifteenth test is run only when JIT support is available. They test some
|
among other non-JIT things, the match-limiting features of the intepretive
|
||||||
JIT-specific features such as information output from pcre2test about JIT
|
matcher.
|
||||||
compilation.
|
|
||||||
|
|
||||||
The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
|
Test 15 is run only when JIT support is not available. It checks that an
|
||||||
POSIX interface to the 8-bit library, without and with Unicode support,
|
attempt to use JIT has the expected behaviour.
|
||||||
respectively.
|
|
||||||
|
Test 16 is run only when JIT support is available. It checks JIT complete and
|
||||||
|
partial modes, match-limiting under JIT, and other JIT-specific features.
|
||||||
|
|
||||||
|
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
|
||||||
|
the 8-bit library, without and with Unicode support, respectively.
|
||||||
|
|
||||||
|
Test 19 checks the serialization functions by writing a set of compiled
|
||||||
|
patterns to a file, and then reloading and checking them.
|
||||||
|
|
||||||
|
|
||||||
Character tables
|
Character tables
|
||||||
|
@ -718,6 +726,7 @@ The distribution should contain the files listed below.
|
||||||
src/pcre2_newline.c )
|
src/pcre2_newline.c )
|
||||||
src/pcre2_ord2utf.c )
|
src/pcre2_ord2utf.c )
|
||||||
src/pcre2_pattern_info.c )
|
src/pcre2_pattern_info.c )
|
||||||
|
src/pcre2_serialize.c )
|
||||||
src/pcre2_string_utils.c )
|
src/pcre2_string_utils.c )
|
||||||
src/pcre2_study.c )
|
src/pcre2_study.c )
|
||||||
src/pcre2_substitute.c )
|
src/pcre2_substitute.c )
|
||||||
|
@ -816,4 +825,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 05 January 2015
|
Last updated: 20 January 2015
|
||||||
|
|
17
RunTest
17
RunTest
|
@ -65,6 +65,7 @@ title15="Test 15: JIT-specific features when JIT is not available"
|
||||||
title16="Test 16: JIT-specific features when JIT is available"
|
title16="Test 16: JIT-specific features when JIT is available"
|
||||||
title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP"
|
title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP"
|
||||||
title18="Test 18: Tests of the POSIX interface with UTF/UCP"
|
title18="Test 18: Tests of the POSIX interface with UTF/UCP"
|
||||||
|
title19="Test 19: Serialization tests"
|
||||||
maxtest=18
|
maxtest=18
|
||||||
|
|
||||||
if [ $# -eq 1 -a "$1" = "list" ]; then
|
if [ $# -eq 1 -a "$1" = "list" ]; then
|
||||||
|
@ -87,6 +88,7 @@ if [ $# -eq 1 -a "$1" = "list" ]; then
|
||||||
echo $title16
|
echo $title16
|
||||||
echo $title17
|
echo $title17
|
||||||
echo $title18
|
echo $title18
|
||||||
|
echo $title19
|
||||||
exit 0
|
exit 0
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
@ -207,6 +209,7 @@ do15=no
|
||||||
do16=no
|
do16=no
|
||||||
do17=no
|
do17=no
|
||||||
do18=no
|
do18=no
|
||||||
|
do19=no
|
||||||
|
|
||||||
while [ $# -gt 0 ] ; do
|
while [ $# -gt 0 ] ; do
|
||||||
case $1 in
|
case $1 in
|
||||||
|
@ -229,6 +232,7 @@ while [ $# -gt 0 ] ; do
|
||||||
16) do16=yes;;
|
16) do16=yes;;
|
||||||
17) do17=yes;;
|
17) do17=yes;;
|
||||||
18) do18=yes;;
|
18) do18=yes;;
|
||||||
|
19) do19=yes;;
|
||||||
-8) arg8=yes;;
|
-8) arg8=yes;;
|
||||||
-16) arg16=yes;;
|
-16) arg16=yes;;
|
||||||
-32) arg32=yes;;
|
-32) arg32=yes;;
|
||||||
|
@ -364,7 +368,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
|
||||||
$do4 = no -a $do5 = no -a $do6 = no -a $do7 = no -a \
|
$do4 = no -a $do5 = no -a $do6 = no -a $do7 = no -a \
|
||||||
$do8 = no -a $do9 = no -a $do10 = no -a $do11 = no -a \
|
$do8 = no -a $do9 = no -a $do10 = no -a $do11 = no -a \
|
||||||
$do12 = no -a $do13 = no -a $do14 = no -a $do15 = no -a \
|
$do12 = no -a $do13 = no -a $do14 = no -a $do15 = no -a \
|
||||||
$do16 = no -a $do17 = no -a $do18 = no \
|
$do16 = no -a $do17 = no -a $do18 = no -a $do19 = no \
|
||||||
]; then
|
]; then
|
||||||
do0=yes
|
do0=yes
|
||||||
do1=yes
|
do1=yes
|
||||||
|
@ -385,6 +389,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
|
||||||
do16=yes
|
do16=yes
|
||||||
do17=yes
|
do17=yes
|
||||||
do18=yes
|
do18=yes
|
||||||
|
do19=yes
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Handle any explicit skips at this stage, so that an argument list may consist
|
# Handle any explicit skips at this stage, so that an argument list may consist
|
||||||
|
@ -721,10 +726,18 @@ for bmode in "$test8" "$test16" "$test32"; do
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# Serialization tests
|
||||||
|
|
||||||
|
if [ $do19 = yes ] ; then
|
||||||
|
echo $title19
|
||||||
|
$sim $valgrind ./pcre2test -q $bmode $testdata/testinput19 testtry
|
||||||
|
checkresult $? 19 ""
|
||||||
|
fi
|
||||||
|
|
||||||
# End of loop for 8/16/32-bit tests
|
# End of loop for 8/16/32-bit tests
|
||||||
done
|
done
|
||||||
|
|
||||||
# Clean up local working files
|
# Clean up local working files
|
||||||
rm -f testSinput test3input test3output test3outputA test3outputB teststdout testtry
|
rm -f testSinput test3input testsaved1 testsaved2 test3output test3outputA test3outputB teststdout testtry
|
||||||
|
|
||||||
# End
|
# End
|
||||||
|
|
|
@ -108,6 +108,7 @@ can skip ahead to the CMake section.
|
||||||
pcre2_newline.c
|
pcre2_newline.c
|
||||||
pcre2_ord2utf.c
|
pcre2_ord2utf.c
|
||||||
pcre2_pattern_info.c
|
pcre2_pattern_info.c
|
||||||
|
pcre2_serialize.c
|
||||||
pcre2_string_utils.c
|
pcre2_string_utils.c
|
||||||
pcre2_study.c
|
pcre2_study.c
|
||||||
pcre2_substitute.c
|
pcre2_substitute.c
|
||||||
|
@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
|
||||||
course.
|
course.
|
||||||
|
|
||||||
=============================
|
=============================
|
||||||
Last Updated: 05 January 2015
|
Last Updated: 19 January 2015
|
||||||
|
|
|
@ -527,11 +527,10 @@ Testing PCRE2
|
||||||
------------
|
------------
|
||||||
|
|
||||||
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
||||||
There is another script called RunGrepTest that tests the options of the
|
There is another script called RunGrepTest that tests the pcre2grep command.
|
||||||
pcre2grep command. When JIT support is enabled, a third test program called
|
When JIT support is enabled, a third test program called pcre2_jit_test is
|
||||||
pcre2_jit_test is built. Both the scripts and all the program tests are run if
|
built. Both the scripts and all the program tests are run if you obey "make
|
||||||
you obey "make check". For other environments, see the instructions in
|
check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
|
||||||
NON-AUTOTOOLS-BUILD.
|
|
||||||
|
|
||||||
The RunTest script runs the pcre2test test program (which is documented in its
|
The RunTest script runs the pcre2test test program (which is documented in its
|
||||||
own man page) on each of the relevant testinput files in the testdata
|
own man page) on each of the relevant testinput files in the testdata
|
||||||
|
@ -544,9 +543,9 @@ Some tests are relevant only when certain build-time options were selected. For
|
||||||
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
||||||
is available. RunTest outputs a comment when it skips a test.
|
is available. RunTest outputs a comment when it skips a test.
|
||||||
|
|
||||||
Many of the tests that are not skipped are run twice if JIT support is
|
Many (but not all) of the tests that are not skipped are run twice if JIT
|
||||||
available. On the second run, JIT compilation is forced. This testing can be
|
support is available. On the second run, JIT compilation is forced. This
|
||||||
suppressed by putting "nojit" on the RunTest command line.
|
testing can be suppressed by putting "nojit" on the RunTest command line.
|
||||||
|
|
||||||
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
||||||
libraries that are enabled. If you want to run just one set of tests, call
|
libraries that are enabled. If you want to run just one set of tests, call
|
||||||
|
@ -570,14 +569,20 @@ in numerical order.
|
||||||
You can also call RunTest with the single argument "list" to cause it to output
|
You can also call RunTest with the single argument "list" to cause it to output
|
||||||
a list of tests.
|
a list of tests.
|
||||||
|
|
||||||
The first two tests can always be run, as they expect only plain text strings
|
The test sequence starts with "test 0", which is a special test that has no
|
||||||
(not UTF) and make no use of Unicode properties. The first test file can be fed
|
input file, and whose output is not checked. This is because it will be
|
||||||
|
different on different hardware and with different configurations. The test
|
||||||
|
exists in order to exercise some of pcre2test's code that would not otherwise
|
||||||
|
be run.
|
||||||
|
|
||||||
|
Tests 1 and 2 can always be run, as they expect only plain text strings (not
|
||||||
|
UTF) and make no use of Unicode properties. The first test file can be fed
|
||||||
directly into the perltest.sh script to check that Perl gives the same results.
|
directly into the perltest.sh script to check that Perl gives the same results.
|
||||||
The only difference you should see is in the first few lines, where the Perl
|
The only difference you should see is in the first few lines, where the Perl
|
||||||
version is given instead of the PCRE2 version. The second set of tests check
|
version is given instead of the PCRE2 version. The second set of tests check
|
||||||
auxiliary functions, error detection, and run-time flags that are specific to
|
auxiliary functions, error detection, and run-time flags that are specific to
|
||||||
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
|
PCRE2. It also uses the debugging flags to check some of the internals of
|
||||||
check some of the internals of pcre2_compile().
|
pcre2_compile().
|
||||||
|
|
||||||
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
||||||
character tables may be different (see next paragraph). In some cases, this may
|
character tables may be different (see next paragraph). In some cases, this may
|
||||||
|
@ -585,18 +590,17 @@ cause failures in the second set of tests. For example, in a locale where the
|
||||||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||||
[:isascii:] inside a character class defines a different set of characters, and
|
[:isascii:] inside a character class defines a different set of characters, and
|
||||||
this shows up in this test as a difference in the compiled code, which is being
|
this shows up in this test as a difference in the compiled code, which is being
|
||||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
listed for checking. For example, where the comparison test output contains
|
||||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
|
||||||
bug in PCRE2.
|
cases. This is not a bug in PCRE2.
|
||||||
|
|
||||||
The third set of tests checks pcre2_maketables(), the facility for building a
|
Test 3 checks pcre2_maketables(), the facility for building a set of character
|
||||||
set of character tables for a specific locale and using them instead of the
|
tables for a specific locale and using them instead of the default tables. The
|
||||||
default tables. The script uses the "locale" command to check for the
|
script uses the "locale" command to check for the availability of the "fr_FR",
|
||||||
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
|
"french", or "fr" locale, and uses the first one that it finds. If the "locale"
|
||||||
that it finds. If the "locale" command fails, or if its output doesn't include
|
command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
|
||||||
"fr_FR", "french", or "fr" in the list of available locales, the third test
|
the list of available locales, the third test cannot be run, and a comment is
|
||||||
cannot be run, and a comment is output to say why. If running this test
|
output to say why. If running this test produces an error like this:
|
||||||
produces an error like this
|
|
||||||
|
|
||||||
** Failed to set locale "fr_FR"
|
** Failed to set locale "fr_FR"
|
||||||
|
|
||||||
|
@ -606,33 +610,37 @@ alternative output files for the third test, because three different versions
|
||||||
of the French locale have been encountered. The test passes if its output
|
of the French locale have been encountered. The test passes if its output
|
||||||
matches any one of them.
|
matches any one of them.
|
||||||
|
|
||||||
The fourth and fifth tests check UTF and Unicode property support, the fourth
|
Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
|
||||||
being compatible with the perltest.sh script, and the fifth checking
|
with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
||||||
PCRE2-specific things.
|
|
||||||
|
|
||||||
The sixth and seventh tests check the pcre2_dfa_match() alternative matching
|
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||||
function, in non-UTF mode and UTF-mode with Unicode property support,
|
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||||
respectively.
|
|
||||||
|
|
||||||
The eighth test checks some internal offsets and code size features; it is
|
Test 8 checks some internal offsets and code size features; it is run only when
|
||||||
run only when the default "link size" of 2 is set (in other cases the sizes
|
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||||
change) and when Unicode support is enabled.
|
Unicode support is enabled.
|
||||||
|
|
||||||
The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
|
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||||
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
|
16-bit and 32-bit modes. These are tests that generate different output in
|
||||||
generate different output in 8-bit mode. Each pair are for general cases and
|
8-bit mode. Each pair are for general cases and Unicode support, respectively.
|
||||||
Unicode support, respectively. The thirteenth test checks the handling of
|
Test 13 checks the handling of non-UTF characters greater than 255 by
|
||||||
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
|
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||||
modes.
|
|
||||||
|
|
||||||
The fourteenth test is run only when JIT support is not available, and the
|
Test 14 contains a number of tests that must not be run with JIT. They check,
|
||||||
fifteenth test is run only when JIT support is available. They test some
|
among other non-JIT things, the match-limiting features of the intepretive
|
||||||
JIT-specific features such as information output from pcre2test about JIT
|
matcher.
|
||||||
compilation.
|
|
||||||
|
|
||||||
The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
|
Test 15 is run only when JIT support is not available. It checks that an
|
||||||
POSIX interface to the 8-bit library, without and with Unicode support,
|
attempt to use JIT has the expected behaviour.
|
||||||
respectively.
|
|
||||||
|
Test 16 is run only when JIT support is available. It checks JIT complete and
|
||||||
|
partial modes, match-limiting under JIT, and other JIT-specific features.
|
||||||
|
|
||||||
|
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
|
||||||
|
the 8-bit library, without and with Unicode support, respectively.
|
||||||
|
|
||||||
|
Test 19 checks the serialization functions by writing a set of compiled
|
||||||
|
patterns to a file, and then reloading and checking them.
|
||||||
|
|
||||||
|
|
||||||
Character tables
|
Character tables
|
||||||
|
@ -718,6 +726,7 @@ The distribution should contain the files listed below.
|
||||||
src/pcre2_newline.c )
|
src/pcre2_newline.c )
|
||||||
src/pcre2_ord2utf.c )
|
src/pcre2_ord2utf.c )
|
||||||
src/pcre2_pattern_info.c )
|
src/pcre2_pattern_info.c )
|
||||||
|
src/pcre2_serialize.c )
|
||||||
src/pcre2_string_utils.c )
|
src/pcre2_string_utils.c )
|
||||||
src/pcre2_study.c )
|
src/pcre2_study.c )
|
||||||
src/pcre2_substitute.c )
|
src/pcre2_substitute.c )
|
||||||
|
@ -816,4 +825,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 05 January 2015
|
Last updated: 20 January 2015
|
||||||
|
|
|
@ -65,6 +65,9 @@ first.
|
||||||
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
||||||
<td> Discussion of the pcre2demo program</td></tr>
|
<td> Discussion of the pcre2demo program</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2serialize.html">pcre2serialize</a></td>
|
||||||
|
<td> Serializing functions for saving precompiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
||||||
<td> Discussion of PCRE2's stack usage</td></tr>
|
<td> Discussion of PCRE2's stack usage</td></tr>
|
||||||
|
|
||||||
|
@ -177,6 +180,18 @@ in the library.
|
||||||
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
||||||
<td> Extract information about a pattern</td></tr>
|
<td> Extract information about a pattern</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_decode.html">pcre2_serialize_decode</a></td>
|
||||||
|
<td> Decode serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_encode.html">pcre2_serialize_encode</a></td>
|
||||||
|
<td> Serialize compiled patterns for save/restore</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_free.html">pcre2_serialize_free</a></td>
|
||||||
|
<td> Free serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_get_number_of_codes.html">pcre2_serialize_get_number_of_codes</a></td>
|
||||||
|
<td> Get number of serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
||||||
<td> Set \R convention</td></tr>
|
<td> Set \R convention</td></tr>
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,62 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_decode specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_decode man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||||
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
This function decodes a serialized set of compiled patterns back into a list of
|
||||||
|
individual patterns. Its arguments are:
|
||||||
|
<pre>
|
||||||
|
<i>codes</i> pointer to a vector in which to build the list
|
||||||
|
<i>number_of_codes</i> number of slots in the vector
|
||||||
|
<i>bytes</i> the serialized byte stream
|
||||||
|
<i>gcontext</i> pointer to a general context or NULL
|
||||||
|
</pre>
|
||||||
|
The <i>bytes</i> argument must point to a block of data that was originally
|
||||||
|
created by <b>pcre2_serialize_encode()</b>, though it may have been saved on
|
||||||
|
disc or elsewhere in the meantime. If there are more codes in the serialized
|
||||||
|
data than slots in the list, only those compiled patterns that will fit are
|
||||||
|
decoded. The yield of the function is the number of decoded patterns, or one of
|
||||||
|
the following negative error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA <i>number_of_codes</i> is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in <i>bytes</i>
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL <i>codes</i> or <i>bytes</i> is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -0,0 +1,61 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_encode specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_encode man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||||
|
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
This function encodes a list of compiled patterns into a byte stream that can
|
||||||
|
be saved on disc or elsewhere. Its arguments are:
|
||||||
|
<pre>
|
||||||
|
<i>codes</i> pointer to a vector containing the list
|
||||||
|
<i>number_of_codes</i> number of slots in the vector
|
||||||
|
<i>serialized_bytes</i> set to point to the serialized byte stream
|
||||||
|
<i>serialized_size</i> set to the number of bytes in the byte stream
|
||||||
|
<i>gcontext</i> pointer to a general context or NULL
|
||||||
|
</pre>
|
||||||
|
The context argument is used to obtain memory for the byte stream. When the
|
||||||
|
serialized data is no longer needed, it must be freed by calling
|
||||||
|
<b>pcre2_serialize_free()</b>. The yield of the function is the number of
|
||||||
|
serialized patterns, or one of the following negative error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA <i>number_of_codes</i> is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL an argument other than <i>gcontext</i> is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -0,0 +1,40 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_free specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_free man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
This function frees the memory that was obtained by
|
||||||
|
<b>pcre2_serialize_encode()</b> to hold a serialized byte stream. The argument
|
||||||
|
must point to such a byte stream.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -0,0 +1,49 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_get_number_of_codes specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_get_number_of_codes man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
The <i>bytes</i> argument must point to a serialized byte stream that was
|
||||||
|
originally created by <b>pcre2_serialize_encode()</b> (though it may have been
|
||||||
|
saved on disc or elsewhere in the meantime). The function returns the number of
|
||||||
|
serialized patterns in the byte stream, or one of the following negative error
|
||||||
|
codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in <i>bytes</i>
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_NULL the argument is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -21,35 +21,37 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC6" href="#SEC6">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a>
|
<li><a name="TOC6" href="#SEC6">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a>
|
||||||
<li><a name="TOC7" href="#SEC7">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a>
|
<li><a name="TOC7" href="#SEC7">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a>
|
||||||
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
||||||
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
|
||||||
<li><a name="TOC10" href="#SEC10">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
||||||
<li><a name="TOC11" href="#SEC11">PCRE2 API OVERVIEW</a>
|
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||||
<li><a name="TOC12" href="#SEC12">STRING LENGTHS AND OFFSETS</a>
|
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
|
||||||
<li><a name="TOC13" href="#SEC13">NEWLINES</a>
|
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
|
||||||
<li><a name="TOC14" href="#SEC14">MULTITHREADING</a>
|
<li><a name="TOC14" href="#SEC14">NEWLINES</a>
|
||||||
<li><a name="TOC15" href="#SEC15">PCRE2 CONTEXTS</a>
|
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
|
||||||
<li><a name="TOC16" href="#SEC16">CHECKING BUILD-TIME OPTIONS</a>
|
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
|
||||||
<li><a name="TOC17" href="#SEC17">COMPILING A PATTERN</a>
|
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
|
||||||
<li><a name="TOC18" href="#SEC18">COMPILATION ERROR CODES</a>
|
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
|
||||||
<li><a name="TOC19" href="#SEC19">JUST-IN-TIME (JIT) COMPILATION</a>
|
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
|
||||||
<li><a name="TOC20" href="#SEC20">LOCALE SUPPORT</a>
|
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||||
<li><a name="TOC21" href="#SEC21">INFORMATION ABOUT A COMPILED PATTERN</a>
|
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
|
||||||
<li><a name="TOC22" href="#SEC22">THE MATCH DATA BLOCK</a>
|
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||||
<li><a name="TOC23" href="#SEC23">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
<li><a name="TOC23" href="#SEC23">SERIALIZATION AND PRECOMPILING</a>
|
||||||
<li><a name="TOC24" href="#SEC24">NEWLINE HANDLING WHEN MATCHING</a>
|
<li><a name="TOC24" href="#SEC24">THE MATCH DATA BLOCK</a>
|
||||||
<li><a name="TOC25" href="#SEC25">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
<li><a name="TOC25" href="#SEC25">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||||
<li><a name="TOC26" href="#SEC26">OTHER INFORMATION ABOUT A MATCH</a>
|
<li><a name="TOC26" href="#SEC26">NEWLINE HANDLING WHEN MATCHING</a>
|
||||||
<li><a name="TOC27" href="#SEC27">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
<li><a name="TOC27" href="#SEC27">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||||
<li><a name="TOC28" href="#SEC28">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
<li><a name="TOC28" href="#SEC28">OTHER INFORMATION ABOUT A MATCH</a>
|
||||||
<li><a name="TOC29" href="#SEC29">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
<li><a name="TOC29" href="#SEC29">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||||
<li><a name="TOC30" href="#SEC30">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
<li><a name="TOC30" href="#SEC30">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||||
<li><a name="TOC31" href="#SEC31">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
<li><a name="TOC31" href="#SEC31">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||||
<li><a name="TOC32" href="#SEC32">DUPLICATE SUBPATTERN NAMES</a>
|
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||||
<li><a name="TOC33" href="#SEC33">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
<li><a name="TOC33" href="#SEC33">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||||
<li><a name="TOC34" href="#SEC34">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
<li><a name="TOC34" href="#SEC34">DUPLICATE SUBPATTERN NAMES</a>
|
||||||
<li><a name="TOC35" href="#SEC35">SEE ALSO</a>
|
<li><a name="TOC35" href="#SEC35">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||||
<li><a name="TOC36" href="#SEC36">AUTHOR</a>
|
<li><a name="TOC36" href="#SEC36">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||||
<li><a name="TOC37" href="#SEC37">REVISION</a>
|
<li><a name="TOC37" href="#SEC37">SEE ALSO</a>
|
||||||
|
<li><a name="TOC38" href="#SEC38">AUTHOR</a>
|
||||||
|
<li><a name="TOC39" href="#SEC39">REVISION</a>
|
||||||
</ul>
|
</ul>
|
||||||
<P>
|
<P>
|
||||||
<b>#include <pcre2.h></b>
|
<b>#include <pcre2.h></b>
|
||||||
|
@ -260,7 +262,24 @@ document for an overview of all the PCRE2 documentation.
|
||||||
<br>
|
<br>
|
||||||
<b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b>
|
<b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a><br>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||||
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||||
|
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC10" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
||||||
|
@ -274,7 +293,7 @@ document for an overview of all the PCRE2 documentation.
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC10" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||||
<P>
|
<P>
|
||||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
||||||
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
||||||
|
@ -335,7 +354,7 @@ In the function summaries above, and in the rest of this document and other
|
||||||
PCRE2 documents, functions and data types are described using their generic
|
PCRE2 documents, functions and data types are described using their generic
|
||||||
names, without the 8, 16, or 32 suffix.
|
names, without the 8, 16, or 32 suffix.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC11" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 has its own native API, which is described in this document. There are
|
PCRE2 has its own native API, which is described in this document. There are
|
||||||
also some wrapper functions for the 8-bit library that correspond to the
|
also some wrapper functions for the 8-bit library that correspond to the
|
||||||
|
@ -426,7 +445,7 @@ Finally, there are functions for finding out information about a compiled
|
||||||
pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
|
pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
|
||||||
PCRE2 was built (<b>pcre2_config()</b>).
|
PCRE2 was built (<b>pcre2_config()</b>).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC12" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The PCRE2 API uses string lengths and offsets into strings of code units in
|
The PCRE2 API uses string lengths and offsets into strings of code units in
|
||||||
several places. These values are always of type PCRE2_SIZE, which is an
|
several places. These values are always of type PCRE2_SIZE, which is an
|
||||||
|
@ -436,7 +455,7 @@ as a special indicator for zero-terminated strings and unset offsets.
|
||||||
Therefore, the longest string that can be handled is one less than this
|
Therefore, the longest string that can be handled is one less than this
|
||||||
maximum.
|
maximum.
|
||||||
<a name="newlines"></a></P>
|
<a name="newlines"></a></P>
|
||||||
<br><a name="SEC13" href="#TOC1">NEWLINES</a><br>
|
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 supports five different conventions for indicating line breaks in
|
PCRE2 supports five different conventions for indicating line breaks in
|
||||||
strings: a single CR (carriage return) character, a single LF (linefeed)
|
strings: a single CR (carriage return) character, a single LF (linefeed)
|
||||||
|
@ -471,7 +490,7 @@ The choice of newline convention does not affect the interpretation of
|
||||||
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
||||||
its own separate convention.
|
its own separate convention.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC14" href="#TOC1">MULTITHREADING</a><br>
|
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
|
||||||
<P>
|
<P>
|
||||||
In a multithreaded application it is important to keep thread-specific data
|
In a multithreaded application it is important to keep thread-specific data
|
||||||
separate from data that can be shared between threads. The PCRE2 library code
|
separate from data that can be shared between threads. The PCRE2 library code
|
||||||
|
@ -516,7 +535,7 @@ storing the results of a match. This includes details of what was matched, as
|
||||||
well as additional information such as the name of a (*MARK) setting. Each
|
well as additional information such as the name of a (*MARK) setting. Each
|
||||||
thread must provide its own version of this memory.
|
thread must provide its own version of this memory.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC15" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
||||||
specialist applications, for example, those that use custom memory management
|
specialist applications, for example, those that use custom memory management
|
||||||
|
@ -797,7 +816,7 @@ exit so that they can be re-used when possible during the match. In the absence
|
||||||
of these functions, the normal custom memory management functions are used, if
|
of these functions, the normal custom memory management functions are used, if
|
||||||
supplied, otherwise the system functions.
|
supplied, otherwise the system functions.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
|
@ -929,7 +948,7 @@ the PCRE2 version string, zero-terminated. The number of code units used is
|
||||||
returned. This is the length of the string plus one unit for the terminating
|
returned. This is the length of the string plus one unit for the terminating
|
||||||
zero.
|
zero.
|
||||||
<a name="compiling"></a></P>
|
<a name="compiling"></a></P>
|
||||||
<br><a name="SEC17" href="#TOC1">COMPILING A PATTERN</a><br>
|
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
||||||
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
||||||
|
@ -1305,7 +1324,7 @@ the behaviour of PCRE2 are given in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
page.
|
page.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC18" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||||
<P>
|
<P>
|
||||||
There are over 80 positive error codes that <b>pcre2_compile()</b> may return if
|
There are over 80 positive error codes that <b>pcre2_compile()</b> may return if
|
||||||
it finds an error in the pattern. There are also some negative error codes that
|
it finds an error in the pattern. There are also some negative error codes that
|
||||||
|
@ -1315,7 +1334,7 @@ are used for invalid UTF strings. These are the same as given by
|
||||||
page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
|
page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
|
||||||
textual error message from any error code.
|
textual error message from any error code.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -1353,7 +1372,7 @@ patterns to be analyzed, and for one-off matches and simple patterns the
|
||||||
benefit of faster execution might be offset by a much slower compilation time.
|
benefit of faster execution might be offset by a much slower compilation time.
|
||||||
Most, but not all patterns can be optimized by the JIT compiler.
|
Most, but not all patterns can be optimized by the JIT compiler.
|
||||||
<a name="localesupport"></a></P>
|
<a name="localesupport"></a></P>
|
||||||
<br><a name="SEC20" href="#TOC1">LOCALE SUPPORT</a><br>
|
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 handles caseless matching, and determines whether characters are letters,
|
PCRE2 handles caseless matching, and determines whether characters are letters,
|
||||||
digits, or whatever, by reference to a set of tables, indexed by character code
|
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||||
|
@ -1409,7 +1428,7 @@ is saved with the compiled pattern, and the same tables are used by
|
||||||
compilation, and matching all happen in the same locale, but different patterns
|
compilation, and matching all happen in the same locale, but different patterns
|
||||||
can be processed in different locales.
|
can be processed in different locales.
|
||||||
<a name="infoaboutpattern"></a></P>
|
<a name="infoaboutpattern"></a></P>
|
||||||
<br><a name="SEC21" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
|
@ -1478,8 +1497,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
|
||||||
PCRE2_INFO_BACKREFMAX
|
PCRE2_INFO_BACKREFMAX
|
||||||
</pre>
|
</pre>
|
||||||
Return the number of the highest back reference in the pattern. The third
|
Return the number of the highest back reference in the pattern. The third
|
||||||
argument should point to an <b>uint32_t</b> variable. Zero is returned if there
|
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
|
||||||
are no back references.
|
numbers as well as names, and these count towards the highest back reference.
|
||||||
|
Back references such as \4 or \g{12} match the captured characters of the
|
||||||
|
given group, but in addition, the check that a capturing group is set in a
|
||||||
|
conditional subpattern such as (?(3)a|b) is also a back reference. Zero is
|
||||||
|
returned if there are no back references.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1689,14 +1712,24 @@ set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET
|
||||||
PCRE2_INFO_SIZE
|
PCRE2_INFO_SIZE
|
||||||
</pre>
|
</pre>
|
||||||
Return the size of the compiled pattern in bytes (for all three libraries). The
|
Return the size of the compiled pattern in bytes (for all three libraries). The
|
||||||
third argument should point to a <b>size_t</b> variable. This value does not
|
third argument should point to a <b>size_t</b> variable. This value includes the
|
||||||
include the size of the <b>pcre2_code</b> structure that is returned by
|
size of the general data block that precedes the code units of the compiled
|
||||||
<b>pcre_compile()</b>. The value that is used when <b>pcre2_compile()</b> is
|
pattern itself. The value that is used when <b>pcre2_compile()</b> is getting
|
||||||
getting memory in which to place the compiled data is the value returned by
|
memory in which to place the compiled pattern may be slightly larger than the
|
||||||
this option plus the size of the <b>pcre2_code</b> structure. Processing a
|
value returned by this option, because there are cases where the code that
|
||||||
pattern with the JIT compiler does not alter the value returned by this option.
|
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||||
|
compiler does not alter the value returned by this option.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC23" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||||
|
<P>
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. The functions whose names begin
|
||||||
|
with <b>pcre2_serialize_</b> are used for this purpose. They are described in
|
||||||
|
the
|
||||||
|
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||||
|
documentation.
|
||||||
<a name="matchdatablock"></a></P>
|
<a name="matchdatablock"></a></P>
|
||||||
<br><a name="SEC22" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
<br><a name="SEC24" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
<b>pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
||||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
@ -1767,7 +1800,7 @@ match data block (for that match) have taken place.
|
||||||
When a match data block itself is no longer needed, it should be freed by
|
When a match data block itself is no longer needed, it should be freed by
|
||||||
calling <b>pcre2_match_data_free()</b>.
|
calling <b>pcre2_match_data_free()</b>.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC23" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
<br><a name="SEC25" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -1981,7 +2014,7 @@ examples, in the
|
||||||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC24" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
<br><a name="SEC26" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||||
<P>
|
<P>
|
||||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||||
standard convention for the operating system. The default can be overridden in
|
standard convention for the operating system. The default can be overridden in
|
||||||
|
@ -2016,7 +2049,7 @@ LF in the characters that it matches.
|
||||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||||
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
||||||
<a name="matchedstrings"></a></P>
|
<a name="matchedstrings"></a></P>
|
||||||
<br><a name="SEC25" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
<br><a name="SEC27" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -2118,7 +2151,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
|
||||||
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
||||||
had.
|
had.
|
||||||
<a name="matchotherdata"></a></P>
|
<a name="matchotherdata"></a></P>
|
||||||
<br><a name="SEC26" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
<br><a name="SEC28" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -2162,7 +2195,7 @@ the code unit offset of the invalid UTF character. Details are given in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
page.
|
page.
|
||||||
<a name="errorlist"></a></P>
|
<a name="errorlist"></a></P>
|
||||||
<br><a name="SEC27" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
<br><a name="SEC29" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||||
<P>
|
<P>
|
||||||
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
||||||
converted to a text string by calling <b>pcre2_get_error_message()</b>. Negative
|
converted to a text string by calling <b>pcre2_get_error_message()</b>. Negative
|
||||||
|
@ -2271,7 +2304,7 @@ is attempted.
|
||||||
</pre>
|
</pre>
|
||||||
The internal recursion limit was reached.
|
The internal recursion limit was reached.
|
||||||
<a name="extractbynumber"></a></P>
|
<a name="extractbynumber"></a></P>
|
||||||
<br><a name="SEC28" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
<br><a name="SEC30" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
||||||
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
||||||
|
@ -2368,7 +2401,7 @@ The substring did not participate in the match. For example, if the pattern is
|
||||||
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
||||||
capturing slots, substring number 1 is unset.
|
capturing slots, substring number 1 is unset.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC29" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
<br><a name="SEC31" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
||||||
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
||||||
|
@ -2407,7 +2440,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
|
||||||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||||
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
||||||
<a name="extractbyname"></a></P>
|
<a name="extractbyname"></a></P>
|
||||||
<br><a name="SEC30" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
||||||
<b> PCRE2_SPTR <i>name</i>);</b>
|
<b> PCRE2_SPTR <i>name</i>);</b>
|
||||||
|
@ -2467,7 +2500,7 @@ names are not included in the compiled code. The matching process uses only
|
||||||
numbers. For this reason, the use of different names for subpatterns of the
|
numbers. For this reason, the use of different names for subpatterns of the
|
||||||
same number causes an error at compile time.
|
same number causes an error at compile time.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
<br><a name="SEC33" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -2528,7 +2561,7 @@ straight back. PCRE2_ERROR_BADREPLACEMENT is returned for an invalid
|
||||||
replacement string (unrecognized sequence following a dollar sign), and
|
replacement string (unrecognized sequence following a dollar sign), and
|
||||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough.
|
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC32" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
<br><a name="SEC34" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
||||||
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
||||||
|
@ -2573,7 +2606,7 @@ The format of the name table is described above in the section entitled
|
||||||
Given all the relevant entries for the name, you can extract each of their
|
Given all the relevant entries for the name, you can extract each of their
|
||||||
numbers, and hence the captured data.
|
numbers, and hence the captured data.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC33" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
<br><a name="SEC35" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||||
<P>
|
<P>
|
||||||
The traditional matching function uses a similar algorithm to Perl, which stops
|
The traditional matching function uses a similar algorithm to Perl, which stops
|
||||||
when it finds the first match at a given point in the subject. If you want to
|
when it finds the first match at a given point in the subject. If you want to
|
||||||
|
@ -2591,7 +2624,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
|
||||||
other alternatives. Ultimately, when it runs out of matches,
|
other alternatives. Ultimately, when it runs out of matches,
|
||||||
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
||||||
<a name="dfamatch"></a></P>
|
<a name="dfamatch"></a></P>
|
||||||
<br><a name="SEC34" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
<br><a name="SEC36" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -2786,13 +2819,13 @@ some plausibility checks are made on the contents of the workspace, which
|
||||||
should contain data about the previous partial match. If any of these checks
|
should contain data about the previous partial match. If any of these checks
|
||||||
fail, this error is given.
|
fail, this error is given.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC35" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC37" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
||||||
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
||||||
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC36" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC38" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
|
@ -2801,9 +2834,9 @@ University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC37" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC39" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -0,0 +1,184 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2serialize specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2serialize man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<ul>
|
||||||
|
<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
|
||||||
|
<li><a name="TOC2" href="#SEC2">SAVING COMPILED PATTERNS</a>
|
||||||
|
<li><a name="TOC3" href="#SEC3">RE-USING PRECOMPILED PATTERNS</a>
|
||||||
|
<li><a name="TOC4" href="#SEC4">AUTHOR</a>
|
||||||
|
<li><a name="TOC5" href="#SEC5">REVISION</a>
|
||||||
|
</ul>
|
||||||
|
<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||||
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||||
|
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
If you are running an application that uses a large number of regular
|
||||||
|
expression patterns, it may be useful to store them in a precompiled form
|
||||||
|
instead of having to compile them every time the application is run. However,
|
||||||
|
if you are using the just-in-time optimization feature, it is not possible to
|
||||||
|
save and reload the JIT data, because it is position-dependent. In addition,
|
||||||
|
the host on which the patterns are reloaded must be running the same version of
|
||||||
|
PCRE2, with the same code unit width, and must also have the same endianness,
|
||||||
|
pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
|
||||||
|
system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
|
||||||
|
can they be reloaded using the 8-bit library.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC2" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
Before compiled patterns can be saved they must be serialized, that is,
|
||||||
|
converted to a stream of bytes. A single byte stream may contain any number of
|
||||||
|
compiled patterns, but they must all use the same character tables. A single
|
||||||
|
copy of the tables is included in the byte stream (its size is 1088 bytes). For
|
||||||
|
more details of character tables, see the
|
||||||
|
<a href="pcre2api.html#localesupport">section on locale support</a>
|
||||||
|
in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
documentation.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream
|
||||||
|
from a list of compiled patterns. Its first two arguments specify the list,
|
||||||
|
being a pointer to a vector of pointers to compiled patterns, and the length of
|
||||||
|
the vector. The third and fourth arguments point to variables which are set to
|
||||||
|
point to the created byte stream and its length, respectively. The final
|
||||||
|
argument is a pointer to a general context, which can be used to specify custom
|
||||||
|
memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used
|
||||||
|
to obtain memory for the byte stream. The yield of the function is the number
|
||||||
|
of serialized patterns, or one of the following negative error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA the number of patterns is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Once a set of patterns has been serialized you can save the data in any
|
||||||
|
appropriate manner. Here is sample code that compiles two patterns and writes
|
||||||
|
them to a file. It assumes that the variable <i>fd</i> refers to a file that is
|
||||||
|
open for output. The error checking that should be present in a real
|
||||||
|
application has been omitted for simplicity.
|
||||||
|
<pre>
|
||||||
|
int errorcode;
|
||||||
|
uint8_t *bytes;
|
||||||
|
PCRE2_SIZE erroroffset;
|
||||||
|
PCRE2_SIZE bytescount;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
list_of_codes[0] = pcre2_compile("first pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
list_of_codes[1] = pcre2_compile("second pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
|
||||||
|
&bytescount, NULL);
|
||||||
|
errorcode = fwrite(bytes, 1, bytescount, fd);
|
||||||
|
</pre>
|
||||||
|
Note that the serialized data is binary data that may contain any of the 256
|
||||||
|
possible byte values. On systems that make a distinction between binary and
|
||||||
|
non-binary data, be sure that the file is opened for binary output.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Serializing a set of patterns leaves the original data untouched, so they can
|
||||||
|
still be used for matching. Their memory must eventually be freed in the usual
|
||||||
|
way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
|
||||||
|
stream, it too must be freed by calling <b>pcre2_serialize_free()</b>.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC3" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
In order to re-use a set of saved patterns you must first make the serialized
|
||||||
|
byte stream available in main memory (for example, by reading from a file). The
|
||||||
|
management of this memory block is up to the application. You can use the
|
||||||
|
<b>pcre2_serialize_get_number_of_codes()</b> function to find out how many
|
||||||
|
compiled patterns are in the serialized data without actually decoding the
|
||||||
|
patterns:
|
||||||
|
<pre>
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
|
||||||
|
</pre>
|
||||||
|
The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates
|
||||||
|
the compiled patterns in new memory blocks, setting pointers to them in a
|
||||||
|
vector. The first two arguments are a pointer to a suitable vector and its
|
||||||
|
length, and the third argument points to a byte stream. The final argument is a
|
||||||
|
pointer to a general context, which can be used to specify custom memory
|
||||||
|
mangagement functions for the decoded patterns. If this argument is NULL,
|
||||||
|
<b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
|
||||||
|
stream is no longer needed and can be discarded.
|
||||||
|
<pre>
|
||||||
|
int32_t number_of_codes;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes =
|
||||||
|
pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
|
||||||
|
</pre>
|
||||||
|
If the vector is not large enough for all the patterns in the byte stream, it
|
||||||
|
is filled with those that fit, and the remainder are ignored. The yield of the
|
||||||
|
function is the number of decoded patterns, or one of the following negative
|
||||||
|
error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA second argument is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE2 version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL first or third argument is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Decoded patterns can be used for matching in the usual way, and must be freed
|
||||||
|
by calling <b>pcre2_code_free()</b> as normal. A single copy of the character
|
||||||
|
tables is used by all the decoded patterns. A reference count is used to
|
||||||
|
arrange for its memory to be automatically freed when the last pattern is
|
||||||
|
freed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If a pattern was processed by <b>pcre2_jit_compile()</b> before being
|
||||||
|
serialized, the JIT data is discarded and so is no longer available after a
|
||||||
|
save/restore cycle. You can, however, process a restored pattern with
|
||||||
|
<b>pcre2_jit_compile()</b> if you wish.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
|
||||||
|
<P>
|
||||||
|
Philip Hazel
|
||||||
|
<br>
|
||||||
|
University Computing Service
|
||||||
|
<br>
|
||||||
|
Cambridge, England.
|
||||||
|
<br>
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
||||||
|
<P>
|
||||||
|
Last updated: 20 January 2015
|
||||||
|
<br>
|
||||||
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -30,9 +30,10 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a>
|
<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a>
|
||||||
<li><a name="TOC16" href="#SEC16">CALLOUTS</a>
|
<li><a name="TOC16" href="#SEC16">CALLOUTS</a>
|
||||||
<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a>
|
<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a>
|
||||||
<li><a name="TOC18" href="#SEC18">SEE ALSO</a>
|
<li><a name="TOC18" href="#SEC18">SAVING AND RESTORING COMPILED PATTERNS</a>
|
||||||
<li><a name="TOC19" href="#SEC19">AUTHOR</a>
|
<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
|
||||||
<li><a name="TOC20" href="#SEC20">REVISION</a>
|
<li><a name="TOC20" href="#SEC20">AUTHOR</a>
|
||||||
|
<li><a name="TOC21" href="#SEC21">REVISION</a>
|
||||||
</ul>
|
</ul>
|
||||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -51,10 +52,11 @@ documentation.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The input for <b>pcre2test</b> is a sequence of regular expression patterns and
|
The input for <b>pcre2test</b> is a sequence of regular expression patterns and
|
||||||
subject strings to be matched. The output shows the result of each match
|
subject strings to be matched. There are also command lines for setting
|
||||||
attempt. Modifiers on the command line, the patterns, and the subject lines
|
defaults and controlling some special actions. The output shows the result of
|
||||||
specify PCRE2 function options, control how the subject is processed, and what
|
each match attempt. Modifiers on external or internal command lines, the
|
||||||
output is produced.
|
patterns, and the subject lines specify PCRE2 function options, control how the
|
||||||
|
subject is processed, and what output is produced.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
As the original fairly simple PCRE library evolved, it acquired many different
|
As the original fairly simple PCRE library evolved, it acquired many different
|
||||||
|
@ -227,9 +229,7 @@ If <b>pcre2test</b> is given two filename arguments, it reads from the first and
|
||||||
writes to the second. If the first name is "-", input is taken from the
|
writes to the second. If the first name is "-", input is taken from the
|
||||||
standard input. If <b>pcre2test</b> is given only one argument, it reads from
|
standard input. If <b>pcre2test</b> is given only one argument, it reads from
|
||||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||||
stdout. When the input is a terminal, it prompts for each line of input, using
|
stdout.
|
||||||
"re>" to prompt for regular expression patterns, and "data>" to prompt for
|
|
||||||
subject lines.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When <b>pcre2test</b> is built, a configuration option can specify that it
|
When <b>pcre2test</b> is built, a configuration option can specify that it
|
||||||
|
@ -242,10 +242,16 @@ the <b>-help</b> option states whether or not <b>readline()</b> will be used.
|
||||||
The program handles any number of tests, each of which consists of a set of
|
The program handles any number of tests, each of which consists of a set of
|
||||||
input lines. Each set starts with a regular expression pattern, followed by any
|
input lines. Each set starts with a regular expression pattern, followed by any
|
||||||
number of subject lines to be matched against that pattern. In between sets of
|
number of subject lines to be matched against that pattern. In between sets of
|
||||||
test data, command lines that begin with a hash (#) character may appear. This
|
test data, command lines that begin with # may appear. This file format, with
|
||||||
file format, with some restrictions, can also be processed by the
|
some restrictions, can also be processed by the <b>perltest.sh</b> script that
|
||||||
<b>perltest.sh</b> script that is distributed with PCRE2 as a means of checking
|
is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
|
||||||
that the behaviour of PCRE2 and Perl is the same.
|
and Perl is the same.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
When the input is a terminal, <b>pcre2test</b> prompts for each line of input,
|
||||||
|
using "re>" to prompt for regular expression patterns, and "data>" to prompt
|
||||||
|
for subject lines. Command lines starting with # can be entered only in
|
||||||
|
response to the "re>" prompt.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Each subject line is matched separately and independently. If you want to do
|
Each subject line is matched separately and independently. If you want to do
|
||||||
|
@ -263,21 +269,27 @@ still input to be read.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br>
|
<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br>
|
||||||
<P>
|
<P>
|
||||||
In between sets of test data, a line that begins with a hash (#) character is
|
In between sets of test data, a line that begins with # is interpreted as a
|
||||||
interpreted as a command line. If the first character is followed by white
|
command line. If the first character is followed by white space or an
|
||||||
space or an exclamation mark, the line is treated as a comment, and ignored.
|
exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
|
||||||
Otherwise, the following commands are recognized:
|
following commands are recognized:
|
||||||
<pre>
|
<pre>
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
</pre>
|
</pre>
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
||||||
options set, which locks out the use of UTF and Unicode property features. This
|
options set, which locks out the use of UTF and Unicode property features. This
|
||||||
is a trigger guard that is used in test files to ensure that UTF/Unicode tests
|
is a trigger guard that is used in test files to ensure that UTF or Unicode
|
||||||
are not accidentally added to files that are used when UTF support is not
|
property tests are not accidentally added to files that are used when Unicode
|
||||||
included in the library. This effect can also be obtained by the use of
|
support is not included in the library. This effect can also be obtained by the
|
||||||
<b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be unset, and
|
use of <b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be
|
||||||
the automatic options are not displayed in pattern information, to avoid
|
unset, and the automatic options are not displayed in pattern information, to
|
||||||
cluttering up test output.
|
avoid cluttering up test output.
|
||||||
|
<pre>
|
||||||
|
#load <filename>
|
||||||
|
</pre>
|
||||||
|
This command is used to load a set of precompiled patterns from a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
<pre>
|
<pre>
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -293,6 +305,18 @@ lines, none of the other command lines are permitted, because they and many
|
||||||
of the modifiers are specific to <b>pcre2test</b>, and should not be used in
|
of the modifiers are specific to <b>pcre2test</b>, and should not be used in
|
||||||
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
||||||
command helps detect tests that are accidentally put in the wrong file.
|
command helps detect tests that are accidentally put in the wrong file.
|
||||||
|
<pre>
|
||||||
|
#pop [<modifiers>]
|
||||||
|
</pre>
|
||||||
|
This command is used to manipulate the stack of compiled patterns, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
|
<pre>
|
||||||
|
#save <filename>
|
||||||
|
</pre>
|
||||||
|
This command is used to save a set of compiled patterns to a file, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
<pre>
|
<pre>
|
||||||
#subject <modifier-list>
|
#subject <modifier-list>
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -428,7 +452,7 @@ There are three types of modifier that can appear in pattern lines, two of
|
||||||
which may also be used in a <b>#pattern</b> command. A pattern's modifier list
|
which may also be used in a <b>#pattern</b> command. A pattern's modifier list
|
||||||
can add to or override default modifiers that were set by a previous
|
can add to or override default modifiers that were set by a previous
|
||||||
<b>#pattern</b> command.
|
<b>#pattern</b> command.
|
||||||
</P>
|
<a name="optionmodifiers"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting compilation options
|
Setting compilation options
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -465,7 +489,7 @@ As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
|
||||||
non-printing characters in output strings to be printed using the \x{hh...}
|
non-printing characters in output strings to be printed using the \x{hh...}
|
||||||
notation. Otherwise, those less than 0x100 are output in hex without the curly
|
notation. Otherwise, those less than 0x100 are output in hex without the curly
|
||||||
brackets.
|
brackets.
|
||||||
</P>
|
<a name="controlmodifiers"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting compilation controls
|
Setting compilation controls
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -486,8 +510,8 @@ about the pattern:
|
||||||
memory show memory used
|
memory show memory used
|
||||||
newline=<type> set newline type
|
newline=<type> set newline type
|
||||||
parens_nest_limit=<n> set maximum parentheses depth
|
parens_nest_limit=<n> set maximum parentheses depth
|
||||||
perlcompat lock out non-Perl modifiers
|
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
|
push push compiled pattern onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -726,6 +750,22 @@ not affect the compilation process.
|
||||||
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
||||||
defaults, set them in a <b>#subject</b> command.
|
defaults, set them in a <b>#subject</b> command.
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Saving a compiled pattern
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
When a pattern with the <b>push</b> modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
|
||||||
|
line to contain a new pattern (or a command) instead of a subject line. This
|
||||||
|
facility is used when saving compiled patterns to a file, as described in the
|
||||||
|
section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
|
The <b>push</b> modifier is incompatible with compilation modifiers such as
|
||||||
|
<b>global</b> that act at match time. Any that are specified are ignored, with a
|
||||||
|
warning message, except for <b>replace</b>, which causes an error. Note that,
|
||||||
|
<b>jitverify</b>, which is allowed, does not carry through to any subsequent
|
||||||
|
matching that uses this pattern.
|
||||||
|
</P>
|
||||||
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The modifiers that can appear in subject lines and the <b>#subject</b>
|
The modifiers that can appear in subject lines and the <b>#subject</b>
|
||||||
|
@ -1292,14 +1332,75 @@ string, it behaves in the same way, unless a different locale has been set for
|
||||||
the pattern (using the <b>/locale</b> modifier). In this case, the
|
the pattern (using the <b>/locale</b> modifier). In this case, the
|
||||||
<b>isprint()</b> function is used to distinguish printing and non-printing
|
<b>isprint()</b> function is used to distinguish printing and non-printing
|
||||||
characters.
|
characters.
|
||||||
|
<a name="saverestore"></a></P>
|
||||||
|
<br><a name="SEC18" href="#TOC1">SAVING AND RESTORING COMPILED PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. JIT data cannot be saved. The host
|
||||||
|
on which the patterns are reloaded must be running the same version of PCRE2,
|
||||||
|
with the same code unit width, and must also have the same endianness, pointer
|
||||||
|
width and PCRE2_SIZE type. Before compiled patterns can be saved they must be
|
||||||
|
serialized, that is, converted to a stream of bytes. A single byte stream may
|
||||||
|
contain any number of compiled patterns, but they must all use the same
|
||||||
|
character tables. A single copy of the tables is included in the byte stream
|
||||||
|
(its size is 1088 bytes).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC18" href="#TOC1">SEE ALSO</a><br>
|
<P>
|
||||||
|
The functions whose names begin with <b>pcre2_serialize_</b> are used
|
||||||
|
for serializing and de-serializing. They are described in the
|
||||||
|
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||||
|
documentation. In this section we describe the features of <b>pcre2test</b> that
|
||||||
|
can be used to test these functions.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
When a pattern with <b>push</b> modifier is successfully compiled, it is pushed
|
||||||
|
onto a stack of compiled patterns, and <b>pcre2test</b> expects the next line to
|
||||||
|
contain a new pattern (or command) instead of a subject line. By this means, a
|
||||||
|
number of patterns can be compiled and retained. The <b>push</b> modifier is
|
||||||
|
incompatible with <b>posix</b>, and control modifiers that act at match time are
|
||||||
|
ignored (with a message). The <b>jitverify</b> modifier applies only at compile
|
||||||
|
time. The command
|
||||||
|
<pre>
|
||||||
|
#save <filename>
|
||||||
|
</pre>
|
||||||
|
causes all the stacked patterns to be serialized and the result written to the
|
||||||
|
named file. Afterwards, all the stacked patterns are freed. The command
|
||||||
|
<pre>
|
||||||
|
#load <filename>
|
||||||
|
</pre>
|
||||||
|
reads the data in the file, and then arranges for it to be de-serialized, with
|
||||||
|
the resulting compiled patterns added to the pattern stack. The pattern on the
|
||||||
|
top of the stack can be retrieved by the #pop command, which must be followed
|
||||||
|
by lines of subjects that are to be matched with the pattern, terminated as
|
||||||
|
usual by an empty line or end of file. This command may be followed by a
|
||||||
|
modifier list containing only
|
||||||
|
<a href="#controlmodifiers">control modifiers</a>
|
||||||
|
that act after a pattern has been compiled. In particular, <b>hex</b>,
|
||||||
|
<b>posix</b>, and <b>push</b> are not allowed, nor are any
|
||||||
|
<a href="#optionmodifiers">option-setting modifiers.</a>
|
||||||
|
The JIT modifiers are, however permitted. Here is an example that saves and
|
||||||
|
reloads two patterns.
|
||||||
|
<pre>
|
||||||
|
/abc/push
|
||||||
|
/xyz/push
|
||||||
|
#save tempfile
|
||||||
|
#load tempfile
|
||||||
|
#pop info
|
||||||
|
xyz
|
||||||
|
|
||||||
|
#pop jit,bincode
|
||||||
|
abc
|
||||||
|
</pre>
|
||||||
|
If <b>jitverify</b> is used with #pop, it does not automatically imply
|
||||||
|
<b>jit</b>, which is different behaviour from when it is used on a pattern.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
||||||
<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d),
|
<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d),
|
||||||
<b>pcre2pattern</b>(3).
|
<b>pcre2pattern</b>(3), <b>pcre2serialize</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
|
@ -1308,9 +1409,9 @@ University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC20" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -65,6 +65,9 @@ first.
|
||||||
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
||||||
<td> Discussion of the pcre2demo program</td></tr>
|
<td> Discussion of the pcre2demo program</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2serialize.html">pcre2serialize</a></td>
|
||||||
|
<td> Serializing functions for saving precompiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
||||||
<td> Discussion of PCRE2's stack usage</td></tr>
|
<td> Discussion of PCRE2's stack usage</td></tr>
|
||||||
|
|
||||||
|
@ -177,6 +180,18 @@ in the library.
|
||||||
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
||||||
<td> Extract information about a pattern</td></tr>
|
<td> Extract information about a pattern</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_decode.html">pcre2_serialize_decode</a></td>
|
||||||
|
<td> Decode serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_encode.html">pcre2_serialize_encode</a></td>
|
||||||
|
<td> Serialize compiled patterns for save/restore</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_free.html">pcre2_serialize_free</a></td>
|
||||||
|
<td> Free serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_get_number_of_codes.html">pcre2_serialize_get_number_of_codes</a></td>
|
||||||
|
<td> Get number of serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
||||||
<td> Set \R convention</td></tr>
|
<td> Set \R convention</td></tr>
|
||||||
|
|
||||||
|
|
|
@ -343,6 +343,21 @@ PCRE2 NATIVE API JIT FUNCTIONS
|
||||||
void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack);
|
void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack);
|
||||||
|
|
||||||
|
|
||||||
|
PCRE2 NATIVE API SERIALIZATION FUNCTIONS
|
||||||
|
|
||||||
|
int32_t pcre2_serialize_decode(pcre2_code **codes,
|
||||||
|
int32_t number_of_codes, const uint32_t *bytes,
|
||||||
|
pcre2_general_context *gcontext);
|
||||||
|
|
||||||
|
int32_t pcre2_serialize_encode(pcre2_code **codes,
|
||||||
|
int32_t number_of_codes, uint32_t **serialized_bytes,
|
||||||
|
PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
|
||||||
|
|
||||||
|
void pcre2_serialize_free(uint8_t *bytes);
|
||||||
|
|
||||||
|
int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);
|
||||||
|
|
||||||
|
|
||||||
PCRE2 NATIVE API AUXILIARY FUNCTIONS
|
PCRE2 NATIVE API AUXILIARY FUNCTIONS
|
||||||
|
|
||||||
int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
|
int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
|
||||||
|
@ -1504,8 +1519,13 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
PCRE2_INFO_BACKREFMAX
|
PCRE2_INFO_BACKREFMAX
|
||||||
|
|
||||||
Return the number of the highest back reference in the pattern. The
|
Return the number of the highest back reference in the pattern. The
|
||||||
third argument should point to an uint32_t variable. Zero is returned
|
third argument should point to an uint32_t variable. Named subpatterns
|
||||||
if there are no back references.
|
acquire numbers as well as names, and these count towards the highest
|
||||||
|
back reference. Back references such as \4 or \g{12} match the cap-
|
||||||
|
tured characters of the given group, but in addition, the check that a
|
||||||
|
capturing group is set in a conditional subpattern such as (?(3)a|b) is
|
||||||
|
also a back reference. Zero is returned if there are no back refer-
|
||||||
|
ences.
|
||||||
|
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
|
|
||||||
|
@ -1715,12 +1735,21 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
|
|
||||||
Return the size of the compiled pattern in bytes (for all three
|
Return the size of the compiled pattern in bytes (for all three
|
||||||
libraries). The third argument should point to a size_t variable. This
|
libraries). The third argument should point to a size_t variable. This
|
||||||
value does not include the size of the pcre2_code structure that is
|
value includes the size of the general data block that precedes the
|
||||||
returned by pcre_compile(). The value that is used when pcre2_compile()
|
code units of the compiled pattern itself. The value that is used when
|
||||||
is getting memory in which to place the compiled data is the value
|
pcre2_compile() is getting memory in which to place the compiled pat-
|
||||||
returned by this option plus the size of the pcre2_code structure. Pro-
|
tern may be slightly larger than the value returned by this option,
|
||||||
cessing a pattern with the JIT compiler does not alter the value
|
because there are cases where the code that calculates the size has to
|
||||||
returned by this option.
|
over-estimate. Processing a pattern with the JIT compiler does not
|
||||||
|
alter the value returned by this option.
|
||||||
|
|
||||||
|
|
||||||
|
SERIALIZATION AND PRECOMPILING
|
||||||
|
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and
|
||||||
|
reload them later, subject to a number of restrictions. The functions
|
||||||
|
whose names begin with pcre2_serialize_ are used for this purpose. They
|
||||||
|
are described in the pcre2serialize documentation.
|
||||||
|
|
||||||
|
|
||||||
THE MATCH DATA BLOCK
|
THE MATCH DATA BLOCK
|
||||||
|
@ -2742,7 +2771,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,50 @@
|
||||||
|
.TH PCRE2_SERIALIZE_DECODE 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||||
|
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
This function decodes a serialized set of compiled patterns back into a list of
|
||||||
|
individual patterns. Its arguments are:
|
||||||
|
.sp
|
||||||
|
\fIcodes\fP pointer to a vector in which to build the list
|
||||||
|
\fInumber_of_codes\fP number of slots in the vector
|
||||||
|
\fIbytes\fP the serialized byte stream
|
||||||
|
\fIgcontext\fP pointer to a general context or NULL
|
||||||
|
.sp
|
||||||
|
The \fIbytes\fP argument must point to a block of data that was originally
|
||||||
|
created by \fBpcre2_serialize_encode()\fP, though it may have been saved on
|
||||||
|
disc or elsewhere in the meantime. If there are more codes in the serialized
|
||||||
|
data than slots in the list, only those compiled patterns that will fit are
|
||||||
|
decoded. The yield of the function is the number of decoded patterns, or one of
|
||||||
|
the following negative error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL \fIcodes\fP or \fIbytes\fP is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -0,0 +1,49 @@
|
||||||
|
.TH PCRE2_SERIALIZE_ENCODE 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||||
|
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
This function encodes a list of compiled patterns into a byte stream that can
|
||||||
|
be saved on disc or elsewhere. Its arguments are:
|
||||||
|
.sp
|
||||||
|
\fIcodes\fP pointer to a vector containing the list
|
||||||
|
\fInumber_of_codes\fP number of slots in the vector
|
||||||
|
\fIserialized_bytes\fP set to point to the serialized byte stream
|
||||||
|
\fIserialized_size\fP set to the number of bytes in the byte stream
|
||||||
|
\fIgcontext\fP pointer to a general context or NULL
|
||||||
|
.sp
|
||||||
|
The context argument is used to obtain memory for the byte stream. When the
|
||||||
|
serialized data is no longer needed, it must be freed by calling
|
||||||
|
\fBpcre2_serialize_free()\fP. The yield of the function is the number of
|
||||||
|
serialized patterns, or one of the following negative error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL an argument other than \fIgcontext\fP is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -0,0 +1,28 @@
|
||||||
|
.TH PCRE2_SERIALIZE_FREE 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
This function frees the memory that was obtained by
|
||||||
|
\fBpcre2_serialize_encode()\fP to hold a serialized byte stream. The argument
|
||||||
|
must point to such a byte stream.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -0,0 +1,37 @@
|
||||||
|
.TH PCRE2_SERIALIZE_GET_NUMBER_OF_CODES 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
The \fIbytes\fP argument must point to a serialized byte stream that was
|
||||||
|
originally created by \fBpcre2_serialize_encode()\fP (though it may have been
|
||||||
|
saved on disc or elsewhere in the meantime). The function returns the number of
|
||||||
|
serialized patterns in the byte stream, or one of the following negative error
|
||||||
|
codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_NULL the argument is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "13 January 2015" "PCRE2 10.10"
|
.TH PCRE2API 3 "23 January 2015" "PCRE2 10.10"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -205,6 +205,24 @@ document for an overview of all the PCRE2 documentation.
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SH "PCRE2 NATIVE API SERIALIZATION FUNCTIONS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||||
|
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||||
|
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "PCRE2 NATIVE API AUXILIARY FUNCTIONS"
|
.SH "PCRE2 NATIVE API AUXILIARY FUNCTIONS"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -1689,12 +1707,26 @@ set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
||||||
PCRE2_INFO_SIZE
|
PCRE2_INFO_SIZE
|
||||||
.sp
|
.sp
|
||||||
Return the size of the compiled pattern in bytes (for all three libraries). The
|
Return the size of the compiled pattern in bytes (for all three libraries). The
|
||||||
third argument should point to a \fBsize_t\fP variable. This value does not
|
third argument should point to a \fBsize_t\fP variable. This value includes the
|
||||||
include the size of the \fBpcre2_code\fP structure that is returned by
|
size of the general data block that precedes the code units of the compiled
|
||||||
\fBpcre_compile()\fP. The value that is used when \fBpcre2_compile()\fP is
|
pattern itself. The value that is used when \fBpcre2_compile()\fP is getting
|
||||||
getting memory in which to place the compiled data is the value returned by
|
memory in which to place the compiled pattern may be slightly larger than the
|
||||||
this option plus the size of the \fBpcre2_code\fP structure. Processing a
|
value returned by this option, because there are cases where the code that
|
||||||
pattern with the JIT compiler does not alter the value returned by this option.
|
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||||
|
compiler does not alter the value returned by this option.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH "SERIALIZATION AND PRECOMPILING"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. The functions whose names begin
|
||||||
|
with \fBpcre2_serialize_\fP are used for this purpose. They are described in
|
||||||
|
the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2serialize\fP
|
||||||
|
.\"
|
||||||
|
documentation.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="matchdatablock"></a>
|
.\" HTML <a name="matchdatablock"></a>
|
||||||
|
@ -2853,6 +2885,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 13 January 2015
|
Last updated: 23 January 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -0,0 +1,170 @@
|
||||||
|
.TH PCRE2SERIALIZE 3 "20 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH "SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||||
|
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||||
|
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.sp
|
||||||
|
If you are running an application that uses a large number of regular
|
||||||
|
expression patterns, it may be useful to store them in a precompiled form
|
||||||
|
instead of having to compile them every time the application is run. However,
|
||||||
|
if you are using the just-in-time optimization feature, it is not possible to
|
||||||
|
save and reload the JIT data, because it is position-dependent. In addition,
|
||||||
|
the host on which the patterns are reloaded must be running the same version of
|
||||||
|
PCRE2, with the same code unit width, and must also have the same endianness,
|
||||||
|
pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
|
||||||
|
system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
|
||||||
|
can they be reloaded using the 8-bit library.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH "SAVING COMPILED PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
Before compiled patterns can be saved they must be serialized, that is,
|
||||||
|
converted to a stream of bytes. A single byte stream may contain any number of
|
||||||
|
compiled patterns, but they must all use the same character tables. A single
|
||||||
|
copy of the tables is included in the byte stream (its size is 1088 bytes). For
|
||||||
|
more details of character tables, see the
|
||||||
|
.\" HTML <a href="pcre2api.html#localesupport">
|
||||||
|
.\" </a>
|
||||||
|
section on locale support
|
||||||
|
.\"
|
||||||
|
in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
documentation.
|
||||||
|
.P
|
||||||
|
The function \fBpcre2_serialize_encode()\fP creates a serialized byte stream
|
||||||
|
from a list of compiled patterns. Its first two arguments specify the list,
|
||||||
|
being a pointer to a vector of pointers to compiled patterns, and the length of
|
||||||
|
the vector. The third and fourth arguments point to variables which are set to
|
||||||
|
point to the created byte stream and its length, respectively. The final
|
||||||
|
argument is a pointer to a general context, which can be used to specify custom
|
||||||
|
memory mangagement functions. If this argument is NULL, \fBmalloc()\fP is used
|
||||||
|
to obtain memory for the byte stream. The yield of the function is the number
|
||||||
|
of serialized patterns, or one of the following negative error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA the number of patterns is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
.P
|
||||||
|
Once a set of patterns has been serialized you can save the data in any
|
||||||
|
appropriate manner. Here is sample code that compiles two patterns and writes
|
||||||
|
them to a file. It assumes that the variable \fIfd\fP refers to a file that is
|
||||||
|
open for output. The error checking that should be present in a real
|
||||||
|
application has been omitted for simplicity.
|
||||||
|
.sp
|
||||||
|
int errorcode;
|
||||||
|
uint8_t *bytes;
|
||||||
|
PCRE2_SIZE erroroffset;
|
||||||
|
PCRE2_SIZE bytescount;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
list_of_codes[0] = pcre2_compile("first pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
list_of_codes[1] = pcre2_compile("second pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
|
||||||
|
&bytescount, NULL);
|
||||||
|
errorcode = fwrite(bytes, 1, bytescount, fd);
|
||||||
|
.sp
|
||||||
|
Note that the serialized data is binary data that may contain any of the 256
|
||||||
|
possible byte values. On systems that make a distinction between binary and
|
||||||
|
non-binary data, be sure that the file is opened for binary output.
|
||||||
|
.P
|
||||||
|
Serializing a set of patterns leaves the original data untouched, so they can
|
||||||
|
still be used for matching. Their memory must eventually be freed in the usual
|
||||||
|
way by calling \fBpcre2_code_free()\fP. When you have finished with the byte
|
||||||
|
stream, it too must be freed by calling \fBpcre2_serialize_free()\fP.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH "RE-USING PRECOMPILED PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
In order to re-use a set of saved patterns you must first make the serialized
|
||||||
|
byte stream available in main memory (for example, by reading from a file). The
|
||||||
|
management of this memory block is up to the application. You can use the
|
||||||
|
\fBpcre2_serialize_get_number_of_codes()\fP function to find out how many
|
||||||
|
compiled patterns are in the serialized data without actually decoding the
|
||||||
|
patterns:
|
||||||
|
.sp
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
|
||||||
|
.sp
|
||||||
|
The \fBpcre2_serialize_decode()\fP function reads a byte stream and recreates
|
||||||
|
the compiled patterns in new memory blocks, setting pointers to them in a
|
||||||
|
vector. The first two arguments are a pointer to a suitable vector and its
|
||||||
|
length, and the third argument points to a byte stream. The final argument is a
|
||||||
|
pointer to a general context, which can be used to specify custom memory
|
||||||
|
mangagement functions for the decoded patterns. If this argument is NULL,
|
||||||
|
\fBmalloc()\fP and \fBfree()\fP are used. After deserialization, the byte
|
||||||
|
stream is no longer needed and can be discarded.
|
||||||
|
.sp
|
||||||
|
int32_t number_of_codes;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes =
|
||||||
|
pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
|
||||||
|
.sp
|
||||||
|
If the vector is not large enough for all the patterns in the byte stream, it
|
||||||
|
is filled with those that fit, and the remainder are ignored. The yield of the
|
||||||
|
function is the number of decoded patterns, or one of the following negative
|
||||||
|
error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA second argument is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE2 version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL first or third argument is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
.P
|
||||||
|
Decoded patterns can be used for matching in the usual way, and must be freed
|
||||||
|
by calling \fBpcre2_code_free()\fP as normal. A single copy of the character
|
||||||
|
tables is used by all the decoded patterns. A reference count is used to
|
||||||
|
arrange for its memory to be automatically freed when the last pattern is
|
||||||
|
freed.
|
||||||
|
.P
|
||||||
|
If a pattern was processed by \fBpcre2_jit_compile()\fP before being
|
||||||
|
serialized, the JIT data is discarded and so is no longer available after a
|
||||||
|
save/restore cycle. You can, however, process a restored pattern with
|
||||||
|
\fBpcre2_jit_compile()\fP if you wish.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH AUTHOR
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
Philip Hazel
|
||||||
|
University Computing Service
|
||||||
|
Cambridge, England.
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH REVISION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
Last updated: 20 January 2015
|
||||||
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
|
.fi
|
173
doc/pcre2test.1
173
doc/pcre2test.1
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "02 January 2015" "PCRE 10.00"
|
.TH PCRE2TEST 1 "23 January 2015" "PCRE 10.10"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -21,10 +21,11 @@ options, see the
|
||||||
documentation.
|
documentation.
|
||||||
.P
|
.P
|
||||||
The input for \fBpcre2test\fP is a sequence of regular expression patterns and
|
The input for \fBpcre2test\fP is a sequence of regular expression patterns and
|
||||||
subject strings to be matched. The output shows the result of each match
|
subject strings to be matched. There are also command lines for setting
|
||||||
attempt. Modifiers on the command line, the patterns, and the subject lines
|
defaults and controlling some special actions. The output shows the result of
|
||||||
specify PCRE2 function options, control how the subject is processed, and what
|
each match attempt. Modifiers on external or internal command lines, the
|
||||||
output is produced.
|
patterns, and the subject lines specify PCRE2 function options, control how the
|
||||||
|
subject is processed, and what output is produced.
|
||||||
.P
|
.P
|
||||||
As the original fairly simple PCRE library evolved, it acquired many different
|
As the original fairly simple PCRE library evolved, it acquired many different
|
||||||
features, and as a result, the original \fBpcretest\fP program ended up with a
|
features, and as a result, the original \fBpcretest\fP program ended up with a
|
||||||
|
@ -185,9 +186,7 @@ If \fBpcre2test\fP is given two filename arguments, it reads from the first and
|
||||||
writes to the second. If the first name is "-", input is taken from the
|
writes to the second. If the first name is "-", input is taken from the
|
||||||
standard input. If \fBpcre2test\fP is given only one argument, it reads from
|
standard input. If \fBpcre2test\fP is given only one argument, it reads from
|
||||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||||
stdout. When the input is a terminal, it prompts for each line of input, using
|
stdout.
|
||||||
"re>" to prompt for regular expression patterns, and "data>" to prompt for
|
|
||||||
subject lines.
|
|
||||||
.P
|
.P
|
||||||
When \fBpcre2test\fP is built, a configuration option can specify that it
|
When \fBpcre2test\fP is built, a configuration option can specify that it
|
||||||
should be linked with the \fBlibreadline\fP or \fBlibedit\fP library. When this
|
should be linked with the \fBlibreadline\fP or \fBlibedit\fP library. When this
|
||||||
|
@ -198,10 +197,15 @@ the \fB-help\fP option states whether or not \fBreadline()\fP will be used.
|
||||||
The program handles any number of tests, each of which consists of a set of
|
The program handles any number of tests, each of which consists of a set of
|
||||||
input lines. Each set starts with a regular expression pattern, followed by any
|
input lines. Each set starts with a regular expression pattern, followed by any
|
||||||
number of subject lines to be matched against that pattern. In between sets of
|
number of subject lines to be matched against that pattern. In between sets of
|
||||||
test data, command lines that begin with a hash (#) character may appear. This
|
test data, command lines that begin with # may appear. This file format, with
|
||||||
file format, with some restrictions, can also be processed by the
|
some restrictions, can also be processed by the \fBperltest.sh\fP script that
|
||||||
\fBperltest.sh\fP script that is distributed with PCRE2 as a means of checking
|
is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
|
||||||
that the behaviour of PCRE2 and Perl is the same.
|
and Perl is the same.
|
||||||
|
.P
|
||||||
|
When the input is a terminal, \fBpcre2test\fP prompts for each line of input,
|
||||||
|
using "re>" to prompt for regular expression patterns, and "data>" to prompt
|
||||||
|
for subject lines. Command lines starting with # can be entered only in
|
||||||
|
response to the "re>" prompt.
|
||||||
.P
|
.P
|
||||||
Each subject line is matched separately and independently. If you want to do
|
Each subject line is matched separately and independently. If you want to do
|
||||||
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
||||||
|
@ -219,21 +223,30 @@ still input to be read.
|
||||||
.SH "COMMAND LINES"
|
.SH "COMMAND LINES"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
In between sets of test data, a line that begins with a hash (#) character is
|
In between sets of test data, a line that begins with # is interpreted as a
|
||||||
interpreted as a command line. If the first character is followed by white
|
command line. If the first character is followed by white space or an
|
||||||
space or an exclamation mark, the line is treated as a comment, and ignored.
|
exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
|
||||||
Otherwise, the following commands are recognized:
|
following commands are recognized:
|
||||||
.sp
|
.sp
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
.sp
|
.sp
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
||||||
options set, which locks out the use of UTF and Unicode property features. This
|
options set, which locks out the use of UTF and Unicode property features. This
|
||||||
is a trigger guard that is used in test files to ensure that UTF/Unicode tests
|
is a trigger guard that is used in test files to ensure that UTF or Unicode
|
||||||
are not accidentally added to files that are used when UTF support is not
|
property tests are not accidentally added to files that are used when Unicode
|
||||||
included in the library. This effect can also be obtained by the use of
|
support is not included in the library. This effect can also be obtained by the
|
||||||
\fB#pattern\fP; the difference is that \fB#forbid_utf\fP cannot be unset, and
|
use of \fB#pattern\fP; the difference is that \fB#forbid_utf\fP cannot be
|
||||||
the automatic options are not displayed in pattern information, to avoid
|
unset, and the automatic options are not displayed in pattern information, to
|
||||||
cluttering up test output.
|
avoid cluttering up test output.
|
||||||
|
.sp
|
||||||
|
#load <filename>
|
||||||
|
.sp
|
||||||
|
This command is used to load a set of precompiled patterns from a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
.sp
|
.sp
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
.sp
|
.sp
|
||||||
|
@ -249,6 +262,24 @@ lines, none of the other command lines are permitted, because they and many
|
||||||
of the modifiers are specific to \fBpcre2test\fP, and should not be used in
|
of the modifiers are specific to \fBpcre2test\fP, and should not be used in
|
||||||
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
|
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
|
||||||
command helps detect tests that are accidentally put in the wrong file.
|
command helps detect tests that are accidentally put in the wrong file.
|
||||||
|
.sp
|
||||||
|
#pop [<modifiers>]
|
||||||
|
.sp
|
||||||
|
This command is used to manipulate the stack of compiled patterns, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
|
.sp
|
||||||
|
#save <filename>
|
||||||
|
.sp
|
||||||
|
This command is used to save a set of compiled patterns to a file, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
.sp
|
.sp
|
||||||
#subject <modifier-list>
|
#subject <modifier-list>
|
||||||
.sp
|
.sp
|
||||||
|
@ -387,6 +418,7 @@ can add to or override default modifiers that were set by a previous
|
||||||
\fB#pattern\fP command.
|
\fB#pattern\fP command.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.\" HTML <a name="optionmodifiers"></a>
|
||||||
.SS "Setting compilation options"
|
.SS "Setting compilation options"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -426,6 +458,7 @@ notation. Otherwise, those less than 0x100 are output in hex without the curly
|
||||||
brackets.
|
brackets.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.\" HTML <a name="controlmodifiers"></a>
|
||||||
.SS "Setting compilation controls"
|
.SS "Setting compilation controls"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -445,8 +478,8 @@ about the pattern:
|
||||||
memory show memory used
|
memory show memory used
|
||||||
newline=<type> set newline type
|
newline=<type> set newline type
|
||||||
parens_nest_limit=<n> set maximum parentheses depth
|
parens_nest_limit=<n> set maximum parentheses depth
|
||||||
perlcompat lock out non-Perl modifiers
|
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
|
push push compiled pattern onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
.sp
|
.sp
|
||||||
|
@ -683,6 +716,25 @@ These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
||||||
defaults, set them in a \fB#subject\fP command.
|
defaults, set them in a \fB#subject\fP command.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Saving a compiled pattern"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
When a pattern with the \fBpush\fP modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and \fBpcre2test\fP expects the next
|
||||||
|
line to contain a new pattern (or a command) instead of a subject line. This
|
||||||
|
facility is used when saving compiled patterns to a file, as described in the
|
||||||
|
section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
|
The \fBpush\fP modifier is incompatible with compilation modifiers such as
|
||||||
|
\fBglobal\fP that act at match time. Any that are specified are ignored, with a
|
||||||
|
warning message, except for \fBreplace\fP, which causes an error. Note that,
|
||||||
|
\fBjitverify\fP, which is allowed, does not carry through to any subsequent
|
||||||
|
matching that uses this pattern.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "SUBJECT MODIFIERS"
|
.SH "SUBJECT MODIFIERS"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -1253,12 +1305,83 @@ characters.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.\" HTML <a name="saverestore"></a>
|
||||||
|
.SH "SAVING AND RESTORING COMPILED PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. JIT data cannot be saved. The host
|
||||||
|
on which the patterns are reloaded must be running the same version of PCRE2,
|
||||||
|
with the same code unit width, and must also have the same endianness, pointer
|
||||||
|
width and PCRE2_SIZE type. Before compiled patterns can be saved they must be
|
||||||
|
serialized, that is, converted to a stream of bytes. A single byte stream may
|
||||||
|
contain any number of compiled patterns, but they must all use the same
|
||||||
|
character tables. A single copy of the tables is included in the byte stream
|
||||||
|
(its size is 1088 bytes).
|
||||||
|
.P
|
||||||
|
The functions whose names begin with \fBpcre2_serialize_\fP are used
|
||||||
|
for serializing and de-serializing. They are described in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2serialize\fP
|
||||||
|
.\"
|
||||||
|
documentation. In this section we describe the features of \fBpcre2test\fP that
|
||||||
|
can be used to test these functions.
|
||||||
|
.P
|
||||||
|
When a pattern with \fBpush\fP modifier is successfully compiled, it is pushed
|
||||||
|
onto a stack of compiled patterns, and \fBpcre2test\fP expects the next line to
|
||||||
|
contain a new pattern (or command) instead of a subject line. By this means, a
|
||||||
|
number of patterns can be compiled and retained. The \fBpush\fP modifier is
|
||||||
|
incompatible with \fBposix\fP, and control modifiers that act at match time are
|
||||||
|
ignored (with a message). The \fBjitverify\fP modifier applies only at compile
|
||||||
|
time. The command
|
||||||
|
.sp
|
||||||
|
#save <filename>
|
||||||
|
.sp
|
||||||
|
causes all the stacked patterns to be serialized and the result written to the
|
||||||
|
named file. Afterwards, all the stacked patterns are freed. The command
|
||||||
|
.sp
|
||||||
|
#load <filename>
|
||||||
|
.sp
|
||||||
|
reads the data in the file, and then arranges for it to be de-serialized, with
|
||||||
|
the resulting compiled patterns added to the pattern stack. The pattern on the
|
||||||
|
top of the stack can be retrieved by the #pop command, which must be followed
|
||||||
|
by lines of subjects that are to be matched with the pattern, terminated as
|
||||||
|
usual by an empty line or end of file. This command may be followed by a
|
||||||
|
modifier list containing only
|
||||||
|
.\" HTML <a href="#controlmodifiers">
|
||||||
|
.\" </a>
|
||||||
|
control modifiers
|
||||||
|
.\"
|
||||||
|
that act after a pattern has been compiled. In particular, \fBhex\fP,
|
||||||
|
\fBposix\fP, and \fBpush\fP are not allowed, nor are any
|
||||||
|
.\" HTML <a href="#optionmodifiers">
|
||||||
|
.\" </a>
|
||||||
|
option-setting modifiers.
|
||||||
|
.\"
|
||||||
|
The JIT modifiers are, however permitted. Here is an example that saves and
|
||||||
|
reloads two patterns.
|
||||||
|
.sp
|
||||||
|
/abc/push
|
||||||
|
/xyz/push
|
||||||
|
#save tempfile
|
||||||
|
#load tempfile
|
||||||
|
#pop info
|
||||||
|
xyz
|
||||||
|
.sp
|
||||||
|
#pop jit,bincode
|
||||||
|
abc
|
||||||
|
.sp
|
||||||
|
If \fBjitverify\fP is used with #pop, it does not automatically imply
|
||||||
|
\fBjit\fP, which is different behaviour from when it is used on a pattern.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "SEE ALSO"
|
.SH "SEE ALSO"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
\fBpcre2\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
|
\fBpcre2\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
|
||||||
\fBpcre2jit\fP, \fBpcre2matching\fP(3), \fBpcre2partial\fP(d),
|
\fBpcre2jit\fP, \fBpcre2matching\fP(3), \fBpcre2partial\fP(d),
|
||||||
\fBpcre2pattern\fP(3).
|
\fBpcre2pattern\fP(3), \fBpcre2serialize\fP(3).
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH AUTHOR
|
.SH AUTHOR
|
||||||
|
@ -1275,6 +1398,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -17,10 +17,12 @@ SYNOPSIS
|
||||||
options, see the pcre2api documentation.
|
options, see the pcre2api documentation.
|
||||||
|
|
||||||
The input for pcre2test is a sequence of regular expression patterns
|
The input for pcre2test is a sequence of regular expression patterns
|
||||||
and subject strings to be matched. The output shows the result of each
|
and subject strings to be matched. There are also command lines for
|
||||||
match attempt. Modifiers on the command line, the patterns, and the
|
setting defaults and controlling some special actions. The output shows
|
||||||
subject lines specify PCRE2 function options, control how the subject
|
the result of each match attempt. Modifiers on external or internal
|
||||||
is processed, and what output is produced.
|
command lines, the patterns, and the subject lines specify PCRE2 func-
|
||||||
|
tion options, control how the subject is processed, and what output is
|
||||||
|
produced.
|
||||||
|
|
||||||
As the original fairly simple PCRE library evolved, it acquired many
|
As the original fairly simple PCRE library evolved, it acquired many
|
||||||
different features, and as a result, the original pcretest program
|
different features, and as a result, the original pcretest program
|
||||||
|
@ -173,9 +175,7 @@ DESCRIPTION
|
||||||
and writes to the second. If the first name is "-", input is taken from
|
and writes to the second. If the first name is "-", input is taken from
|
||||||
the standard input. If pcre2test is given only one argument, it reads
|
the standard input. If pcre2test is given only one argument, it reads
|
||||||
from that file and writes to stdout. Otherwise, it reads from stdin and
|
from that file and writes to stdout. Otherwise, it reads from stdin and
|
||||||
writes to stdout. When the input is a terminal, it prompts for each
|
writes to stdout.
|
||||||
line of input, using "re>" to prompt for regular expression patterns,
|
|
||||||
and "data>" to prompt for subject lines.
|
|
||||||
|
|
||||||
When pcre2test is built, a configuration option can specify that it
|
When pcre2test is built, a configuration option can specify that it
|
||||||
should be linked with the libreadline or libedit library. When this is
|
should be linked with the libreadline or libedit library. When this is
|
||||||
|
@ -186,11 +186,15 @@ DESCRIPTION
|
||||||
The program handles any number of tests, each of which consists of a
|
The program handles any number of tests, each of which consists of a
|
||||||
set of input lines. Each set starts with a regular expression pattern,
|
set of input lines. Each set starts with a regular expression pattern,
|
||||||
followed by any number of subject lines to be matched against that pat-
|
followed by any number of subject lines to be matched against that pat-
|
||||||
tern. In between sets of test data, command lines that begin with a
|
tern. In between sets of test data, command lines that begin with # may
|
||||||
hash (#) character may appear. This file format, with some restric-
|
appear. This file format, with some restrictions, can also be processed
|
||||||
tions, can also be processed by the perltest.sh script that is distrib-
|
by the perltest.sh script that is distributed with PCRE2 as a means of
|
||||||
uted with PCRE2 as a means of checking that the behaviour of PCRE2 and
|
checking that the behaviour of PCRE2 and Perl is the same.
|
||||||
Perl is the same.
|
|
||||||
|
When the input is a terminal, pcre2test prompts for each line of input,
|
||||||
|
using "re>" to prompt for regular expression patterns, and "data>" to
|
||||||
|
prompt for subject lines. Command lines starting with # can be entered
|
||||||
|
only in response to the "re>" prompt.
|
||||||
|
|
||||||
Each subject line is matched separately and independently. If you want
|
Each subject line is matched separately and independently. If you want
|
||||||
to do multi-line matches, you have to use the \n escape sequence (or \r
|
to do multi-line matches, you have to use the \n escape sequence (or \r
|
||||||
|
@ -207,22 +211,28 @@ DESCRIPTION
|
||||||
|
|
||||||
COMMAND LINES
|
COMMAND LINES
|
||||||
|
|
||||||
In between sets of test data, a line that begins with a hash (#) char-
|
In between sets of test data, a line that begins with # is interpreted
|
||||||
acter is interpreted as a command line. If the first character is fol-
|
as a command line. If the first character is followed by white space or
|
||||||
lowed by white space or an exclamation mark, the line is treated as a
|
an exclamation mark, the line is treated as a comment, and ignored.
|
||||||
comment, and ignored. Otherwise, the following commands are recog-
|
Otherwise, the following commands are recognized:
|
||||||
nized:
|
|
||||||
|
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
|
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
||||||
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
||||||
property features. This is a trigger guard that is used in test files
|
property features. This is a trigger guard that is used in test files
|
||||||
to ensure that UTF/Unicode tests are not accidentally added to files
|
to ensure that UTF or Unicode property tests are not accidentally added
|
||||||
that are used when UTF support is not included in the library. This
|
to files that are used when Unicode support is not included in the
|
||||||
effect can also be obtained by the use of #pattern; the difference is
|
library. This effect can also be obtained by the use of #pattern; the
|
||||||
that #forbid_utf cannot be unset, and the automatic options are not
|
difference is that #forbid_utf cannot be unset, and the automatic
|
||||||
displayed in pattern information, to avoid cluttering up test output.
|
options are not displayed in pattern information, to avoid cluttering
|
||||||
|
up test output.
|
||||||
|
|
||||||
|
#load <filename>
|
||||||
|
|
||||||
|
This command is used to load a set of precompiled patterns from a file,
|
||||||
|
as described in the section entitled "Saving and restoring compiled
|
||||||
|
patterns" below.
|
||||||
|
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
|
|
||||||
|
@ -240,6 +250,18 @@ COMMAND LINES
|
||||||
#perltest command helps detect tests that are accidentally put in the
|
#perltest command helps detect tests that are accidentally put in the
|
||||||
wrong file.
|
wrong file.
|
||||||
|
|
||||||
|
#pop [<modifiers>]
|
||||||
|
|
||||||
|
This command is used to manipulate the stack of compiled patterns, as
|
||||||
|
described in the section entitled "Saving and restoring compiled pat-
|
||||||
|
terns" below.
|
||||||
|
|
||||||
|
#save <filename>
|
||||||
|
|
||||||
|
This command is used to save a set of compiled patterns to a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled pat-
|
||||||
|
terns" below.
|
||||||
|
|
||||||
#subject <modifier-list>
|
#subject <modifier-list>
|
||||||
|
|
||||||
This command sets a default modifier list that applies to all subse-
|
This command sets a default modifier list that applies to all subse-
|
||||||
|
@ -432,8 +454,8 @@ PATTERN MODIFIERS
|
||||||
memory show memory used
|
memory show memory used
|
||||||
newline=<type> set newline type
|
newline=<type> set newline type
|
||||||
parens_nest_limit=<n> set maximum parentheses depth
|
parens_nest_limit=<n> set maximum parentheses depth
|
||||||
perlcompat lock out non-Perl modifiers
|
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
|
push push compiled pattern onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
|
|
||||||
|
@ -644,6 +666,19 @@ PATTERN MODIFIERS
|
||||||
These modifiers may not appear in a #pattern command. If you want them
|
These modifiers may not appear in a #pattern command. If you want them
|
||||||
as defaults, set them in a #subject command.
|
as defaults, set them in a #subject command.
|
||||||
|
|
||||||
|
Saving a compiled pattern
|
||||||
|
|
||||||
|
When a pattern with the push modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||||
|
next line to contain a new pattern (or a command) instead of a subject
|
||||||
|
line. This facility is used when saving compiled patterns to a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled pat-
|
||||||
|
terns" below. The push modifier is incompatible with compilation modi-
|
||||||
|
fiers such as global that act at match time. Any that are specified are
|
||||||
|
ignored, with a warning message, except for replace, which causes an
|
||||||
|
error. Note that, jitverify, which is allowed, does not carry through
|
||||||
|
to any subsequent matching that uses this pattern.
|
||||||
|
|
||||||
|
|
||||||
SUBJECT MODIFIERS
|
SUBJECT MODIFIERS
|
||||||
|
|
||||||
|
@ -1171,10 +1206,69 @@ NON-PRINTING CHARACTERS
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
|
||||||
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and
|
||||||
|
reload them later, subject to a number of restrictions. JIT data cannot
|
||||||
|
be saved. The host on which the patterns are reloaded must be running
|
||||||
|
the same version of PCRE2, with the same code unit width, and must also
|
||||||
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
|
ber of compiled patterns, but they must all use the same character
|
||||||
|
tables. A single copy of the tables is included in the byte stream (its
|
||||||
|
size is 1088 bytes).
|
||||||
|
|
||||||
|
The functions whose names begin with pcre2_serialize_ are used for
|
||||||
|
serializing and de-serializing. They are described in the pcre2serial-
|
||||||
|
ize documentation. In this section we describe the features of
|
||||||
|
pcre2test that can be used to test these functions.
|
||||||
|
|
||||||
|
When a pattern with push modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||||
|
next line to contain a new pattern (or command) instead of a subject
|
||||||
|
line. By this means, a number of patterns can be compiled and retained.
|
||||||
|
The push modifier is incompatible with posix, and control modifiers
|
||||||
|
that act at match time are ignored (with a message). The jitverify mod-
|
||||||
|
ifier applies only at compile time. The command
|
||||||
|
|
||||||
|
#save <filename>
|
||||||
|
|
||||||
|
causes all the stacked patterns to be serialized and the result written
|
||||||
|
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||||
|
command
|
||||||
|
|
||||||
|
#load <filename>
|
||||||
|
|
||||||
|
reads the data in the file, and then arranges for it to be de-serial-
|
||||||
|
ized, with the resulting compiled patterns added to the pattern stack.
|
||||||
|
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||||
|
mand, which must be followed by lines of subjects that are to be
|
||||||
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
|
of file. This command may be followed by a modifier list containing
|
||||||
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
|
particular, hex, posix, and push are not allowed, nor are any option-
|
||||||
|
setting modifiers. The JIT modifiers are, however permitted. Here is
|
||||||
|
an example that saves and reloads two patterns.
|
||||||
|
|
||||||
|
/abc/push
|
||||||
|
/xyz/push
|
||||||
|
#save tempfile
|
||||||
|
#load tempfile
|
||||||
|
#pop info
|
||||||
|
xyz
|
||||||
|
|
||||||
|
#pop jit,bincode
|
||||||
|
abc
|
||||||
|
|
||||||
|
If jitverify is used with #pop, it does not automatically imply jit,
|
||||||
|
which is different behaviour from when it is used on a pattern.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3),
|
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3),
|
||||||
pcre2partial(d), pcre2pattern(3).
|
pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
|
||||||
|
|
||||||
|
|
||||||
AUTHOR
|
AUTHOR
|
||||||
|
@ -1186,5 +1280,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
|
|
|
@ -293,7 +293,7 @@ if [ $usevalgrind -ne 0 ]; then
|
||||||
|
|
||||||
for opts in \
|
for opts in \
|
||||||
"--disable-stack-for-recursion --disable-shared" \
|
"--disable-stack-for-recursion --disable-shared" \
|
||||||
"--with-link-size=3 --disable-shared" \
|
"--with-link-size=3 --enable-pcre2-16 --enable-pcre2-32 --disable-shared" \
|
||||||
"--disable-unicode --disable-shared"
|
"--disable-unicode --disable-shared"
|
||||||
do
|
do
|
||||||
opts="--enable-valgrind $opts"
|
opts="--enable-valgrind $opts"
|
||||||
|
|
|
@ -200,7 +200,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_NAME "PCRE2"
|
#define PACKAGE_NAME "PCRE2"
|
||||||
|
|
||||||
/* Define to the full name and version of this package. */
|
/* Define to the full name and version of this package. */
|
||||||
#define PACKAGE_STRING "PCRE2 10.00"
|
#define PACKAGE_STRING "PCRE2 10.10-RC1"
|
||||||
|
|
||||||
/* Define to the one symbol short name of this package. */
|
/* Define to the one symbol short name of this package. */
|
||||||
#define PACKAGE_TARNAME "pcre2"
|
#define PACKAGE_TARNAME "pcre2"
|
||||||
|
@ -209,7 +209,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_URL ""
|
#define PACKAGE_URL ""
|
||||||
|
|
||||||
/* Define to the version of this package. */
|
/* Define to the version of this package. */
|
||||||
#define PACKAGE_VERSION "10.00"
|
#define PACKAGE_VERSION "10.10-RC1"
|
||||||
|
|
||||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||||
|
@ -287,7 +287,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
/* #undef SUPPORT_VALGRIND */
|
/* #undef SUPPORT_VALGRIND */
|
||||||
|
|
||||||
/* Version number of package */
|
/* Version number of package */
|
||||||
#define VERSION "10.00"
|
#define VERSION "10.10-RC1"
|
||||||
|
|
||||||
/* Define to empty if `const' does not conform to ANSI C. */
|
/* Define to empty if `const' does not conform to ANSI C. */
|
||||||
/* #undef const */
|
/* #undef const */
|
||||||
|
|
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
/* The current PCRE version information. */
|
/* The current PCRE version information. */
|
||||||
|
|
||||||
#define PCRE2_MAJOR 10
|
#define PCRE2_MAJOR 10
|
||||||
#define PCRE2_MINOR 00
|
#define PCRE2_MINOR 10
|
||||||
#define PCRE2_PRERELEASE
|
#define PCRE2_PRERELEASE -RC1
|
||||||
#define PCRE2_DATE 2014-01-05
|
#define PCRE2_DATE 2014-01-13
|
||||||
|
|
||||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||||
imported have to be identified as such. When building PCRE2, the appropriate
|
imported have to be identified as such. When building PCRE2, the appropriate
|
||||||
|
@ -455,6 +455,18 @@ PCRE2_EXP_DECL void pcre2_substring_list_free(PCRE2_SPTR *); \
|
||||||
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
||||||
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
||||||
|
|
||||||
|
/* Functions for serializing / deserializing compiled patterns. */
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_EXP_DECL int pcre2_serialize_encode(const pcre2_code **, \
|
||||||
|
PCRE2_SIZE, uint8_t **, PCRE2_SIZE *, \
|
||||||
|
pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int pcre2_serialize_decode(pcre2_code **, PCRE2_SIZE, \
|
||||||
|
const uint8_t *, pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int pcre2_serialize_get_number_of_codes(const uint8_t *, \
|
||||||
|
PCRE2_SIZE *); \
|
||||||
|
PCRE2_EXP_DECL void pcre2_serialize_free(uint8_t *);
|
||||||
|
|
||||||
|
|
||||||
/* Convenience function for match + substitute. */
|
/* Convenience function for match + substitute. */
|
||||||
|
|
||||||
|
@ -560,6 +572,10 @@ pcre2_compile are called by application code. */
|
||||||
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
||||||
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
||||||
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
||||||
|
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
|
||||||
|
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
|
||||||
|
#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_)
|
||||||
|
#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_)
|
||||||
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
||||||
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
||||||
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
||||||
|
@ -596,8 +612,9 @@ PCRE2_MATCH_CONTEXT_FUNCTIONS \
|
||||||
PCRE2_COMPILE_FUNCTIONS \
|
PCRE2_COMPILE_FUNCTIONS \
|
||||||
PCRE2_PATTERN_INFO_FUNCTIONS \
|
PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||||
PCRE2_MATCH_FUNCTIONS \
|
PCRE2_MATCH_FUNCTIONS \
|
||||||
PCRE2_SUBSTITUTE_FUNCTION \
|
|
||||||
PCRE2_SUBSTRING_FUNCTIONS \
|
PCRE2_SUBSTRING_FUNCTIONS \
|
||||||
|
PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_SUBSTITUTE_FUNCTION \
|
||||||
PCRE2_JIT_FUNCTIONS \
|
PCRE2_JIT_FUNCTIONS \
|
||||||
PCRE2_OTHER_FUNCTIONS
|
PCRE2_OTHER_FUNCTIONS
|
||||||
|
|
||||||
|
@ -625,6 +642,8 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
||||||
#undef PCRE2_MATCH_FUNCTIONS
|
#undef PCRE2_MATCH_FUNCTIONS
|
||||||
#undef PCRE2_SUBSTRING_FUNCTIONS
|
#undef PCRE2_SUBSTRING_FUNCTIONS
|
||||||
|
#undef PCRE2_SERIALIZE_FUNCTIONS
|
||||||
|
#undef PCRE2_SUBSTITUTE_FUNCTION
|
||||||
#undef PCRE2_JIT_FUNCTIONS
|
#undef PCRE2_JIT_FUNCTIONS
|
||||||
#undef PCRE2_OTHER_FUNCTIONS
|
#undef PCRE2_OTHER_FUNCTIONS
|
||||||
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
|
|
|
@ -198,11 +198,13 @@ greater than zero. */
|
||||||
#define PCRE2_ERROR_UTF32_ERR1 (-27)
|
#define PCRE2_ERROR_UTF32_ERR1 (-27)
|
||||||
#define PCRE2_ERROR_UTF32_ERR2 (-28)
|
#define PCRE2_ERROR_UTF32_ERR2 (-28)
|
||||||
|
|
||||||
/* Error codes for pcre2[_dfa]_match(), substring extraction functions, and
|
/* Error codes for pcre2[_dfa]_match(), substring extraction functions, context
|
||||||
context functions. */
|
functions, and serializing functions. They are in numerical order. Originally
|
||||||
|
they were in alphabetical order too, but now that PCRE2 is released, the
|
||||||
|
numbers must not be changed. */
|
||||||
|
|
||||||
#define PCRE2_ERROR_BADDATA (-29)
|
#define PCRE2_ERROR_BADDATA (-29)
|
||||||
#define PCRE2_ERROR_BADLENGTH (-30)
|
#define PCRE2_ERROR_MIXEDTABLES (-30) /* Name was changed */
|
||||||
#define PCRE2_ERROR_BADMAGIC (-31)
|
#define PCRE2_ERROR_BADMAGIC (-31)
|
||||||
#define PCRE2_ERROR_BADMODE (-32)
|
#define PCRE2_ERROR_BADMODE (-32)
|
||||||
#define PCRE2_ERROR_BADOFFSET (-33)
|
#define PCRE2_ERROR_BADOFFSET (-33)
|
||||||
|
@ -455,6 +457,17 @@ PCRE2_EXP_DECL void pcre2_substring_list_free(PCRE2_SPTR *); \
|
||||||
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
||||||
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
||||||
|
|
||||||
|
/* Functions for serializing / deserializing compiled patterns. */
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_EXP_DECL int32_t pcre2_serialize_encode(const pcre2_code **, \
|
||||||
|
int32_t, uint8_t **, PCRE2_SIZE *, \
|
||||||
|
pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int32_t pcre2_serialize_decode(pcre2_code **, int32_t, \
|
||||||
|
const uint8_t *, pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int32_t pcre2_serialize_get_number_of_codes(const uint8_t *); \
|
||||||
|
PCRE2_EXP_DECL void pcre2_serialize_free(uint8_t *);
|
||||||
|
|
||||||
|
|
||||||
/* Convenience function for match + substitute. */
|
/* Convenience function for match + substitute. */
|
||||||
|
|
||||||
|
@ -560,6 +573,10 @@ pcre2_compile are called by application code. */
|
||||||
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
||||||
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
||||||
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
||||||
|
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
|
||||||
|
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
|
||||||
|
#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_)
|
||||||
|
#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_)
|
||||||
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
||||||
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
||||||
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
||||||
|
@ -596,8 +613,9 @@ PCRE2_MATCH_CONTEXT_FUNCTIONS \
|
||||||
PCRE2_COMPILE_FUNCTIONS \
|
PCRE2_COMPILE_FUNCTIONS \
|
||||||
PCRE2_PATTERN_INFO_FUNCTIONS \
|
PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||||
PCRE2_MATCH_FUNCTIONS \
|
PCRE2_MATCH_FUNCTIONS \
|
||||||
PCRE2_SUBSTITUTE_FUNCTION \
|
|
||||||
PCRE2_SUBSTRING_FUNCTIONS \
|
PCRE2_SUBSTRING_FUNCTIONS \
|
||||||
|
PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_SUBSTITUTE_FUNCTION \
|
||||||
PCRE2_JIT_FUNCTIONS \
|
PCRE2_JIT_FUNCTIONS \
|
||||||
PCRE2_OTHER_FUNCTIONS
|
PCRE2_OTHER_FUNCTIONS
|
||||||
|
|
||||||
|
@ -625,6 +643,8 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
||||||
#undef PCRE2_MATCH_FUNCTIONS
|
#undef PCRE2_MATCH_FUNCTIONS
|
||||||
#undef PCRE2_SUBSTRING_FUNCTIONS
|
#undef PCRE2_SUBSTRING_FUNCTIONS
|
||||||
|
#undef PCRE2_SERIALIZE_FUNCTIONS
|
||||||
|
#undef PCRE2_SUBSTITUTE_FUNCTION
|
||||||
#undef PCRE2_JIT_FUNCTIONS
|
#undef PCRE2_JIT_FUNCTIONS
|
||||||
#undef PCRE2_OTHER_FUNCTIONS
|
#undef PCRE2_OTHER_FUNCTIONS
|
||||||
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
|
|
|
@ -683,10 +683,28 @@ static const uint8_t opcode_possessify[] = {
|
||||||
PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
|
||||||
pcre2_code_free(pcre2_code *code)
|
pcre2_code_free(pcre2_code *code)
|
||||||
{
|
{
|
||||||
|
PCRE2_SIZE* ref_count;
|
||||||
|
|
||||||
if (code != NULL)
|
if (code != NULL)
|
||||||
{
|
{
|
||||||
if (code->executable_jit != NULL)
|
if (code->executable_jit != NULL)
|
||||||
PRIV(jit_free)(code->executable_jit, &code->memctl);
|
PRIV(jit_free)(code->executable_jit, &code->memctl);
|
||||||
|
|
||||||
|
if ((code->flags & PCRE2_DEREF_TABLES) != 0)
|
||||||
|
{
|
||||||
|
/* Decoded tables belong to the codes after deserialization, and they must
|
||||||
|
be freed when there are no more reference to them. The *ref_count should
|
||||||
|
always be > 0. */
|
||||||
|
|
||||||
|
ref_count = (PCRE2_SIZE *)(code->tables + tables_length);
|
||||||
|
if (*ref_count > 0)
|
||||||
|
{
|
||||||
|
(*ref_count)--;
|
||||||
|
if (*ref_count == 0)
|
||||||
|
code->memctl.free((void *)code->tables, code->memctl.memory_data);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
code->memctl.free(code, code->memctl.memory_data);
|
code->memctl.free(code, code->memctl.memory_data);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -7317,8 +7335,14 @@ for (i = 0; i < cb->names_found; i++)
|
||||||
|
|
||||||
PUT2(slot, 0, groupno);
|
PUT2(slot, 0, groupno);
|
||||||
memcpy(slot + IMM2_SIZE, name, CU2BYTES(length));
|
memcpy(slot + IMM2_SIZE, name, CU2BYTES(length));
|
||||||
slot[IMM2_SIZE + length] = 0;
|
|
||||||
cb->names_found++;
|
cb->names_found++;
|
||||||
|
|
||||||
|
/* Add a terminating zero and fill the rest of the slot with zeroes so that
|
||||||
|
the memory is all initialized. Otherwise valgrind moans about uninitialized
|
||||||
|
memory when saving serialized compiled patterns. */
|
||||||
|
|
||||||
|
memset(slot + IMM2_SIZE + length, 0,
|
||||||
|
CU2BYTES(cb->name_entry_size - length - IMM2_SIZE));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -7356,6 +7380,7 @@ PCRE2_SPTR codestart; /* Start of compiled code */
|
||||||
PCRE2_SPTR ptr; /* Current pointer in pattern */
|
PCRE2_SPTR ptr; /* Current pointer in pattern */
|
||||||
|
|
||||||
size_t length = 1; /* Allow or final END opcode */
|
size_t length = 1; /* Allow or final END opcode */
|
||||||
|
size_t usedlength; /* Actual length used */
|
||||||
size_t re_blocksize; /* Size of memory block */
|
size_t re_blocksize; /* Size of memory block */
|
||||||
|
|
||||||
int32_t firstcuflags, reqcuflags; /* Type of first/req code unit */
|
int32_t firstcuflags, reqcuflags; /* Type of first/req code unit */
|
||||||
|
@ -7754,13 +7779,16 @@ overflow. */
|
||||||
|
|
||||||
if (errorcode == 0 && ptr < cb.end_pattern) errorcode = ERR22;
|
if (errorcode == 0 && ptr < cb.end_pattern) errorcode = ERR22;
|
||||||
*code++ = OP_END;
|
*code++ = OP_END;
|
||||||
if ((size_t)(code - codestart) > length) errorcode = ERR23;
|
usedlength = code - codestart;
|
||||||
|
if (usedlength > length) errorcode = ERR23;
|
||||||
|
|
||||||
|
/* If the estimated length exceeds the really used length, adjust the value of
|
||||||
|
re->blocksize, and if valgrind support is configured, mark the extra allocated
|
||||||
|
memory as unaddressable, so that any out-of-bound reads can be detected. */
|
||||||
|
|
||||||
|
re->blocksize -= CU2BYTES(length - usedlength);
|
||||||
#ifdef SUPPORT_VALGRIND
|
#ifdef SUPPORT_VALGRIND
|
||||||
/* If the estimated length exceeds the really used length, mark the extra
|
VALGRIND_MAKE_MEM_NOACCESS(code, CU2BYTES(length - usedlength));
|
||||||
allocated memory as unaddressable, so that any out-of-bound reads can be
|
|
||||||
detected. */
|
|
||||||
VALGRIND_MAKE_MEM_NOACCESS(code, (length - (code - codestart)) * sizeof(PCRE2_UCHAR));
|
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* Fill in any forward references that are required. There may be repeated
|
/* Fill in any forward references that are required. There may be repeated
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2014 University of Cambridge
|
New API code Copyright (c) 2015 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -200,7 +200,7 @@ static const char match_error_texts[] =
|
||||||
"UTF-32 error: code points greater than 0x10ffff are not defined\0"
|
"UTF-32 error: code points greater than 0x10ffff are not defined\0"
|
||||||
"bad data value\0"
|
"bad data value\0"
|
||||||
/* 30 */
|
/* 30 */
|
||||||
"bad length\0"
|
"patterns do not all use the same character tables\0"
|
||||||
"magic number missing\0"
|
"magic number missing\0"
|
||||||
"pattern compiled in wrong mode: 8/16/32-bit error\0"
|
"pattern compiled in wrong mode: 8/16/32-bit error\0"
|
||||||
"bad offset value\0"
|
"bad offset value\0"
|
||||||
|
|
|
@ -523,6 +523,7 @@ bytes in a code unit in that mode. */
|
||||||
#define PCRE2_NL_SET 0x00008000 /* newline was set in the pattern */
|
#define PCRE2_NL_SET 0x00008000 /* newline was set in the pattern */
|
||||||
#define PCRE2_NOTEMPTY_SET 0x00010000 /* (*NOTEMPTY) used ) keep */
|
#define PCRE2_NOTEMPTY_SET 0x00010000 /* (*NOTEMPTY) used ) keep */
|
||||||
#define PCRE2_NE_ATST_SET 0x00020000 /* (*NOTEMPTY_ATSTART) used) together */
|
#define PCRE2_NE_ATST_SET 0x00020000 /* (*NOTEMPTY_ATSTART) used) together */
|
||||||
|
#define PCRE2_DEREF_TABLES 0x00040000 /* Release character tables. */
|
||||||
|
|
||||||
#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)
|
#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)
|
||||||
|
|
||||||
|
@ -1763,6 +1764,15 @@ typedef struct {
|
||||||
#define UCD_CASESET(ch) GET_UCD(ch)->caseset
|
#define UCD_CASESET(ch) GET_UCD(ch)->caseset
|
||||||
#define UCD_OTHERCASE(ch) ((uint32_t)((int)ch + (int)(GET_UCD(ch)->other_case)))
|
#define UCD_OTHERCASE(ch) ((uint32_t)((int)ch + (int)(GET_UCD(ch)->other_case)))
|
||||||
|
|
||||||
|
/* Header for serialized pcre2 codes. */
|
||||||
|
|
||||||
|
typedef struct pcre2_serialized_data {
|
||||||
|
uint32_t magic;
|
||||||
|
uint32_t version;
|
||||||
|
uint32_t config;
|
||||||
|
int32_t number_of_codes;
|
||||||
|
} pcre2_serialized_data;
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/* ----------------- Items that need PCRE2_CODE_UNIT_WIDTH ----------------- */
|
/* ----------------- Items that need PCRE2_CODE_UNIT_WIDTH ----------------- */
|
||||||
|
|
|
@ -0,0 +1,251 @@
|
||||||
|
/*************************************************
|
||||||
|
* Perl-Compatible Regular Expressions *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
/* PCRE is a library of functions to support regular expressions whose syntax
|
||||||
|
and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
|
Written by Philip Hazel
|
||||||
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
|
New API code Copyright (c) 2015 University of Cambridge
|
||||||
|
|
||||||
|
-----------------------------------------------------------------------------
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of the University of Cambridge nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||||
|
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||||
|
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||||
|
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||||
|
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||||
|
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||||
|
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||||
|
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||||
|
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
-----------------------------------------------------------------------------
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* This module contains functions for serializing and deserializing
|
||||||
|
a sequence of compiled codes. */
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef HAVE_CONFIG_H
|
||||||
|
#include "config.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
#include "pcre2_internal.h"
|
||||||
|
|
||||||
|
/* Magic number to provide a small check against being handed junk. */
|
||||||
|
|
||||||
|
#define SERIALIZED_DATA_MAGIC 0x50523253u
|
||||||
|
|
||||||
|
/* Deserialization is limited to the current PCRE version and
|
||||||
|
character width. */
|
||||||
|
|
||||||
|
#define SERIALIZED_DATA_VERSION \
|
||||||
|
((PCRE2_MAJOR) | ((PCRE2_MINOR) << 16))
|
||||||
|
|
||||||
|
#define SERIALIZED_DATA_CONFIG \
|
||||||
|
(sizeof(PCRE2_UCHAR) | ((sizeof(void*)) << 8) | ((sizeof(PCRE2_SIZE)) << 16))
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Serialize compiled patterns *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_encode(const pcre2_code **codes, int32_t number_of_codes,
|
||||||
|
uint8_t **serialized_bytes, PCRE2_SIZE *serialized_size,
|
||||||
|
pcre2_general_context *gcontext)
|
||||||
|
{
|
||||||
|
uint8_t *bytes;
|
||||||
|
uint8_t *dst_bytes;
|
||||||
|
int32_t i;
|
||||||
|
PCRE2_SIZE total_size;
|
||||||
|
const pcre2_real_code *re;
|
||||||
|
const uint8_t *tables;
|
||||||
|
pcre2_serialized_data *data;
|
||||||
|
|
||||||
|
const pcre2_memctl *memctl = (gcontext != NULL) ?
|
||||||
|
&gcontext->memctl : &PRIV(default_compile_context).memctl;
|
||||||
|
|
||||||
|
if (codes == NULL || serialized_bytes == NULL || serialized_size == NULL)
|
||||||
|
return PCRE2_ERROR_NULL;
|
||||||
|
|
||||||
|
if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA;
|
||||||
|
|
||||||
|
/* Compute total size. */
|
||||||
|
total_size = sizeof(pcre2_serialized_data) + tables_length;
|
||||||
|
tables = NULL;
|
||||||
|
|
||||||
|
for (i = 0; i < number_of_codes; i++)
|
||||||
|
{
|
||||||
|
if (codes[i] == NULL) return PCRE2_ERROR_NULL;
|
||||||
|
re = (const pcre2_real_code *)(codes[i]);
|
||||||
|
if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC;
|
||||||
|
if (tables == NULL)
|
||||||
|
tables = re->tables;
|
||||||
|
else if (tables != re->tables)
|
||||||
|
return PCRE2_ERROR_MIXEDTABLES;
|
||||||
|
total_size += re->blocksize;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Initialize the byte stream. */
|
||||||
|
bytes = memctl->malloc(total_size + sizeof(pcre2_memctl), memctl->memory_data);
|
||||||
|
if (bytes == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
|
||||||
|
/* The controller is stored as a hidden parameter. */
|
||||||
|
memcpy(bytes, memctl, sizeof(pcre2_memctl));
|
||||||
|
bytes += sizeof(pcre2_memctl);
|
||||||
|
|
||||||
|
data = (pcre2_serialized_data *)bytes;
|
||||||
|
data->magic = SERIALIZED_DATA_MAGIC;
|
||||||
|
data->version = SERIALIZED_DATA_VERSION;
|
||||||
|
data->config = SERIALIZED_DATA_CONFIG;
|
||||||
|
data->number_of_codes = number_of_codes;
|
||||||
|
|
||||||
|
/* Copy all compiled code data. */
|
||||||
|
dst_bytes = bytes + sizeof(pcre2_serialized_data);
|
||||||
|
memcpy(dst_bytes, tables, tables_length);
|
||||||
|
dst_bytes += tables_length;
|
||||||
|
|
||||||
|
for (i = 0; i < number_of_codes; i++)
|
||||||
|
{
|
||||||
|
re = (const pcre2_real_code *)(codes[i]);
|
||||||
|
memcpy(dst_bytes, (char *)re, re->blocksize);
|
||||||
|
dst_bytes += re->blocksize;
|
||||||
|
}
|
||||||
|
|
||||||
|
*serialized_bytes = bytes;
|
||||||
|
*serialized_size = total_size;
|
||||||
|
return number_of_codes;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Deserialize compiled patterns *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_decode(pcre2_code **codes, int32_t number_of_codes,
|
||||||
|
const uint8_t *bytes, pcre2_general_context *gcontext)
|
||||||
|
{
|
||||||
|
const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes;
|
||||||
|
const pcre2_memctl *memctl = (gcontext != NULL) ?
|
||||||
|
&gcontext->memctl : &PRIV(default_compile_context).memctl;
|
||||||
|
|
||||||
|
const uint8_t *src_bytes;
|
||||||
|
pcre2_real_code *src_re;
|
||||||
|
pcre2_real_code *dst_re;
|
||||||
|
uint8_t *tables;
|
||||||
|
int32_t i, j;
|
||||||
|
|
||||||
|
/* Sanity checks. */
|
||||||
|
|
||||||
|
if (data == NULL || codes == NULL) return PCRE2_ERROR_NULL;
|
||||||
|
if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA;
|
||||||
|
if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC;
|
||||||
|
if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE;
|
||||||
|
if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE;
|
||||||
|
|
||||||
|
if (number_of_codes > data->number_of_codes)
|
||||||
|
number_of_codes = data->number_of_codes;
|
||||||
|
|
||||||
|
src_bytes = bytes + sizeof(pcre2_serialized_data);
|
||||||
|
|
||||||
|
/* Decode tables. The reference count for the tables is stored immediately
|
||||||
|
following them. */
|
||||||
|
|
||||||
|
tables = memctl->malloc(tables_length + sizeof(PCRE2_SIZE), memctl->memory_data);
|
||||||
|
if (tables == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
|
||||||
|
memcpy(tables, src_bytes, tables_length);
|
||||||
|
*(PCRE2_SIZE *)(tables + tables_length) = number_of_codes;
|
||||||
|
src_bytes += tables_length;
|
||||||
|
|
||||||
|
/* Decode byte stream. */
|
||||||
|
|
||||||
|
for (i = 0; i < number_of_codes; i++)
|
||||||
|
{
|
||||||
|
src_re = (pcre2_real_code *)src_bytes;
|
||||||
|
|
||||||
|
/* The allocator provided by gcontext replaces the original one. */
|
||||||
|
dst_re = (pcre2_real_code *)PRIV(memctl_malloc)
|
||||||
|
(src_re->blocksize, (pcre2_memctl *)gcontext);
|
||||||
|
if (dst_re == NULL)
|
||||||
|
{
|
||||||
|
memctl->free(tables, memctl->memory_data);
|
||||||
|
for (j = 0; j < i; j++)
|
||||||
|
{
|
||||||
|
memctl->free(codes[j], memctl->memory_data);
|
||||||
|
codes[j] = NULL;
|
||||||
|
}
|
||||||
|
return PCRE2_ERROR_NOMEMORY;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* The new allocator must be preserved. */
|
||||||
|
memcpy(((uint8_t *)dst_re) + sizeof(pcre2_memctl),
|
||||||
|
src_bytes + sizeof(pcre2_memctl),
|
||||||
|
src_re->blocksize - sizeof(pcre2_memctl));
|
||||||
|
|
||||||
|
/* At the moment only one table is supported. */
|
||||||
|
dst_re->tables = tables;
|
||||||
|
dst_re->executable_jit = NULL;
|
||||||
|
dst_re->flags |= PCRE2_DEREF_TABLES;
|
||||||
|
|
||||||
|
codes[i] = dst_re;
|
||||||
|
src_bytes += src_re->blocksize;
|
||||||
|
}
|
||||||
|
|
||||||
|
return number_of_codes;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Get the number of serialized patterns *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_get_number_of_codes(const uint8_t *bytes)
|
||||||
|
{
|
||||||
|
const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes;
|
||||||
|
|
||||||
|
if (data == NULL) return PCRE2_ERROR_NULL;
|
||||||
|
if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC;
|
||||||
|
if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE;
|
||||||
|
if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE;
|
||||||
|
|
||||||
|
return data->number_of_codes;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Free the allocated stream *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_free(uint8_t *bytes)
|
||||||
|
{
|
||||||
|
if (bytes != NULL)
|
||||||
|
{
|
||||||
|
pcre2_memctl *memctl = (pcre2_memctl *)(bytes - sizeof(pcre2_memctl));
|
||||||
|
memctl->free(memctl, memctl->memory_data);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* End of pcre2_serialize.c */
|
498
src/pcre2test.c
498
src/pcre2test.c
|
@ -166,6 +166,7 @@ void vms_setsymbol( char *, char *, int );
|
||||||
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
||||||
#define LOCALESIZE 32 /* Size of locale name */
|
#define LOCALESIZE 32 /* Size of locale name */
|
||||||
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
||||||
|
#define PATSTACKSIZE 20 /* Pattern stack for save/restore testing */
|
||||||
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
||||||
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
||||||
|
|
||||||
|
@ -313,6 +314,26 @@ modes, so use the form of the first that is available. */
|
||||||
#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_32
|
#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_32
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
/* ------------- Structure and table for handling #-commands ------------- */
|
||||||
|
|
||||||
|
typedef struct cmdstruct {
|
||||||
|
const char *name;
|
||||||
|
int value;
|
||||||
|
} cmdstruct;
|
||||||
|
|
||||||
|
enum { CMD_FORBID_UTF, CMD_LOAD, CMD_PATTERN, CMD_PERLTEST, CMD_POP, CMD_SAVE,
|
||||||
|
CMD_SUBJECT, CMD_UNKNOWN };
|
||||||
|
|
||||||
|
static cmdstruct cmdlist[] = {
|
||||||
|
{ "forbid_utf", CMD_FORBID_UTF },
|
||||||
|
{ "load", CMD_LOAD },
|
||||||
|
{ "pattern", CMD_PATTERN },
|
||||||
|
{ "perltest", CMD_PERLTEST },
|
||||||
|
{ "pop", CMD_POP },
|
||||||
|
{ "save", CMD_SAVE },
|
||||||
|
{ "subject", CMD_SUBJECT }};
|
||||||
|
|
||||||
|
#define cmdlistcount sizeof(cmdlist)/sizeof(cmdstruct)
|
||||||
|
|
||||||
/* ------------- Structures and tables for handling modifiers -------------- */
|
/* ------------- Structures and tables for handling modifiers -------------- */
|
||||||
|
|
||||||
|
@ -367,8 +388,9 @@ either on a pattern or a data line, so they must all be distinct. */
|
||||||
#define CTL_MARK 0x00020000u
|
#define CTL_MARK 0x00020000u
|
||||||
#define CTL_MEMORY 0x00040000u
|
#define CTL_MEMORY 0x00040000u
|
||||||
#define CTL_POSIX 0x00080000u
|
#define CTL_POSIX 0x00080000u
|
||||||
#define CTL_STARTCHAR 0x00100000u
|
#define CTL_PUSH 0x00100000u
|
||||||
#define CTL_ZERO_TERMINATE 0x00200000u
|
#define CTL_STARTCHAR 0x00200000u
|
||||||
|
#define CTL_ZERO_TERMINATE 0x00400000u
|
||||||
|
|
||||||
#define CTL_BSR_SET 0x80000000u /* This is informational */
|
#define CTL_BSR_SET 0x80000000u /* This is informational */
|
||||||
#define CTL_NL_SET 0x40000000u /* This is informational */
|
#define CTL_NL_SET 0x40000000u /* This is informational */
|
||||||
|
@ -426,6 +448,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
|
||||||
/* Ids for which context to modify. */
|
/* Ids for which context to modify. */
|
||||||
|
|
||||||
enum { CTX_PAT, /* Active pattern context */
|
enum { CTX_PAT, /* Active pattern context */
|
||||||
|
CTX_POPPAT, /* Ditto, for a popped pattern */
|
||||||
CTX_DEFPAT, /* Default pattern context */
|
CTX_DEFPAT, /* Default pattern context */
|
||||||
CTX_DAT, /* Active data (match) context */
|
CTX_DAT, /* Active data (match) context */
|
||||||
CTX_DEFDAT }; /* Default data (match) context */
|
CTX_DEFDAT }; /* Default data (match) context */
|
||||||
|
@ -513,6 +536,7 @@ static modstruct modlist[] = {
|
||||||
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||||
|
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
|
||||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
||||||
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
||||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||||
|
@ -544,6 +568,20 @@ static modstruct modlist[] = {
|
||||||
|
|
||||||
#define EXCLUSIVE_DAT_CONTROLS (CTL_ALLUSEDTEXT|CTL_STARTCHAR)
|
#define EXCLUSIVE_DAT_CONTROLS (CTL_ALLUSEDTEXT|CTL_STARTCHAR)
|
||||||
|
|
||||||
|
/* Control bits that are not ignored with 'push'. */
|
||||||
|
|
||||||
|
#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \
|
||||||
|
CTL_BINCODE|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO|CTL_JITVERIFY| \
|
||||||
|
CTL_MEMORY|CTL_PUSH|CTL_BSR_SET|CTL_NL_SET)
|
||||||
|
|
||||||
|
/* Controls that apply only at compile time with 'push'. */
|
||||||
|
|
||||||
|
#define PUSH_COMPILE_ONLY_CONTROLS CTL_JITVERIFY
|
||||||
|
|
||||||
|
/* Controls that are forbidden with #pop. */
|
||||||
|
|
||||||
|
#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_PUSH)
|
||||||
|
|
||||||
/* Table of single-character abbreviated modifiers. The index field is
|
/* Table of single-character abbreviated modifiers. The index field is
|
||||||
initialized to -1, but the first time the modifier is encountered, it is filled
|
initialized to -1, but the first time the modifier is encountered, it is filled
|
||||||
in with the index of the full entry in modlist, to save repeated searching when
|
in with the index of the full entry in modlist, to save repeated searching when
|
||||||
|
@ -671,6 +709,9 @@ static patctl pat_patctl;
|
||||||
static datctl def_datctl;
|
static datctl def_datctl;
|
||||||
static datctl dat_datctl;
|
static datctl dat_datctl;
|
||||||
|
|
||||||
|
static void *patstack[PATSTACKSIZE];
|
||||||
|
static int patstacknext = 0;
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
static regex_t preg = { NULL, NULL, 0, 0 };
|
static regex_t preg = { NULL, NULL, 0, 0 };
|
||||||
#endif
|
#endif
|
||||||
|
@ -928,6 +969,38 @@ are supported. */
|
||||||
else \
|
else \
|
||||||
pcre2_printint_32(compiled_code32,outfile,a)
|
pcre2_printint_32(compiled_code32,outfile,a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8)); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16)); \
|
||||||
|
else \
|
||||||
|
r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8)); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16)); \
|
||||||
|
else \
|
||||||
|
r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
pcre2_serialize_free_8(a); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
pcre2_serialize_free_16(a); \
|
||||||
|
else \
|
||||||
|
pcre2_serialize_free_32(a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_8(a); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_16(a); \
|
||||||
|
else \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_32(a); \
|
||||||
|
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
if (test_mode == PCRE8_MODE) \
|
if (test_mode == PCRE8_MODE) \
|
||||||
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c); \
|
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c); \
|
||||||
|
@ -1302,6 +1375,30 @@ the three different cases. */
|
||||||
else \
|
else \
|
||||||
G(pcre2_printint_,BITTWO)(G(compiled_code,BITTWO),outfile,a)
|
G(pcre2_printint_,BITTWO)(G(compiled_code,BITTWO),outfile,a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
r = G(pcre2_serialize_decode_,BITONE)((G(pcre2_code_,BITONE) **)a,b,c,G(d,BITONE)); \
|
||||||
|
else \
|
||||||
|
r = G(pcre2_serialize_decode_,BITTWO)((G(pcre2_code_,BITTWO) **)a,b,c,G(d,BITTWO))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
r = G(pcre2_serialize_encode_,BITONE)((G(const pcre2_code_,BITONE) **)a,b,c,d,G(e,BITONE)); \
|
||||||
|
else \
|
||||||
|
r = G(pcre2_serialize_encode_,BITTWO)((G(const pcre2_code_,BITTWO) **)a,b,c,d,G(e,BITTWO))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
G(pcre2_serialize_free_,BITONE)(a); \
|
||||||
|
else \
|
||||||
|
G(pcre2_serialize_free_,BITTWO)(a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
r = G(pcre2_serialize_get_number_of_codes_,BITONE)(a); \
|
||||||
|
else \
|
||||||
|
r = G(pcre2_serialize_get_number_of_codes_,BITTWO)(a)
|
||||||
|
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
G(pcre2_set_callout_,BITONE)(G(a,BITONE), \
|
G(pcre2_set_callout_,BITONE)(G(a,BITONE), \
|
||||||
|
@ -1510,6 +1607,13 @@ the three different cases. */
|
||||||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_8(G(a,8))
|
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_8(G(a,8))
|
||||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_8(G(b,8),c,d)
|
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_8(G(b,8),c,d)
|
||||||
#define PCRE2_PRINTINT(a) pcre2_printint_8(compiled_code8,outfile,a)
|
#define PCRE2_PRINTINT(a) pcre2_printint_8(compiled_code8,outfile,a)
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8))
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8))
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_8(a)
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_8(a)
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c)
|
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c)
|
||||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_8(G(a,8),b)
|
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_8(G(a,8),b)
|
||||||
|
@ -1591,6 +1695,13 @@ the three different cases. */
|
||||||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_16(G(a,16))
|
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_16(G(a,16))
|
||||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_16(G(b,16),c,d)
|
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_16(G(b,16),c,d)
|
||||||
#define PCRE2_PRINTINT(a) pcre2_printint_16(compiled_code16,outfile,a)
|
#define PCRE2_PRINTINT(a) pcre2_printint_16(compiled_code16,outfile,a)
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16))
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16))
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_16(a)
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_16(a)
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
pcre2_set_callout_16(G(a,16),(int (*)(pcre2_callout_block_16 *, void *))b,c);
|
pcre2_set_callout_16(G(a,16),(int (*)(pcre2_callout_block_16 *, void *))b,c);
|
||||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_16(G(a,16),b)
|
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_16(G(a,16),b)
|
||||||
|
@ -1672,6 +1783,13 @@ the three different cases. */
|
||||||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_32(G(a,32))
|
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_32(G(a,32))
|
||||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_32(G(b,32),c,d)
|
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_32(G(b,32),c,d)
|
||||||
#define PCRE2_PRINTINT(a) pcre2_printint_32(compiled_code32,outfile,a)
|
#define PCRE2_PRINTINT(a) pcre2_printint_32(compiled_code32,outfile,a)
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32))
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32))
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_32(a)
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_32(a)
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c);
|
pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c);
|
||||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_32(G(a,32),b)
|
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_32(G(a,32),b)
|
||||||
|
@ -2792,6 +2910,7 @@ it is allowed here and find the field that is to be changed.
|
||||||
Arguments:
|
Arguments:
|
||||||
m the modifier list entry
|
m the modifier list entry
|
||||||
ctx CTX_PAT => pattern context
|
ctx CTX_PAT => pattern context
|
||||||
|
CTX_POPPAT => pattern context for popped pattern
|
||||||
CTX_DEFPAT => default pattern context
|
CTX_DEFPAT => default pattern context
|
||||||
CTX_DAT => data context
|
CTX_DAT => data context
|
||||||
CTX_DEFDAT => default data context
|
CTX_DEFDAT => default data context
|
||||||
|
@ -2878,6 +2997,7 @@ modifiers that apply to contexts.
|
||||||
Arguments:
|
Arguments:
|
||||||
p point to modifier string
|
p point to modifier string
|
||||||
ctx CTX_PAT => pattern context
|
ctx CTX_PAT => pattern context
|
||||||
|
CTX_POPPAT => pattern context for popped pattern
|
||||||
CTX_DEFPAT => default pattern context
|
CTX_DEFPAT => default pattern context
|
||||||
CTX_DAT => data context
|
CTX_DAT => data context
|
||||||
CTX_DEFDAT => default data context
|
CTX_DEFDAT => default data context
|
||||||
|
@ -2902,11 +3022,8 @@ for (;;)
|
||||||
int index;
|
int index;
|
||||||
char *endptr;
|
char *endptr;
|
||||||
|
|
||||||
/* Skip white space and commas; after a comma we have passed the first
|
/* Skip white space and commas. */
|
||||||
item. */
|
|
||||||
|
|
||||||
while (isspace(*p)) p++;
|
|
||||||
if (*p == ',') first = FALSE;
|
|
||||||
while (isspace(*p) || *p == ',') p++;
|
while (isspace(*p) || *p == ',') p++;
|
||||||
if (*p == 0) break;
|
if (*p == 0) break;
|
||||||
|
|
||||||
|
@ -3163,6 +3280,17 @@ for (;;)
|
||||||
}
|
}
|
||||||
|
|
||||||
p = pp;
|
p = pp;
|
||||||
|
first = FALSE;
|
||||||
|
|
||||||
|
if (ctx == CTX_POPPAT &&
|
||||||
|
(pctl->options != 0 ||
|
||||||
|
pctl->tables_id != 0 ||
|
||||||
|
pctl->locale[0] != 0 ||
|
||||||
|
(pctl->control & NOTPOP_CONTROLS) != 0))
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** '%s' is not valid here\n", m->name);
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return TRUE;
|
return TRUE;
|
||||||
|
@ -3246,7 +3374,7 @@ Returns: nothing
|
||||||
static void
|
static void
|
||||||
show_controls(uint32_t controls, const char *before)
|
show_controls(uint32_t controls, const char *before)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
before,
|
before,
|
||||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||||
|
@ -3268,6 +3396,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
((controls & CTL_MARK) != 0)? " mark" : "",
|
((controls & CTL_MARK) != 0)? " mark" : "",
|
||||||
((controls & CTL_MEMORY) != 0)? " memory" : "",
|
((controls & CTL_MEMORY) != 0)? " memory" : "",
|
||||||
((controls & CTL_POSIX) != 0)? " posix" : "",
|
((controls & CTL_POSIX) != 0)? " posix" : "",
|
||||||
|
((controls & CTL_PUSH) != 0)? " push" : "",
|
||||||
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
|
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
|
||||||
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
|
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
|
||||||
}
|
}
|
||||||
|
@ -3347,6 +3476,40 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s",
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Show memory usage info for a pattern *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
static void
|
||||||
|
show_memory_info(void)
|
||||||
|
{
|
||||||
|
uint32_t name_count, name_entry_size;
|
||||||
|
size_t size, cblock_size;
|
||||||
|
|
||||||
|
#ifdef SUPPORT_PCRE2_8
|
||||||
|
if (test_mode == 8) cblock_size = sizeof(pcre2_real_code_8);
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_16
|
||||||
|
if (test_mode == 16) cblock_size = sizeof(pcre2_real_code_16);
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_32
|
||||||
|
if (test_mode == 32) cblock_size = sizeof(pcre2_real_code_32);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
(void)pattern_info(PCRE2_INFO_SIZE, &size, FALSE);
|
||||||
|
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
||||||
|
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
||||||
|
fprintf(outfile, "Memory allocation (code space): %d\n",
|
||||||
|
(int)(size - name_count*name_entry_size*code_unit_size - cblock_size));
|
||||||
|
if (pat_patctl.jit != 0)
|
||||||
|
{
|
||||||
|
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
||||||
|
fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
* Show information about a pattern *
|
* Show information about a pattern *
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
@ -3624,12 +3787,79 @@ return PR_OK;
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Handle serialization error *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
/* Print an error message after a serialization failure.
|
||||||
|
|
||||||
|
Arguments:
|
||||||
|
rc the error code
|
||||||
|
msg an initial message for what failed
|
||||||
|
|
||||||
|
Returns: nothing
|
||||||
|
*/
|
||||||
|
|
||||||
|
static void
|
||||||
|
serial_error(int rc, const char *msg)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "%s failed: error %d: ", msg, rc);
|
||||||
|
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||||
|
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||||
|
fprintf(outfile, "\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Open file for save/load commands *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
/* This function decodes the file name and opens the file.
|
||||||
|
|
||||||
|
Arguments:
|
||||||
|
buffptr point after the #command
|
||||||
|
mode open mode
|
||||||
|
fptr points to the FILE variable
|
||||||
|
|
||||||
|
Returns: PR_OK or PR_ABEND
|
||||||
|
*/
|
||||||
|
|
||||||
|
static int
|
||||||
|
open_file(uint8_t *buffptr, const char *mode, FILE **fptr)
|
||||||
|
{
|
||||||
|
char *endf;
|
||||||
|
char *filename = (char *)buffptr;
|
||||||
|
while (isspace(*filename)) filename++;
|
||||||
|
endf = filename + strlen8(filename);
|
||||||
|
while (endf > filename && isspace(endf[-1])) endf--;
|
||||||
|
|
||||||
|
if (endf == filename)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** File name expected after #save\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
*endf = 0;
|
||||||
|
*fptr = fopen((const char *)filename, mode);
|
||||||
|
if (*fptr == NULL)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Failed to open '%s'\n", filename);
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
return PR_OK;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
* Process command line *
|
* Process command line *
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This function is called for lines beginning with # and a character that is
|
/* This function is called for lines beginning with # and a character that is
|
||||||
not ! or whitespace, when encountered between tests. The line is in buffer.
|
not ! or whitespace, when encountered between tests, which means that there is
|
||||||
|
no compiled pattern (compiled_code is NULL). The line is in buffer.
|
||||||
|
|
||||||
Arguments: none
|
Arguments: none
|
||||||
|
|
||||||
|
@ -3641,33 +3871,176 @@ Returns: PR_OK continue processing next line
|
||||||
static int
|
static int
|
||||||
process_command(void)
|
process_command(void)
|
||||||
{
|
{
|
||||||
|
FILE *f;
|
||||||
|
PCRE2_SIZE serial_size;
|
||||||
|
size_t i;
|
||||||
|
int rc, cmd, cmdlen;
|
||||||
|
const char *cmdname;
|
||||||
|
uint8_t *argptr, *serial;
|
||||||
|
|
||||||
if (restrict_for_perl_test)
|
if (restrict_for_perl_test)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
|
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
|
||||||
return PR_ABEND;
|
return PR_ABEND;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (strncmp((char *)buffer, "#forbid_utf", 11) == 0 && isspace(buffer[11]))
|
cmd = CMD_UNKNOWN;
|
||||||
|
cmdlen = 0;
|
||||||
|
|
||||||
|
for (i = 0; i < cmdlistcount; i++)
|
||||||
{
|
{
|
||||||
forbid_utf = PCRE2_NEVER_UTF|PCRE2_NEVER_UCP;
|
cmdname = cmdlist[i].name;
|
||||||
|
cmdlen = strlen(cmdname);
|
||||||
|
if (strncmp((char *)(buffer+1), cmdname, cmdlen) == 0 &&
|
||||||
|
isspace(buffer[cmdlen+1]))
|
||||||
|
{
|
||||||
|
cmd = cmdlist[i].value;
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
else if (strncmp((char *)buffer, "#pattern", 8) == 0 && isspace(buffer[8]))
|
}
|
||||||
|
|
||||||
|
argptr = buffer + cmdlen + 1;
|
||||||
|
|
||||||
|
switch(cmd)
|
||||||
{
|
{
|
||||||
(void)decode_modifiers(buffer + 8, CTX_DEFPAT, &def_patctl, NULL);
|
case CMD_UNKNOWN:
|
||||||
|
fprintf(outfile, "** Unknown command: %s", buffer);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case CMD_FORBID_UTF:
|
||||||
|
forbid_utf = PCRE2_NEVER_UTF|PCRE2_NEVER_UCP;
|
||||||
|
break;
|
||||||
|
|
||||||
|
case CMD_PERLTEST:
|
||||||
|
restrict_for_perl_test = TRUE;
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Set default pattern modifiers */
|
||||||
|
|
||||||
|
case CMD_PATTERN:
|
||||||
|
(void)decode_modifiers(argptr, CTX_DEFPAT, &def_patctl, NULL);
|
||||||
if (def_patctl.jit == 0 && (def_patctl.control & CTL_JITVERIFY) != 0)
|
if (def_patctl.jit == 0 && (def_patctl.control & CTL_JITVERIFY) != 0)
|
||||||
def_patctl.jit = 7;
|
def_patctl.jit = 7;
|
||||||
}
|
break;
|
||||||
else if (strncmp((char *)buffer, "#perltest", 9) == 0 && isspace(buffer[9]))
|
|
||||||
|
/* Set default subject modifiers */
|
||||||
|
|
||||||
|
case CMD_SUBJECT:
|
||||||
|
(void)decode_modifiers(argptr, CTX_DEFDAT, NULL, &def_datctl);
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Pop a compiled pattern off the stack. Modifiers that do not affect the
|
||||||
|
compiled pattern (e.g. to give information) are permitted. The default
|
||||||
|
pattern modifiers are ignored. */
|
||||||
|
|
||||||
|
case CMD_POP:
|
||||||
|
if (patstacknext <= 0)
|
||||||
{
|
{
|
||||||
restrict_for_perl_test = TRUE;
|
fprintf(outfile, "** Can't pop off an empty stack\n");
|
||||||
|
return PR_SKIP;
|
||||||
}
|
}
|
||||||
else if (strncmp((char *)buffer, "#subject", 8) == 0 && isspace(buffer[8]))
|
memset(&pat_patctl, 0, sizeof(patctl)); /* Completely unset */
|
||||||
|
if (!decode_modifiers(argptr, CTX_POPPAT, &pat_patctl, NULL))
|
||||||
|
return PR_SKIP;
|
||||||
|
SET(compiled_code, patstack[--patstacknext]);
|
||||||
|
if (pat_patctl.jit != 0)
|
||||||
{
|
{
|
||||||
(void)decode_modifiers(buffer + 8, CTX_DEFDAT, NULL, &def_datctl);
|
PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
|
||||||
}
|
}
|
||||||
else
|
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||||
|
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "** Unknown command: %s", buffer);
|
rc = show_pattern_info();
|
||||||
|
if (rc != PR_OK) return rc;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Save the stack of compiled patterns to a file, then empty the stack. */
|
||||||
|
|
||||||
|
case CMD_SAVE:
|
||||||
|
if (patstacknext <= 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** No stacked patterns to save\n");
|
||||||
|
return PR_OK;
|
||||||
|
}
|
||||||
|
|
||||||
|
rc = open_file(argptr+1, OUTPUT_MODE, &f);
|
||||||
|
if (rc != PR_OK) return rc;
|
||||||
|
|
||||||
|
PCRE2_SERIALIZE_ENCODE(rc, patstack, patstacknext, &serial, &serial_size,
|
||||||
|
general_context);
|
||||||
|
if (rc < 0)
|
||||||
|
{
|
||||||
|
serial_error(rc, "Serialization");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Write the length at the start of the file to make it straightforward to
|
||||||
|
get the right memory when re-loading. This saves having to read the file size
|
||||||
|
in different operating systems. To allow for different endianness (even
|
||||||
|
though reloading with the opposite endianness does not work), write the
|
||||||
|
length byte-by-byte. */
|
||||||
|
|
||||||
|
for (i = 0; i < 4; i++) fputc((serial_size >> (i*8)) & 255, f);
|
||||||
|
if (fwrite(serial, 1, serial_size, f) != serial_size)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Wrong return from fwrite()\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
fclose(f);
|
||||||
|
PCRE2_SERIALIZE_FREE(serial);
|
||||||
|
while(patstacknext > 0)
|
||||||
|
{
|
||||||
|
SET(compiled_code, patstack[--patstacknext]);
|
||||||
|
SUB1(pcre2_code_free, compiled_code);
|
||||||
|
}
|
||||||
|
SET(compiled_code, NULL);
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Load a set of compiled patterns from a file onto the stack */
|
||||||
|
|
||||||
|
case CMD_LOAD:
|
||||||
|
rc = open_file(argptr+1, INPUT_MODE, &f);
|
||||||
|
if (rc != PR_OK) return rc;
|
||||||
|
|
||||||
|
serial_size = 0;
|
||||||
|
for (i = 0; i < 4; i++) serial_size |= fgetc(f) << (i*8);
|
||||||
|
|
||||||
|
serial = malloc(serial_size);
|
||||||
|
if (serial == NULL)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Failed to get memory (size %ld) for #load\n",
|
||||||
|
serial_size);
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (fread(serial, 1, serial_size, f) != serial_size)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Wrong return from fread()\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
fclose(f);
|
||||||
|
|
||||||
|
PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(rc, serial);
|
||||||
|
if (rc < 0) serial_error(rc, "Get number of codes"); else
|
||||||
|
{
|
||||||
|
if (rc + patstacknext > PATSTACKSIZE)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Not enough space on pattern stack for %d pattern%s\n",
|
||||||
|
rc, (rc == 1)? "" : "s");
|
||||||
|
rc = PATSTACKSIZE - patstacknext;
|
||||||
|
fprintf(outfile, "** Decoding %d pattern%s\n", rc,
|
||||||
|
(rc == 1)? "" : "s");
|
||||||
|
}
|
||||||
|
PCRE2_SERIALIZE_DECODE(rc, patstack + patstacknext, rc, serial,
|
||||||
|
general_context);
|
||||||
|
if (rc < 0) serial_error(rc, "Deserialization");
|
||||||
|
else patstacknext += rc;
|
||||||
|
}
|
||||||
|
|
||||||
|
free(serial);
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
return PR_OK;
|
return PR_OK;
|
||||||
|
@ -3750,6 +4123,14 @@ if (pat_patctl.jit == 0 &&
|
||||||
(pat_patctl.control & (CTL_JITVERIFY|CTL_JITFAST)) != 0)
|
(pat_patctl.control & (CTL_JITVERIFY|CTL_JITFAST)) != 0)
|
||||||
pat_patctl.jit = 7;
|
pat_patctl.jit = 7;
|
||||||
|
|
||||||
|
/* POSIX and 'push' do not play together. */
|
||||||
|
|
||||||
|
if ((pat_patctl.control & (CTL_POSIX|CTL_PUSH)) == (CTL_POSIX|CTL_PUSH))
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** The POSIX interface is incompatible with 'push'\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
/* Now copy the pattern to pbuffer8 for use in 8-bit testing and for reflecting
|
/* Now copy the pattern to pbuffer8 for use in 8-bit testing and for reflecting
|
||||||
in callouts. Convert to binary if required. */
|
in callouts. Convert to binary if required. */
|
||||||
|
|
||||||
|
@ -3897,8 +4278,31 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
||||||
#endif /* SUPPORT_PCRE2_8 */
|
#endif /* SUPPORT_PCRE2_8 */
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Handle compiling via the native interface, converting the input in non-8-bit
|
/* Handle compiling via the native interface. Controls that act later are
|
||||||
modes. */
|
ignored with "push". Replacements are locked out. */
|
||||||
|
|
||||||
|
if ((pat_patctl.control & CTL_PUSH) != 0)
|
||||||
|
{
|
||||||
|
if (pat_patctl.replacement[0] != 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Replacement text is not supported with 'push'.\n");
|
||||||
|
return PR_OK;
|
||||||
|
}
|
||||||
|
if ((pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS) != 0)
|
||||||
|
{
|
||||||
|
show_controls(pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS,
|
||||||
|
"** Ignored when compiled pattern is stacked with 'push':");
|
||||||
|
fprintf(outfile, "\n");
|
||||||
|
}
|
||||||
|
if ((pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS) != 0)
|
||||||
|
{
|
||||||
|
show_controls(pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS,
|
||||||
|
"** Applies only to compile when pattern is stacked with 'push':");
|
||||||
|
fprintf(outfile, "\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Convert the input in non-8-bit modes. */
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
if (test_mode == PCRE8_MODE) errorcode = 0;
|
if (test_mode == PCRE8_MODE) errorcode = 0;
|
||||||
|
@ -4017,39 +4421,27 @@ if (pat_patctl.jit != 0)
|
||||||
|
|
||||||
/* Output code size and other information if requested. */
|
/* Output code size and other information if requested. */
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_MEMORY) != 0)
|
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||||
{
|
|
||||||
uint32_t name_count, name_entry_size;
|
|
||||||
size_t size, cblock_size;
|
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
|
||||||
if (test_mode == 8) cblock_size = sizeof(pcre2_real_code_8);
|
|
||||||
#endif
|
|
||||||
#ifdef SUPPORT_PCRE2_16
|
|
||||||
if (test_mode == 16) cblock_size = sizeof(pcre2_real_code_16);
|
|
||||||
#endif
|
|
||||||
#ifdef SUPPORT_PCRE2_32
|
|
||||||
if (test_mode == 32) cblock_size = sizeof(pcre2_real_code_32);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
(void)pattern_info(PCRE2_INFO_SIZE, &size, FALSE);
|
|
||||||
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
|
||||||
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
|
||||||
fprintf(outfile, "Memory allocation (code space): %d\n",
|
|
||||||
(int)(size - name_count*name_entry_size*code_unit_size - cblock_size));
|
|
||||||
if (pat_patctl.jit != 0)
|
|
||||||
{
|
|
||||||
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
|
||||||
fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)size);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||||
{
|
{
|
||||||
int rc = show_pattern_info();
|
int rc = show_pattern_info();
|
||||||
if (rc != PR_OK) return rc;
|
if (rc != PR_OK) return rc;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* The "push" control requests that the compiled pattern be remembered on a
|
||||||
|
stack. This is mainly for testing the serialization functionality. */
|
||||||
|
|
||||||
|
if ((pat_patctl.control & CTL_PUSH) != 0)
|
||||||
|
{
|
||||||
|
if (patstacknext >= PATSTACKSIZE)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Too many pushed patterns (max %d)\n", PATSTACKSIZE);
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
patstack[patstacknext++] = PTR(compiled_code);
|
||||||
|
SET(compiled_code, NULL);
|
||||||
|
}
|
||||||
|
|
||||||
return PR_OK;
|
return PR_OK;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -6253,7 +6645,7 @@ if (argc > 1 && strcmp(argv[op], "-") != 0)
|
||||||
infile = fopen(argv[op], INPUT_MODE);
|
infile = fopen(argv[op], INPUT_MODE);
|
||||||
if (infile == NULL)
|
if (infile == NULL)
|
||||||
{
|
{
|
||||||
printf("** Failed to open %s\n", argv[op]);
|
printf("** Failed to open '%s'\n", argv[op]);
|
||||||
yield = 1;
|
yield = 1;
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
@ -6264,7 +6656,7 @@ if (argc > 2)
|
||||||
outfile = fopen(argv[op+1], OUTPUT_MODE);
|
outfile = fopen(argv[op+1], OUTPUT_MODE);
|
||||||
if (outfile == NULL)
|
if (outfile == NULL)
|
||||||
{
|
{
|
||||||
printf("** Failed to open %s\n", argv[op+1]);
|
printf("** Failed to open '%s'\n", argv[op+1]);
|
||||||
yield = 1;
|
yield = 1;
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
@ -6399,6 +6791,12 @@ free((void *)locale_tables);
|
||||||
PCRE2_MATCH_DATA_FREE(match_data);
|
PCRE2_MATCH_DATA_FREE(match_data);
|
||||||
SUB1(pcre2_code_free, compiled_code);
|
SUB1(pcre2_code_free, compiled_code);
|
||||||
|
|
||||||
|
while(patstacknext-- > 0)
|
||||||
|
{
|
||||||
|
SET(compiled_code, patstack[patstacknext]);
|
||||||
|
SUB1(pcre2_code_free, compiled_code);
|
||||||
|
}
|
||||||
|
|
||||||
PCRE2_JIT_FREE_UNUSED_MEMORY(general_context);
|
PCRE2_JIT_FREE_UNUSED_MEMORY(general_context);
|
||||||
if (jit_stack != NULL)
|
if (jit_stack != NULL)
|
||||||
{
|
{
|
||||||
|
|
|
@ -6,4 +6,4 @@
|
||||||
|
|
||||||
/a*/I
|
/a*/I
|
||||||
|
|
||||||
# End of testinput14
|
# End of testinput15
|
||||||
|
|
|
@ -161,7 +161,7 @@
|
||||||
# match to happen via the interpreter, but for fast JIT invalid options are
|
# match to happen via the interpreter, but for fast JIT invalid options are
|
||||||
# ignored, so an unanchored match happens.
|
# ignored, so an unanchored match happens.
|
||||||
|
|
||||||
/abcd/jit
|
/abcd/
|
||||||
abcd\=anchored
|
abcd\=anchored
|
||||||
fail abcd\=anchored
|
fail abcd\=anchored
|
||||||
|
|
||||||
|
@ -169,4 +169,21 @@
|
||||||
abcd\=anchored
|
abcd\=anchored
|
||||||
succeed abcd\=anchored
|
succeed abcd\=anchored
|
||||||
|
|
||||||
|
# Push/pop does not lose the JIT information, though jitverify applies only to
|
||||||
|
# compilation, but serializing (save/load) discards JIT data completely.
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
#save testsaved1
|
||||||
|
#load testsaved1
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#pop jit,jitverify
|
||||||
|
abcdef
|
||||||
|
|
||||||
# End of testinput16
|
# End of testinput16
|
||||||
|
|
|
@ -0,0 +1,62 @@
|
||||||
|
# This set of tests exercises the serialization/deserialization functions in
|
||||||
|
# the library. It does not use UTF or JIT.
|
||||||
|
|
||||||
|
#forbid_utf
|
||||||
|
|
||||||
|
# Compile several patterns, push them onto the stack, and then write them
|
||||||
|
# all to a file.
|
||||||
|
|
||||||
|
#pattern push
|
||||||
|
|
||||||
|
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
|
||||||
|
(?(DEFINE)
|
||||||
|
(?<NAME_PAT>[a-z]+)
|
||||||
|
(?<ADDRESS_PAT>\d+)
|
||||||
|
)/x
|
||||||
|
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
|
||||||
|
|
||||||
|
#save testsaved1
|
||||||
|
|
||||||
|
# Do it again for some more patterns.
|
||||||
|
|
||||||
|
/(*MARK:A)(*SKIP:B)(C|X)/mark
|
||||||
|
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
|
||||||
|
|
||||||
|
#save testsaved2
|
||||||
|
#pattern -push
|
||||||
|
|
||||||
|
# Reload the patterns, then pop them one by one and check them.
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#load testsaved2
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
foofoo
|
||||||
|
barbar
|
||||||
|
|
||||||
|
#pop mark
|
||||||
|
C
|
||||||
|
D
|
||||||
|
|
||||||
|
#pop
|
||||||
|
AmanaplanacanalPanama
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
metcalfe 33
|
||||||
|
|
||||||
|
# Check for an error when different tables are used.
|
||||||
|
|
||||||
|
/abc/push,tables=1
|
||||||
|
/xyz/push,tables=2
|
||||||
|
#save testsaved1
|
||||||
|
|
||||||
|
#pop
|
||||||
|
xyz
|
||||||
|
|
||||||
|
#pop
|
||||||
|
abc
|
||||||
|
|
||||||
|
#pop should give an error
|
||||||
|
pqr
|
||||||
|
|
||||||
|
# End of testinput19
|
|
@ -14,4 +14,4 @@ Capturing subpattern count = 0
|
||||||
May match empty string
|
May match empty string
|
||||||
Subject length lower bound = 0
|
Subject length lower bound = 0
|
||||||
|
|
||||||
# End of testinput14
|
# End of testinput15
|
||||||
|
|
|
@ -310,7 +310,7 @@ Failed: error -46: JIT stack limit reached
|
||||||
# match to happen via the interpreter, but for fast JIT invalid options are
|
# match to happen via the interpreter, but for fast JIT invalid options are
|
||||||
# ignored, so an unanchored match happens.
|
# ignored, so an unanchored match happens.
|
||||||
|
|
||||||
/abcd/jit
|
/abcd/
|
||||||
abcd\=anchored
|
abcd\=anchored
|
||||||
0: abcd
|
0: abcd
|
||||||
fail abcd\=anchored
|
fail abcd\=anchored
|
||||||
|
@ -322,4 +322,36 @@ No match
|
||||||
succeed abcd\=anchored
|
succeed abcd\=anchored
|
||||||
0: abcd (JIT)
|
0: abcd (JIT)
|
||||||
|
|
||||||
|
# Push/pop does not lose the JIT information, though jitverify applies only to
|
||||||
|
# compilation, but serializing (save/load) discards JIT data completely.
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
** Applied only to compile when pattern is stacked with 'push': jitverify
|
||||||
|
Capturing subpattern count = 0
|
||||||
|
Compile options: <none>
|
||||||
|
Overall options: anchored
|
||||||
|
Subject length lower bound = 6
|
||||||
|
JIT compilation was successful
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
0: def (JIT)
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
** Applied only to compile when pattern is stacked with 'push': jitverify
|
||||||
|
Capturing subpattern count = 0
|
||||||
|
Compile options: <none>
|
||||||
|
Overall options: anchored
|
||||||
|
Subject length lower bound = 6
|
||||||
|
JIT compilation was successful
|
||||||
|
#save testsaved1
|
||||||
|
#load testsaved1
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
0: def
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#pop jit,jitverify
|
||||||
|
abcdef
|
||||||
|
0: def (JIT)
|
||||||
|
|
||||||
# End of testinput16
|
# End of testinput16
|
||||||
|
|
|
@ -0,0 +1,100 @@
|
||||||
|
# This set of tests exercises the serialization/deserialization functions in
|
||||||
|
# the library. It does not use UTF or JIT.
|
||||||
|
|
||||||
|
#forbid_utf
|
||||||
|
|
||||||
|
# Compile several patterns, push them onto the stack, and then write them
|
||||||
|
# all to a file.
|
||||||
|
|
||||||
|
#pattern push
|
||||||
|
|
||||||
|
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
|
||||||
|
(?(DEFINE)
|
||||||
|
(?<NAME_PAT>[a-z]+)
|
||||||
|
(?<ADDRESS_PAT>\d+)
|
||||||
|
)/x
|
||||||
|
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
|
||||||
|
|
||||||
|
#save testsaved1
|
||||||
|
|
||||||
|
# Do it again for some more patterns.
|
||||||
|
|
||||||
|
/(*MARK:A)(*SKIP:B)(C|X)/mark
|
||||||
|
** Ignored when compiled pattern is stacked with 'push': mark
|
||||||
|
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
|
||||||
|
|
||||||
|
#save testsaved2
|
||||||
|
#pattern -push
|
||||||
|
|
||||||
|
# Reload the patterns, then pop them one by one and check them.
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#load testsaved2
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
Capturing subpattern count = 2
|
||||||
|
Max back reference = 2
|
||||||
|
Named capturing subpatterns:
|
||||||
|
n 1
|
||||||
|
n 2
|
||||||
|
Options: dupnames
|
||||||
|
Starting code units: b f
|
||||||
|
Subject length lower bound = 6
|
||||||
|
foofoo
|
||||||
|
0: foofoo
|
||||||
|
1: foo
|
||||||
|
barbar
|
||||||
|
0: barbar
|
||||||
|
1: <unset>
|
||||||
|
2: bar
|
||||||
|
|
||||||
|
#pop mark
|
||||||
|
C
|
||||||
|
0: C
|
||||||
|
1: C
|
||||||
|
MK: A
|
||||||
|
D
|
||||||
|
No match, mark = A
|
||||||
|
|
||||||
|
#pop
|
||||||
|
AmanaplanacanalPanama
|
||||||
|
0: AmanaplanacanalPanama
|
||||||
|
1: <unset>
|
||||||
|
2: <unset>
|
||||||
|
3: AmanaplanacanalPanama
|
||||||
|
4: A
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
Capturing subpattern count = 4
|
||||||
|
Named capturing subpatterns:
|
||||||
|
ADDR 2
|
||||||
|
ADDRESS_PAT 4
|
||||||
|
NAME 1
|
||||||
|
NAME_PAT 3
|
||||||
|
Options: extended
|
||||||
|
Subject length lower bound = 3
|
||||||
|
metcalfe 33
|
||||||
|
0: metcalfe 33
|
||||||
|
1: metcalfe
|
||||||
|
2: 33
|
||||||
|
|
||||||
|
# Check for an error when different tables are used.
|
||||||
|
|
||||||
|
/abc/push,tables=1
|
||||||
|
/xyz/push,tables=2
|
||||||
|
#save testsaved1
|
||||||
|
Serialization failed: error -30: patterns do not all use the same character tables
|
||||||
|
|
||||||
|
#pop
|
||||||
|
xyz
|
||||||
|
0: xyz
|
||||||
|
|
||||||
|
#pop
|
||||||
|
abc
|
||||||
|
0: abc
|
||||||
|
|
||||||
|
#pop should give an error
|
||||||
|
** Can't pop off an empty stack
|
||||||
|
pqr
|
||||||
|
|
||||||
|
# End of testinput19
|
Loading…
Reference in New Issue