Add serialization functions and tests with updated pcre2test. Fix
PCRE2_INFO_SIZE issues.
This commit is contained in:
parent
d4daaf966d
commit
5438fc8a6a
22
ChangeLog
22
ChangeLog
|
@ -1,8 +1,8 @@
|
|||
Change Log for PCRE2
|
||||
--------------------
|
||||
|
||||
Version 10.10 13-January-2015
|
||||
-----------------------------
|
||||
Version 10.10 xx-xxx-2015
|
||||
-------------------------
|
||||
|
||||
1. When a pattern is compiled, it remembers the highest back reference so that
|
||||
when matching, if the ovector is too small, extra memory can be obtained to
|
||||
|
@ -16,6 +16,19 @@ bug was that the condition was always treated as FALSE when the capture could
|
|||
not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
|
||||
has been fixed.
|
||||
|
||||
2. Functions for serialization and deserialization of sets of compiled patterns
|
||||
have been added.
|
||||
|
||||
3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
|
||||
excess code units at the end of the data block that may occasionally occur if
|
||||
the code for calculating the size over-estimates. This change stops the
|
||||
serialization code copying uninitialized data, to which valgrind objects. The
|
||||
documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
|
||||
include the general overhead. This has been corrected.
|
||||
|
||||
4. All code units in every slot in the table of group names are now set, again
|
||||
in order to avoid accessing uninitialized data when serializing.
|
||||
|
||||
|
||||
Version 10.00 05-January-2015
|
||||
-----------------------------
|
||||
|
@ -30,8 +43,9 @@ logged. In addition to the API changes, the following changes were made. They
|
|||
are either new functionality, or bug fixes and other noticeable changes of
|
||||
behaviour that were implemented after the code had been forked.
|
||||
|
||||
1. Unicode support is now enabled by default, but it can optionally be
|
||||
disabled.
|
||||
1. Including Unicode support at build time is now enabled by default, but it
|
||||
can optionally be disabled. It is not enabled by default at run time (no
|
||||
change).
|
||||
|
||||
2. The test program, now called pcre2test, was re-specified and almost
|
||||
completely re-written. Its input is not compatible with input for pcretest.
|
||||
|
|
13
Makefile.am
13
Makefile.am
|
@ -54,6 +54,10 @@ dist_html_DATA = \
|
|||
doc/html/pcre2_match_data_create_from_pattern.html \
|
||||
doc/html/pcre2_match_data_free.html \
|
||||
doc/html/pcre2_pattern_info.html \
|
||||
doc/html/pcre2_serialize_decode.html \
|
||||
doc/html/pcre2_serialize_encode.html \
|
||||
doc/html/pcre2_serialize_free.html \
|
||||
doc/html/pcre2_serialize_get_number_of_codes.html \
|
||||
doc/html/pcre2_set_bsr.html \
|
||||
doc/html/pcre2_set_callout.html \
|
||||
doc/html/pcre2_set_character_tables.html \
|
||||
|
@ -89,6 +93,7 @@ dist_html_DATA = \
|
|||
doc/html/pcre2perform.html \
|
||||
doc/html/pcre2posix.html \
|
||||
doc/html/pcre2sample.html \
|
||||
doc/html/pcre2serialize.html \
|
||||
doc/html/pcre2stack.html \
|
||||
doc/html/pcre2syntax.html \
|
||||
doc/html/pcre2test.html \
|
||||
|
@ -127,6 +132,10 @@ dist_man_MANS = \
|
|||
doc/pcre2_match_data_create_from_pattern.3 \
|
||||
doc/pcre2_match_data_free.3 \
|
||||
doc/pcre2_pattern_info.3 \
|
||||
doc/pcre2_serialize_decode.3 \
|
||||
doc/pcre2_serialize_encode.3 \
|
||||
doc/pcre2_serialize_free.3 \
|
||||
doc/pcre2_serialize_get_number_of_codes.3 \
|
||||
doc/pcre2_set_bsr.3 \
|
||||
doc/pcre2_set_callout.3 \
|
||||
doc/pcre2_set_character_tables.3 \
|
||||
|
@ -162,6 +171,7 @@ dist_man_MANS = \
|
|||
doc/pcre2perform.3 \
|
||||
doc/pcre2posix.3 \
|
||||
doc/pcre2sample.3 \
|
||||
doc/pcre2serialize.3 \
|
||||
doc/pcre2stack.3 \
|
||||
doc/pcre2syntax.3 \
|
||||
doc/pcre2test.1 \
|
||||
|
@ -316,6 +326,7 @@ COMMON_SOURCES = \
|
|||
src/pcre2_newline.c \
|
||||
src/pcre2_ord2utf.c \
|
||||
src/pcre2_pattern_info.c \
|
||||
src/pcre2_serialize.c \
|
||||
src/pcre2_string_utils.c \
|
||||
src/pcre2_study.c \
|
||||
src/pcre2_substitute.c \
|
||||
|
@ -573,6 +584,7 @@ EXTRA_DIST += \
|
|||
testdata/testinput16 \
|
||||
testdata/testinput17 \
|
||||
testdata/testinput18 \
|
||||
testdata/testinput19 \
|
||||
testdata/testinputEBC \
|
||||
testdata/testoutput1 \
|
||||
testdata/testoutput2 \
|
||||
|
@ -598,6 +610,7 @@ EXTRA_DIST += \
|
|||
testdata/testoutput16 \
|
||||
testdata/testoutput17 \
|
||||
testdata/testoutput18 \
|
||||
testdata/testoutput19 \
|
||||
testdata/testoutputEBC \
|
||||
perltest.sh
|
||||
|
||||
|
|
|
@ -108,6 +108,7 @@ can skip ahead to the CMake section.
|
|||
pcre2_newline.c
|
||||
pcre2_ord2utf.c
|
||||
pcre2_pattern_info.c
|
||||
pcre2_serialize.c
|
||||
pcre2_string_utils.c
|
||||
pcre2_study.c
|
||||
pcre2_substitute.c
|
||||
|
@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
|
|||
course.
|
||||
|
||||
=============================
|
||||
Last Updated: 05 January 2015
|
||||
Last Updated: 19 January 2015
|
||||
|
|
101
README
101
README
|
@ -527,11 +527,10 @@ Testing PCRE2
|
|||
------------
|
||||
|
||||
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
||||
There is another script called RunGrepTest that tests the options of the
|
||||
pcre2grep command. When JIT support is enabled, a third test program called
|
||||
pcre2_jit_test is built. Both the scripts and all the program tests are run if
|
||||
you obey "make check". For other environments, see the instructions in
|
||||
NON-AUTOTOOLS-BUILD.
|
||||
There is another script called RunGrepTest that tests the pcre2grep command.
|
||||
When JIT support is enabled, a third test program called pcre2_jit_test is
|
||||
built. Both the scripts and all the program tests are run if you obey "make
|
||||
check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
|
||||
|
||||
The RunTest script runs the pcre2test test program (which is documented in its
|
||||
own man page) on each of the relevant testinput files in the testdata
|
||||
|
@ -544,9 +543,9 @@ Some tests are relevant only when certain build-time options were selected. For
|
|||
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
||||
is available. RunTest outputs a comment when it skips a test.
|
||||
|
||||
Many of the tests that are not skipped are run twice if JIT support is
|
||||
available. On the second run, JIT compilation is forced. This testing can be
|
||||
suppressed by putting "nojit" on the RunTest command line.
|
||||
Many (but not all) of the tests that are not skipped are run twice if JIT
|
||||
support is available. On the second run, JIT compilation is forced. This
|
||||
testing can be suppressed by putting "nojit" on the RunTest command line.
|
||||
|
||||
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
||||
libraries that are enabled. If you want to run just one set of tests, call
|
||||
|
@ -570,14 +569,20 @@ in numerical order.
|
|||
You can also call RunTest with the single argument "list" to cause it to output
|
||||
a list of tests.
|
||||
|
||||
The first two tests can always be run, as they expect only plain text strings
|
||||
(not UTF) and make no use of Unicode properties. The first test file can be fed
|
||||
The test sequence starts with "test 0", which is a special test that has no
|
||||
input file, and whose output is not checked. This is because it will be
|
||||
different on different hardware and with different configurations. The test
|
||||
exists in order to exercise some of pcre2test's code that would not otherwise
|
||||
be run.
|
||||
|
||||
Tests 1 and 2 can always be run, as they expect only plain text strings (not
|
||||
UTF) and make no use of Unicode properties. The first test file can be fed
|
||||
directly into the perltest.sh script to check that Perl gives the same results.
|
||||
The only difference you should see is in the first few lines, where the Perl
|
||||
version is given instead of the PCRE2 version. The second set of tests check
|
||||
auxiliary functions, error detection, and run-time flags that are specific to
|
||||
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
|
||||
check some of the internals of pcre2_compile().
|
||||
PCRE2. It also uses the debugging flags to check some of the internals of
|
||||
pcre2_compile().
|
||||
|
||||
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
||||
character tables may be different (see next paragraph). In some cases, this may
|
||||
|
@ -585,18 +590,17 @@ cause failures in the second set of tests. For example, in a locale where the
|
|||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||
[:isascii:] inside a character class defines a different set of characters, and
|
||||
this shows up in this test as a difference in the compiled code, which is being
|
||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
||||
bug in PCRE2.
|
||||
listed for checking. For example, where the comparison test output contains
|
||||
[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
|
||||
cases. This is not a bug in PCRE2.
|
||||
|
||||
The third set of tests checks pcre2_maketables(), the facility for building a
|
||||
set of character tables for a specific locale and using them instead of the
|
||||
default tables. The script uses the "locale" command to check for the
|
||||
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
|
||||
that it finds. If the "locale" command fails, or if its output doesn't include
|
||||
"fr_FR", "french", or "fr" in the list of available locales, the third test
|
||||
cannot be run, and a comment is output to say why. If running this test
|
||||
produces an error like this
|
||||
Test 3 checks pcre2_maketables(), the facility for building a set of character
|
||||
tables for a specific locale and using them instead of the default tables. The
|
||||
script uses the "locale" command to check for the availability of the "fr_FR",
|
||||
"french", or "fr" locale, and uses the first one that it finds. If the "locale"
|
||||
command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
|
||||
the list of available locales, the third test cannot be run, and a comment is
|
||||
output to say why. If running this test produces an error like this:
|
||||
|
||||
** Failed to set locale "fr_FR"
|
||||
|
||||
|
@ -606,33 +610,37 @@ alternative output files for the third test, because three different versions
|
|||
of the French locale have been encountered. The test passes if its output
|
||||
matches any one of them.
|
||||
|
||||
The fourth and fifth tests check UTF and Unicode property support, the fourth
|
||||
being compatible with the perltest.sh script, and the fifth checking
|
||||
PCRE2-specific things.
|
||||
Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
|
||||
with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
||||
|
||||
The sixth and seventh tests check the pcre2_dfa_match() alternative matching
|
||||
function, in non-UTF mode and UTF-mode with Unicode property support,
|
||||
respectively.
|
||||
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||
|
||||
The eighth test checks some internal offsets and code size features; it is
|
||||
run only when the default "link size" of 2 is set (in other cases the sizes
|
||||
change) and when Unicode support is enabled.
|
||||
Test 8 checks some internal offsets and code size features; it is run only when
|
||||
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||
Unicode support is enabled.
|
||||
|
||||
The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
|
||||
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
|
||||
generate different output in 8-bit mode. Each pair are for general cases and
|
||||
Unicode support, respectively. The thirteenth test checks the handling of
|
||||
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
|
||||
modes.
|
||||
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||
16-bit and 32-bit modes. These are tests that generate different output in
|
||||
8-bit mode. Each pair are for general cases and Unicode support, respectively.
|
||||
Test 13 checks the handling of non-UTF characters greater than 255 by
|
||||
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||
|
||||
The fourteenth test is run only when JIT support is not available, and the
|
||||
fifteenth test is run only when JIT support is available. They test some
|
||||
JIT-specific features such as information output from pcre2test about JIT
|
||||
compilation.
|
||||
Test 14 contains a number of tests that must not be run with JIT. They check,
|
||||
among other non-JIT things, the match-limiting features of the intepretive
|
||||
matcher.
|
||||
|
||||
The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
|
||||
POSIX interface to the 8-bit library, without and with Unicode support,
|
||||
respectively.
|
||||
Test 15 is run only when JIT support is not available. It checks that an
|
||||
attempt to use JIT has the expected behaviour.
|
||||
|
||||
Test 16 is run only when JIT support is available. It checks JIT complete and
|
||||
partial modes, match-limiting under JIT, and other JIT-specific features.
|
||||
|
||||
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
|
||||
the 8-bit library, without and with Unicode support, respectively.
|
||||
|
||||
Test 19 checks the serialization functions by writing a set of compiled
|
||||
patterns to a file, and then reloading and checking them.
|
||||
|
||||
|
||||
Character tables
|
||||
|
@ -718,6 +726,7 @@ The distribution should contain the files listed below.
|
|||
src/pcre2_newline.c )
|
||||
src/pcre2_ord2utf.c )
|
||||
src/pcre2_pattern_info.c )
|
||||
src/pcre2_serialize.c )
|
||||
src/pcre2_string_utils.c )
|
||||
src/pcre2_study.c )
|
||||
src/pcre2_substitute.c )
|
||||
|
@ -816,4 +825,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 05 January 2015
|
||||
Last updated: 20 January 2015
|
||||
|
|
17
RunTest
17
RunTest
|
@ -65,6 +65,7 @@ title15="Test 15: JIT-specific features when JIT is not available"
|
|||
title16="Test 16: JIT-specific features when JIT is available"
|
||||
title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP"
|
||||
title18="Test 18: Tests of the POSIX interface with UTF/UCP"
|
||||
title19="Test 19: Serialization tests"
|
||||
maxtest=18
|
||||
|
||||
if [ $# -eq 1 -a "$1" = "list" ]; then
|
||||
|
@ -87,6 +88,7 @@ if [ $# -eq 1 -a "$1" = "list" ]; then
|
|||
echo $title16
|
||||
echo $title17
|
||||
echo $title18
|
||||
echo $title19
|
||||
exit 0
|
||||
fi
|
||||
|
||||
|
@ -207,6 +209,7 @@ do15=no
|
|||
do16=no
|
||||
do17=no
|
||||
do18=no
|
||||
do19=no
|
||||
|
||||
while [ $# -gt 0 ] ; do
|
||||
case $1 in
|
||||
|
@ -229,6 +232,7 @@ while [ $# -gt 0 ] ; do
|
|||
16) do16=yes;;
|
||||
17) do17=yes;;
|
||||
18) do18=yes;;
|
||||
19) do19=yes;;
|
||||
-8) arg8=yes;;
|
||||
-16) arg16=yes;;
|
||||
-32) arg32=yes;;
|
||||
|
@ -364,7 +368,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
|
|||
$do4 = no -a $do5 = no -a $do6 = no -a $do7 = no -a \
|
||||
$do8 = no -a $do9 = no -a $do10 = no -a $do11 = no -a \
|
||||
$do12 = no -a $do13 = no -a $do14 = no -a $do15 = no -a \
|
||||
$do16 = no -a $do17 = no -a $do18 = no \
|
||||
$do16 = no -a $do17 = no -a $do18 = no -a $do19 = no \
|
||||
]; then
|
||||
do0=yes
|
||||
do1=yes
|
||||
|
@ -385,6 +389,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
|
|||
do16=yes
|
||||
do17=yes
|
||||
do18=yes
|
||||
do19=yes
|
||||
fi
|
||||
|
||||
# Handle any explicit skips at this stage, so that an argument list may consist
|
||||
|
@ -721,10 +726,18 @@ for bmode in "$test8" "$test16" "$test32"; do
|
|||
fi
|
||||
fi
|
||||
|
||||
# Serialization tests
|
||||
|
||||
if [ $do19 = yes ] ; then
|
||||
echo $title19
|
||||
$sim $valgrind ./pcre2test -q $bmode $testdata/testinput19 testtry
|
||||
checkresult $? 19 ""
|
||||
fi
|
||||
|
||||
# End of loop for 8/16/32-bit tests
|
||||
done
|
||||
|
||||
# Clean up local working files
|
||||
rm -f testSinput test3input test3output test3outputA test3outputB teststdout testtry
|
||||
rm -f testSinput test3input testsaved1 testsaved2 test3output test3outputA test3outputB teststdout testtry
|
||||
|
||||
# End
|
||||
|
|
|
@ -108,6 +108,7 @@ can skip ahead to the CMake section.
|
|||
pcre2_newline.c
|
||||
pcre2_ord2utf.c
|
||||
pcre2_pattern_info.c
|
||||
pcre2_serialize.c
|
||||
pcre2_string_utils.c
|
||||
pcre2_study.c
|
||||
pcre2_substitute.c
|
||||
|
@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
|
|||
course.
|
||||
|
||||
=============================
|
||||
Last Updated: 05 January 2015
|
||||
Last Updated: 19 January 2015
|
||||
|
|
|
@ -527,11 +527,10 @@ Testing PCRE2
|
|||
------------
|
||||
|
||||
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
||||
There is another script called RunGrepTest that tests the options of the
|
||||
pcre2grep command. When JIT support is enabled, a third test program called
|
||||
pcre2_jit_test is built. Both the scripts and all the program tests are run if
|
||||
you obey "make check". For other environments, see the instructions in
|
||||
NON-AUTOTOOLS-BUILD.
|
||||
There is another script called RunGrepTest that tests the pcre2grep command.
|
||||
When JIT support is enabled, a third test program called pcre2_jit_test is
|
||||
built. Both the scripts and all the program tests are run if you obey "make
|
||||
check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
|
||||
|
||||
The RunTest script runs the pcre2test test program (which is documented in its
|
||||
own man page) on each of the relevant testinput files in the testdata
|
||||
|
@ -544,9 +543,9 @@ Some tests are relevant only when certain build-time options were selected. For
|
|||
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
||||
is available. RunTest outputs a comment when it skips a test.
|
||||
|
||||
Many of the tests that are not skipped are run twice if JIT support is
|
||||
available. On the second run, JIT compilation is forced. This testing can be
|
||||
suppressed by putting "nojit" on the RunTest command line.
|
||||
Many (but not all) of the tests that are not skipped are run twice if JIT
|
||||
support is available. On the second run, JIT compilation is forced. This
|
||||
testing can be suppressed by putting "nojit" on the RunTest command line.
|
||||
|
||||
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
||||
libraries that are enabled. If you want to run just one set of tests, call
|
||||
|
@ -570,14 +569,20 @@ in numerical order.
|
|||
You can also call RunTest with the single argument "list" to cause it to output
|
||||
a list of tests.
|
||||
|
||||
The first two tests can always be run, as they expect only plain text strings
|
||||
(not UTF) and make no use of Unicode properties. The first test file can be fed
|
||||
The test sequence starts with "test 0", which is a special test that has no
|
||||
input file, and whose output is not checked. This is because it will be
|
||||
different on different hardware and with different configurations. The test
|
||||
exists in order to exercise some of pcre2test's code that would not otherwise
|
||||
be run.
|
||||
|
||||
Tests 1 and 2 can always be run, as they expect only plain text strings (not
|
||||
UTF) and make no use of Unicode properties. The first test file can be fed
|
||||
directly into the perltest.sh script to check that Perl gives the same results.
|
||||
The only difference you should see is in the first few lines, where the Perl
|
||||
version is given instead of the PCRE2 version. The second set of tests check
|
||||
auxiliary functions, error detection, and run-time flags that are specific to
|
||||
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
|
||||
check some of the internals of pcre2_compile().
|
||||
PCRE2. It also uses the debugging flags to check some of the internals of
|
||||
pcre2_compile().
|
||||
|
||||
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
||||
character tables may be different (see next paragraph). In some cases, this may
|
||||
|
@ -585,18 +590,17 @@ cause failures in the second set of tests. For example, in a locale where the
|
|||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||
[:isascii:] inside a character class defines a different set of characters, and
|
||||
this shows up in this test as a difference in the compiled code, which is being
|
||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
||||
bug in PCRE2.
|
||||
listed for checking. For example, where the comparison test output contains
|
||||
[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
|
||||
cases. This is not a bug in PCRE2.
|
||||
|
||||
The third set of tests checks pcre2_maketables(), the facility for building a
|
||||
set of character tables for a specific locale and using them instead of the
|
||||
default tables. The script uses the "locale" command to check for the
|
||||
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
|
||||
that it finds. If the "locale" command fails, or if its output doesn't include
|
||||
"fr_FR", "french", or "fr" in the list of available locales, the third test
|
||||
cannot be run, and a comment is output to say why. If running this test
|
||||
produces an error like this
|
||||
Test 3 checks pcre2_maketables(), the facility for building a set of character
|
||||
tables for a specific locale and using them instead of the default tables. The
|
||||
script uses the "locale" command to check for the availability of the "fr_FR",
|
||||
"french", or "fr" locale, and uses the first one that it finds. If the "locale"
|
||||
command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
|
||||
the list of available locales, the third test cannot be run, and a comment is
|
||||
output to say why. If running this test produces an error like this:
|
||||
|
||||
** Failed to set locale "fr_FR"
|
||||
|
||||
|
@ -606,33 +610,37 @@ alternative output files for the third test, because three different versions
|
|||
of the French locale have been encountered. The test passes if its output
|
||||
matches any one of them.
|
||||
|
||||
The fourth and fifth tests check UTF and Unicode property support, the fourth
|
||||
being compatible with the perltest.sh script, and the fifth checking
|
||||
PCRE2-specific things.
|
||||
Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
|
||||
with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
||||
|
||||
The sixth and seventh tests check the pcre2_dfa_match() alternative matching
|
||||
function, in non-UTF mode and UTF-mode with Unicode property support,
|
||||
respectively.
|
||||
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||
|
||||
The eighth test checks some internal offsets and code size features; it is
|
||||
run only when the default "link size" of 2 is set (in other cases the sizes
|
||||
change) and when Unicode support is enabled.
|
||||
Test 8 checks some internal offsets and code size features; it is run only when
|
||||
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||
Unicode support is enabled.
|
||||
|
||||
The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
|
||||
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
|
||||
generate different output in 8-bit mode. Each pair are for general cases and
|
||||
Unicode support, respectively. The thirteenth test checks the handling of
|
||||
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
|
||||
modes.
|
||||
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||
16-bit and 32-bit modes. These are tests that generate different output in
|
||||
8-bit mode. Each pair are for general cases and Unicode support, respectively.
|
||||
Test 13 checks the handling of non-UTF characters greater than 255 by
|
||||
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||
|
||||
The fourteenth test is run only when JIT support is not available, and the
|
||||
fifteenth test is run only when JIT support is available. They test some
|
||||
JIT-specific features such as information output from pcre2test about JIT
|
||||
compilation.
|
||||
Test 14 contains a number of tests that must not be run with JIT. They check,
|
||||
among other non-JIT things, the match-limiting features of the intepretive
|
||||
matcher.
|
||||
|
||||
The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
|
||||
POSIX interface to the 8-bit library, without and with Unicode support,
|
||||
respectively.
|
||||
Test 15 is run only when JIT support is not available. It checks that an
|
||||
attempt to use JIT has the expected behaviour.
|
||||
|
||||
Test 16 is run only when JIT support is available. It checks JIT complete and
|
||||
partial modes, match-limiting under JIT, and other JIT-specific features.
|
||||
|
||||
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
|
||||
the 8-bit library, without and with Unicode support, respectively.
|
||||
|
||||
Test 19 checks the serialization functions by writing a set of compiled
|
||||
patterns to a file, and then reloading and checking them.
|
||||
|
||||
|
||||
Character tables
|
||||
|
@ -718,6 +726,7 @@ The distribution should contain the files listed below.
|
|||
src/pcre2_newline.c )
|
||||
src/pcre2_ord2utf.c )
|
||||
src/pcre2_pattern_info.c )
|
||||
src/pcre2_serialize.c )
|
||||
src/pcre2_string_utils.c )
|
||||
src/pcre2_study.c )
|
||||
src/pcre2_substitute.c )
|
||||
|
@ -816,4 +825,4 @@ The distribution should contain the files listed below.
|
|||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 05 January 2015
|
||||
Last updated: 20 January 2015
|
||||
|
|
|
@ -65,6 +65,9 @@ first.
|
|||
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
||||
<td> Discussion of the pcre2demo program</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2serialize.html">pcre2serialize</a></td>
|
||||
<td> Serializing functions for saving precompiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
||||
<td> Discussion of PCRE2's stack usage</td></tr>
|
||||
|
||||
|
@ -177,6 +180,18 @@ in the library.
|
|||
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
||||
<td> Extract information about a pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_decode.html">pcre2_serialize_decode</a></td>
|
||||
<td> Decode serialized compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_encode.html">pcre2_serialize_encode</a></td>
|
||||
<td> Serialize compiled patterns for save/restore</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_free.html">pcre2_serialize_free</a></td>
|
||||
<td> Free serialized compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_get_number_of_codes.html">pcre2_serialize_get_number_of_codes</a></td>
|
||||
<td> Get number of serialized compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
||||
<td> Set \R convention</td></tr>
|
||||
|
||||
|
|
|
@ -0,0 +1,62 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre2_serialize_decode specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre2_serialize_decode man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE2 HTML documentation. It was generated
|
||||
automatically from the original man page. If there is any nonsense in it,
|
||||
please consult the man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre2.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function decodes a serialized set of compiled patterns back into a list of
|
||||
individual patterns. Its arguments are:
|
||||
<pre>
|
||||
<i>codes</i> pointer to a vector in which to build the list
|
||||
<i>number_of_codes</i> number of slots in the vector
|
||||
<i>bytes</i> the serialized byte stream
|
||||
<i>gcontext</i> pointer to a general context or NULL
|
||||
</pre>
|
||||
The <i>bytes</i> argument must point to a block of data that was originally
|
||||
created by <b>pcre2_serialize_encode()</b>, though it may have been saved on
|
||||
disc or elsewhere in the meantime. If there are more codes in the serialized
|
||||
data than slots in the list, only those compiled patterns that will fit are
|
||||
decoded. The yield of the function is the number of decoded patterns, or one of
|
||||
the following negative error codes:
|
||||
<pre>
|
||||
PCRE2_ERROR_BADDATA <i>number_of_codes</i> is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in <i>bytes</i>
|
||||
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_NULL <i>codes</i> or <i>bytes</i> is NULL
|
||||
</pre>
|
||||
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||
on a system with different endianness.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,61 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre2_serialize_encode specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre2_serialize_encode man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE2 HTML documentation. It was generated
|
||||
automatically from the original man page. If there is any nonsense in it,
|
||||
please consult the man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre2.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function encodes a list of compiled patterns into a byte stream that can
|
||||
be saved on disc or elsewhere. Its arguments are:
|
||||
<pre>
|
||||
<i>codes</i> pointer to a vector containing the list
|
||||
<i>number_of_codes</i> number of slots in the vector
|
||||
<i>serialized_bytes</i> set to point to the serialized byte stream
|
||||
<i>serialized_size</i> set to the number of bytes in the byte stream
|
||||
<i>gcontext</i> pointer to a general context or NULL
|
||||
</pre>
|
||||
The context argument is used to obtain memory for the byte stream. When the
|
||||
serialized data is no longer needed, it must be freed by calling
|
||||
<b>pcre2_serialize_free()</b>. The yield of the function is the number of
|
||||
serialized patterns, or one of the following negative error codes:
|
||||
<pre>
|
||||
PCRE2_ERROR_BADDATA <i>number_of_codes</i> is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||
PCRE2_ERROR_NULL an argument other than <i>gcontext</i> is NULL
|
||||
</pre>
|
||||
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||
that a slot in the vector does not point to a compiled pattern.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,40 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre2_serialize_free specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre2_serialize_free man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE2 HTML documentation. It was generated
|
||||
automatically from the original man page. If there is any nonsense in it,
|
||||
please consult the man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre2.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function frees the memory that was obtained by
|
||||
<b>pcre2_serialize_encode()</b> to hold a serialized byte stream. The argument
|
||||
must point to such a byte stream.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,49 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre2_serialize_get_number_of_codes specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre2_serialize_get_number_of_codes man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE2 HTML documentation. It was generated
|
||||
automatically from the original man page. If there is any nonsense in it,
|
||||
please consult the man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre2.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
The <i>bytes</i> argument must point to a serialized byte stream that was
|
||||
originally created by <b>pcre2_serialize_encode()</b> (though it may have been
|
||||
saved on disc or elsewhere in the meantime). The function returns the number of
|
||||
serialized patterns in the byte stream, or one of the following negative error
|
||||
codes:
|
||||
<pre>
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in <i>bytes</i>
|
||||
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||
PCRE2_ERROR_NULL the argument is NULL
|
||||
</pre>
|
||||
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||
on a system with different endianness.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
|
@ -21,35 +21,37 @@ please consult the man page, in case the conversion went wrong.
|
|||
<li><a name="TOC6" href="#SEC6">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a>
|
||||
<li><a name="TOC7" href="#SEC7">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a>
|
||||
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
||||
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
||||
<li><a name="TOC10" href="#SEC10">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC11" href="#SEC11">PCRE2 API OVERVIEW</a>
|
||||
<li><a name="TOC12" href="#SEC12">STRING LENGTHS AND OFFSETS</a>
|
||||
<li><a name="TOC13" href="#SEC13">NEWLINES</a>
|
||||
<li><a name="TOC14" href="#SEC14">MULTITHREADING</a>
|
||||
<li><a name="TOC15" href="#SEC15">PCRE2 CONTEXTS</a>
|
||||
<li><a name="TOC16" href="#SEC16">CHECKING BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC17" href="#SEC17">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC18" href="#SEC18">COMPILATION ERROR CODES</a>
|
||||
<li><a name="TOC19" href="#SEC19">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||
<li><a name="TOC20" href="#SEC20">LOCALE SUPPORT</a>
|
||||
<li><a name="TOC21" href="#SEC21">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||
<li><a name="TOC22" href="#SEC22">THE MATCH DATA BLOCK</a>
|
||||
<li><a name="TOC23" href="#SEC23">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||
<li><a name="TOC24" href="#SEC24">NEWLINE HANDLING WHEN MATCHING</a>
|
||||
<li><a name="TOC25" href="#SEC25">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC26" href="#SEC26">OTHER INFORMATION ABOUT A MATCH</a>
|
||||
<li><a name="TOC27" href="#SEC27">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||
<li><a name="TOC28" href="#SEC28">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||
<li><a name="TOC29" href="#SEC29">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC30" href="#SEC30">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||
<li><a name="TOC31" href="#SEC31">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||
<li><a name="TOC32" href="#SEC32">DUPLICATE SUBPATTERN NAMES</a>
|
||||
<li><a name="TOC33" href="#SEC33">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||
<li><a name="TOC34" href="#SEC34">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||
<li><a name="TOC35" href="#SEC35">SEE ALSO</a>
|
||||
<li><a name="TOC36" href="#SEC36">AUTHOR</a>
|
||||
<li><a name="TOC37" href="#SEC37">REVISION</a>
|
||||
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
|
||||
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
||||
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
|
||||
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
|
||||
<li><a name="TOC14" href="#SEC14">NEWLINES</a>
|
||||
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
|
||||
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
|
||||
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
|
||||
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
|
||||
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||
<li><a name="TOC23" href="#SEC23">SERIALIZATION AND PRECOMPILING</a>
|
||||
<li><a name="TOC24" href="#SEC24">THE MATCH DATA BLOCK</a>
|
||||
<li><a name="TOC25" href="#SEC25">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||
<li><a name="TOC26" href="#SEC26">NEWLINE HANDLING WHEN MATCHING</a>
|
||||
<li><a name="TOC27" href="#SEC27">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC28" href="#SEC28">OTHER INFORMATION ABOUT A MATCH</a>
|
||||
<li><a name="TOC29" href="#SEC29">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||
<li><a name="TOC30" href="#SEC30">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||
<li><a name="TOC31" href="#SEC31">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||
<li><a name="TOC33" href="#SEC33">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||
<li><a name="TOC34" href="#SEC34">DUPLICATE SUBPATTERN NAMES</a>
|
||||
<li><a name="TOC35" href="#SEC35">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||
<li><a name="TOC36" href="#SEC36">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||
<li><a name="TOC37" href="#SEC37">SEE ALSO</a>
|
||||
<li><a name="TOC38" href="#SEC38">AUTHOR</a>
|
||||
<li><a name="TOC39" href="#SEC39">REVISION</a>
|
||||
</ul>
|
||||
<P>
|
||||
<b>#include <pcre2.h></b>
|
||||
|
@ -260,7 +262,24 @@ document for an overview of all the PCRE2 documentation.
|
|||
<br>
|
||||
<b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
||||
<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
||||
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
||||
|
@ -274,7 +293,7 @@ document for an overview of all the PCRE2 documentation.
|
|||
<br>
|
||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||
<P>
|
||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
||||
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
||||
|
@ -335,7 +354,7 @@ In the function summaries above, and in the rest of this document and other
|
|||
PCRE2 documents, functions and data types are described using their generic
|
||||
names, without the 8, 16, or 32 suffix.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||
<P>
|
||||
PCRE2 has its own native API, which is described in this document. There are
|
||||
also some wrapper functions for the 8-bit library that correspond to the
|
||||
|
@ -426,7 +445,7 @@ Finally, there are functions for finding out information about a compiled
|
|||
pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
|
||||
PCRE2 was built (<b>pcre2_config()</b>).
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||
<P>
|
||||
The PCRE2 API uses string lengths and offsets into strings of code units in
|
||||
several places. These values are always of type PCRE2_SIZE, which is an
|
||||
|
@ -436,7 +455,7 @@ as a special indicator for zero-terminated strings and unset offsets.
|
|||
Therefore, the longest string that can be handled is one less than this
|
||||
maximum.
|
||||
<a name="newlines"></a></P>
|
||||
<br><a name="SEC13" href="#TOC1">NEWLINES</a><br>
|
||||
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
|
||||
<P>
|
||||
PCRE2 supports five different conventions for indicating line breaks in
|
||||
strings: a single CR (carriage return) character, a single LF (linefeed)
|
||||
|
@ -471,7 +490,7 @@ The choice of newline convention does not affect the interpretation of
|
|||
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
||||
its own separate convention.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">MULTITHREADING</a><br>
|
||||
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
|
||||
<P>
|
||||
In a multithreaded application it is important to keep thread-specific data
|
||||
separate from data that can be shared between threads. The PCRE2 library code
|
||||
|
@ -516,7 +535,7 @@ storing the results of a match. This includes details of what was matched, as
|
|||
well as additional information such as the name of a (*MARK) setting. Each
|
||||
thread must provide its own version of this memory.
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||
<P>
|
||||
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
||||
specialist applications, for example, those that use custom memory management
|
||||
|
@ -797,7 +816,7 @@ exit so that they can be re-used when possible during the match. In the absence
|
|||
of these functions, the normal custom memory management functions are used, if
|
||||
supplied, otherwise the system functions.
|
||||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
|
@ -929,7 +948,7 @@ the PCRE2 version string, zero-terminated. The number of code units used is
|
|||
returned. This is the length of the string plus one unit for the terminating
|
||||
zero.
|
||||
<a name="compiling"></a></P>
|
||||
<br><a name="SEC17" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<P>
|
||||
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
||||
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
||||
|
@ -1305,7 +1324,7 @@ the behaviour of PCRE2 are given in the
|
|||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
page.
|
||||
</P>
|
||||
<br><a name="SEC18" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||
<P>
|
||||
There are over 80 positive error codes that <b>pcre2_compile()</b> may return if
|
||||
it finds an error in the pattern. There are also some negative error codes that
|
||||
|
@ -1315,7 +1334,7 @@ are used for invalid UTF strings. These are the same as given by
|
|||
page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
|
||||
textual error message from any error code.
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
||||
<br>
|
||||
|
@ -1353,7 +1372,7 @@ patterns to be analyzed, and for one-off matches and simple patterns the
|
|||
benefit of faster execution might be offset by a much slower compilation time.
|
||||
Most, but not all patterns can be optimized by the JIT compiler.
|
||||
<a name="localesupport"></a></P>
|
||||
<br><a name="SEC20" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||
<P>
|
||||
PCRE2 handles caseless matching, and determines whether characters are letters,
|
||||
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||
|
@ -1409,7 +1428,7 @@ is saved with the compiled pattern, and the same tables are used by
|
|||
compilation, and matching all happen in the same locale, but different patterns
|
||||
can be processed in different locales.
|
||||
<a name="infoaboutpattern"></a></P>
|
||||
<br><a name="SEC21" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||
<P>
|
||||
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
|
@ -1478,8 +1497,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
|
|||
PCRE2_INFO_BACKREFMAX
|
||||
</pre>
|
||||
Return the number of the highest back reference in the pattern. The third
|
||||
argument should point to an <b>uint32_t</b> variable. Zero is returned if there
|
||||
are no back references.
|
||||
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
|
||||
numbers as well as names, and these count towards the highest back reference.
|
||||
Back references such as \4 or \g{12} match the captured characters of the
|
||||
given group, but in addition, the check that a capturing group is set in a
|
||||
conditional subpattern such as (?(3)a|b) is also a back reference. Zero is
|
||||
returned if there are no back references.
|
||||
<pre>
|
||||
PCRE2_INFO_BSR
|
||||
</pre>
|
||||
|
@ -1689,14 +1712,24 @@ set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET
|
|||
PCRE2_INFO_SIZE
|
||||
</pre>
|
||||
Return the size of the compiled pattern in bytes (for all three libraries). The
|
||||
third argument should point to a <b>size_t</b> variable. This value does not
|
||||
include the size of the <b>pcre2_code</b> structure that is returned by
|
||||
<b>pcre_compile()</b>. The value that is used when <b>pcre2_compile()</b> is
|
||||
getting memory in which to place the compiled data is the value returned by
|
||||
this option plus the size of the <b>pcre2_code</b> structure. Processing a
|
||||
pattern with the JIT compiler does not alter the value returned by this option.
|
||||
third argument should point to a <b>size_t</b> variable. This value includes the
|
||||
size of the general data block that precedes the code units of the compiled
|
||||
pattern itself. The value that is used when <b>pcre2_compile()</b> is getting
|
||||
memory in which to place the compiled pattern may be slightly larger than the
|
||||
value returned by this option, because there are cases where the code that
|
||||
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||
compiler does not alter the value returned by this option.
|
||||
</P>
|
||||
<br><a name="SEC23" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||
<P>
|
||||
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||
later, subject to a number of restrictions. The functions whose names begin
|
||||
with <b>pcre2_serialize_</b> are used for this purpose. They are described in
|
||||
the
|
||||
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||
documentation.
|
||||
<a name="matchdatablock"></a></P>
|
||||
<br><a name="SEC22" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||
<br><a name="SEC24" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||
<P>
|
||||
<b>pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
|
@ -1767,7 +1800,7 @@ match data block (for that match) have taken place.
|
|||
When a match data block itself is no longer needed, it should be freed by
|
||||
calling <b>pcre2_match_data_free()</b>.
|
||||
</P>
|
||||
<br><a name="SEC23" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||
<br><a name="SEC25" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -1981,7 +2014,7 @@ examples, in the
|
|||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC24" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||
<br><a name="SEC26" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||
<P>
|
||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||
standard convention for the operating system. The default can be overridden in
|
||||
|
@ -2016,7 +2049,7 @@ LF in the characters that it matches.
|
|||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
||||
<a name="matchedstrings"></a></P>
|
||||
<br><a name="SEC25" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||
<br><a name="SEC27" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||
<P>
|
||||
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
||||
<br>
|
||||
|
@ -2118,7 +2151,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
|
|||
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
||||
had.
|
||||
<a name="matchotherdata"></a></P>
|
||||
<br><a name="SEC26" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||
<br><a name="SEC28" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||
<P>
|
||||
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
||||
<br>
|
||||
|
@ -2162,7 +2195,7 @@ the code unit offset of the invalid UTF character. Details are given in the
|
|||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
page.
|
||||
<a name="errorlist"></a></P>
|
||||
<br><a name="SEC27" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||
<br><a name="SEC29" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||
<P>
|
||||
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
||||
converted to a text string by calling <b>pcre2_get_error_message()</b>. Negative
|
||||
|
@ -2271,7 +2304,7 @@ is attempted.
|
|||
</pre>
|
||||
The internal recursion limit was reached.
|
||||
<a name="extractbynumber"></a></P>
|
||||
<br><a name="SEC28" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||
<br><a name="SEC30" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
||||
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
||||
|
@ -2368,7 +2401,7 @@ The substring did not participate in the match. For example, if the pattern is
|
|||
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
||||
capturing slots, substring number 1 is unset.
|
||||
</P>
|
||||
<br><a name="SEC29" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||
<br><a name="SEC31" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
||||
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
||||
|
@ -2407,7 +2440,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
|
|||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
||||
<a name="extractbyname"></a></P>
|
||||
<br><a name="SEC30" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
||||
<b> PCRE2_SPTR <i>name</i>);</b>
|
||||
|
@ -2467,7 +2500,7 @@ names are not included in the compiled code. The matching process uses only
|
|||
numbers. For this reason, the use of different names for subpatterns of the
|
||||
same number causes an error at compile time.
|
||||
</P>
|
||||
<br><a name="SEC31" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||
<br><a name="SEC33" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -2528,7 +2561,7 @@ straight back. PCRE2_ERROR_BADREPLACEMENT is returned for an invalid
|
|||
replacement string (unrecognized sequence following a dollar sign), and
|
||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough.
|
||||
</P>
|
||||
<br><a name="SEC32" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||
<br><a name="SEC34" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
||||
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
||||
|
@ -2573,7 +2606,7 @@ The format of the name table is described above in the section entitled
|
|||
Given all the relevant entries for the name, you can extract each of their
|
||||
numbers, and hence the captured data.
|
||||
</P>
|
||||
<br><a name="SEC33" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||
<br><a name="SEC35" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||
<P>
|
||||
The traditional matching function uses a similar algorithm to Perl, which stops
|
||||
when it finds the first match at a given point in the subject. If you want to
|
||||
|
@ -2591,7 +2624,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
|
|||
other alternatives. Ultimately, when it runs out of matches,
|
||||
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
||||
<a name="dfamatch"></a></P>
|
||||
<br><a name="SEC34" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||
<br><a name="SEC36" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -2786,13 +2819,13 @@ some plausibility checks are made on the contents of the workspace, which
|
|||
should contain data about the previous partial match. If any of these checks
|
||||
fail, this error is given.
|
||||
</P>
|
||||
<br><a name="SEC35" href="#TOC1">SEE ALSO</a><br>
|
||||
<br><a name="SEC37" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
||||
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
||||
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC36" href="#TOC1">AUTHOR</a><br>
|
||||
<br><a name="SEC38" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
|
@ -2801,9 +2834,9 @@ University Computing Service
|
|||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC37" href="#TOC1">REVISION</a><br>
|
||||
<br><a name="SEC39" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 02 January 2015
|
||||
Last updated: 23 January 2015
|
||||
<br>
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -0,0 +1,184 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre2serialize specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre2serialize man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE2 HTML documentation. It was generated
|
||||
automatically from the original man page. If there is any nonsense in it,
|
||||
please consult the man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
|
||||
<li><a name="TOC2" href="#SEC2">SAVING COMPILED PATTERNS</a>
|
||||
<li><a name="TOC3" href="#SEC3">RE-USING PRECOMPILED PATTERNS</a>
|
||||
<li><a name="TOC4" href="#SEC4">AUTHOR</a>
|
||||
<li><a name="TOC5" href="#SEC5">REVISION</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
|
||||
<P>
|
||||
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
If you are running an application that uses a large number of regular
|
||||
expression patterns, it may be useful to store them in a precompiled form
|
||||
instead of having to compile them every time the application is run. However,
|
||||
if you are using the just-in-time optimization feature, it is not possible to
|
||||
save and reload the JIT data, because it is position-dependent. In addition,
|
||||
the host on which the patterns are reloaded must be running the same version of
|
||||
PCRE2, with the same code unit width, and must also have the same endianness,
|
||||
pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
|
||||
system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
|
||||
can they be reloaded using the 8-bit library.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
|
||||
<P>
|
||||
Before compiled patterns can be saved they must be serialized, that is,
|
||||
converted to a stream of bytes. A single byte stream may contain any number of
|
||||
compiled patterns, but they must all use the same character tables. A single
|
||||
copy of the tables is included in the byte stream (its size is 1088 bytes). For
|
||||
more details of character tables, see the
|
||||
<a href="pcre2api.html#localesupport">section on locale support</a>
|
||||
in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream
|
||||
from a list of compiled patterns. Its first two arguments specify the list,
|
||||
being a pointer to a vector of pointers to compiled patterns, and the length of
|
||||
the vector. The third and fourth arguments point to variables which are set to
|
||||
point to the created byte stream and its length, respectively. The final
|
||||
argument is a pointer to a general context, which can be used to specify custom
|
||||
memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used
|
||||
to obtain memory for the byte stream. The yield of the function is the number
|
||||
of serialized patterns, or one of the following negative error codes:
|
||||
<pre>
|
||||
PCRE2_ERROR_BADDATA the number of patterns is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||
PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
|
||||
</pre>
|
||||
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||
that a slot in the vector does not point to a compiled pattern.
|
||||
</P>
|
||||
<P>
|
||||
Once a set of patterns has been serialized you can save the data in any
|
||||
appropriate manner. Here is sample code that compiles two patterns and writes
|
||||
them to a file. It assumes that the variable <i>fd</i> refers to a file that is
|
||||
open for output. The error checking that should be present in a real
|
||||
application has been omitted for simplicity.
|
||||
<pre>
|
||||
int errorcode;
|
||||
uint8_t *bytes;
|
||||
PCRE2_SIZE erroroffset;
|
||||
PCRE2_SIZE bytescount;
|
||||
pcre2_code *list_of_codes[2];
|
||||
list_of_codes[0] = pcre2_compile("first pattern",
|
||||
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||
list_of_codes[1] = pcre2_compile("second pattern",
|
||||
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||
errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
|
||||
&bytescount, NULL);
|
||||
errorcode = fwrite(bytes, 1, bytescount, fd);
|
||||
</pre>
|
||||
Note that the serialized data is binary data that may contain any of the 256
|
||||
possible byte values. On systems that make a distinction between binary and
|
||||
non-binary data, be sure that the file is opened for binary output.
|
||||
</P>
|
||||
<P>
|
||||
Serializing a set of patterns leaves the original data untouched, so they can
|
||||
still be used for matching. Their memory must eventually be freed in the usual
|
||||
way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
|
||||
stream, it too must be freed by calling <b>pcre2_serialize_free()</b>.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
|
||||
<P>
|
||||
In order to re-use a set of saved patterns you must first make the serialized
|
||||
byte stream available in main memory (for example, by reading from a file). The
|
||||
management of this memory block is up to the application. You can use the
|
||||
<b>pcre2_serialize_get_number_of_codes()</b> function to find out how many
|
||||
compiled patterns are in the serialized data without actually decoding the
|
||||
patterns:
|
||||
<pre>
|
||||
uint8_t *bytes = <serialized data>;
|
||||
int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
|
||||
</pre>
|
||||
The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates
|
||||
the compiled patterns in new memory blocks, setting pointers to them in a
|
||||
vector. The first two arguments are a pointer to a suitable vector and its
|
||||
length, and the third argument points to a byte stream. The final argument is a
|
||||
pointer to a general context, which can be used to specify custom memory
|
||||
mangagement functions for the decoded patterns. If this argument is NULL,
|
||||
<b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
|
||||
stream is no longer needed and can be discarded.
|
||||
<pre>
|
||||
int32_t number_of_codes;
|
||||
pcre2_code *list_of_codes[2];
|
||||
uint8_t *bytes = <serialized data>;
|
||||
int32_t number_of_codes =
|
||||
pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
|
||||
</pre>
|
||||
If the vector is not large enough for all the patterns in the byte stream, it
|
||||
is filled with those that fit, and the remainder are ignored. The yield of the
|
||||
function is the number of decoded patterns, or one of the following negative
|
||||
error codes:
|
||||
<pre>
|
||||
PCRE2_ERROR_BADDATA second argument is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
|
||||
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE2 version
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_NULL first or third argument is NULL
|
||||
</pre>
|
||||
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||
on a system with different endianness.
|
||||
</P>
|
||||
<P>
|
||||
Decoded patterns can be used for matching in the usual way, and must be freed
|
||||
by calling <b>pcre2_code_free()</b> as normal. A single copy of the character
|
||||
tables is used by all the decoded patterns. A reference count is used to
|
||||
arrange for its memory to be automatically freed when the last pattern is
|
||||
freed.
|
||||
</P>
|
||||
<P>
|
||||
If a pattern was processed by <b>pcre2_jit_compile()</b> before being
|
||||
serialized, the JIT data is discarded and so is no longer available after a
|
||||
save/restore cycle. You can, however, process a restored pattern with
|
||||
<b>pcre2_jit_compile()</b> if you wish.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service
|
||||
<br>
|
||||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 20 January 2015
|
||||
<br>
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
</p>
|
|
@ -30,9 +30,10 @@ please consult the man page, in case the conversion went wrong.
|
|||
<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a>
|
||||
<li><a name="TOC16" href="#SEC16">CALLOUTS</a>
|
||||
<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a>
|
||||
<li><a name="TOC18" href="#SEC18">SEE ALSO</a>
|
||||
<li><a name="TOC19" href="#SEC19">AUTHOR</a>
|
||||
<li><a name="TOC20" href="#SEC20">REVISION</a>
|
||||
<li><a name="TOC18" href="#SEC18">SAVING AND RESTORING COMPILED PATTERNS</a>
|
||||
<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
|
||||
<li><a name="TOC20" href="#SEC20">AUTHOR</a>
|
||||
<li><a name="TOC21" href="#SEC21">REVISION</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
|
@ -51,10 +52,11 @@ documentation.
|
|||
</P>
|
||||
<P>
|
||||
The input for <b>pcre2test</b> is a sequence of regular expression patterns and
|
||||
subject strings to be matched. The output shows the result of each match
|
||||
attempt. Modifiers on the command line, the patterns, and the subject lines
|
||||
specify PCRE2 function options, control how the subject is processed, and what
|
||||
output is produced.
|
||||
subject strings to be matched. There are also command lines for setting
|
||||
defaults and controlling some special actions. The output shows the result of
|
||||
each match attempt. Modifiers on external or internal command lines, the
|
||||
patterns, and the subject lines specify PCRE2 function options, control how the
|
||||
subject is processed, and what output is produced.
|
||||
</P>
|
||||
<P>
|
||||
As the original fairly simple PCRE library evolved, it acquired many different
|
||||
|
@ -227,9 +229,7 @@ If <b>pcre2test</b> is given two filename arguments, it reads from the first and
|
|||
writes to the second. If the first name is "-", input is taken from the
|
||||
standard input. If <b>pcre2test</b> is given only one argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout. When the input is a terminal, it prompts for each line of input, using
|
||||
"re>" to prompt for regular expression patterns, and "data>" to prompt for
|
||||
subject lines.
|
||||
stdout.
|
||||
</P>
|
||||
<P>
|
||||
When <b>pcre2test</b> is built, a configuration option can specify that it
|
||||
|
@ -242,10 +242,16 @@ the <b>-help</b> option states whether or not <b>readline()</b> will be used.
|
|||
The program handles any number of tests, each of which consists of a set of
|
||||
input lines. Each set starts with a regular expression pattern, followed by any
|
||||
number of subject lines to be matched against that pattern. In between sets of
|
||||
test data, command lines that begin with a hash (#) character may appear. This
|
||||
file format, with some restrictions, can also be processed by the
|
||||
<b>perltest.sh</b> script that is distributed with PCRE2 as a means of checking
|
||||
that the behaviour of PCRE2 and Perl is the same.
|
||||
test data, command lines that begin with # may appear. This file format, with
|
||||
some restrictions, can also be processed by the <b>perltest.sh</b> script that
|
||||
is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
|
||||
and Perl is the same.
|
||||
</P>
|
||||
<P>
|
||||
When the input is a terminal, <b>pcre2test</b> prompts for each line of input,
|
||||
using "re>" to prompt for regular expression patterns, and "data>" to prompt
|
||||
for subject lines. Command lines starting with # can be entered only in
|
||||
response to the "re>" prompt.
|
||||
</P>
|
||||
<P>
|
||||
Each subject line is matched separately and independently. If you want to do
|
||||
|
@ -263,21 +269,27 @@ still input to be read.
|
|||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br>
|
||||
<P>
|
||||
In between sets of test data, a line that begins with a hash (#) character is
|
||||
interpreted as a command line. If the first character is followed by white
|
||||
space or an exclamation mark, the line is treated as a comment, and ignored.
|
||||
Otherwise, the following commands are recognized:
|
||||
In between sets of test data, a line that begins with # is interpreted as a
|
||||
command line. If the first character is followed by white space or an
|
||||
exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
|
||||
following commands are recognized:
|
||||
<pre>
|
||||
#forbid_utf
|
||||
</pre>
|
||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
||||
options set, which locks out the use of UTF and Unicode property features. This
|
||||
is a trigger guard that is used in test files to ensure that UTF/Unicode tests
|
||||
are not accidentally added to files that are used when UTF support is not
|
||||
included in the library. This effect can also be obtained by the use of
|
||||
<b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be unset, and
|
||||
the automatic options are not displayed in pattern information, to avoid
|
||||
cluttering up test output.
|
||||
is a trigger guard that is used in test files to ensure that UTF or Unicode
|
||||
property tests are not accidentally added to files that are used when Unicode
|
||||
support is not included in the library. This effect can also be obtained by the
|
||||
use of <b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be
|
||||
unset, and the automatic options are not displayed in pattern information, to
|
||||
avoid cluttering up test output.
|
||||
<pre>
|
||||
#load <filename>
|
||||
</pre>
|
||||
This command is used to load a set of precompiled patterns from a file, as
|
||||
described in the section entitled "Saving and restoring compiled patterns"
|
||||
<a href="#saverestore">below.</a>
|
||||
<pre>
|
||||
#pattern <modifier-list>
|
||||
</pre>
|
||||
|
@ -293,6 +305,18 @@ lines, none of the other command lines are permitted, because they and many
|
|||
of the modifiers are specific to <b>pcre2test</b>, and should not be used in
|
||||
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
||||
command helps detect tests that are accidentally put in the wrong file.
|
||||
<pre>
|
||||
#pop [<modifiers>]
|
||||
</pre>
|
||||
This command is used to manipulate the stack of compiled patterns, as described
|
||||
in the section entitled "Saving and restoring compiled patterns"
|
||||
<a href="#saverestore">below.</a>
|
||||
<pre>
|
||||
#save <filename>
|
||||
</pre>
|
||||
This command is used to save a set of compiled patterns to a file, as described
|
||||
in the section entitled "Saving and restoring compiled patterns"
|
||||
<a href="#saverestore">below.</a>
|
||||
<pre>
|
||||
#subject <modifier-list>
|
||||
</pre>
|
||||
|
@ -428,7 +452,7 @@ There are three types of modifier that can appear in pattern lines, two of
|
|||
which may also be used in a <b>#pattern</b> command. A pattern's modifier list
|
||||
can add to or override default modifiers that were set by a previous
|
||||
<b>#pattern</b> command.
|
||||
</P>
|
||||
<a name="optionmodifiers"></a></P>
|
||||
<br><b>
|
||||
Setting compilation options
|
||||
</b><br>
|
||||
|
@ -465,7 +489,7 @@ As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
|
|||
non-printing characters in output strings to be printed using the \x{hh...}
|
||||
notation. Otherwise, those less than 0x100 are output in hex without the curly
|
||||
brackets.
|
||||
</P>
|
||||
<a name="controlmodifiers"></a></P>
|
||||
<br><b>
|
||||
Setting compilation controls
|
||||
</b><br>
|
||||
|
@ -486,8 +510,8 @@ about the pattern:
|
|||
memory show memory used
|
||||
newline=<type> set newline type
|
||||
parens_nest_limit=<n> set maximum parentheses depth
|
||||
perlcompat lock out non-Perl modifiers
|
||||
posix use the POSIX API
|
||||
push push compiled pattern onto the stack
|
||||
stackguard=<number> test the stackguard feature
|
||||
tables=[0|1|2] select internal tables
|
||||
</pre>
|
||||
|
@ -726,6 +750,22 @@ not affect the compilation process.
|
|||
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
||||
defaults, set them in a <b>#subject</b> command.
|
||||
</P>
|
||||
<br><b>
|
||||
Saving a compiled pattern
|
||||
</b><br>
|
||||
<P>
|
||||
When a pattern with the <b>push</b> modifier is successfully compiled, it is
|
||||
pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
|
||||
line to contain a new pattern (or a command) instead of a subject line. This
|
||||
facility is used when saving compiled patterns to a file, as described in the
|
||||
section entitled "Saving and restoring compiled patterns"
|
||||
<a href="#saverestore">below.</a>
|
||||
The <b>push</b> modifier is incompatible with compilation modifiers such as
|
||||
<b>global</b> that act at match time. Any that are specified are ignored, with a
|
||||
warning message, except for <b>replace</b>, which causes an error. Note that,
|
||||
<b>jitverify</b>, which is allowed, does not carry through to any subsequent
|
||||
matching that uses this pattern.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
||||
<P>
|
||||
The modifiers that can appear in subject lines and the <b>#subject</b>
|
||||
|
@ -1292,14 +1332,75 @@ string, it behaves in the same way, unless a different locale has been set for
|
|||
the pattern (using the <b>/locale</b> modifier). In this case, the
|
||||
<b>isprint()</b> function is used to distinguish printing and non-printing
|
||||
characters.
|
||||
<a name="saverestore"></a></P>
|
||||
<br><a name="SEC18" href="#TOC1">SAVING AND RESTORING COMPILED PATTERNS</a><br>
|
||||
<P>
|
||||
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||
later, subject to a number of restrictions. JIT data cannot be saved. The host
|
||||
on which the patterns are reloaded must be running the same version of PCRE2,
|
||||
with the same code unit width, and must also have the same endianness, pointer
|
||||
width and PCRE2_SIZE type. Before compiled patterns can be saved they must be
|
||||
serialized, that is, converted to a stream of bytes. A single byte stream may
|
||||
contain any number of compiled patterns, but they must all use the same
|
||||
character tables. A single copy of the tables is included in the byte stream
|
||||
(its size is 1088 bytes).
|
||||
</P>
|
||||
<br><a name="SEC18" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
The functions whose names begin with <b>pcre2_serialize_</b> are used
|
||||
for serializing and de-serializing. They are described in the
|
||||
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||
documentation. In this section we describe the features of <b>pcre2test</b> that
|
||||
can be used to test these functions.
|
||||
</P>
|
||||
<P>
|
||||
When a pattern with <b>push</b> modifier is successfully compiled, it is pushed
|
||||
onto a stack of compiled patterns, and <b>pcre2test</b> expects the next line to
|
||||
contain a new pattern (or command) instead of a subject line. By this means, a
|
||||
number of patterns can be compiled and retained. The <b>push</b> modifier is
|
||||
incompatible with <b>posix</b>, and control modifiers that act at match time are
|
||||
ignored (with a message). The <b>jitverify</b> modifier applies only at compile
|
||||
time. The command
|
||||
<pre>
|
||||
#save <filename>
|
||||
</pre>
|
||||
causes all the stacked patterns to be serialized and the result written to the
|
||||
named file. Afterwards, all the stacked patterns are freed. The command
|
||||
<pre>
|
||||
#load <filename>
|
||||
</pre>
|
||||
reads the data in the file, and then arranges for it to be de-serialized, with
|
||||
the resulting compiled patterns added to the pattern stack. The pattern on the
|
||||
top of the stack can be retrieved by the #pop command, which must be followed
|
||||
by lines of subjects that are to be matched with the pattern, terminated as
|
||||
usual by an empty line or end of file. This command may be followed by a
|
||||
modifier list containing only
|
||||
<a href="#controlmodifiers">control modifiers</a>
|
||||
that act after a pattern has been compiled. In particular, <b>hex</b>,
|
||||
<b>posix</b>, and <b>push</b> are not allowed, nor are any
|
||||
<a href="#optionmodifiers">option-setting modifiers.</a>
|
||||
The JIT modifiers are, however permitted. Here is an example that saves and
|
||||
reloads two patterns.
|
||||
<pre>
|
||||
/abc/push
|
||||
/xyz/push
|
||||
#save tempfile
|
||||
#load tempfile
|
||||
#pop info
|
||||
xyz
|
||||
|
||||
#pop jit,bincode
|
||||
abc
|
||||
</pre>
|
||||
If <b>jitverify</b> is used with #pop, it does not automatically imply
|
||||
<b>jit</b>, which is different behaviour from when it is used on a pattern.
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
||||
<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d),
|
||||
<b>pcre2pattern</b>(3).
|
||||
<b>pcre2pattern</b>(3), <b>pcre2serialize</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">AUTHOR</a><br>
|
||||
<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
|
@ -1308,9 +1409,9 @@ University Computing Service
|
|||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC20" href="#TOC1">REVISION</a><br>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 02 January 2015
|
||||
Last updated: 23 January 2015
|
||||
<br>
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -65,6 +65,9 @@ first.
|
|||
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
||||
<td> Discussion of the pcre2demo program</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2serialize.html">pcre2serialize</a></td>
|
||||
<td> Serializing functions for saving precompiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
||||
<td> Discussion of PCRE2's stack usage</td></tr>
|
||||
|
||||
|
@ -177,6 +180,18 @@ in the library.
|
|||
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
||||
<td> Extract information about a pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_decode.html">pcre2_serialize_decode</a></td>
|
||||
<td> Decode serialized compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_encode.html">pcre2_serialize_encode</a></td>
|
||||
<td> Serialize compiled patterns for save/restore</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_free.html">pcre2_serialize_free</a></td>
|
||||
<td> Free serialized compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_serialize_get_number_of_codes.html">pcre2_serialize_get_number_of_codes</a></td>
|
||||
<td> Get number of serialized compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
||||
<td> Set \R convention</td></tr>
|
||||
|
||||
|
|
971
doc/pcre2.txt
971
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,50 @@
|
|||
.TH PCRE2_SERIALIZE_DECODE 3 "19 January 2015" "PCRE2 10.10"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre2.h>
|
||||
.PP
|
||||
.nf
|
||||
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function decodes a serialized set of compiled patterns back into a list of
|
||||
individual patterns. Its arguments are:
|
||||
.sp
|
||||
\fIcodes\fP pointer to a vector in which to build the list
|
||||
\fInumber_of_codes\fP number of slots in the vector
|
||||
\fIbytes\fP the serialized byte stream
|
||||
\fIgcontext\fP pointer to a general context or NULL
|
||||
.sp
|
||||
The \fIbytes\fP argument must point to a block of data that was originally
|
||||
created by \fBpcre2_serialize_encode()\fP, though it may have been saved on
|
||||
disc or elsewhere in the meantime. If there are more codes in the serialized
|
||||
data than slots in the list, only those compiled patterns that will fit are
|
||||
decoded. The yield of the function is the number of decoded patterns, or one of
|
||||
the following negative error codes:
|
||||
.sp
|
||||
PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP
|
||||
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_NULL \fIcodes\fP or \fIbytes\fP is NULL
|
||||
.sp
|
||||
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||
on a system with different endianness.
|
||||
.P
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcre2posix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,49 @@
|
|||
.TH PCRE2_SERIALIZE_ENCODE 3 "19 January 2015" "PCRE2 10.10"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre2.h>
|
||||
.PP
|
||||
.nf
|
||||
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function encodes a list of compiled patterns into a byte stream that can
|
||||
be saved on disc or elsewhere. Its arguments are:
|
||||
.sp
|
||||
\fIcodes\fP pointer to a vector containing the list
|
||||
\fInumber_of_codes\fP number of slots in the vector
|
||||
\fIserialized_bytes\fP set to point to the serialized byte stream
|
||||
\fIserialized_size\fP set to the number of bytes in the byte stream
|
||||
\fIgcontext\fP pointer to a general context or NULL
|
||||
.sp
|
||||
The context argument is used to obtain memory for the byte stream. When the
|
||||
serialized data is no longer needed, it must be freed by calling
|
||||
\fBpcre2_serialize_free()\fP. The yield of the function is the number of
|
||||
serialized patterns, or one of the following negative error codes:
|
||||
.sp
|
||||
PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||
PCRE2_ERROR_NULL an argument other than \fIgcontext\fP is NULL
|
||||
.sp
|
||||
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||
that a slot in the vector does not point to a compiled pattern.
|
||||
.P
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcre2posix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,28 @@
|
|||
.TH PCRE2_SERIALIZE_FREE 3 "19 January 2015" "PCRE2 10.10"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre2.h>
|
||||
.PP
|
||||
.nf
|
||||
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function frees the memory that was obtained by
|
||||
\fBpcre2_serialize_encode()\fP to hold a serialized byte stream. The argument
|
||||
must point to such a byte stream.
|
||||
.P
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcre2posix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,37 @@
|
|||
.TH PCRE2_SERIALIZE_GET_NUMBER_OF_CODES 3 "19 January 2015" "PCRE2 10.10"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre2.h>
|
||||
.PP
|
||||
.nf
|
||||
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||
.fi
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
The \fIbytes\fP argument must point to a serialized byte stream that was
|
||||
originally created by \fBpcre2_serialize_encode()\fP (though it may have been
|
||||
saved on disc or elsewhere in the meantime). The function returns the number of
|
||||
serialized patterns in the byte stream, or one of the following negative error
|
||||
codes:
|
||||
.sp
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP
|
||||
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||
PCRE2_ERROR_NULL the argument is NULL
|
||||
.sp
|
||||
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||
on a system with different endianness.
|
||||
.P
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcre2posix\fP
|
||||
.\"
|
||||
page.
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "13 January 2015" "PCRE2 10.10"
|
||||
.TH PCRE2API 3 "23 January 2015" "PCRE2 10.10"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -205,6 +205,24 @@ document for an overview of all the PCRE2 documentation.
|
|||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 NATIVE API SERIALIZATION FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||
.sp
|
||||
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||
.sp
|
||||
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||
.sp
|
||||
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 NATIVE API AUXILIARY FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -1689,12 +1707,26 @@ set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
|||
PCRE2_INFO_SIZE
|
||||
.sp
|
||||
Return the size of the compiled pattern in bytes (for all three libraries). The
|
||||
third argument should point to a \fBsize_t\fP variable. This value does not
|
||||
include the size of the \fBpcre2_code\fP structure that is returned by
|
||||
\fBpcre_compile()\fP. The value that is used when \fBpcre2_compile()\fP is
|
||||
getting memory in which to place the compiled data is the value returned by
|
||||
this option plus the size of the \fBpcre2_code\fP structure. Processing a
|
||||
pattern with the JIT compiler does not alter the value returned by this option.
|
||||
third argument should point to a \fBsize_t\fP variable. This value includes the
|
||||
size of the general data block that precedes the code units of the compiled
|
||||
pattern itself. The value that is used when \fBpcre2_compile()\fP is getting
|
||||
memory in which to place the compiled pattern may be slightly larger than the
|
||||
value returned by this option, because there are cases where the code that
|
||||
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||
compiler does not alter the value returned by this option.
|
||||
.
|
||||
.
|
||||
.SH "SERIALIZATION AND PRECOMPILING"
|
||||
.rs
|
||||
.sp
|
||||
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||
later, subject to a number of restrictions. The functions whose names begin
|
||||
with \fBpcre2_serialize_\fP are used for this purpose. They are described in
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcre2serialize\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="matchdatablock"></a>
|
||||
|
@ -2853,6 +2885,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 13 January 2015
|
||||
Last updated: 23 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -0,0 +1,170 @@
|
|||
.TH PCRE2SERIALIZE 3 "20 January 2015" "PCRE2 10.10"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||
.sp
|
||||
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||
.sp
|
||||
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||
.sp
|
||||
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||
.fi
|
||||
.sp
|
||||
If you are running an application that uses a large number of regular
|
||||
expression patterns, it may be useful to store them in a precompiled form
|
||||
instead of having to compile them every time the application is run. However,
|
||||
if you are using the just-in-time optimization feature, it is not possible to
|
||||
save and reload the JIT data, because it is position-dependent. In addition,
|
||||
the host on which the patterns are reloaded must be running the same version of
|
||||
PCRE2, with the same code unit width, and must also have the same endianness,
|
||||
pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
|
||||
system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
|
||||
can they be reloaded using the 8-bit library.
|
||||
.
|
||||
.
|
||||
.SH "SAVING COMPILED PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
Before compiled patterns can be saved they must be serialized, that is,
|
||||
converted to a stream of bytes. A single byte stream may contain any number of
|
||||
compiled patterns, but they must all use the same character tables. A single
|
||||
copy of the tables is included in the byte stream (its size is 1088 bytes). For
|
||||
more details of character tables, see the
|
||||
.\" HTML <a href="pcre2api.html#localesupport">
|
||||
.\" </a>
|
||||
section on locale support
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
The function \fBpcre2_serialize_encode()\fP creates a serialized byte stream
|
||||
from a list of compiled patterns. Its first two arguments specify the list,
|
||||
being a pointer to a vector of pointers to compiled patterns, and the length of
|
||||
the vector. The third and fourth arguments point to variables which are set to
|
||||
point to the created byte stream and its length, respectively. The final
|
||||
argument is a pointer to a general context, which can be used to specify custom
|
||||
memory mangagement functions. If this argument is NULL, \fBmalloc()\fP is used
|
||||
to obtain memory for the byte stream. The yield of the function is the number
|
||||
of serialized patterns, or one of the following negative error codes:
|
||||
.sp
|
||||
PCRE2_ERROR_BADDATA the number of patterns is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||
PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
|
||||
.sp
|
||||
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||
that a slot in the vector does not point to a compiled pattern.
|
||||
.P
|
||||
Once a set of patterns has been serialized you can save the data in any
|
||||
appropriate manner. Here is sample code that compiles two patterns and writes
|
||||
them to a file. It assumes that the variable \fIfd\fP refers to a file that is
|
||||
open for output. The error checking that should be present in a real
|
||||
application has been omitted for simplicity.
|
||||
.sp
|
||||
int errorcode;
|
||||
uint8_t *bytes;
|
||||
PCRE2_SIZE erroroffset;
|
||||
PCRE2_SIZE bytescount;
|
||||
pcre2_code *list_of_codes[2];
|
||||
list_of_codes[0] = pcre2_compile("first pattern",
|
||||
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||
list_of_codes[1] = pcre2_compile("second pattern",
|
||||
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||
errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
|
||||
&bytescount, NULL);
|
||||
errorcode = fwrite(bytes, 1, bytescount, fd);
|
||||
.sp
|
||||
Note that the serialized data is binary data that may contain any of the 256
|
||||
possible byte values. On systems that make a distinction between binary and
|
||||
non-binary data, be sure that the file is opened for binary output.
|
||||
.P
|
||||
Serializing a set of patterns leaves the original data untouched, so they can
|
||||
still be used for matching. Their memory must eventually be freed in the usual
|
||||
way by calling \fBpcre2_code_free()\fP. When you have finished with the byte
|
||||
stream, it too must be freed by calling \fBpcre2_serialize_free()\fP.
|
||||
.
|
||||
.
|
||||
.SH "RE-USING PRECOMPILED PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
In order to re-use a set of saved patterns you must first make the serialized
|
||||
byte stream available in main memory (for example, by reading from a file). The
|
||||
management of this memory block is up to the application. You can use the
|
||||
\fBpcre2_serialize_get_number_of_codes()\fP function to find out how many
|
||||
compiled patterns are in the serialized data without actually decoding the
|
||||
patterns:
|
||||
.sp
|
||||
uint8_t *bytes = <serialized data>;
|
||||
int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
|
||||
.sp
|
||||
The \fBpcre2_serialize_decode()\fP function reads a byte stream and recreates
|
||||
the compiled patterns in new memory blocks, setting pointers to them in a
|
||||
vector. The first two arguments are a pointer to a suitable vector and its
|
||||
length, and the third argument points to a byte stream. The final argument is a
|
||||
pointer to a general context, which can be used to specify custom memory
|
||||
mangagement functions for the decoded patterns. If this argument is NULL,
|
||||
\fBmalloc()\fP and \fBfree()\fP are used. After deserialization, the byte
|
||||
stream is no longer needed and can be discarded.
|
||||
.sp
|
||||
int32_t number_of_codes;
|
||||
pcre2_code *list_of_codes[2];
|
||||
uint8_t *bytes = <serialized data>;
|
||||
int32_t number_of_codes =
|
||||
pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
|
||||
.sp
|
||||
If the vector is not large enough for all the patterns in the byte stream, it
|
||||
is filled with those that fit, and the remainder are ignored. The yield of the
|
||||
function is the number of decoded patterns, or one of the following negative
|
||||
error codes:
|
||||
.sp
|
||||
PCRE2_ERROR_BADDATA second argument is zero or less
|
||||
PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
|
||||
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE2 version
|
||||
PCRE2_ERROR_MEMORY memory allocation failed
|
||||
PCRE2_ERROR_NULL first or third argument is NULL
|
||||
.sp
|
||||
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||
on a system with different endianness.
|
||||
.P
|
||||
Decoded patterns can be used for matching in the usual way, and must be freed
|
||||
by calling \fBpcre2_code_free()\fP as normal. A single copy of the character
|
||||
tables is used by all the decoded patterns. A reference count is used to
|
||||
arrange for its memory to be automatically freed when the last pattern is
|
||||
freed.
|
||||
.P
|
||||
If a pattern was processed by \fBpcre2_jit_compile()\fP before being
|
||||
serialized, the JIT data is discarded and so is no longer available after a
|
||||
save/restore cycle. You can, however, process a restored pattern with
|
||||
\fBpcre2_jit_compile()\fP if you wish.
|
||||
.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Philip Hazel
|
||||
University Computing Service
|
||||
Cambridge, England.
|
||||
.fi
|
||||
.
|
||||
.
|
||||
.SH REVISION
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 20 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
173
doc/pcre2test.1
173
doc/pcre2test.1
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "02 January 2015" "PCRE 10.00"
|
||||
.TH PCRE2TEST 1 "23 January 2015" "PCRE 10.10"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -21,10 +21,11 @@ options, see the
|
|||
documentation.
|
||||
.P
|
||||
The input for \fBpcre2test\fP is a sequence of regular expression patterns and
|
||||
subject strings to be matched. The output shows the result of each match
|
||||
attempt. Modifiers on the command line, the patterns, and the subject lines
|
||||
specify PCRE2 function options, control how the subject is processed, and what
|
||||
output is produced.
|
||||
subject strings to be matched. There are also command lines for setting
|
||||
defaults and controlling some special actions. The output shows the result of
|
||||
each match attempt. Modifiers on external or internal command lines, the
|
||||
patterns, and the subject lines specify PCRE2 function options, control how the
|
||||
subject is processed, and what output is produced.
|
||||
.P
|
||||
As the original fairly simple PCRE library evolved, it acquired many different
|
||||
features, and as a result, the original \fBpcretest\fP program ended up with a
|
||||
|
@ -185,9 +186,7 @@ If \fBpcre2test\fP is given two filename arguments, it reads from the first and
|
|||
writes to the second. If the first name is "-", input is taken from the
|
||||
standard input. If \fBpcre2test\fP is given only one argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout. When the input is a terminal, it prompts for each line of input, using
|
||||
"re>" to prompt for regular expression patterns, and "data>" to prompt for
|
||||
subject lines.
|
||||
stdout.
|
||||
.P
|
||||
When \fBpcre2test\fP is built, a configuration option can specify that it
|
||||
should be linked with the \fBlibreadline\fP or \fBlibedit\fP library. When this
|
||||
|
@ -198,10 +197,15 @@ the \fB-help\fP option states whether or not \fBreadline()\fP will be used.
|
|||
The program handles any number of tests, each of which consists of a set of
|
||||
input lines. Each set starts with a regular expression pattern, followed by any
|
||||
number of subject lines to be matched against that pattern. In between sets of
|
||||
test data, command lines that begin with a hash (#) character may appear. This
|
||||
file format, with some restrictions, can also be processed by the
|
||||
\fBperltest.sh\fP script that is distributed with PCRE2 as a means of checking
|
||||
that the behaviour of PCRE2 and Perl is the same.
|
||||
test data, command lines that begin with # may appear. This file format, with
|
||||
some restrictions, can also be processed by the \fBperltest.sh\fP script that
|
||||
is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
|
||||
and Perl is the same.
|
||||
.P
|
||||
When the input is a terminal, \fBpcre2test\fP prompts for each line of input,
|
||||
using "re>" to prompt for regular expression patterns, and "data>" to prompt
|
||||
for subject lines. Command lines starting with # can be entered only in
|
||||
response to the "re>" prompt.
|
||||
.P
|
||||
Each subject line is matched separately and independently. If you want to do
|
||||
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
||||
|
@ -219,21 +223,30 @@ still input to be read.
|
|||
.SH "COMMAND LINES"
|
||||
.rs
|
||||
.sp
|
||||
In between sets of test data, a line that begins with a hash (#) character is
|
||||
interpreted as a command line. If the first character is followed by white
|
||||
space or an exclamation mark, the line is treated as a comment, and ignored.
|
||||
Otherwise, the following commands are recognized:
|
||||
In between sets of test data, a line that begins with # is interpreted as a
|
||||
command line. If the first character is followed by white space or an
|
||||
exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
|
||||
following commands are recognized:
|
||||
.sp
|
||||
#forbid_utf
|
||||
.sp
|
||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
||||
options set, which locks out the use of UTF and Unicode property features. This
|
||||
is a trigger guard that is used in test files to ensure that UTF/Unicode tests
|
||||
are not accidentally added to files that are used when UTF support is not
|
||||
included in the library. This effect can also be obtained by the use of
|
||||
\fB#pattern\fP; the difference is that \fB#forbid_utf\fP cannot be unset, and
|
||||
the automatic options are not displayed in pattern information, to avoid
|
||||
cluttering up test output.
|
||||
is a trigger guard that is used in test files to ensure that UTF or Unicode
|
||||
property tests are not accidentally added to files that are used when Unicode
|
||||
support is not included in the library. This effect can also be obtained by the
|
||||
use of \fB#pattern\fP; the difference is that \fB#forbid_utf\fP cannot be
|
||||
unset, and the automatic options are not displayed in pattern information, to
|
||||
avoid cluttering up test output.
|
||||
.sp
|
||||
#load <filename>
|
||||
.sp
|
||||
This command is used to load a set of precompiled patterns from a file, as
|
||||
described in the section entitled "Saving and restoring compiled patterns"
|
||||
.\" HTML <a href="#saverestore">
|
||||
.\" </a>
|
||||
below.
|
||||
.\"
|
||||
.sp
|
||||
#pattern <modifier-list>
|
||||
.sp
|
||||
|
@ -249,6 +262,24 @@ lines, none of the other command lines are permitted, because they and many
|
|||
of the modifiers are specific to \fBpcre2test\fP, and should not be used in
|
||||
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
|
||||
command helps detect tests that are accidentally put in the wrong file.
|
||||
.sp
|
||||
#pop [<modifiers>]
|
||||
.sp
|
||||
This command is used to manipulate the stack of compiled patterns, as described
|
||||
in the section entitled "Saving and restoring compiled patterns"
|
||||
.\" HTML <a href="#saverestore">
|
||||
.\" </a>
|
||||
below.
|
||||
.\"
|
||||
.sp
|
||||
#save <filename>
|
||||
.sp
|
||||
This command is used to save a set of compiled patterns to a file, as described
|
||||
in the section entitled "Saving and restoring compiled patterns"
|
||||
.\" HTML <a href="#saverestore">
|
||||
.\" </a>
|
||||
below.
|
||||
.\"
|
||||
.sp
|
||||
#subject <modifier-list>
|
||||
.sp
|
||||
|
@ -387,6 +418,7 @@ can add to or override default modifiers that were set by a previous
|
|||
\fB#pattern\fP command.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="optionmodifiers"></a>
|
||||
.SS "Setting compilation options"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -426,6 +458,7 @@ notation. Otherwise, those less than 0x100 are output in hex without the curly
|
|||
brackets.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="controlmodifiers"></a>
|
||||
.SS "Setting compilation controls"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -445,8 +478,8 @@ about the pattern:
|
|||
memory show memory used
|
||||
newline=<type> set newline type
|
||||
parens_nest_limit=<n> set maximum parentheses depth
|
||||
perlcompat lock out non-Perl modifiers
|
||||
posix use the POSIX API
|
||||
push push compiled pattern onto the stack
|
||||
stackguard=<number> test the stackguard feature
|
||||
tables=[0|1|2] select internal tables
|
||||
.sp
|
||||
|
@ -683,6 +716,25 @@ These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
|||
defaults, set them in a \fB#subject\fP command.
|
||||
.
|
||||
.
|
||||
.SS "Saving a compiled pattern"
|
||||
.rs
|
||||
.sp
|
||||
When a pattern with the \fBpush\fP modifier is successfully compiled, it is
|
||||
pushed onto a stack of compiled patterns, and \fBpcre2test\fP expects the next
|
||||
line to contain a new pattern (or a command) instead of a subject line. This
|
||||
facility is used when saving compiled patterns to a file, as described in the
|
||||
section entitled "Saving and restoring compiled patterns"
|
||||
.\" HTML <a href="#saverestore">
|
||||
.\" </a>
|
||||
below.
|
||||
.\"
|
||||
The \fBpush\fP modifier is incompatible with compilation modifiers such as
|
||||
\fBglobal\fP that act at match time. Any that are specified are ignored, with a
|
||||
warning message, except for \fBreplace\fP, which causes an error. Note that,
|
||||
\fBjitverify\fP, which is allowed, does not carry through to any subsequent
|
||||
matching that uses this pattern.
|
||||
.
|
||||
.
|
||||
.SH "SUBJECT MODIFIERS"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -1253,12 +1305,83 @@ characters.
|
|||
.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="saverestore"></a>
|
||||
.SH "SAVING AND RESTORING COMPILED PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||
later, subject to a number of restrictions. JIT data cannot be saved. The host
|
||||
on which the patterns are reloaded must be running the same version of PCRE2,
|
||||
with the same code unit width, and must also have the same endianness, pointer
|
||||
width and PCRE2_SIZE type. Before compiled patterns can be saved they must be
|
||||
serialized, that is, converted to a stream of bytes. A single byte stream may
|
||||
contain any number of compiled patterns, but they must all use the same
|
||||
character tables. A single copy of the tables is included in the byte stream
|
||||
(its size is 1088 bytes).
|
||||
.P
|
||||
The functions whose names begin with \fBpcre2_serialize_\fP are used
|
||||
for serializing and de-serializing. They are described in the
|
||||
.\" HREF
|
||||
\fBpcre2serialize\fP
|
||||
.\"
|
||||
documentation. In this section we describe the features of \fBpcre2test\fP that
|
||||
can be used to test these functions.
|
||||
.P
|
||||
When a pattern with \fBpush\fP modifier is successfully compiled, it is pushed
|
||||
onto a stack of compiled patterns, and \fBpcre2test\fP expects the next line to
|
||||
contain a new pattern (or command) instead of a subject line. By this means, a
|
||||
number of patterns can be compiled and retained. The \fBpush\fP modifier is
|
||||
incompatible with \fBposix\fP, and control modifiers that act at match time are
|
||||
ignored (with a message). The \fBjitverify\fP modifier applies only at compile
|
||||
time. The command
|
||||
.sp
|
||||
#save <filename>
|
||||
.sp
|
||||
causes all the stacked patterns to be serialized and the result written to the
|
||||
named file. Afterwards, all the stacked patterns are freed. The command
|
||||
.sp
|
||||
#load <filename>
|
||||
.sp
|
||||
reads the data in the file, and then arranges for it to be de-serialized, with
|
||||
the resulting compiled patterns added to the pattern stack. The pattern on the
|
||||
top of the stack can be retrieved by the #pop command, which must be followed
|
||||
by lines of subjects that are to be matched with the pattern, terminated as
|
||||
usual by an empty line or end of file. This command may be followed by a
|
||||
modifier list containing only
|
||||
.\" HTML <a href="#controlmodifiers">
|
||||
.\" </a>
|
||||
control modifiers
|
||||
.\"
|
||||
that act after a pattern has been compiled. In particular, \fBhex\fP,
|
||||
\fBposix\fP, and \fBpush\fP are not allowed, nor are any
|
||||
.\" HTML <a href="#optionmodifiers">
|
||||
.\" </a>
|
||||
option-setting modifiers.
|
||||
.\"
|
||||
The JIT modifiers are, however permitted. Here is an example that saves and
|
||||
reloads two patterns.
|
||||
.sp
|
||||
/abc/push
|
||||
/xyz/push
|
||||
#save tempfile
|
||||
#load tempfile
|
||||
#pop info
|
||||
xyz
|
||||
.sp
|
||||
#pop jit,bincode
|
||||
abc
|
||||
.sp
|
||||
If \fBjitverify\fP is used with #pop, it does not automatically imply
|
||||
\fBjit\fP, which is different behaviour from when it is used on a pattern.
|
||||
.
|
||||
.
|
||||
.
|
||||
.SH "SEE ALSO"
|
||||
.rs
|
||||
.sp
|
||||
\fBpcre2\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
|
||||
\fBpcre2jit\fP, \fBpcre2matching\fP(3), \fBpcre2partial\fP(d),
|
||||
\fBpcre2pattern\fP(3).
|
||||
\fBpcre2pattern\fP(3), \fBpcre2serialize\fP(3).
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
|
@ -1275,6 +1398,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 02 January 2015
|
||||
Last updated: 23 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -17,10 +17,12 @@ SYNOPSIS
|
|||
options, see the pcre2api documentation.
|
||||
|
||||
The input for pcre2test is a sequence of regular expression patterns
|
||||
and subject strings to be matched. The output shows the result of each
|
||||
match attempt. Modifiers on the command line, the patterns, and the
|
||||
subject lines specify PCRE2 function options, control how the subject
|
||||
is processed, and what output is produced.
|
||||
and subject strings to be matched. There are also command lines for
|
||||
setting defaults and controlling some special actions. The output shows
|
||||
the result of each match attempt. Modifiers on external or internal
|
||||
command lines, the patterns, and the subject lines specify PCRE2 func-
|
||||
tion options, control how the subject is processed, and what output is
|
||||
produced.
|
||||
|
||||
As the original fairly simple PCRE library evolved, it acquired many
|
||||
different features, and as a result, the original pcretest program
|
||||
|
@ -173,9 +175,7 @@ DESCRIPTION
|
|||
and writes to the second. If the first name is "-", input is taken from
|
||||
the standard input. If pcre2test is given only one argument, it reads
|
||||
from that file and writes to stdout. Otherwise, it reads from stdin and
|
||||
writes to stdout. When the input is a terminal, it prompts for each
|
||||
line of input, using "re>" to prompt for regular expression patterns,
|
||||
and "data>" to prompt for subject lines.
|
||||
writes to stdout.
|
||||
|
||||
When pcre2test is built, a configuration option can specify that it
|
||||
should be linked with the libreadline or libedit library. When this is
|
||||
|
@ -186,11 +186,15 @@ DESCRIPTION
|
|||
The program handles any number of tests, each of which consists of a
|
||||
set of input lines. Each set starts with a regular expression pattern,
|
||||
followed by any number of subject lines to be matched against that pat-
|
||||
tern. In between sets of test data, command lines that begin with a
|
||||
hash (#) character may appear. This file format, with some restric-
|
||||
tions, can also be processed by the perltest.sh script that is distrib-
|
||||
uted with PCRE2 as a means of checking that the behaviour of PCRE2 and
|
||||
Perl is the same.
|
||||
tern. In between sets of test data, command lines that begin with # may
|
||||
appear. This file format, with some restrictions, can also be processed
|
||||
by the perltest.sh script that is distributed with PCRE2 as a means of
|
||||
checking that the behaviour of PCRE2 and Perl is the same.
|
||||
|
||||
When the input is a terminal, pcre2test prompts for each line of input,
|
||||
using "re>" to prompt for regular expression patterns, and "data>" to
|
||||
prompt for subject lines. Command lines starting with # can be entered
|
||||
only in response to the "re>" prompt.
|
||||
|
||||
Each subject line is matched separately and independently. If you want
|
||||
to do multi-line matches, you have to use the \n escape sequence (or \r
|
||||
|
@ -207,22 +211,28 @@ DESCRIPTION
|
|||
|
||||
COMMAND LINES
|
||||
|
||||
In between sets of test data, a line that begins with a hash (#) char-
|
||||
acter is interpreted as a command line. If the first character is fol-
|
||||
lowed by white space or an exclamation mark, the line is treated as a
|
||||
comment, and ignored. Otherwise, the following commands are recog-
|
||||
nized:
|
||||
In between sets of test data, a line that begins with # is interpreted
|
||||
as a command line. If the first character is followed by white space or
|
||||
an exclamation mark, the line is treated as a comment, and ignored.
|
||||
Otherwise, the following commands are recognized:
|
||||
|
||||
#forbid_utf
|
||||
|
||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
||||
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
||||
property features. This is a trigger guard that is used in test files
|
||||
to ensure that UTF/Unicode tests are not accidentally added to files
|
||||
that are used when UTF support is not included in the library. This
|
||||
effect can also be obtained by the use of #pattern; the difference is
|
||||
that #forbid_utf cannot be unset, and the automatic options are not
|
||||
displayed in pattern information, to avoid cluttering up test output.
|
||||
property features. This is a trigger guard that is used in test files
|
||||
to ensure that UTF or Unicode property tests are not accidentally added
|
||||
to files that are used when Unicode support is not included in the
|
||||
library. This effect can also be obtained by the use of #pattern; the
|
||||
difference is that #forbid_utf cannot be unset, and the automatic
|
||||
options are not displayed in pattern information, to avoid cluttering
|
||||
up test output.
|
||||
|
||||
#load <filename>
|
||||
|
||||
This command is used to load a set of precompiled patterns from a file,
|
||||
as described in the section entitled "Saving and restoring compiled
|
||||
patterns" below.
|
||||
|
||||
#pattern <modifier-list>
|
||||
|
||||
|
@ -240,6 +250,18 @@ COMMAND LINES
|
|||
#perltest command helps detect tests that are accidentally put in the
|
||||
wrong file.
|
||||
|
||||
#pop [<modifiers>]
|
||||
|
||||
This command is used to manipulate the stack of compiled patterns, as
|
||||
described in the section entitled "Saving and restoring compiled pat-
|
||||
terns" below.
|
||||
|
||||
#save <filename>
|
||||
|
||||
This command is used to save a set of compiled patterns to a file, as
|
||||
described in the section entitled "Saving and restoring compiled pat-
|
||||
terns" below.
|
||||
|
||||
#subject <modifier-list>
|
||||
|
||||
This command sets a default modifier list that applies to all subse-
|
||||
|
@ -432,8 +454,8 @@ PATTERN MODIFIERS
|
|||
memory show memory used
|
||||
newline=<type> set newline type
|
||||
parens_nest_limit=<n> set maximum parentheses depth
|
||||
perlcompat lock out non-Perl modifiers
|
||||
posix use the POSIX API
|
||||
push push compiled pattern onto the stack
|
||||
stackguard=<number> test the stackguard feature
|
||||
tables=[0|1|2] select internal tables
|
||||
|
||||
|
@ -644,6 +666,19 @@ PATTERN MODIFIERS
|
|||
These modifiers may not appear in a #pattern command. If you want them
|
||||
as defaults, set them in a #subject command.
|
||||
|
||||
Saving a compiled pattern
|
||||
|
||||
When a pattern with the push modifier is successfully compiled, it is
|
||||
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||
next line to contain a new pattern (or a command) instead of a subject
|
||||
line. This facility is used when saving compiled patterns to a file, as
|
||||
described in the section entitled "Saving and restoring compiled pat-
|
||||
terns" below. The push modifier is incompatible with compilation modi-
|
||||
fiers such as global that act at match time. Any that are specified are
|
||||
ignored, with a warning message, except for replace, which causes an
|
||||
error. Note that, jitverify, which is allowed, does not carry through
|
||||
to any subsequent matching that uses this pattern.
|
||||
|
||||
|
||||
SUBJECT MODIFIERS
|
||||
|
||||
|
@ -652,7 +687,7 @@ SUBJECT MODIFIERS
|
|||
|
||||
Setting match options
|
||||
|
||||
The following modifiers set options for pcre2_match() or
|
||||
The following modifiers set options for pcre2_match() or
|
||||
pcre2_dfa_match(). See pcreapi for a description of their effects.
|
||||
|
||||
anchored set PCRE2_ANCHORED
|
||||
|
@ -666,20 +701,20 @@ SUBJECT MODIFIERS
|
|||
partial_hard (or ph) set PCRE2_PARTIAL_HARD
|
||||
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
|
||||
|
||||
The partial matching modifiers are provided with abbreviations because
|
||||
The partial matching modifiers are provided with abbreviations because
|
||||
they appear frequently in tests.
|
||||
|
||||
If the /posix modifier was present on the pattern, causing the POSIX
|
||||
If the /posix modifier was present on the pattern, causing the POSIX
|
||||
wrapper API to be used, the only option-setting modifiers that have any
|
||||
effect are notbol, notempty, and noteol, causing REG_NOTBOL,
|
||||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
||||
effect are notbol, notempty, and noteol, causing REG_NOTBOL,
|
||||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
||||
Any other modifiers cause an error.
|
||||
|
||||
Setting match controls
|
||||
|
||||
The following modifiers affect the matching process or request addi-
|
||||
tional information. Some of them may also be specified on a pattern
|
||||
line (see above), in which case they apply to every subject line that
|
||||
The following modifiers affect the matching process or request addi-
|
||||
tional information. Some of them may also be specified on a pattern
|
||||
line (see above), in which case they apply to every subject line that
|
||||
is matched against that pattern.
|
||||
|
||||
aftertext show text after match
|
||||
|
@ -712,23 +747,23 @@ SUBJECT MODIFIERS
|
|||
|
||||
Showing more text
|
||||
|
||||
The aftertext modifier requests that as well as outputting the part of
|
||||
The aftertext modifier requests that as well as outputting the part of
|
||||
the subject string that matched the entire pattern, pcre2test should in
|
||||
addition output the remainder of the subject string. This is useful for
|
||||
tests where the subject contains multiple copies of the same substring.
|
||||
The allaftertext modifier requests the same action for captured sub-
|
||||
The allaftertext modifier requests the same action for captured sub-
|
||||
strings as well as the main matched substring. In each case the remain-
|
||||
der is output on the following line with a plus character following the
|
||||
capture number.
|
||||
|
||||
The allusedtext modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown.
|
||||
This feature is not supported for JIT matching, and if requested with
|
||||
JIT it is ignored (with a warning message). Setting this modifier
|
||||
The allusedtext modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown.
|
||||
This feature is not supported for JIT matching, and if requested with
|
||||
JIT it is ignored (with a warning message). Setting this modifier
|
||||
affects the output if there is a lookbehind at the start of a match, or
|
||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||
that precede or follow the start and end of the actual match are indi-
|
||||
cated in the output by '<' or '>' characters underneath them. Here is
|
||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||
that precede or follow the start and end of the actual match are indi-
|
||||
cated in the output by '<' or '>' characters underneath them. Here is
|
||||
an example:
|
||||
|
||||
re> /(?<=pqr)abc(?=xyz)/
|
||||
|
@ -736,16 +771,16 @@ SUBJECT MODIFIERS
|
|||
0: pqrabcxyz
|
||||
<<< >>>
|
||||
|
||||
This shows that the matched string is "abc", with the preceding and
|
||||
following strings "pqr" and "xyz" having been consulted during the
|
||||
This shows that the matched string is "abc", with the preceding and
|
||||
following strings "pqr" and "xyz" having been consulted during the
|
||||
match (when processing the assertions).
|
||||
|
||||
The startchar modifier requests that the starting character for the
|
||||
match be indicated, if it is different to the start of the matched
|
||||
The startchar modifier requests that the starting character for the
|
||||
match be indicated, if it is different to the start of the matched
|
||||
string. The only time when this occurs is when \K has been processed as
|
||||
part of the match. In this situation, the output for the matched string
|
||||
is displayed from the starting character instead of from the match
|
||||
point, with circumflex characters under the earlier characters. For
|
||||
is displayed from the starting character instead of from the match
|
||||
point, with circumflex characters under the earlier characters. For
|
||||
example:
|
||||
|
||||
re> /abc\Kxyz/
|
||||
|
@ -753,7 +788,7 @@ SUBJECT MODIFIERS
|
|||
0: abcxyz
|
||||
^^^
|
||||
|
||||
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
||||
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
||||
ever, these two modifiers are mutually exclusive.
|
||||
|
||||
Showing the value of all capture groups
|
||||
|
@ -761,84 +796,84 @@ SUBJECT MODIFIERS
|
|||
The allcaptures modifier requests that the values of all potential cap-
|
||||
tured parentheses be output after a match. By default, only those up to
|
||||
the highest one actually used in the match are output (corresponding to
|
||||
the return code from pcre2_match()). Groups that did not take part in
|
||||
the return code from pcre2_match()). Groups that did not take part in
|
||||
the match are output as "<unset>".
|
||||
|
||||
Testing callouts
|
||||
|
||||
A callout function is supplied when pcre2test calls the library match-
|
||||
ing functions, unless callout_none is specified. If callout_capture is
|
||||
A callout function is supplied when pcre2test calls the library match-
|
||||
ing functions, unless callout_none is specified. If callout_capture is
|
||||
set, the current captured groups are output when a callout occurs.
|
||||
|
||||
The callout_fail modifier can be given one or two numbers. If there is
|
||||
The callout_fail modifier can be given one or two numbers. If there is
|
||||
only one number, 1 is returned instead of 0 when a callout of that num-
|
||||
ber is reached. If two numbers are given, 1 is returned when callout
|
||||
ber is reached. If two numbers are given, 1 is returned when callout
|
||||
<n> is reached for the <m>th time.
|
||||
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. Any value other than zero is used as a return from pcre2test's
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. Any value other than zero is used as a return from pcre2test's
|
||||
callout function.
|
||||
|
||||
Finding all matches in a string
|
||||
|
||||
Searching for all possible matches within a subject can be requested by
|
||||
the global or /altglobal modifier. After finding a match, the matching
|
||||
function is called again to search the remainder of the subject. The
|
||||
difference between global and altglobal is that the former uses the
|
||||
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
||||
searching at a new point within the entire string (which is what Perl
|
||||
the global or /altglobal modifier. After finding a match, the matching
|
||||
function is called again to search the remainder of the subject. The
|
||||
difference between global and altglobal is that the former uses the
|
||||
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
||||
searching at a new point within the entire string (which is what Perl
|
||||
does), whereas the latter passes over a shortened subject. This makes a
|
||||
difference to the matching process if the pattern begins with a lookbe-
|
||||
hind assertion (including \b or \B).
|
||||
|
||||
If an empty string is matched, the next match is done with the
|
||||
If an empty string is matched, the next match is done with the
|
||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||
for another, non-empty, match at the same point in the subject. If this
|
||||
match fails, the start offset is advanced, and the normal match is
|
||||
retried. This imitates the way Perl handles such cases when using the
|
||||
/g modifier or the split() function. Normally, the start offset is
|
||||
advanced by one character, but if the newline convention recognizes
|
||||
CRLF as a newline, and the current character is CR followed by LF, an
|
||||
match fails, the start offset is advanced, and the normal match is
|
||||
retried. This imitates the way Perl handles such cases when using the
|
||||
/g modifier or the split() function. Normally, the start offset is
|
||||
advanced by one character, but if the newline convention recognizes
|
||||
CRLF as a newline, and the current character is CR followed by LF, an
|
||||
advance of two characters occurs.
|
||||
|
||||
Testing substring extraction functions
|
||||
|
||||
The copy and get modifiers can be used to test the pcre2_sub-
|
||||
The copy and get modifiers can be used to test the pcre2_sub-
|
||||
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
||||
given more than once, and each can specify a group name or number, for
|
||||
given more than once, and each can specify a group name or number, for
|
||||
example:
|
||||
|
||||
abcd\=copy=1,copy=3,get=G1
|
||||
|
||||
If the #subject command is used to set default copy and/or get lists,
|
||||
these can be unset by specifying a negative number to cancel all num-
|
||||
If the #subject command is used to set default copy and/or get lists,
|
||||
these can be unset by specifying a negative number to cancel all num-
|
||||
bered groups and an empty name to cancel all named groups.
|
||||
|
||||
The getall modifier tests pcre2_substring_list_get(), which extracts
|
||||
The getall modifier tests pcre2_substring_list_get(), which extracts
|
||||
all captured substrings.
|
||||
|
||||
If the subject line is successfully matched, the substrings extracted
|
||||
by the convenience functions are output with C, G, or L after the
|
||||
string number instead of a colon. This is in addition to the normal
|
||||
full list. The string length (that is, the return from the extraction
|
||||
If the subject line is successfully matched, the substrings extracted
|
||||
by the convenience functions are output with C, G, or L after the
|
||||
string number instead of a colon. This is in addition to the normal
|
||||
full list. The string length (that is, the return from the extraction
|
||||
function) is given in parentheses after each substring, followed by the
|
||||
name when the extraction was by name.
|
||||
|
||||
Testing the substitution function
|
||||
|
||||
If the replace modifier is set, the pcre2_substitute() function is
|
||||
called instead of one of the matching functions. Unlike subject
|
||||
strings, pcre2test does not process replacement strings for escape
|
||||
If the replace modifier is set, the pcre2_substitute() function is
|
||||
called instead of one of the matching functions. Unlike subject
|
||||
strings, pcre2test does not process replacement strings for escape
|
||||
sequences. In UTF mode, a replacement string is checked to see if it is
|
||||
a valid UTF-8 string. If so, it is correctly converted to a UTF string
|
||||
of the appropriate code unit width. If it is not a valid UTF-8 string,
|
||||
of the appropriate code unit width. If it is not a valid UTF-8 string,
|
||||
the individual code units are copied directly. This provides a means of
|
||||
passing an invalid UTF-8 string for testing purposes.
|
||||
|
||||
If the global modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||
If the global modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||
pcre2_substitute(). After a successful substitution, the modified
|
||||
string is output, preceded by the number of replacements. This may be
|
||||
zero if there were no matches. Here is a simple example of a substitu-
|
||||
string is output, preceded by the number of replacements. This may be
|
||||
zero if there were no matches. Here is a simple example of a substitu-
|
||||
tion test:
|
||||
|
||||
/abc/replace=xxx
|
||||
|
@ -847,11 +882,11 @@ SUBJECT MODIFIERS
|
|||
=abc=abc=\=global
|
||||
2: =xxx=xxx=
|
||||
|
||||
Subject and replacement strings should be kept relatively short for
|
||||
substitution tests, as fixed-size buffers are used. To make it easy to
|
||||
test for buffer overflow, if the replacement string starts with a num-
|
||||
ber in square brackets, that number is passed to pcre2_substitute() as
|
||||
the size of the output buffer, with the replacement string starting at
|
||||
Subject and replacement strings should be kept relatively short for
|
||||
substitution tests, as fixed-size buffers are used. To make it easy to
|
||||
test for buffer overflow, if the replacement string starts with a num-
|
||||
ber in square brackets, that number is passed to pcre2_substitute() as
|
||||
the size of the output buffer, with the replacement string starting at
|
||||
the next character. Here is an example that tests the edge case:
|
||||
|
||||
/abc/
|
||||
|
@ -861,123 +896,123 @@ SUBJECT MODIFIERS
|
|||
Failed: error -47: no more memory
|
||||
|
||||
A replacement string is ignored with POSIX and DFA matching. Specifying
|
||||
partial matching provokes an error return ("bad option value") from
|
||||
partial matching provokes an error return ("bad option value") from
|
||||
pcre2_substitute().
|
||||
|
||||
Setting the JIT stack size
|
||||
|
||||
The jitstack modifier provides a way of setting the maximum stack size
|
||||
that is used by the just-in-time optimization code. It is ignored if
|
||||
The jitstack modifier provides a way of setting the maximum stack size
|
||||
that is used by the just-in-time optimization code. It is ignored if
|
||||
JIT optimization is not being used. The value is a number of kilobytes.
|
||||
Providing a stack that is larger than the default 32K is necessary only
|
||||
for very complicated patterns.
|
||||
|
||||
Setting match and recursion limits
|
||||
|
||||
The match_limit and recursion_limit modifiers set the appropriate lim-
|
||||
The match_limit and recursion_limit modifiers set the appropriate lim-
|
||||
its in the match context. These values are ignored when the find_limits
|
||||
modifier is specified.
|
||||
|
||||
Finding minimum limits
|
||||
|
||||
If the find_limits modifier is present, pcre2test calls pcre2_match()
|
||||
several times, setting different values in the match context via
|
||||
pcre2_set_match_limit() and pcre2_set_recursion_limit() until it finds
|
||||
the minimum values for each parameter that allow pcre2_match() to com-
|
||||
If the find_limits modifier is present, pcre2test calls pcre2_match()
|
||||
several times, setting different values in the match context via
|
||||
pcre2_set_match_limit() and pcre2_set_recursion_limit() until it finds
|
||||
the minimum values for each parameter that allow pcre2_match() to com-
|
||||
plete without error.
|
||||
|
||||
If JIT is being used, only the match limit is relevant. If DFA matching
|
||||
is being used, neither limit is relevant, and this modifier is ignored
|
||||
is being used, neither limit is relevant, and this modifier is ignored
|
||||
(with a warning message).
|
||||
|
||||
The match_limit number is a measure of the amount of backtracking that
|
||||
takes place, and learning the minimum value can be instructive. For
|
||||
most simple matches, the number is quite small, but for patterns with
|
||||
very large numbers of matching possibilities, it can become large very
|
||||
quickly with increasing length of subject string. The
|
||||
match_limit_recursion number is a measure of how much stack (or, if
|
||||
PCRE2 is compiled with NO_RECURSE, how much heap) memory is needed to
|
||||
The match_limit number is a measure of the amount of backtracking that
|
||||
takes place, and learning the minimum value can be instructive. For
|
||||
most simple matches, the number is quite small, but for patterns with
|
||||
very large numbers of matching possibilities, it can become large very
|
||||
quickly with increasing length of subject string. The
|
||||
match_limit_recursion number is a measure of how much stack (or, if
|
||||
PCRE2 is compiled with NO_RECURSE, how much heap) memory is needed to
|
||||
complete the match attempt.
|
||||
|
||||
Showing MARK names
|
||||
|
||||
|
||||
The mark modifier causes the names from backtracking control verbs that
|
||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||
it is added to the non-match message.
|
||||
|
||||
Showing memory usage
|
||||
|
||||
The memory modifier causes pcre2test to log all memory allocation and
|
||||
The memory modifier causes pcre2test to log all memory allocation and
|
||||
freeing calls that occur during a match operation.
|
||||
|
||||
Setting a starting offset
|
||||
|
||||
The offset modifier sets an offset in the subject string at which
|
||||
The offset modifier sets an offset in the subject string at which
|
||||
matching starts. Its value is a number of code units, not characters.
|
||||
|
||||
Setting the size of the output vector
|
||||
|
||||
The ovector modifier applies only to the subject line in which it
|
||||
appears, though of course it can also be used to set a default in a
|
||||
#subject command. It specifies the number of pairs of offsets that are
|
||||
The ovector modifier applies only to the subject line in which it
|
||||
appears, though of course it can also be used to set a default in a
|
||||
#subject command. It specifies the number of pairs of offsets that are
|
||||
available for storing matching information. The default is 15.
|
||||
|
||||
A value of zero is useful when testing the POSIX API because it causes
|
||||
A value of zero is useful when testing the POSIX API because it causes
|
||||
regexec() to be called with a NULL capture vector. When not testing the
|
||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||
ate_from_pattern() to be called, in order to create a match block of
|
||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||
ate_from_pattern() to be called, in order to create a match block of
|
||||
exactly the right size for the pattern. (It is not possible to create a
|
||||
match block with a zero-length ovector; there is always at least one
|
||||
match block with a zero-length ovector; there is always at least one
|
||||
pair of offsets.)
|
||||
|
||||
Passing the subject as zero-terminated
|
||||
|
||||
By default, the subject string is passed to a native API matching func-
|
||||
tion with its correct length. In order to test the facility for passing
|
||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
||||
via the POSIX interface, this modifier has no effect, as there is no
|
||||
via the POSIX interface, this modifier has no effect, as there is no
|
||||
facility for passing a length.)
|
||||
|
||||
When testing pcre2_substitute(), this modifier also has the effect of
|
||||
When testing pcre2_substitute(), this modifier also has the effect of
|
||||
passing the replacement string as zero-terminated.
|
||||
|
||||
|
||||
THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
By default, pcre2test uses the standard PCRE2 matching function,
|
||||
By default, pcre2test uses the standard PCRE2 matching function,
|
||||
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||
ferent way, and has some restrictions. The differences between the two
|
||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||
ferent way, and has some restrictions. The differences between the two
|
||||
functions are described in the pcre2matching documentation.
|
||||
|
||||
If the dfa modifier is set, the alternative matching function is used.
|
||||
This function finds all possible matches at a given point in the sub-
|
||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||
after the first match is found. This is always the shortest possible
|
||||
If the dfa modifier is set, the alternative matching function is used.
|
||||
This function finds all possible matches at a given point in the sub-
|
||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||
after the first match is found. This is always the shortest possible
|
||||
match.
|
||||
|
||||
|
||||
DEFAULT OUTPUT FROM pcre2test
|
||||
|
||||
This section describes the output when the normal matching function,
|
||||
This section describes the output when the normal matching function,
|
||||
pcre2_match(), is being used.
|
||||
|
||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||
strings, starting with number 0 for the string that matched the whole
|
||||
pattern. Otherwise, it outputs "No match" when the return is
|
||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||
this is the entire substring that was inspected during the partial
|
||||
match; it may include characters before the actual match start if a
|
||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||
strings, starting with number 0 for the string that matched the whole
|
||||
pattern. Otherwise, it outputs "No match" when the return is
|
||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||
this is the entire substring that was inspected during the partial
|
||||
match; it may include characters before the actual match start if a
|
||||
lookbehind assertion, \K, \b, or \B was involved.)
|
||||
|
||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||
and a short descriptive phrase. If the error is a failed UTF string
|
||||
check, the code unit offset of the start of the failing character is
|
||||
and a short descriptive phrase. If the error is a failed UTF string
|
||||
check, the code unit offset of the start of the failing character is
|
||||
also output. Here is an example of an interactive pcre2test run.
|
||||
|
||||
$ pcre2test
|
||||
|
@ -993,8 +1028,8 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
Unset capturing substrings that are not followed by one that is set are
|
||||
not shown by pcre2test unless the allcaptures modifier is specified. In
|
||||
the following example, there are two capturing substrings, but when the
|
||||
first data line is matched, the second, unset substring is not shown.
|
||||
An "internal" unset substring is shown as "<unset>", as for the second
|
||||
first data line is matched, the second, unset substring is not shown.
|
||||
An "internal" unset substring is shown as "<unset>", as for the second
|
||||
data line.
|
||||
|
||||
re> /(a)|(b)/
|
||||
|
@ -1006,11 +1041,11 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
1: <unset>
|
||||
2: b
|
||||
|
||||
If the strings contain any non-printing characters, they are output as
|
||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||
If the strings contain any non-printing characters, they are output as
|
||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
||||
nition of non-printing characters. If the /aftertext modifier is set,
|
||||
the output for substring 0 is followed by the the rest of the subject
|
||||
nition of non-printing characters. If the /aftertext modifier is set,
|
||||
the output for substring 0 is followed by the the rest of the subject
|
||||
string, identified by "0+" like this:
|
||||
|
||||
re> /cat/aftertext
|
||||
|
@ -1018,7 +1053,7 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
0: cat
|
||||
0+ aract
|
||||
|
||||
If global matching is requested, the results of successive matching
|
||||
If global matching is requested, the results of successive matching
|
||||
attempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
|
@ -1030,8 +1065,8 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails. Here is an
|
||||
example of a failure message (the offset 4 that is specified by the
|
||||
"No match" is output only if the first match attempt fails. Here is an
|
||||
example of a failure message (the offset 4 that is specified by the
|
||||
offset modifier is past the end of the subject string):
|
||||
|
||||
re> /xyz/
|
||||
|
@ -1039,7 +1074,7 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
Error -24 (bad offset value)
|
||||
|
||||
Note that whereas patterns can be continued over several lines (a plain
|
||||
">" prompt is used for continuations), subject lines may not. However
|
||||
">" prompt is used for continuations), subject lines may not. However
|
||||
newlines can be included in a subject by means of the \n escape (or \r,
|
||||
\r\n, etc., depending on the newline sequence setting).
|
||||
|
||||
|
@ -1047,7 +1082,7 @@ DEFAULT OUTPUT FROM pcre2test
|
|||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
When the alternative matching function, pcre2_dfa_match(), is used, the
|
||||
output consists of a list of all the matches that start at the first
|
||||
output consists of a list of all the matches that start at the first
|
||||
point in the subject where there is at least one match. For example:
|
||||
|
||||
re> /(tang|tangerine|tan)/
|
||||
|
@ -1056,11 +1091,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
|||
1: tang
|
||||
2: tan
|
||||
|
||||
Using the normal matching function on this data finds only "tang". The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||
followed by the partially matching substring. Note that this is the
|
||||
entire substring that was inspected during the partial match; it may
|
||||
Using the normal matching function on this data finds only "tang". The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||
followed by the partially matching substring. Note that this is the
|
||||
entire substring that was inspected during the partial match; it may
|
||||
include characters before the actual match start if a lookbehind asser-
|
||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
||||
|
||||
|
@ -1076,16 +1111,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
|||
1: tan
|
||||
0: tan
|
||||
|
||||
The alternative matching function does not support substring capture,
|
||||
so the modifiers that are concerned with captured substrings are not
|
||||
The alternative matching function does not support substring capture,
|
||||
so the modifiers that are concerned with captured substrings are not
|
||||
relevant.
|
||||
|
||||
|
||||
RESTARTING AFTER A PARTIAL MATCH
|
||||
|
||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||
TIAL return, indicating that the subject partially matched the pattern,
|
||||
you can restart the match with additional subject data by means of the
|
||||
you can restart the match with additional subject data by means of the
|
||||
dfa_restart modifier. For example:
|
||||
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
|
@ -1094,29 +1129,29 @@ RESTARTING AFTER A PARTIAL MATCH
|
|||
data> n05\=dfa,dfa_restart
|
||||
0: n05
|
||||
|
||||
For further information about partial matching, see the pcre2partial
|
||||
For further information about partial matching, see the pcre2partial
|
||||
documentation.
|
||||
|
||||
|
||||
CALLOUTS
|
||||
|
||||
If the pattern contains any callout requests, pcre2test's callout func-
|
||||
tion is called during matching. This works with both matching func-
|
||||
tion is called during matching. This works with both matching func-
|
||||
tions. By default, the called function displays the callout number, the
|
||||
start and current positions in the text at the callout time, and the
|
||||
start and current positions in the text at the callout time, and the
|
||||
next pattern item to be tested. For example:
|
||||
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
positions are the same.
|
||||
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||
a result of the /auto_callout pattern modifier. In this case, instead
|
||||
a result of the /auto_callout pattern modifier. In this case, instead
|
||||
of showing the callout number, the offset in the pattern, preceded by a
|
||||
plus, is output. For example:
|
||||
|
||||
|
@ -1130,7 +1165,7 @@ CALLOUTS
|
|||
0: E*
|
||||
|
||||
If a pattern contains (*MARK) items, an additional line is output when-
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
example:
|
||||
|
||||
re> /a(*MARK:X)bc/auto_callout
|
||||
|
@ -1144,37 +1179,96 @@ CALLOUTS
|
|||
+12 ^ ^
|
||||
0: abc
|
||||
|
||||
The mark changes between matching "a" and "b", but stays the same for
|
||||
the rest of the match, so nothing more is output. If, as a result of
|
||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||
The mark changes between matching "a" and "b", but stays the same for
|
||||
the rest of the match, so nothing more is output. If, as a result of
|
||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||
output.
|
||||
|
||||
The callout function in pcre2test returns zero (carry on matching) by
|
||||
default, but you can use a callout_fail modifier in a subject line (as
|
||||
The callout function in pcre2test returns zero (carry on matching) by
|
||||
default, but you can use a callout_fail modifier in a subject line (as
|
||||
described above) to change this and other parameters of the callout.
|
||||
|
||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
cated regular expressions. For further information about callouts, see
|
||||
the pcre2callout documentation.
|
||||
|
||||
|
||||
NON-PRINTING CHARACTERS
|
||||
|
||||
When pcre2test is outputting text in the compiled version of a pattern,
|
||||
bytes other than 32-126 are always treated as non-printing characters
|
||||
bytes other than 32-126 are always treated as non-printing characters
|
||||
and are therefore shown as hex escapes.
|
||||
|
||||
When pcre2test is outputting text that is a matched part of a subject
|
||||
string, it behaves in the same way, unless a different locale has been
|
||||
set for the pattern (using the /locale modifier). In this case, the
|
||||
isprint() function is used to distinguish printing and non-printing
|
||||
When pcre2test is outputting text that is a matched part of a subject
|
||||
string, it behaves in the same way, unless a different locale has been
|
||||
set for the pattern (using the /locale modifier). In this case, the
|
||||
isprint() function is used to distinguish printing and non-printing
|
||||
characters.
|
||||
|
||||
|
||||
SAVING AND RESTORING COMPILED PATTERNS
|
||||
|
||||
It is possible to save compiled patterns on disc or elsewhere, and
|
||||
reload them later, subject to a number of restrictions. JIT data cannot
|
||||
be saved. The host on which the patterns are reloaded must be running
|
||||
the same version of PCRE2, with the same code unit width, and must also
|
||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||
compiled patterns can be saved they must be serialized, that is, con-
|
||||
verted to a stream of bytes. A single byte stream may contain any num-
|
||||
ber of compiled patterns, but they must all use the same character
|
||||
tables. A single copy of the tables is included in the byte stream (its
|
||||
size is 1088 bytes).
|
||||
|
||||
The functions whose names begin with pcre2_serialize_ are used for
|
||||
serializing and de-serializing. They are described in the pcre2serial-
|
||||
ize documentation. In this section we describe the features of
|
||||
pcre2test that can be used to test these functions.
|
||||
|
||||
When a pattern with push modifier is successfully compiled, it is
|
||||
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||
next line to contain a new pattern (or command) instead of a subject
|
||||
line. By this means, a number of patterns can be compiled and retained.
|
||||
The push modifier is incompatible with posix, and control modifiers
|
||||
that act at match time are ignored (with a message). The jitverify mod-
|
||||
ifier applies only at compile time. The command
|
||||
|
||||
#save <filename>
|
||||
|
||||
causes all the stacked patterns to be serialized and the result written
|
||||
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||
command
|
||||
|
||||
#load <filename>
|
||||
|
||||
reads the data in the file, and then arranges for it to be de-serial-
|
||||
ized, with the resulting compiled patterns added to the pattern stack.
|
||||
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||
mand, which must be followed by lines of subjects that are to be
|
||||
matched with the pattern, terminated as usual by an empty line or end
|
||||
of file. This command may be followed by a modifier list containing
|
||||
only control modifiers that act after a pattern has been compiled. In
|
||||
particular, hex, posix, and push are not allowed, nor are any option-
|
||||
setting modifiers. The JIT modifiers are, however permitted. Here is
|
||||
an example that saves and reloads two patterns.
|
||||
|
||||
/abc/push
|
||||
/xyz/push
|
||||
#save tempfile
|
||||
#load tempfile
|
||||
#pop info
|
||||
xyz
|
||||
|
||||
#pop jit,bincode
|
||||
abc
|
||||
|
||||
If jitverify is used with #pop, it does not automatically imply jit,
|
||||
which is different behaviour from when it is used on a pattern.
|
||||
|
||||
|
||||
SEE ALSO
|
||||
|
||||
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3),
|
||||
pcre2partial(d), pcre2pattern(3).
|
||||
pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
@ -1186,5 +1280,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 02 January 2015
|
||||
Last updated: 23 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
|
|
|
@ -293,7 +293,7 @@ if [ $usevalgrind -ne 0 ]; then
|
|||
|
||||
for opts in \
|
||||
"--disable-stack-for-recursion --disable-shared" \
|
||||
"--with-link-size=3 --disable-shared" \
|
||||
"--with-link-size=3 --enable-pcre2-16 --enable-pcre2-32 --disable-shared" \
|
||||
"--disable-unicode --disable-shared"
|
||||
do
|
||||
opts="--enable-valgrind $opts"
|
||||
|
|
|
@ -200,7 +200,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#define PACKAGE_NAME "PCRE2"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE2 10.00"
|
||||
#define PACKAGE_STRING "PCRE2 10.10-RC1"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre2"
|
||||
|
@ -209,7 +209,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
#define PACKAGE_URL ""
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "10.00"
|
||||
#define PACKAGE_VERSION "10.10-RC1"
|
||||
|
||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||
|
@ -287,7 +287,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
/* #undef SUPPORT_VALGRIND */
|
||||
|
||||
/* Version number of package */
|
||||
#define VERSION "10.00"
|
||||
#define VERSION "10.10-RC1"
|
||||
|
||||
/* Define to empty if `const' does not conform to ANSI C. */
|
||||
/* #undef const */
|
||||
|
|
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
|||
/* The current PCRE version information. */
|
||||
|
||||
#define PCRE2_MAJOR 10
|
||||
#define PCRE2_MINOR 00
|
||||
#define PCRE2_PRERELEASE
|
||||
#define PCRE2_DATE 2014-01-05
|
||||
#define PCRE2_MINOR 10
|
||||
#define PCRE2_PRERELEASE -RC1
|
||||
#define PCRE2_DATE 2014-01-13
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE2, the appropriate
|
||||
|
@ -455,6 +455,18 @@ PCRE2_EXP_DECL void pcre2_substring_list_free(PCRE2_SPTR *); \
|
|||
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
||||
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
||||
|
||||
/* Functions for serializing / deserializing compiled patterns. */
|
||||
|
||||
#define PCRE2_SERIALIZE_FUNCTIONS \
|
||||
PCRE2_EXP_DECL int pcre2_serialize_encode(const pcre2_code **, \
|
||||
PCRE2_SIZE, uint8_t **, PCRE2_SIZE *, \
|
||||
pcre2_general_context *); \
|
||||
PCRE2_EXP_DECL int pcre2_serialize_decode(pcre2_code **, PCRE2_SIZE, \
|
||||
const uint8_t *, pcre2_general_context *); \
|
||||
PCRE2_EXP_DECL int pcre2_serialize_get_number_of_codes(const uint8_t *, \
|
||||
PCRE2_SIZE *); \
|
||||
PCRE2_EXP_DECL void pcre2_serialize_free(uint8_t *);
|
||||
|
||||
|
||||
/* Convenience function for match + substitute. */
|
||||
|
||||
|
@ -560,6 +572,10 @@ pcre2_compile are called by application code. */
|
|||
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
||||
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
||||
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
||||
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
|
||||
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
|
||||
#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_)
|
||||
#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_)
|
||||
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
||||
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
||||
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
||||
|
@ -596,8 +612,9 @@ PCRE2_MATCH_CONTEXT_FUNCTIONS \
|
|||
PCRE2_COMPILE_FUNCTIONS \
|
||||
PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||
PCRE2_MATCH_FUNCTIONS \
|
||||
PCRE2_SUBSTITUTE_FUNCTION \
|
||||
PCRE2_SUBSTRING_FUNCTIONS \
|
||||
PCRE2_SERIALIZE_FUNCTIONS \
|
||||
PCRE2_SUBSTITUTE_FUNCTION \
|
||||
PCRE2_JIT_FUNCTIONS \
|
||||
PCRE2_OTHER_FUNCTIONS
|
||||
|
||||
|
@ -625,6 +642,8 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
|||
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
||||
#undef PCRE2_MATCH_FUNCTIONS
|
||||
#undef PCRE2_SUBSTRING_FUNCTIONS
|
||||
#undef PCRE2_SERIALIZE_FUNCTIONS
|
||||
#undef PCRE2_SUBSTITUTE_FUNCTION
|
||||
#undef PCRE2_JIT_FUNCTIONS
|
||||
#undef PCRE2_OTHER_FUNCTIONS
|
||||
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||
|
|
|
@ -198,11 +198,13 @@ greater than zero. */
|
|||
#define PCRE2_ERROR_UTF32_ERR1 (-27)
|
||||
#define PCRE2_ERROR_UTF32_ERR2 (-28)
|
||||
|
||||
/* Error codes for pcre2[_dfa]_match(), substring extraction functions, and
|
||||
context functions. */
|
||||
/* Error codes for pcre2[_dfa]_match(), substring extraction functions, context
|
||||
functions, and serializing functions. They are in numerical order. Originally
|
||||
they were in alphabetical order too, but now that PCRE2 is released, the
|
||||
numbers must not be changed. */
|
||||
|
||||
#define PCRE2_ERROR_BADDATA (-29)
|
||||
#define PCRE2_ERROR_BADLENGTH (-30)
|
||||
#define PCRE2_ERROR_MIXEDTABLES (-30) /* Name was changed */
|
||||
#define PCRE2_ERROR_BADMAGIC (-31)
|
||||
#define PCRE2_ERROR_BADMODE (-32)
|
||||
#define PCRE2_ERROR_BADOFFSET (-33)
|
||||
|
@ -455,6 +457,17 @@ PCRE2_EXP_DECL void pcre2_substring_list_free(PCRE2_SPTR *); \
|
|||
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
||||
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
||||
|
||||
/* Functions for serializing / deserializing compiled patterns. */
|
||||
|
||||
#define PCRE2_SERIALIZE_FUNCTIONS \
|
||||
PCRE2_EXP_DECL int32_t pcre2_serialize_encode(const pcre2_code **, \
|
||||
int32_t, uint8_t **, PCRE2_SIZE *, \
|
||||
pcre2_general_context *); \
|
||||
PCRE2_EXP_DECL int32_t pcre2_serialize_decode(pcre2_code **, int32_t, \
|
||||
const uint8_t *, pcre2_general_context *); \
|
||||
PCRE2_EXP_DECL int32_t pcre2_serialize_get_number_of_codes(const uint8_t *); \
|
||||
PCRE2_EXP_DECL void pcre2_serialize_free(uint8_t *);
|
||||
|
||||
|
||||
/* Convenience function for match + substitute. */
|
||||
|
||||
|
@ -560,6 +573,10 @@ pcre2_compile are called by application code. */
|
|||
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
||||
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
||||
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
||||
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
|
||||
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
|
||||
#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_)
|
||||
#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_)
|
||||
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
||||
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
||||
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
||||
|
@ -596,8 +613,9 @@ PCRE2_MATCH_CONTEXT_FUNCTIONS \
|
|||
PCRE2_COMPILE_FUNCTIONS \
|
||||
PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||
PCRE2_MATCH_FUNCTIONS \
|
||||
PCRE2_SUBSTITUTE_FUNCTION \
|
||||
PCRE2_SUBSTRING_FUNCTIONS \
|
||||
PCRE2_SERIALIZE_FUNCTIONS \
|
||||
PCRE2_SUBSTITUTE_FUNCTION \
|
||||
PCRE2_JIT_FUNCTIONS \
|
||||
PCRE2_OTHER_FUNCTIONS
|
||||
|
||||
|
@ -625,6 +643,8 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
|||
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
||||
#undef PCRE2_MATCH_FUNCTIONS
|
||||
#undef PCRE2_SUBSTRING_FUNCTIONS
|
||||
#undef PCRE2_SERIALIZE_FUNCTIONS
|
||||
#undef PCRE2_SUBSTITUTE_FUNCTION
|
||||
#undef PCRE2_JIT_FUNCTIONS
|
||||
#undef PCRE2_OTHER_FUNCTIONS
|
||||
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||
|
|
|
@ -683,10 +683,28 @@ static const uint8_t opcode_possessify[] = {
|
|||
PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
|
||||
pcre2_code_free(pcre2_code *code)
|
||||
{
|
||||
PCRE2_SIZE* ref_count;
|
||||
|
||||
if (code != NULL)
|
||||
{
|
||||
if (code->executable_jit != NULL)
|
||||
PRIV(jit_free)(code->executable_jit, &code->memctl);
|
||||
|
||||
if ((code->flags & PCRE2_DEREF_TABLES) != 0)
|
||||
{
|
||||
/* Decoded tables belong to the codes after deserialization, and they must
|
||||
be freed when there are no more reference to them. The *ref_count should
|
||||
always be > 0. */
|
||||
|
||||
ref_count = (PCRE2_SIZE *)(code->tables + tables_length);
|
||||
if (*ref_count > 0)
|
||||
{
|
||||
(*ref_count)--;
|
||||
if (*ref_count == 0)
|
||||
code->memctl.free((void *)code->tables, code->memctl.memory_data);
|
||||
}
|
||||
}
|
||||
|
||||
code->memctl.free(code, code->memctl.memory_data);
|
||||
}
|
||||
}
|
||||
|
@ -7317,8 +7335,14 @@ for (i = 0; i < cb->names_found; i++)
|
|||
|
||||
PUT2(slot, 0, groupno);
|
||||
memcpy(slot + IMM2_SIZE, name, CU2BYTES(length));
|
||||
slot[IMM2_SIZE + length] = 0;
|
||||
cb->names_found++;
|
||||
|
||||
/* Add a terminating zero and fill the rest of the slot with zeroes so that
|
||||
the memory is all initialized. Otherwise valgrind moans about uninitialized
|
||||
memory when saving serialized compiled patterns. */
|
||||
|
||||
memset(slot + IMM2_SIZE + length, 0,
|
||||
CU2BYTES(cb->name_entry_size - length - IMM2_SIZE));
|
||||
}
|
||||
|
||||
|
||||
|
@ -7356,6 +7380,7 @@ PCRE2_SPTR codestart; /* Start of compiled code */
|
|||
PCRE2_SPTR ptr; /* Current pointer in pattern */
|
||||
|
||||
size_t length = 1; /* Allow or final END opcode */
|
||||
size_t usedlength; /* Actual length used */
|
||||
size_t re_blocksize; /* Size of memory block */
|
||||
|
||||
int32_t firstcuflags, reqcuflags; /* Type of first/req code unit */
|
||||
|
@ -7754,13 +7779,16 @@ overflow. */
|
|||
|
||||
if (errorcode == 0 && ptr < cb.end_pattern) errorcode = ERR22;
|
||||
*code++ = OP_END;
|
||||
if ((size_t)(code - codestart) > length) errorcode = ERR23;
|
||||
usedlength = code - codestart;
|
||||
if (usedlength > length) errorcode = ERR23;
|
||||
|
||||
/* If the estimated length exceeds the really used length, adjust the value of
|
||||
re->blocksize, and if valgrind support is configured, mark the extra allocated
|
||||
memory as unaddressable, so that any out-of-bound reads can be detected. */
|
||||
|
||||
re->blocksize -= CU2BYTES(length - usedlength);
|
||||
#ifdef SUPPORT_VALGRIND
|
||||
/* If the estimated length exceeds the really used length, mark the extra
|
||||
allocated memory as unaddressable, so that any out-of-bound reads can be
|
||||
detected. */
|
||||
VALGRIND_MAKE_MEM_NOACCESS(code, (length - (code - codestart)) * sizeof(PCRE2_UCHAR));
|
||||
VALGRIND_MAKE_MEM_NOACCESS(code, CU2BYTES(length - usedlength));
|
||||
#endif
|
||||
|
||||
/* Fill in any forward references that are required. There may be repeated
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2014 University of Cambridge
|
||||
New API code Copyright (c) 2015 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -200,7 +200,7 @@ static const char match_error_texts[] =
|
|||
"UTF-32 error: code points greater than 0x10ffff are not defined\0"
|
||||
"bad data value\0"
|
||||
/* 30 */
|
||||
"bad length\0"
|
||||
"patterns do not all use the same character tables\0"
|
||||
"magic number missing\0"
|
||||
"pattern compiled in wrong mode: 8/16/32-bit error\0"
|
||||
"bad offset value\0"
|
||||
|
|
|
@ -523,6 +523,7 @@ bytes in a code unit in that mode. */
|
|||
#define PCRE2_NL_SET 0x00008000 /* newline was set in the pattern */
|
||||
#define PCRE2_NOTEMPTY_SET 0x00010000 /* (*NOTEMPTY) used ) keep */
|
||||
#define PCRE2_NE_ATST_SET 0x00020000 /* (*NOTEMPTY_ATSTART) used) together */
|
||||
#define PCRE2_DEREF_TABLES 0x00040000 /* Release character tables. */
|
||||
|
||||
#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)
|
||||
|
||||
|
@ -1763,6 +1764,15 @@ typedef struct {
|
|||
#define UCD_CASESET(ch) GET_UCD(ch)->caseset
|
||||
#define UCD_OTHERCASE(ch) ((uint32_t)((int)ch + (int)(GET_UCD(ch)->other_case)))
|
||||
|
||||
/* Header for serialized pcre2 codes. */
|
||||
|
||||
typedef struct pcre2_serialized_data {
|
||||
uint32_t magic;
|
||||
uint32_t version;
|
||||
uint32_t config;
|
||||
int32_t number_of_codes;
|
||||
} pcre2_serialized_data;
|
||||
|
||||
|
||||
|
||||
/* ----------------- Items that need PCRE2_CODE_UNIT_WIDTH ----------------- */
|
||||
|
|
|
@ -0,0 +1,251 @@
|
|||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/* PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2015 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the University of Cambridge nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/* This module contains functions for serializing and deserializing
|
||||
a sequence of compiled codes. */
|
||||
|
||||
|
||||
#ifdef HAVE_CONFIG_H
|
||||
#include "config.h"
|
||||
#endif
|
||||
|
||||
|
||||
#include "pcre2_internal.h"
|
||||
|
||||
/* Magic number to provide a small check against being handed junk. */
|
||||
|
||||
#define SERIALIZED_DATA_MAGIC 0x50523253u
|
||||
|
||||
/* Deserialization is limited to the current PCRE version and
|
||||
character width. */
|
||||
|
||||
#define SERIALIZED_DATA_VERSION \
|
||||
((PCRE2_MAJOR) | ((PCRE2_MINOR) << 16))
|
||||
|
||||
#define SERIALIZED_DATA_CONFIG \
|
||||
(sizeof(PCRE2_UCHAR) | ((sizeof(void*)) << 8) | ((sizeof(PCRE2_SIZE)) << 16))
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Serialize compiled patterns *
|
||||
*************************************************/
|
||||
|
||||
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||
pcre2_serialize_encode(const pcre2_code **codes, int32_t number_of_codes,
|
||||
uint8_t **serialized_bytes, PCRE2_SIZE *serialized_size,
|
||||
pcre2_general_context *gcontext)
|
||||
{
|
||||
uint8_t *bytes;
|
||||
uint8_t *dst_bytes;
|
||||
int32_t i;
|
||||
PCRE2_SIZE total_size;
|
||||
const pcre2_real_code *re;
|
||||
const uint8_t *tables;
|
||||
pcre2_serialized_data *data;
|
||||
|
||||
const pcre2_memctl *memctl = (gcontext != NULL) ?
|
||||
&gcontext->memctl : &PRIV(default_compile_context).memctl;
|
||||
|
||||
if (codes == NULL || serialized_bytes == NULL || serialized_size == NULL)
|
||||
return PCRE2_ERROR_NULL;
|
||||
|
||||
if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA;
|
||||
|
||||
/* Compute total size. */
|
||||
total_size = sizeof(pcre2_serialized_data) + tables_length;
|
||||
tables = NULL;
|
||||
|
||||
for (i = 0; i < number_of_codes; i++)
|
||||
{
|
||||
if (codes[i] == NULL) return PCRE2_ERROR_NULL;
|
||||
re = (const pcre2_real_code *)(codes[i]);
|
||||
if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC;
|
||||
if (tables == NULL)
|
||||
tables = re->tables;
|
||||
else if (tables != re->tables)
|
||||
return PCRE2_ERROR_MIXEDTABLES;
|
||||
total_size += re->blocksize;
|
||||
}
|
||||
|
||||
/* Initialize the byte stream. */
|
||||
bytes = memctl->malloc(total_size + sizeof(pcre2_memctl), memctl->memory_data);
|
||||
if (bytes == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
|
||||
/* The controller is stored as a hidden parameter. */
|
||||
memcpy(bytes, memctl, sizeof(pcre2_memctl));
|
||||
bytes += sizeof(pcre2_memctl);
|
||||
|
||||
data = (pcre2_serialized_data *)bytes;
|
||||
data->magic = SERIALIZED_DATA_MAGIC;
|
||||
data->version = SERIALIZED_DATA_VERSION;
|
||||
data->config = SERIALIZED_DATA_CONFIG;
|
||||
data->number_of_codes = number_of_codes;
|
||||
|
||||
/* Copy all compiled code data. */
|
||||
dst_bytes = bytes + sizeof(pcre2_serialized_data);
|
||||
memcpy(dst_bytes, tables, tables_length);
|
||||
dst_bytes += tables_length;
|
||||
|
||||
for (i = 0; i < number_of_codes; i++)
|
||||
{
|
||||
re = (const pcre2_real_code *)(codes[i]);
|
||||
memcpy(dst_bytes, (char *)re, re->blocksize);
|
||||
dst_bytes += re->blocksize;
|
||||
}
|
||||
|
||||
*serialized_bytes = bytes;
|
||||
*serialized_size = total_size;
|
||||
return number_of_codes;
|
||||
}
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Deserialize compiled patterns *
|
||||
*************************************************/
|
||||
|
||||
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||
pcre2_serialize_decode(pcre2_code **codes, int32_t number_of_codes,
|
||||
const uint8_t *bytes, pcre2_general_context *gcontext)
|
||||
{
|
||||
const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes;
|
||||
const pcre2_memctl *memctl = (gcontext != NULL) ?
|
||||
&gcontext->memctl : &PRIV(default_compile_context).memctl;
|
||||
|
||||
const uint8_t *src_bytes;
|
||||
pcre2_real_code *src_re;
|
||||
pcre2_real_code *dst_re;
|
||||
uint8_t *tables;
|
||||
int32_t i, j;
|
||||
|
||||
/* Sanity checks. */
|
||||
|
||||
if (data == NULL || codes == NULL) return PCRE2_ERROR_NULL;
|
||||
if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA;
|
||||
if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC;
|
||||
if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE;
|
||||
if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE;
|
||||
|
||||
if (number_of_codes > data->number_of_codes)
|
||||
number_of_codes = data->number_of_codes;
|
||||
|
||||
src_bytes = bytes + sizeof(pcre2_serialized_data);
|
||||
|
||||
/* Decode tables. The reference count for the tables is stored immediately
|
||||
following them. */
|
||||
|
||||
tables = memctl->malloc(tables_length + sizeof(PCRE2_SIZE), memctl->memory_data);
|
||||
if (tables == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
|
||||
memcpy(tables, src_bytes, tables_length);
|
||||
*(PCRE2_SIZE *)(tables + tables_length) = number_of_codes;
|
||||
src_bytes += tables_length;
|
||||
|
||||
/* Decode byte stream. */
|
||||
|
||||
for (i = 0; i < number_of_codes; i++)
|
||||
{
|
||||
src_re = (pcre2_real_code *)src_bytes;
|
||||
|
||||
/* The allocator provided by gcontext replaces the original one. */
|
||||
dst_re = (pcre2_real_code *)PRIV(memctl_malloc)
|
||||
(src_re->blocksize, (pcre2_memctl *)gcontext);
|
||||
if (dst_re == NULL)
|
||||
{
|
||||
memctl->free(tables, memctl->memory_data);
|
||||
for (j = 0; j < i; j++)
|
||||
{
|
||||
memctl->free(codes[j], memctl->memory_data);
|
||||
codes[j] = NULL;
|
||||
}
|
||||
return PCRE2_ERROR_NOMEMORY;
|
||||
}
|
||||
|
||||
/* The new allocator must be preserved. */
|
||||
memcpy(((uint8_t *)dst_re) + sizeof(pcre2_memctl),
|
||||
src_bytes + sizeof(pcre2_memctl),
|
||||
src_re->blocksize - sizeof(pcre2_memctl));
|
||||
|
||||
/* At the moment only one table is supported. */
|
||||
dst_re->tables = tables;
|
||||
dst_re->executable_jit = NULL;
|
||||
dst_re->flags |= PCRE2_DEREF_TABLES;
|
||||
|
||||
codes[i] = dst_re;
|
||||
src_bytes += src_re->blocksize;
|
||||
}
|
||||
|
||||
return number_of_codes;
|
||||
}
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Get the number of serialized patterns *
|
||||
*************************************************/
|
||||
|
||||
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||
pcre2_serialize_get_number_of_codes(const uint8_t *bytes)
|
||||
{
|
||||
const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes;
|
||||
|
||||
if (data == NULL) return PCRE2_ERROR_NULL;
|
||||
if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC;
|
||||
if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE;
|
||||
if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE;
|
||||
|
||||
return data->number_of_codes;
|
||||
}
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Free the allocated stream *
|
||||
*************************************************/
|
||||
|
||||
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION
|
||||
pcre2_serialize_free(uint8_t *bytes)
|
||||
{
|
||||
if (bytes != NULL)
|
||||
{
|
||||
pcre2_memctl *memctl = (pcre2_memctl *)(bytes - sizeof(pcre2_memctl));
|
||||
memctl->free(memctl, memctl->memory_data);
|
||||
}
|
||||
}
|
||||
|
||||
/* End of pcre2_serialize.c */
|
514
src/pcre2test.c
514
src/pcre2test.c
|
@ -166,6 +166,7 @@ void vms_setsymbol( char *, char *, int );
|
|||
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
||||
#define LOCALESIZE 32 /* Size of locale name */
|
||||
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
||||
#define PATSTACKSIZE 20 /* Pattern stack for save/restore testing */
|
||||
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
||||
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
||||
|
||||
|
@ -313,6 +314,26 @@ modes, so use the form of the first that is available. */
|
|||
#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_32
|
||||
#endif
|
||||
|
||||
/* ------------- Structure and table for handling #-commands ------------- */
|
||||
|
||||
typedef struct cmdstruct {
|
||||
const char *name;
|
||||
int value;
|
||||
} cmdstruct;
|
||||
|
||||
enum { CMD_FORBID_UTF, CMD_LOAD, CMD_PATTERN, CMD_PERLTEST, CMD_POP, CMD_SAVE,
|
||||
CMD_SUBJECT, CMD_UNKNOWN };
|
||||
|
||||
static cmdstruct cmdlist[] = {
|
||||
{ "forbid_utf", CMD_FORBID_UTF },
|
||||
{ "load", CMD_LOAD },
|
||||
{ "pattern", CMD_PATTERN },
|
||||
{ "perltest", CMD_PERLTEST },
|
||||
{ "pop", CMD_POP },
|
||||
{ "save", CMD_SAVE },
|
||||
{ "subject", CMD_SUBJECT }};
|
||||
|
||||
#define cmdlistcount sizeof(cmdlist)/sizeof(cmdstruct)
|
||||
|
||||
/* ------------- Structures and tables for handling modifiers -------------- */
|
||||
|
||||
|
@ -367,8 +388,9 @@ either on a pattern or a data line, so they must all be distinct. */
|
|||
#define CTL_MARK 0x00020000u
|
||||
#define CTL_MEMORY 0x00040000u
|
||||
#define CTL_POSIX 0x00080000u
|
||||
#define CTL_STARTCHAR 0x00100000u
|
||||
#define CTL_ZERO_TERMINATE 0x00200000u
|
||||
#define CTL_PUSH 0x00100000u
|
||||
#define CTL_STARTCHAR 0x00200000u
|
||||
#define CTL_ZERO_TERMINATE 0x00400000u
|
||||
|
||||
#define CTL_BSR_SET 0x80000000u /* This is informational */
|
||||
#define CTL_NL_SET 0x40000000u /* This is informational */
|
||||
|
@ -426,6 +448,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
|
|||
/* Ids for which context to modify. */
|
||||
|
||||
enum { CTX_PAT, /* Active pattern context */
|
||||
CTX_POPPAT, /* Ditto, for a popped pattern */
|
||||
CTX_DEFPAT, /* Default pattern context */
|
||||
CTX_DAT, /* Active data (match) context */
|
||||
CTX_DEFDAT }; /* Default data (match) context */
|
||||
|
@ -513,6 +536,7 @@ static modstruct modlist[] = {
|
|||
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
|
||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
||||
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||
|
@ -544,6 +568,20 @@ static modstruct modlist[] = {
|
|||
|
||||
#define EXCLUSIVE_DAT_CONTROLS (CTL_ALLUSEDTEXT|CTL_STARTCHAR)
|
||||
|
||||
/* Control bits that are not ignored with 'push'. */
|
||||
|
||||
#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \
|
||||
CTL_BINCODE|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO|CTL_JITVERIFY| \
|
||||
CTL_MEMORY|CTL_PUSH|CTL_BSR_SET|CTL_NL_SET)
|
||||
|
||||
/* Controls that apply only at compile time with 'push'. */
|
||||
|
||||
#define PUSH_COMPILE_ONLY_CONTROLS CTL_JITVERIFY
|
||||
|
||||
/* Controls that are forbidden with #pop. */
|
||||
|
||||
#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_PUSH)
|
||||
|
||||
/* Table of single-character abbreviated modifiers. The index field is
|
||||
initialized to -1, but the first time the modifier is encountered, it is filled
|
||||
in with the index of the full entry in modlist, to save repeated searching when
|
||||
|
@ -671,6 +709,9 @@ static patctl pat_patctl;
|
|||
static datctl def_datctl;
|
||||
static datctl dat_datctl;
|
||||
|
||||
static void *patstack[PATSTACKSIZE];
|
||||
static int patstacknext = 0;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
static regex_t preg = { NULL, NULL, 0, 0 };
|
||||
#endif
|
||||
|
@ -928,6 +969,38 @@ are supported. */
|
|||
else \
|
||||
pcre2_printint_32(compiled_code32,outfile,a)
|
||||
|
||||
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||
if (test_mode == PCRE8_MODE) \
|
||||
r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8)); \
|
||||
else if (test_mode == PCRE16_MODE) \
|
||||
r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16)); \
|
||||
else \
|
||||
r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32))
|
||||
|
||||
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||
if (test_mode == PCRE8_MODE) \
|
||||
r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8)); \
|
||||
else if (test_mode == PCRE16_MODE) \
|
||||
r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16)); \
|
||||
else \
|
||||
r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32))
|
||||
|
||||
#define PCRE2_SERIALIZE_FREE(a) \
|
||||
if (test_mode == PCRE8_MODE) \
|
||||
pcre2_serialize_free_8(a); \
|
||||
else if (test_mode == PCRE16_MODE) \
|
||||
pcre2_serialize_free_16(a); \
|
||||
else \
|
||||
pcre2_serialize_free_32(a)
|
||||
|
||||
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||
if (test_mode == PCRE8_MODE) \
|
||||
r = pcre2_serialize_get_number_of_codes_8(a); \
|
||||
else if (test_mode == PCRE16_MODE) \
|
||||
r = pcre2_serialize_get_number_of_codes_16(a); \
|
||||
else \
|
||||
r = pcre2_serialize_get_number_of_codes_32(a); \
|
||||
|
||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||
if (test_mode == PCRE8_MODE) \
|
||||
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c); \
|
||||
|
@ -1297,11 +1370,35 @@ the three different cases. */
|
|||
a = G(pcre2_pattern_info_,BITTWO)(G(b,BITTWO),c,d)
|
||||
|
||||
#define PCRE2_PRINTINT(a) \
|
||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||
G(pcre2_printint_,BITONE)(G(compiled_code,BITONE),outfile,a); \
|
||||
else \
|
||||
G(pcre2_printint_,BITTWO)(G(compiled_code,BITTWO),outfile,a)
|
||||
|
||||
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||
r = G(pcre2_serialize_decode_,BITONE)((G(pcre2_code_,BITONE) **)a,b,c,G(d,BITONE)); \
|
||||
else \
|
||||
r = G(pcre2_serialize_decode_,BITTWO)((G(pcre2_code_,BITTWO) **)a,b,c,G(d,BITTWO))
|
||||
|
||||
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||
r = G(pcre2_serialize_encode_,BITONE)((G(const pcre2_code_,BITONE) **)a,b,c,d,G(e,BITONE)); \
|
||||
else \
|
||||
r = G(pcre2_serialize_encode_,BITTWO)((G(const pcre2_code_,BITTWO) **)a,b,c,d,G(e,BITTWO))
|
||||
|
||||
#define PCRE2_SERIALIZE_FREE(a) \
|
||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||
G(pcre2_serialize_free_,BITONE)(a); \
|
||||
else \
|
||||
G(pcre2_serialize_free_,BITTWO)(a)
|
||||
|
||||
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||
r = G(pcre2_serialize_get_number_of_codes_,BITONE)(a); \
|
||||
else \
|
||||
r = G(pcre2_serialize_get_number_of_codes_,BITTWO)(a)
|
||||
|
||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||
G(pcre2_set_callout_,BITONE)(G(a,BITONE), \
|
||||
|
@ -1510,6 +1607,13 @@ the three different cases. */
|
|||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_8(G(a,8))
|
||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_8(G(b,8),c,d)
|
||||
#define PCRE2_PRINTINT(a) pcre2_printint_8(compiled_code8,outfile,a)
|
||||
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||
r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8))
|
||||
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||
r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8))
|
||||
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_8(a)
|
||||
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||
r = pcre2_serialize_get_number_of_codes_8(a)
|
||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c)
|
||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_8(G(a,8),b)
|
||||
|
@ -1591,6 +1695,13 @@ the three different cases. */
|
|||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_16(G(a,16))
|
||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_16(G(b,16),c,d)
|
||||
#define PCRE2_PRINTINT(a) pcre2_printint_16(compiled_code16,outfile,a)
|
||||
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||
r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16))
|
||||
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||
r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16))
|
||||
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_16(a)
|
||||
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||
r = pcre2_serialize_get_number_of_codes_16(a)
|
||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||
pcre2_set_callout_16(G(a,16),(int (*)(pcre2_callout_block_16 *, void *))b,c);
|
||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_16(G(a,16),b)
|
||||
|
@ -1672,6 +1783,13 @@ the three different cases. */
|
|||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_32(G(a,32))
|
||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_32(G(b,32),c,d)
|
||||
#define PCRE2_PRINTINT(a) pcre2_printint_32(compiled_code32,outfile,a)
|
||||
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||
r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32))
|
||||
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||
r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32))
|
||||
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_32(a)
|
||||
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||
r = pcre2_serialize_get_number_of_codes_32(a)
|
||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||
pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c);
|
||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_32(G(a,32),b)
|
||||
|
@ -2792,6 +2910,7 @@ it is allowed here and find the field that is to be changed.
|
|||
Arguments:
|
||||
m the modifier list entry
|
||||
ctx CTX_PAT => pattern context
|
||||
CTX_POPPAT => pattern context for popped pattern
|
||||
CTX_DEFPAT => default pattern context
|
||||
CTX_DAT => data context
|
||||
CTX_DEFDAT => default data context
|
||||
|
@ -2837,8 +2956,8 @@ switch (m->which)
|
|||
if (dctl != NULL) field = dctl;
|
||||
break;
|
||||
|
||||
case MOD_PAT: /* Pattern modifier */
|
||||
case MOD_PATP: /* Allowed for Perl test */
|
||||
case MOD_PAT: /* Pattern modifier */
|
||||
case MOD_PATP: /* Allowed for Perl test */
|
||||
if (pctl != NULL) field = pctl;
|
||||
break;
|
||||
|
||||
|
@ -2878,6 +2997,7 @@ modifiers that apply to contexts.
|
|||
Arguments:
|
||||
p point to modifier string
|
||||
ctx CTX_PAT => pattern context
|
||||
CTX_POPPAT => pattern context for popped pattern
|
||||
CTX_DEFPAT => default pattern context
|
||||
CTX_DAT => data context
|
||||
CTX_DEFDAT => default data context
|
||||
|
@ -2902,11 +3022,8 @@ for (;;)
|
|||
int index;
|
||||
char *endptr;
|
||||
|
||||
/* Skip white space and commas; after a comma we have passed the first
|
||||
item. */
|
||||
/* Skip white space and commas. */
|
||||
|
||||
while (isspace(*p)) p++;
|
||||
if (*p == ',') first = FALSE;
|
||||
while (isspace(*p) || *p == ',') p++;
|
||||
if (*p == 0) break;
|
||||
|
||||
|
@ -3163,6 +3280,17 @@ for (;;)
|
|||
}
|
||||
|
||||
p = pp;
|
||||
first = FALSE;
|
||||
|
||||
if (ctx == CTX_POPPAT &&
|
||||
(pctl->options != 0 ||
|
||||
pctl->tables_id != 0 ||
|
||||
pctl->locale[0] != 0 ||
|
||||
(pctl->control & NOTPOP_CONTROLS) != 0))
|
||||
{
|
||||
fprintf(outfile, "** '%s' is not valid here\n", m->name);
|
||||
return FALSE;
|
||||
}
|
||||
}
|
||||
|
||||
return TRUE;
|
||||
|
@ -3246,7 +3374,7 @@ Returns: nothing
|
|||
static void
|
||||
show_controls(uint32_t controls, const char *before)
|
||||
{
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
before,
|
||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||
|
@ -3268,6 +3396,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
|||
((controls & CTL_MARK) != 0)? " mark" : "",
|
||||
((controls & CTL_MEMORY) != 0)? " memory" : "",
|
||||
((controls & CTL_POSIX) != 0)? " posix" : "",
|
||||
((controls & CTL_PUSH) != 0)? " push" : "",
|
||||
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
|
||||
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
|
||||
}
|
||||
|
@ -3347,6 +3476,40 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s",
|
|||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Show memory usage info for a pattern *
|
||||
*************************************************/
|
||||
|
||||
static void
|
||||
show_memory_info(void)
|
||||
{
|
||||
uint32_t name_count, name_entry_size;
|
||||
size_t size, cblock_size;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == 8) cblock_size = sizeof(pcre2_real_code_8);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_16
|
||||
if (test_mode == 16) cblock_size = sizeof(pcre2_real_code_16);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == 32) cblock_size = sizeof(pcre2_real_code_32);
|
||||
#endif
|
||||
|
||||
(void)pattern_info(PCRE2_INFO_SIZE, &size, FALSE);
|
||||
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
||||
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
||||
fprintf(outfile, "Memory allocation (code space): %d\n",
|
||||
(int)(size - name_count*name_entry_size*code_unit_size - cblock_size));
|
||||
if (pat_patctl.jit != 0)
|
||||
{
|
||||
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
||||
fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)size);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Show information about a pattern *
|
||||
*************************************************/
|
||||
|
@ -3624,12 +3787,79 @@ return PR_OK;
|
|||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Handle serialization error *
|
||||
*************************************************/
|
||||
|
||||
/* Print an error message after a serialization failure.
|
||||
|
||||
Arguments:
|
||||
rc the error code
|
||||
msg an initial message for what failed
|
||||
|
||||
Returns: nothing
|
||||
*/
|
||||
|
||||
static void
|
||||
serial_error(int rc, const char *msg)
|
||||
{
|
||||
fprintf(outfile, "%s failed: error %d: ", msg, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Open file for save/load commands *
|
||||
*************************************************/
|
||||
|
||||
/* This function decodes the file name and opens the file.
|
||||
|
||||
Arguments:
|
||||
buffptr point after the #command
|
||||
mode open mode
|
||||
fptr points to the FILE variable
|
||||
|
||||
Returns: PR_OK or PR_ABEND
|
||||
*/
|
||||
|
||||
static int
|
||||
open_file(uint8_t *buffptr, const char *mode, FILE **fptr)
|
||||
{
|
||||
char *endf;
|
||||
char *filename = (char *)buffptr;
|
||||
while (isspace(*filename)) filename++;
|
||||
endf = filename + strlen8(filename);
|
||||
while (endf > filename && isspace(endf[-1])) endf--;
|
||||
|
||||
if (endf == filename)
|
||||
{
|
||||
fprintf(outfile, "** File name expected after #save\n");
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
*endf = 0;
|
||||
*fptr = fopen((const char *)filename, mode);
|
||||
if (*fptr == NULL)
|
||||
{
|
||||
fprintf(outfile, "** Failed to open '%s'\n", filename);
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
return PR_OK;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Process command line *
|
||||
*************************************************/
|
||||
|
||||
/* This function is called for lines beginning with # and a character that is
|
||||
not ! or whitespace, when encountered between tests. The line is in buffer.
|
||||
not ! or whitespace, when encountered between tests, which means that there is
|
||||
no compiled pattern (compiled_code is NULL). The line is in buffer.
|
||||
|
||||
Arguments: none
|
||||
|
||||
|
@ -3641,33 +3871,176 @@ Returns: PR_OK continue processing next line
|
|||
static int
|
||||
process_command(void)
|
||||
{
|
||||
FILE *f;
|
||||
PCRE2_SIZE serial_size;
|
||||
size_t i;
|
||||
int rc, cmd, cmdlen;
|
||||
const char *cmdname;
|
||||
uint8_t *argptr, *serial;
|
||||
|
||||
if (restrict_for_perl_test)
|
||||
{
|
||||
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
if (strncmp((char *)buffer, "#forbid_utf", 11) == 0 && isspace(buffer[11]))
|
||||
cmd = CMD_UNKNOWN;
|
||||
cmdlen = 0;
|
||||
|
||||
for (i = 0; i < cmdlistcount; i++)
|
||||
{
|
||||
forbid_utf = PCRE2_NEVER_UTF|PCRE2_NEVER_UCP;
|
||||
cmdname = cmdlist[i].name;
|
||||
cmdlen = strlen(cmdname);
|
||||
if (strncmp((char *)(buffer+1), cmdname, cmdlen) == 0 &&
|
||||
isspace(buffer[cmdlen+1]))
|
||||
{
|
||||
cmd = cmdlist[i].value;
|
||||
break;
|
||||
}
|
||||
}
|
||||
else if (strncmp((char *)buffer, "#pattern", 8) == 0 && isspace(buffer[8]))
|
||||
|
||||
argptr = buffer + cmdlen + 1;
|
||||
|
||||
switch(cmd)
|
||||
{
|
||||
(void)decode_modifiers(buffer + 8, CTX_DEFPAT, &def_patctl, NULL);
|
||||
case CMD_UNKNOWN:
|
||||
fprintf(outfile, "** Unknown command: %s", buffer);
|
||||
break;
|
||||
|
||||
case CMD_FORBID_UTF:
|
||||
forbid_utf = PCRE2_NEVER_UTF|PCRE2_NEVER_UCP;
|
||||
break;
|
||||
|
||||
case CMD_PERLTEST:
|
||||
restrict_for_perl_test = TRUE;
|
||||
break;
|
||||
|
||||
/* Set default pattern modifiers */
|
||||
|
||||
case CMD_PATTERN:
|
||||
(void)decode_modifiers(argptr, CTX_DEFPAT, &def_patctl, NULL);
|
||||
if (def_patctl.jit == 0 && (def_patctl.control & CTL_JITVERIFY) != 0)
|
||||
def_patctl.jit = 7;
|
||||
}
|
||||
else if (strncmp((char *)buffer, "#perltest", 9) == 0 && isspace(buffer[9]))
|
||||
{
|
||||
restrict_for_perl_test = TRUE;
|
||||
}
|
||||
else if (strncmp((char *)buffer, "#subject", 8) == 0 && isspace(buffer[8]))
|
||||
{
|
||||
(void)decode_modifiers(buffer + 8, CTX_DEFDAT, NULL, &def_datctl);
|
||||
}
|
||||
else
|
||||
{
|
||||
fprintf(outfile, "** Unknown command: %s", buffer);
|
||||
break;
|
||||
|
||||
/* Set default subject modifiers */
|
||||
|
||||
case CMD_SUBJECT:
|
||||
(void)decode_modifiers(argptr, CTX_DEFDAT, NULL, &def_datctl);
|
||||
break;
|
||||
|
||||
/* Pop a compiled pattern off the stack. Modifiers that do not affect the
|
||||
compiled pattern (e.g. to give information) are permitted. The default
|
||||
pattern modifiers are ignored. */
|
||||
|
||||
case CMD_POP:
|
||||
if (patstacknext <= 0)
|
||||
{
|
||||
fprintf(outfile, "** Can't pop off an empty stack\n");
|
||||
return PR_SKIP;
|
||||
}
|
||||
memset(&pat_patctl, 0, sizeof(patctl)); /* Completely unset */
|
||||
if (!decode_modifiers(argptr, CTX_POPPAT, &pat_patctl, NULL))
|
||||
return PR_SKIP;
|
||||
SET(compiled_code, patstack[--patstacknext]);
|
||||
if (pat_patctl.jit != 0)
|
||||
{
|
||||
PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
|
||||
}
|
||||
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||
{
|
||||
rc = show_pattern_info();
|
||||
if (rc != PR_OK) return rc;
|
||||
}
|
||||
break;
|
||||
|
||||
/* Save the stack of compiled patterns to a file, then empty the stack. */
|
||||
|
||||
case CMD_SAVE:
|
||||
if (patstacknext <= 0)
|
||||
{
|
||||
fprintf(outfile, "** No stacked patterns to save\n");
|
||||
return PR_OK;
|
||||
}
|
||||
|
||||
rc = open_file(argptr+1, OUTPUT_MODE, &f);
|
||||
if (rc != PR_OK) return rc;
|
||||
|
||||
PCRE2_SERIALIZE_ENCODE(rc, patstack, patstacknext, &serial, &serial_size,
|
||||
general_context);
|
||||
if (rc < 0)
|
||||
{
|
||||
serial_error(rc, "Serialization");
|
||||
break;
|
||||
}
|
||||
|
||||
/* Write the length at the start of the file to make it straightforward to
|
||||
get the right memory when re-loading. This saves having to read the file size
|
||||
in different operating systems. To allow for different endianness (even
|
||||
though reloading with the opposite endianness does not work), write the
|
||||
length byte-by-byte. */
|
||||
|
||||
for (i = 0; i < 4; i++) fputc((serial_size >> (i*8)) & 255, f);
|
||||
if (fwrite(serial, 1, serial_size, f) != serial_size)
|
||||
{
|
||||
fprintf(outfile, "** Wrong return from fwrite()\n");
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
fclose(f);
|
||||
PCRE2_SERIALIZE_FREE(serial);
|
||||
while(patstacknext > 0)
|
||||
{
|
||||
SET(compiled_code, patstack[--patstacknext]);
|
||||
SUB1(pcre2_code_free, compiled_code);
|
||||
}
|
||||
SET(compiled_code, NULL);
|
||||
break;
|
||||
|
||||
/* Load a set of compiled patterns from a file onto the stack */
|
||||
|
||||
case CMD_LOAD:
|
||||
rc = open_file(argptr+1, INPUT_MODE, &f);
|
||||
if (rc != PR_OK) return rc;
|
||||
|
||||
serial_size = 0;
|
||||
for (i = 0; i < 4; i++) serial_size |= fgetc(f) << (i*8);
|
||||
|
||||
serial = malloc(serial_size);
|
||||
if (serial == NULL)
|
||||
{
|
||||
fprintf(outfile, "** Failed to get memory (size %ld) for #load\n",
|
||||
serial_size);
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
if (fread(serial, 1, serial_size, f) != serial_size)
|
||||
{
|
||||
fprintf(outfile, "** Wrong return from fread()\n");
|
||||
return PR_ABEND;
|
||||
}
|
||||
fclose(f);
|
||||
|
||||
PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(rc, serial);
|
||||
if (rc < 0) serial_error(rc, "Get number of codes"); else
|
||||
{
|
||||
if (rc + patstacknext > PATSTACKSIZE)
|
||||
{
|
||||
fprintf(outfile, "** Not enough space on pattern stack for %d pattern%s\n",
|
||||
rc, (rc == 1)? "" : "s");
|
||||
rc = PATSTACKSIZE - patstacknext;
|
||||
fprintf(outfile, "** Decoding %d pattern%s\n", rc,
|
||||
(rc == 1)? "" : "s");
|
||||
}
|
||||
PCRE2_SERIALIZE_DECODE(rc, patstack + patstacknext, rc, serial,
|
||||
general_context);
|
||||
if (rc < 0) serial_error(rc, "Deserialization");
|
||||
else patstacknext += rc;
|
||||
}
|
||||
|
||||
free(serial);
|
||||
break;
|
||||
}
|
||||
|
||||
return PR_OK;
|
||||
|
@ -3750,6 +4123,14 @@ if (pat_patctl.jit == 0 &&
|
|||
(pat_patctl.control & (CTL_JITVERIFY|CTL_JITFAST)) != 0)
|
||||
pat_patctl.jit = 7;
|
||||
|
||||
/* POSIX and 'push' do not play together. */
|
||||
|
||||
if ((pat_patctl.control & (CTL_POSIX|CTL_PUSH)) == (CTL_POSIX|CTL_PUSH))
|
||||
{
|
||||
fprintf(outfile, "** The POSIX interface is incompatible with 'push'\n");
|
||||
return PR_ABEND;
|
||||
}
|
||||
|
||||
/* Now copy the pattern to pbuffer8 for use in 8-bit testing and for reflecting
|
||||
in callouts. Convert to binary if required. */
|
||||
|
||||
|
@ -3897,8 +4278,31 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
|||
#endif /* SUPPORT_PCRE2_8 */
|
||||
}
|
||||
|
||||
/* Handle compiling via the native interface, converting the input in non-8-bit
|
||||
modes. */
|
||||
/* Handle compiling via the native interface. Controls that act later are
|
||||
ignored with "push". Replacements are locked out. */
|
||||
|
||||
if ((pat_patctl.control & CTL_PUSH) != 0)
|
||||
{
|
||||
if (pat_patctl.replacement[0] != 0)
|
||||
{
|
||||
fprintf(outfile, "** Replacement text is not supported with 'push'.\n");
|
||||
return PR_OK;
|
||||
}
|
||||
if ((pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS) != 0)
|
||||
{
|
||||
show_controls(pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS,
|
||||
"** Ignored when compiled pattern is stacked with 'push':");
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
if ((pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS) != 0)
|
||||
{
|
||||
show_controls(pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS,
|
||||
"** Applies only to compile when pattern is stacked with 'push':");
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
}
|
||||
|
||||
/* Convert the input in non-8-bit modes. */
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == PCRE8_MODE) errorcode = 0;
|
||||
|
@ -4017,39 +4421,27 @@ if (pat_patctl.jit != 0)
|
|||
|
||||
/* Output code size and other information if requested. */
|
||||
|
||||
if ((pat_patctl.control & CTL_MEMORY) != 0)
|
||||
{
|
||||
uint32_t name_count, name_entry_size;
|
||||
size_t size, cblock_size;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == 8) cblock_size = sizeof(pcre2_real_code_8);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_16
|
||||
if (test_mode == 16) cblock_size = sizeof(pcre2_real_code_16);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == 32) cblock_size = sizeof(pcre2_real_code_32);
|
||||
#endif
|
||||
|
||||
(void)pattern_info(PCRE2_INFO_SIZE, &size, FALSE);
|
||||
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
||||
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
||||
fprintf(outfile, "Memory allocation (code space): %d\n",
|
||||
(int)(size - name_count*name_entry_size*code_unit_size - cblock_size));
|
||||
if (pat_patctl.jit != 0)
|
||||
{
|
||||
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
||||
fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)size);
|
||||
}
|
||||
}
|
||||
|
||||
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||
{
|
||||
int rc = show_pattern_info();
|
||||
if (rc != PR_OK) return rc;
|
||||
}
|
||||
|
||||
/* The "push" control requests that the compiled pattern be remembered on a
|
||||
stack. This is mainly for testing the serialization functionality. */
|
||||
|
||||
if ((pat_patctl.control & CTL_PUSH) != 0)
|
||||
{
|
||||
if (patstacknext >= PATSTACKSIZE)
|
||||
{
|
||||
fprintf(outfile, "** Too many pushed patterns (max %d)\n", PATSTACKSIZE);
|
||||
return PR_ABEND;
|
||||
}
|
||||
patstack[patstacknext++] = PTR(compiled_code);
|
||||
SET(compiled_code, NULL);
|
||||
}
|
||||
|
||||
return PR_OK;
|
||||
}
|
||||
|
||||
|
@ -6253,7 +6645,7 @@ if (argc > 1 && strcmp(argv[op], "-") != 0)
|
|||
infile = fopen(argv[op], INPUT_MODE);
|
||||
if (infile == NULL)
|
||||
{
|
||||
printf("** Failed to open %s\n", argv[op]);
|
||||
printf("** Failed to open '%s'\n", argv[op]);
|
||||
yield = 1;
|
||||
goto EXIT;
|
||||
}
|
||||
|
@ -6264,7 +6656,7 @@ if (argc > 2)
|
|||
outfile = fopen(argv[op+1], OUTPUT_MODE);
|
||||
if (outfile == NULL)
|
||||
{
|
||||
printf("** Failed to open %s\n", argv[op+1]);
|
||||
printf("** Failed to open '%s'\n", argv[op+1]);
|
||||
yield = 1;
|
||||
goto EXIT;
|
||||
}
|
||||
|
@ -6399,6 +6791,12 @@ free((void *)locale_tables);
|
|||
PCRE2_MATCH_DATA_FREE(match_data);
|
||||
SUB1(pcre2_code_free, compiled_code);
|
||||
|
||||
while(patstacknext-- > 0)
|
||||
{
|
||||
SET(compiled_code, patstack[patstacknext]);
|
||||
SUB1(pcre2_code_free, compiled_code);
|
||||
}
|
||||
|
||||
PCRE2_JIT_FREE_UNUSED_MEMORY(general_context);
|
||||
if (jit_stack != NULL)
|
||||
{
|
||||
|
|
|
@ -6,4 +6,4 @@
|
|||
|
||||
/a*/I
|
||||
|
||||
# End of testinput14
|
||||
# End of testinput15
|
||||
|
|
|
@ -161,7 +161,7 @@
|
|||
# match to happen via the interpreter, but for fast JIT invalid options are
|
||||
# ignored, so an unanchored match happens.
|
||||
|
||||
/abcd/jit
|
||||
/abcd/
|
||||
abcd\=anchored
|
||||
fail abcd\=anchored
|
||||
|
||||
|
@ -169,4 +169,21 @@
|
|||
abcd\=anchored
|
||||
succeed abcd\=anchored
|
||||
|
||||
# Push/pop does not lose the JIT information, though jitverify applies only to
|
||||
# compilation, but serializing (save/load) discards JIT data completely.
|
||||
|
||||
/^abc\Kdef/info,push
|
||||
#pop jitverify
|
||||
abcdef
|
||||
|
||||
/^abc\Kdef/info,push
|
||||
#save testsaved1
|
||||
#load testsaved1
|
||||
#pop jitverify
|
||||
abcdef
|
||||
|
||||
#load testsaved1
|
||||
#pop jit,jitverify
|
||||
abcdef
|
||||
|
||||
# End of testinput16
|
||||
|
|
|
@ -0,0 +1,62 @@
|
|||
# This set of tests exercises the serialization/deserialization functions in
|
||||
# the library. It does not use UTF or JIT.
|
||||
|
||||
#forbid_utf
|
||||
|
||||
# Compile several patterns, push them onto the stack, and then write them
|
||||
# all to a file.
|
||||
|
||||
#pattern push
|
||||
|
||||
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
|
||||
(?(DEFINE)
|
||||
(?<NAME_PAT>[a-z]+)
|
||||
(?<ADDRESS_PAT>\d+)
|
||||
)/x
|
||||
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
|
||||
|
||||
#save testsaved1
|
||||
|
||||
# Do it again for some more patterns.
|
||||
|
||||
/(*MARK:A)(*SKIP:B)(C|X)/mark
|
||||
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
|
||||
|
||||
#save testsaved2
|
||||
#pattern -push
|
||||
|
||||
# Reload the patterns, then pop them one by one and check them.
|
||||
|
||||
#load testsaved1
|
||||
#load testsaved2
|
||||
|
||||
#pop info
|
||||
foofoo
|
||||
barbar
|
||||
|
||||
#pop mark
|
||||
C
|
||||
D
|
||||
|
||||
#pop
|
||||
AmanaplanacanalPanama
|
||||
|
||||
#pop info
|
||||
metcalfe 33
|
||||
|
||||
# Check for an error when different tables are used.
|
||||
|
||||
/abc/push,tables=1
|
||||
/xyz/push,tables=2
|
||||
#save testsaved1
|
||||
|
||||
#pop
|
||||
xyz
|
||||
|
||||
#pop
|
||||
abc
|
||||
|
||||
#pop should give an error
|
||||
pqr
|
||||
|
||||
# End of testinput19
|
|
@ -14,4 +14,4 @@ Capturing subpattern count = 0
|
|||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
# End of testinput14
|
||||
# End of testinput15
|
||||
|
|
|
@ -310,7 +310,7 @@ Failed: error -46: JIT stack limit reached
|
|||
# match to happen via the interpreter, but for fast JIT invalid options are
|
||||
# ignored, so an unanchored match happens.
|
||||
|
||||
/abcd/jit
|
||||
/abcd/
|
||||
abcd\=anchored
|
||||
0: abcd
|
||||
fail abcd\=anchored
|
||||
|
@ -322,4 +322,36 @@ No match
|
|||
succeed abcd\=anchored
|
||||
0: abcd (JIT)
|
||||
|
||||
# Push/pop does not lose the JIT information, though jitverify applies only to
|
||||
# compilation, but serializing (save/load) discards JIT data completely.
|
||||
|
||||
/^abc\Kdef/info,push
|
||||
** Applied only to compile when pattern is stacked with 'push': jitverify
|
||||
Capturing subpattern count = 0
|
||||
Compile options: <none>
|
||||
Overall options: anchored
|
||||
Subject length lower bound = 6
|
||||
JIT compilation was successful
|
||||
#pop jitverify
|
||||
abcdef
|
||||
0: def (JIT)
|
||||
|
||||
/^abc\Kdef/info,push
|
||||
** Applied only to compile when pattern is stacked with 'push': jitverify
|
||||
Capturing subpattern count = 0
|
||||
Compile options: <none>
|
||||
Overall options: anchored
|
||||
Subject length lower bound = 6
|
||||
JIT compilation was successful
|
||||
#save testsaved1
|
||||
#load testsaved1
|
||||
#pop jitverify
|
||||
abcdef
|
||||
0: def
|
||||
|
||||
#load testsaved1
|
||||
#pop jit,jitverify
|
||||
abcdef
|
||||
0: def (JIT)
|
||||
|
||||
# End of testinput16
|
||||
|
|
|
@ -0,0 +1,100 @@
|
|||
# This set of tests exercises the serialization/deserialization functions in
|
||||
# the library. It does not use UTF or JIT.
|
||||
|
||||
#forbid_utf
|
||||
|
||||
# Compile several patterns, push them onto the stack, and then write them
|
||||
# all to a file.
|
||||
|
||||
#pattern push
|
||||
|
||||
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
|
||||
(?(DEFINE)
|
||||
(?<NAME_PAT>[a-z]+)
|
||||
(?<ADDRESS_PAT>\d+)
|
||||
)/x
|
||||
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
|
||||
|
||||
#save testsaved1
|
||||
|
||||
# Do it again for some more patterns.
|
||||
|
||||
/(*MARK:A)(*SKIP:B)(C|X)/mark
|
||||
** Ignored when compiled pattern is stacked with 'push': mark
|
||||
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
|
||||
|
||||
#save testsaved2
|
||||
#pattern -push
|
||||
|
||||
# Reload the patterns, then pop them one by one and check them.
|
||||
|
||||
#load testsaved1
|
||||
#load testsaved2
|
||||
|
||||
#pop info
|
||||
Capturing subpattern count = 2
|
||||
Max back reference = 2
|
||||
Named capturing subpatterns:
|
||||
n 1
|
||||
n 2
|
||||
Options: dupnames
|
||||
Starting code units: b f
|
||||
Subject length lower bound = 6
|
||||
foofoo
|
||||
0: foofoo
|
||||
1: foo
|
||||
barbar
|
||||
0: barbar
|
||||
1: <unset>
|
||||
2: bar
|
||||
|
||||
#pop mark
|
||||
C
|
||||
0: C
|
||||
1: C
|
||||
MK: A
|
||||
D
|
||||
No match, mark = A
|
||||
|
||||
#pop
|
||||
AmanaplanacanalPanama
|
||||
0: AmanaplanacanalPanama
|
||||
1: <unset>
|
||||
2: <unset>
|
||||
3: AmanaplanacanalPanama
|
||||
4: A
|
||||
|
||||
#pop info
|
||||
Capturing subpattern count = 4
|
||||
Named capturing subpatterns:
|
||||
ADDR 2
|
||||
ADDRESS_PAT 4
|
||||
NAME 1
|
||||
NAME_PAT 3
|
||||
Options: extended
|
||||
Subject length lower bound = 3
|
||||
metcalfe 33
|
||||
0: metcalfe 33
|
||||
1: metcalfe
|
||||
2: 33
|
||||
|
||||
# Check for an error when different tables are used.
|
||||
|
||||
/abc/push,tables=1
|
||||
/xyz/push,tables=2
|
||||
#save testsaved1
|
||||
Serialization failed: error -30: patterns do not all use the same character tables
|
||||
|
||||
#pop
|
||||
xyz
|
||||
0: xyz
|
||||
|
||||
#pop
|
||||
abc
|
||||
0: abc
|
||||
|
||||
#pop should give an error
|
||||
** Can't pop off an empty stack
|
||||
pqr
|
||||
|
||||
# End of testinput19
|
Loading…
Reference in New Issue