Add serialization functions and tests with updated pcre2test. Fix
PCRE2_INFO_SIZE issues.
This commit is contained in:
parent
d4daaf966d
commit
5438fc8a6a
22
ChangeLog
22
ChangeLog
|
@ -1,8 +1,8 @@
|
||||||
Change Log for PCRE2
|
Change Log for PCRE2
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
Version 10.10 13-January-2015
|
Version 10.10 xx-xxx-2015
|
||||||
-----------------------------
|
-------------------------
|
||||||
|
|
||||||
1. When a pattern is compiled, it remembers the highest back reference so that
|
1. When a pattern is compiled, it remembers the highest back reference so that
|
||||||
when matching, if the ovector is too small, extra memory can be obtained to
|
when matching, if the ovector is too small, extra memory can be obtained to
|
||||||
|
@ -16,6 +16,19 @@ bug was that the condition was always treated as FALSE when the capture could
|
||||||
not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
|
not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
|
||||||
has been fixed.
|
has been fixed.
|
||||||
|
|
||||||
|
2. Functions for serialization and deserialization of sets of compiled patterns
|
||||||
|
have been added.
|
||||||
|
|
||||||
|
3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
|
||||||
|
excess code units at the end of the data block that may occasionally occur if
|
||||||
|
the code for calculating the size over-estimates. This change stops the
|
||||||
|
serialization code copying uninitialized data, to which valgrind objects. The
|
||||||
|
documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
|
||||||
|
include the general overhead. This has been corrected.
|
||||||
|
|
||||||
|
4. All code units in every slot in the table of group names are now set, again
|
||||||
|
in order to avoid accessing uninitialized data when serializing.
|
||||||
|
|
||||||
|
|
||||||
Version 10.00 05-January-2015
|
Version 10.00 05-January-2015
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
@ -30,8 +43,9 @@ logged. In addition to the API changes, the following changes were made. They
|
||||||
are either new functionality, or bug fixes and other noticeable changes of
|
are either new functionality, or bug fixes and other noticeable changes of
|
||||||
behaviour that were implemented after the code had been forked.
|
behaviour that were implemented after the code had been forked.
|
||||||
|
|
||||||
1. Unicode support is now enabled by default, but it can optionally be
|
1. Including Unicode support at build time is now enabled by default, but it
|
||||||
disabled.
|
can optionally be disabled. It is not enabled by default at run time (no
|
||||||
|
change).
|
||||||
|
|
||||||
2. The test program, now called pcre2test, was re-specified and almost
|
2. The test program, now called pcre2test, was re-specified and almost
|
||||||
completely re-written. Its input is not compatible with input for pcretest.
|
completely re-written. Its input is not compatible with input for pcretest.
|
||||||
|
|
13
Makefile.am
13
Makefile.am
|
@ -54,6 +54,10 @@ dist_html_DATA = \
|
||||||
doc/html/pcre2_match_data_create_from_pattern.html \
|
doc/html/pcre2_match_data_create_from_pattern.html \
|
||||||
doc/html/pcre2_match_data_free.html \
|
doc/html/pcre2_match_data_free.html \
|
||||||
doc/html/pcre2_pattern_info.html \
|
doc/html/pcre2_pattern_info.html \
|
||||||
|
doc/html/pcre2_serialize_decode.html \
|
||||||
|
doc/html/pcre2_serialize_encode.html \
|
||||||
|
doc/html/pcre2_serialize_free.html \
|
||||||
|
doc/html/pcre2_serialize_get_number_of_codes.html \
|
||||||
doc/html/pcre2_set_bsr.html \
|
doc/html/pcre2_set_bsr.html \
|
||||||
doc/html/pcre2_set_callout.html \
|
doc/html/pcre2_set_callout.html \
|
||||||
doc/html/pcre2_set_character_tables.html \
|
doc/html/pcre2_set_character_tables.html \
|
||||||
|
@ -89,6 +93,7 @@ dist_html_DATA = \
|
||||||
doc/html/pcre2perform.html \
|
doc/html/pcre2perform.html \
|
||||||
doc/html/pcre2posix.html \
|
doc/html/pcre2posix.html \
|
||||||
doc/html/pcre2sample.html \
|
doc/html/pcre2sample.html \
|
||||||
|
doc/html/pcre2serialize.html \
|
||||||
doc/html/pcre2stack.html \
|
doc/html/pcre2stack.html \
|
||||||
doc/html/pcre2syntax.html \
|
doc/html/pcre2syntax.html \
|
||||||
doc/html/pcre2test.html \
|
doc/html/pcre2test.html \
|
||||||
|
@ -127,6 +132,10 @@ dist_man_MANS = \
|
||||||
doc/pcre2_match_data_create_from_pattern.3 \
|
doc/pcre2_match_data_create_from_pattern.3 \
|
||||||
doc/pcre2_match_data_free.3 \
|
doc/pcre2_match_data_free.3 \
|
||||||
doc/pcre2_pattern_info.3 \
|
doc/pcre2_pattern_info.3 \
|
||||||
|
doc/pcre2_serialize_decode.3 \
|
||||||
|
doc/pcre2_serialize_encode.3 \
|
||||||
|
doc/pcre2_serialize_free.3 \
|
||||||
|
doc/pcre2_serialize_get_number_of_codes.3 \
|
||||||
doc/pcre2_set_bsr.3 \
|
doc/pcre2_set_bsr.3 \
|
||||||
doc/pcre2_set_callout.3 \
|
doc/pcre2_set_callout.3 \
|
||||||
doc/pcre2_set_character_tables.3 \
|
doc/pcre2_set_character_tables.3 \
|
||||||
|
@ -162,6 +171,7 @@ dist_man_MANS = \
|
||||||
doc/pcre2perform.3 \
|
doc/pcre2perform.3 \
|
||||||
doc/pcre2posix.3 \
|
doc/pcre2posix.3 \
|
||||||
doc/pcre2sample.3 \
|
doc/pcre2sample.3 \
|
||||||
|
doc/pcre2serialize.3 \
|
||||||
doc/pcre2stack.3 \
|
doc/pcre2stack.3 \
|
||||||
doc/pcre2syntax.3 \
|
doc/pcre2syntax.3 \
|
||||||
doc/pcre2test.1 \
|
doc/pcre2test.1 \
|
||||||
|
@ -316,6 +326,7 @@ COMMON_SOURCES = \
|
||||||
src/pcre2_newline.c \
|
src/pcre2_newline.c \
|
||||||
src/pcre2_ord2utf.c \
|
src/pcre2_ord2utf.c \
|
||||||
src/pcre2_pattern_info.c \
|
src/pcre2_pattern_info.c \
|
||||||
|
src/pcre2_serialize.c \
|
||||||
src/pcre2_string_utils.c \
|
src/pcre2_string_utils.c \
|
||||||
src/pcre2_study.c \
|
src/pcre2_study.c \
|
||||||
src/pcre2_substitute.c \
|
src/pcre2_substitute.c \
|
||||||
|
@ -573,6 +584,7 @@ EXTRA_DIST += \
|
||||||
testdata/testinput16 \
|
testdata/testinput16 \
|
||||||
testdata/testinput17 \
|
testdata/testinput17 \
|
||||||
testdata/testinput18 \
|
testdata/testinput18 \
|
||||||
|
testdata/testinput19 \
|
||||||
testdata/testinputEBC \
|
testdata/testinputEBC \
|
||||||
testdata/testoutput1 \
|
testdata/testoutput1 \
|
||||||
testdata/testoutput2 \
|
testdata/testoutput2 \
|
||||||
|
@ -598,6 +610,7 @@ EXTRA_DIST += \
|
||||||
testdata/testoutput16 \
|
testdata/testoutput16 \
|
||||||
testdata/testoutput17 \
|
testdata/testoutput17 \
|
||||||
testdata/testoutput18 \
|
testdata/testoutput18 \
|
||||||
|
testdata/testoutput19 \
|
||||||
testdata/testoutputEBC \
|
testdata/testoutputEBC \
|
||||||
perltest.sh
|
perltest.sh
|
||||||
|
|
||||||
|
|
|
@ -108,6 +108,7 @@ can skip ahead to the CMake section.
|
||||||
pcre2_newline.c
|
pcre2_newline.c
|
||||||
pcre2_ord2utf.c
|
pcre2_ord2utf.c
|
||||||
pcre2_pattern_info.c
|
pcre2_pattern_info.c
|
||||||
|
pcre2_serialize.c
|
||||||
pcre2_string_utils.c
|
pcre2_string_utils.c
|
||||||
pcre2_study.c
|
pcre2_study.c
|
||||||
pcre2_substitute.c
|
pcre2_substitute.c
|
||||||
|
@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
|
||||||
course.
|
course.
|
||||||
|
|
||||||
=============================
|
=============================
|
||||||
Last Updated: 05 January 2015
|
Last Updated: 19 January 2015
|
||||||
|
|
101
README
101
README
|
@ -527,11 +527,10 @@ Testing PCRE2
|
||||||
------------
|
------------
|
||||||
|
|
||||||
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
||||||
There is another script called RunGrepTest that tests the options of the
|
There is another script called RunGrepTest that tests the pcre2grep command.
|
||||||
pcre2grep command. When JIT support is enabled, a third test program called
|
When JIT support is enabled, a third test program called pcre2_jit_test is
|
||||||
pcre2_jit_test is built. Both the scripts and all the program tests are run if
|
built. Both the scripts and all the program tests are run if you obey "make
|
||||||
you obey "make check". For other environments, see the instructions in
|
check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
|
||||||
NON-AUTOTOOLS-BUILD.
|
|
||||||
|
|
||||||
The RunTest script runs the pcre2test test program (which is documented in its
|
The RunTest script runs the pcre2test test program (which is documented in its
|
||||||
own man page) on each of the relevant testinput files in the testdata
|
own man page) on each of the relevant testinput files in the testdata
|
||||||
|
@ -544,9 +543,9 @@ Some tests are relevant only when certain build-time options were selected. For
|
||||||
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
||||||
is available. RunTest outputs a comment when it skips a test.
|
is available. RunTest outputs a comment when it skips a test.
|
||||||
|
|
||||||
Many of the tests that are not skipped are run twice if JIT support is
|
Many (but not all) of the tests that are not skipped are run twice if JIT
|
||||||
available. On the second run, JIT compilation is forced. This testing can be
|
support is available. On the second run, JIT compilation is forced. This
|
||||||
suppressed by putting "nojit" on the RunTest command line.
|
testing can be suppressed by putting "nojit" on the RunTest command line.
|
||||||
|
|
||||||
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
||||||
libraries that are enabled. If you want to run just one set of tests, call
|
libraries that are enabled. If you want to run just one set of tests, call
|
||||||
|
@ -570,14 +569,20 @@ in numerical order.
|
||||||
You can also call RunTest with the single argument "list" to cause it to output
|
You can also call RunTest with the single argument "list" to cause it to output
|
||||||
a list of tests.
|
a list of tests.
|
||||||
|
|
||||||
The first two tests can always be run, as they expect only plain text strings
|
The test sequence starts with "test 0", which is a special test that has no
|
||||||
(not UTF) and make no use of Unicode properties. The first test file can be fed
|
input file, and whose output is not checked. This is because it will be
|
||||||
|
different on different hardware and with different configurations. The test
|
||||||
|
exists in order to exercise some of pcre2test's code that would not otherwise
|
||||||
|
be run.
|
||||||
|
|
||||||
|
Tests 1 and 2 can always be run, as they expect only plain text strings (not
|
||||||
|
UTF) and make no use of Unicode properties. The first test file can be fed
|
||||||
directly into the perltest.sh script to check that Perl gives the same results.
|
directly into the perltest.sh script to check that Perl gives the same results.
|
||||||
The only difference you should see is in the first few lines, where the Perl
|
The only difference you should see is in the first few lines, where the Perl
|
||||||
version is given instead of the PCRE2 version. The second set of tests check
|
version is given instead of the PCRE2 version. The second set of tests check
|
||||||
auxiliary functions, error detection, and run-time flags that are specific to
|
auxiliary functions, error detection, and run-time flags that are specific to
|
||||||
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
|
PCRE2. It also uses the debugging flags to check some of the internals of
|
||||||
check some of the internals of pcre2_compile().
|
pcre2_compile().
|
||||||
|
|
||||||
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
||||||
character tables may be different (see next paragraph). In some cases, this may
|
character tables may be different (see next paragraph). In some cases, this may
|
||||||
|
@ -585,18 +590,17 @@ cause failures in the second set of tests. For example, in a locale where the
|
||||||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||||
[:isascii:] inside a character class defines a different set of characters, and
|
[:isascii:] inside a character class defines a different set of characters, and
|
||||||
this shows up in this test as a difference in the compiled code, which is being
|
this shows up in this test as a difference in the compiled code, which is being
|
||||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
listed for checking. For example, where the comparison test output contains
|
||||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
|
||||||
bug in PCRE2.
|
cases. This is not a bug in PCRE2.
|
||||||
|
|
||||||
The third set of tests checks pcre2_maketables(), the facility for building a
|
Test 3 checks pcre2_maketables(), the facility for building a set of character
|
||||||
set of character tables for a specific locale and using them instead of the
|
tables for a specific locale and using them instead of the default tables. The
|
||||||
default tables. The script uses the "locale" command to check for the
|
script uses the "locale" command to check for the availability of the "fr_FR",
|
||||||
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
|
"french", or "fr" locale, and uses the first one that it finds. If the "locale"
|
||||||
that it finds. If the "locale" command fails, or if its output doesn't include
|
command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
|
||||||
"fr_FR", "french", or "fr" in the list of available locales, the third test
|
the list of available locales, the third test cannot be run, and a comment is
|
||||||
cannot be run, and a comment is output to say why. If running this test
|
output to say why. If running this test produces an error like this:
|
||||||
produces an error like this
|
|
||||||
|
|
||||||
** Failed to set locale "fr_FR"
|
** Failed to set locale "fr_FR"
|
||||||
|
|
||||||
|
@ -606,33 +610,37 @@ alternative output files for the third test, because three different versions
|
||||||
of the French locale have been encountered. The test passes if its output
|
of the French locale have been encountered. The test passes if its output
|
||||||
matches any one of them.
|
matches any one of them.
|
||||||
|
|
||||||
The fourth and fifth tests check UTF and Unicode property support, the fourth
|
Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
|
||||||
being compatible with the perltest.sh script, and the fifth checking
|
with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
||||||
PCRE2-specific things.
|
|
||||||
|
|
||||||
The sixth and seventh tests check the pcre2_dfa_match() alternative matching
|
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||||
function, in non-UTF mode and UTF-mode with Unicode property support,
|
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||||
respectively.
|
|
||||||
|
|
||||||
The eighth test checks some internal offsets and code size features; it is
|
Test 8 checks some internal offsets and code size features; it is run only when
|
||||||
run only when the default "link size" of 2 is set (in other cases the sizes
|
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||||
change) and when Unicode support is enabled.
|
Unicode support is enabled.
|
||||||
|
|
||||||
The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
|
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||||
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
|
16-bit and 32-bit modes. These are tests that generate different output in
|
||||||
generate different output in 8-bit mode. Each pair are for general cases and
|
8-bit mode. Each pair are for general cases and Unicode support, respectively.
|
||||||
Unicode support, respectively. The thirteenth test checks the handling of
|
Test 13 checks the handling of non-UTF characters greater than 255 by
|
||||||
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
|
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||||
modes.
|
|
||||||
|
|
||||||
The fourteenth test is run only when JIT support is not available, and the
|
Test 14 contains a number of tests that must not be run with JIT. They check,
|
||||||
fifteenth test is run only when JIT support is available. They test some
|
among other non-JIT things, the match-limiting features of the intepretive
|
||||||
JIT-specific features such as information output from pcre2test about JIT
|
matcher.
|
||||||
compilation.
|
|
||||||
|
|
||||||
The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
|
Test 15 is run only when JIT support is not available. It checks that an
|
||||||
POSIX interface to the 8-bit library, without and with Unicode support,
|
attempt to use JIT has the expected behaviour.
|
||||||
respectively.
|
|
||||||
|
Test 16 is run only when JIT support is available. It checks JIT complete and
|
||||||
|
partial modes, match-limiting under JIT, and other JIT-specific features.
|
||||||
|
|
||||||
|
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
|
||||||
|
the 8-bit library, without and with Unicode support, respectively.
|
||||||
|
|
||||||
|
Test 19 checks the serialization functions by writing a set of compiled
|
||||||
|
patterns to a file, and then reloading and checking them.
|
||||||
|
|
||||||
|
|
||||||
Character tables
|
Character tables
|
||||||
|
@ -718,6 +726,7 @@ The distribution should contain the files listed below.
|
||||||
src/pcre2_newline.c )
|
src/pcre2_newline.c )
|
||||||
src/pcre2_ord2utf.c )
|
src/pcre2_ord2utf.c )
|
||||||
src/pcre2_pattern_info.c )
|
src/pcre2_pattern_info.c )
|
||||||
|
src/pcre2_serialize.c )
|
||||||
src/pcre2_string_utils.c )
|
src/pcre2_string_utils.c )
|
||||||
src/pcre2_study.c )
|
src/pcre2_study.c )
|
||||||
src/pcre2_substitute.c )
|
src/pcre2_substitute.c )
|
||||||
|
@ -816,4 +825,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 05 January 2015
|
Last updated: 20 January 2015
|
||||||
|
|
17
RunTest
17
RunTest
|
@ -65,6 +65,7 @@ title15="Test 15: JIT-specific features when JIT is not available"
|
||||||
title16="Test 16: JIT-specific features when JIT is available"
|
title16="Test 16: JIT-specific features when JIT is available"
|
||||||
title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP"
|
title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP"
|
||||||
title18="Test 18: Tests of the POSIX interface with UTF/UCP"
|
title18="Test 18: Tests of the POSIX interface with UTF/UCP"
|
||||||
|
title19="Test 19: Serialization tests"
|
||||||
maxtest=18
|
maxtest=18
|
||||||
|
|
||||||
if [ $# -eq 1 -a "$1" = "list" ]; then
|
if [ $# -eq 1 -a "$1" = "list" ]; then
|
||||||
|
@ -87,6 +88,7 @@ if [ $# -eq 1 -a "$1" = "list" ]; then
|
||||||
echo $title16
|
echo $title16
|
||||||
echo $title17
|
echo $title17
|
||||||
echo $title18
|
echo $title18
|
||||||
|
echo $title19
|
||||||
exit 0
|
exit 0
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
@ -207,6 +209,7 @@ do15=no
|
||||||
do16=no
|
do16=no
|
||||||
do17=no
|
do17=no
|
||||||
do18=no
|
do18=no
|
||||||
|
do19=no
|
||||||
|
|
||||||
while [ $# -gt 0 ] ; do
|
while [ $# -gt 0 ] ; do
|
||||||
case $1 in
|
case $1 in
|
||||||
|
@ -229,6 +232,7 @@ while [ $# -gt 0 ] ; do
|
||||||
16) do16=yes;;
|
16) do16=yes;;
|
||||||
17) do17=yes;;
|
17) do17=yes;;
|
||||||
18) do18=yes;;
|
18) do18=yes;;
|
||||||
|
19) do19=yes;;
|
||||||
-8) arg8=yes;;
|
-8) arg8=yes;;
|
||||||
-16) arg16=yes;;
|
-16) arg16=yes;;
|
||||||
-32) arg32=yes;;
|
-32) arg32=yes;;
|
||||||
|
@ -364,7 +368,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
|
||||||
$do4 = no -a $do5 = no -a $do6 = no -a $do7 = no -a \
|
$do4 = no -a $do5 = no -a $do6 = no -a $do7 = no -a \
|
||||||
$do8 = no -a $do9 = no -a $do10 = no -a $do11 = no -a \
|
$do8 = no -a $do9 = no -a $do10 = no -a $do11 = no -a \
|
||||||
$do12 = no -a $do13 = no -a $do14 = no -a $do15 = no -a \
|
$do12 = no -a $do13 = no -a $do14 = no -a $do15 = no -a \
|
||||||
$do16 = no -a $do17 = no -a $do18 = no \
|
$do16 = no -a $do17 = no -a $do18 = no -a $do19 = no \
|
||||||
]; then
|
]; then
|
||||||
do0=yes
|
do0=yes
|
||||||
do1=yes
|
do1=yes
|
||||||
|
@ -385,6 +389,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
|
||||||
do16=yes
|
do16=yes
|
||||||
do17=yes
|
do17=yes
|
||||||
do18=yes
|
do18=yes
|
||||||
|
do19=yes
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Handle any explicit skips at this stage, so that an argument list may consist
|
# Handle any explicit skips at this stage, so that an argument list may consist
|
||||||
|
@ -721,10 +726,18 @@ for bmode in "$test8" "$test16" "$test32"; do
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# Serialization tests
|
||||||
|
|
||||||
|
if [ $do19 = yes ] ; then
|
||||||
|
echo $title19
|
||||||
|
$sim $valgrind ./pcre2test -q $bmode $testdata/testinput19 testtry
|
||||||
|
checkresult $? 19 ""
|
||||||
|
fi
|
||||||
|
|
||||||
# End of loop for 8/16/32-bit tests
|
# End of loop for 8/16/32-bit tests
|
||||||
done
|
done
|
||||||
|
|
||||||
# Clean up local working files
|
# Clean up local working files
|
||||||
rm -f testSinput test3input test3output test3outputA test3outputB teststdout testtry
|
rm -f testSinput test3input testsaved1 testsaved2 test3output test3outputA test3outputB teststdout testtry
|
||||||
|
|
||||||
# End
|
# End
|
||||||
|
|
|
@ -108,6 +108,7 @@ can skip ahead to the CMake section.
|
||||||
pcre2_newline.c
|
pcre2_newline.c
|
||||||
pcre2_ord2utf.c
|
pcre2_ord2utf.c
|
||||||
pcre2_pattern_info.c
|
pcre2_pattern_info.c
|
||||||
|
pcre2_serialize.c
|
||||||
pcre2_string_utils.c
|
pcre2_string_utils.c
|
||||||
pcre2_study.c
|
pcre2_study.c
|
||||||
pcre2_substitute.c
|
pcre2_substitute.c
|
||||||
|
@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
|
||||||
course.
|
course.
|
||||||
|
|
||||||
=============================
|
=============================
|
||||||
Last Updated: 05 January 2015
|
Last Updated: 19 January 2015
|
||||||
|
|
|
@ -527,11 +527,10 @@ Testing PCRE2
|
||||||
------------
|
------------
|
||||||
|
|
||||||
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
|
||||||
There is another script called RunGrepTest that tests the options of the
|
There is another script called RunGrepTest that tests the pcre2grep command.
|
||||||
pcre2grep command. When JIT support is enabled, a third test program called
|
When JIT support is enabled, a third test program called pcre2_jit_test is
|
||||||
pcre2_jit_test is built. Both the scripts and all the program tests are run if
|
built. Both the scripts and all the program tests are run if you obey "make
|
||||||
you obey "make check". For other environments, see the instructions in
|
check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
|
||||||
NON-AUTOTOOLS-BUILD.
|
|
||||||
|
|
||||||
The RunTest script runs the pcre2test test program (which is documented in its
|
The RunTest script runs the pcre2test test program (which is documented in its
|
||||||
own man page) on each of the relevant testinput files in the testdata
|
own man page) on each of the relevant testinput files in the testdata
|
||||||
|
@ -544,9 +543,9 @@ Some tests are relevant only when certain build-time options were selected. For
|
||||||
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
example, the tests for UTF-8/16/32 features are run only when Unicode support
|
||||||
is available. RunTest outputs a comment when it skips a test.
|
is available. RunTest outputs a comment when it skips a test.
|
||||||
|
|
||||||
Many of the tests that are not skipped are run twice if JIT support is
|
Many (but not all) of the tests that are not skipped are run twice if JIT
|
||||||
available. On the second run, JIT compilation is forced. This testing can be
|
support is available. On the second run, JIT compilation is forced. This
|
||||||
suppressed by putting "nojit" on the RunTest command line.
|
testing can be suppressed by putting "nojit" on the RunTest command line.
|
||||||
|
|
||||||
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
|
||||||
libraries that are enabled. If you want to run just one set of tests, call
|
libraries that are enabled. If you want to run just one set of tests, call
|
||||||
|
@ -570,14 +569,20 @@ in numerical order.
|
||||||
You can also call RunTest with the single argument "list" to cause it to output
|
You can also call RunTest with the single argument "list" to cause it to output
|
||||||
a list of tests.
|
a list of tests.
|
||||||
|
|
||||||
The first two tests can always be run, as they expect only plain text strings
|
The test sequence starts with "test 0", which is a special test that has no
|
||||||
(not UTF) and make no use of Unicode properties. The first test file can be fed
|
input file, and whose output is not checked. This is because it will be
|
||||||
|
different on different hardware and with different configurations. The test
|
||||||
|
exists in order to exercise some of pcre2test's code that would not otherwise
|
||||||
|
be run.
|
||||||
|
|
||||||
|
Tests 1 and 2 can always be run, as they expect only plain text strings (not
|
||||||
|
UTF) and make no use of Unicode properties. The first test file can be fed
|
||||||
directly into the perltest.sh script to check that Perl gives the same results.
|
directly into the perltest.sh script to check that Perl gives the same results.
|
||||||
The only difference you should see is in the first few lines, where the Perl
|
The only difference you should see is in the first few lines, where the Perl
|
||||||
version is given instead of the PCRE2 version. The second set of tests check
|
version is given instead of the PCRE2 version. The second set of tests check
|
||||||
auxiliary functions, error detection, and run-time flags that are specific to
|
auxiliary functions, error detection, and run-time flags that are specific to
|
||||||
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
|
PCRE2. It also uses the debugging flags to check some of the internals of
|
||||||
check some of the internals of pcre2_compile().
|
pcre2_compile().
|
||||||
|
|
||||||
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
If you build PCRE2 with a locale setting that is not the standard C locale, the
|
||||||
character tables may be different (see next paragraph). In some cases, this may
|
character tables may be different (see next paragraph). In some cases, this may
|
||||||
|
@ -585,18 +590,17 @@ cause failures in the second set of tests. For example, in a locale where the
|
||||||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||||
[:isascii:] inside a character class defines a different set of characters, and
|
[:isascii:] inside a character class defines a different set of characters, and
|
||||||
this shows up in this test as a difference in the compiled code, which is being
|
this shows up in this test as a difference in the compiled code, which is being
|
||||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
listed for checking. For example, where the comparison test output contains
|
||||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
|
||||||
bug in PCRE2.
|
cases. This is not a bug in PCRE2.
|
||||||
|
|
||||||
The third set of tests checks pcre2_maketables(), the facility for building a
|
Test 3 checks pcre2_maketables(), the facility for building a set of character
|
||||||
set of character tables for a specific locale and using them instead of the
|
tables for a specific locale and using them instead of the default tables. The
|
||||||
default tables. The script uses the "locale" command to check for the
|
script uses the "locale" command to check for the availability of the "fr_FR",
|
||||||
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
|
"french", or "fr" locale, and uses the first one that it finds. If the "locale"
|
||||||
that it finds. If the "locale" command fails, or if its output doesn't include
|
command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
|
||||||
"fr_FR", "french", or "fr" in the list of available locales, the third test
|
the list of available locales, the third test cannot be run, and a comment is
|
||||||
cannot be run, and a comment is output to say why. If running this test
|
output to say why. If running this test produces an error like this:
|
||||||
produces an error like this
|
|
||||||
|
|
||||||
** Failed to set locale "fr_FR"
|
** Failed to set locale "fr_FR"
|
||||||
|
|
||||||
|
@ -606,33 +610,37 @@ alternative output files for the third test, because three different versions
|
||||||
of the French locale have been encountered. The test passes if its output
|
of the French locale have been encountered. The test passes if its output
|
||||||
matches any one of them.
|
matches any one of them.
|
||||||
|
|
||||||
The fourth and fifth tests check UTF and Unicode property support, the fourth
|
Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
|
||||||
being compatible with the perltest.sh script, and the fifth checking
|
with the perltest.sh script, and test 5 checking PCRE2-specific things.
|
||||||
PCRE2-specific things.
|
|
||||||
|
|
||||||
The sixth and seventh tests check the pcre2_dfa_match() alternative matching
|
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
|
||||||
function, in non-UTF mode and UTF-mode with Unicode property support,
|
non-UTF mode and UTF-mode with Unicode property support, respectively.
|
||||||
respectively.
|
|
||||||
|
|
||||||
The eighth test checks some internal offsets and code size features; it is
|
Test 8 checks some internal offsets and code size features; it is run only when
|
||||||
run only when the default "link size" of 2 is set (in other cases the sizes
|
the default "link size" of 2 is set (in other cases the sizes change) and when
|
||||||
change) and when Unicode support is enabled.
|
Unicode support is enabled.
|
||||||
|
|
||||||
The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
|
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
|
||||||
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
|
16-bit and 32-bit modes. These are tests that generate different output in
|
||||||
generate different output in 8-bit mode. Each pair are for general cases and
|
8-bit mode. Each pair are for general cases and Unicode support, respectively.
|
||||||
Unicode support, respectively. The thirteenth test checks the handling of
|
Test 13 checks the handling of non-UTF characters greater than 255 by
|
||||||
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
|
pcre2_dfa_match() in 16-bit and 32-bit modes.
|
||||||
modes.
|
|
||||||
|
|
||||||
The fourteenth test is run only when JIT support is not available, and the
|
Test 14 contains a number of tests that must not be run with JIT. They check,
|
||||||
fifteenth test is run only when JIT support is available. They test some
|
among other non-JIT things, the match-limiting features of the intepretive
|
||||||
JIT-specific features such as information output from pcre2test about JIT
|
matcher.
|
||||||
compilation.
|
|
||||||
|
|
||||||
The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
|
Test 15 is run only when JIT support is not available. It checks that an
|
||||||
POSIX interface to the 8-bit library, without and with Unicode support,
|
attempt to use JIT has the expected behaviour.
|
||||||
respectively.
|
|
||||||
|
Test 16 is run only when JIT support is available. It checks JIT complete and
|
||||||
|
partial modes, match-limiting under JIT, and other JIT-specific features.
|
||||||
|
|
||||||
|
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
|
||||||
|
the 8-bit library, without and with Unicode support, respectively.
|
||||||
|
|
||||||
|
Test 19 checks the serialization functions by writing a set of compiled
|
||||||
|
patterns to a file, and then reloading and checking them.
|
||||||
|
|
||||||
|
|
||||||
Character tables
|
Character tables
|
||||||
|
@ -718,6 +726,7 @@ The distribution should contain the files listed below.
|
||||||
src/pcre2_newline.c )
|
src/pcre2_newline.c )
|
||||||
src/pcre2_ord2utf.c )
|
src/pcre2_ord2utf.c )
|
||||||
src/pcre2_pattern_info.c )
|
src/pcre2_pattern_info.c )
|
||||||
|
src/pcre2_serialize.c )
|
||||||
src/pcre2_string_utils.c )
|
src/pcre2_string_utils.c )
|
||||||
src/pcre2_study.c )
|
src/pcre2_study.c )
|
||||||
src/pcre2_substitute.c )
|
src/pcre2_substitute.c )
|
||||||
|
@ -816,4 +825,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 05 January 2015
|
Last updated: 20 January 2015
|
||||||
|
|
|
@ -65,6 +65,9 @@ first.
|
||||||
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
||||||
<td> Discussion of the pcre2demo program</td></tr>
|
<td> Discussion of the pcre2demo program</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2serialize.html">pcre2serialize</a></td>
|
||||||
|
<td> Serializing functions for saving precompiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
||||||
<td> Discussion of PCRE2's stack usage</td></tr>
|
<td> Discussion of PCRE2's stack usage</td></tr>
|
||||||
|
|
||||||
|
@ -177,6 +180,18 @@ in the library.
|
||||||
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
||||||
<td> Extract information about a pattern</td></tr>
|
<td> Extract information about a pattern</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_decode.html">pcre2_serialize_decode</a></td>
|
||||||
|
<td> Decode serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_encode.html">pcre2_serialize_encode</a></td>
|
||||||
|
<td> Serialize compiled patterns for save/restore</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_free.html">pcre2_serialize_free</a></td>
|
||||||
|
<td> Free serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_get_number_of_codes.html">pcre2_serialize_get_number_of_codes</a></td>
|
||||||
|
<td> Get number of serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
||||||
<td> Set \R convention</td></tr>
|
<td> Set \R convention</td></tr>
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,62 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_decode specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_decode man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||||
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
This function decodes a serialized set of compiled patterns back into a list of
|
||||||
|
individual patterns. Its arguments are:
|
||||||
|
<pre>
|
||||||
|
<i>codes</i> pointer to a vector in which to build the list
|
||||||
|
<i>number_of_codes</i> number of slots in the vector
|
||||||
|
<i>bytes</i> the serialized byte stream
|
||||||
|
<i>gcontext</i> pointer to a general context or NULL
|
||||||
|
</pre>
|
||||||
|
The <i>bytes</i> argument must point to a block of data that was originally
|
||||||
|
created by <b>pcre2_serialize_encode()</b>, though it may have been saved on
|
||||||
|
disc or elsewhere in the meantime. If there are more codes in the serialized
|
||||||
|
data than slots in the list, only those compiled patterns that will fit are
|
||||||
|
decoded. The yield of the function is the number of decoded patterns, or one of
|
||||||
|
the following negative error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA <i>number_of_codes</i> is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in <i>bytes</i>
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL <i>codes</i> or <i>bytes</i> is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -0,0 +1,61 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_encode specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_encode man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||||
|
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
This function encodes a list of compiled patterns into a byte stream that can
|
||||||
|
be saved on disc or elsewhere. Its arguments are:
|
||||||
|
<pre>
|
||||||
|
<i>codes</i> pointer to a vector containing the list
|
||||||
|
<i>number_of_codes</i> number of slots in the vector
|
||||||
|
<i>serialized_bytes</i> set to point to the serialized byte stream
|
||||||
|
<i>serialized_size</i> set to the number of bytes in the byte stream
|
||||||
|
<i>gcontext</i> pointer to a general context or NULL
|
||||||
|
</pre>
|
||||||
|
The context argument is used to obtain memory for the byte stream. When the
|
||||||
|
serialized data is no longer needed, it must be freed by calling
|
||||||
|
<b>pcre2_serialize_free()</b>. The yield of the function is the number of
|
||||||
|
serialized patterns, or one of the following negative error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA <i>number_of_codes</i> is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL an argument other than <i>gcontext</i> is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -0,0 +1,40 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_free specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_free man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
This function frees the memory that was obtained by
|
||||||
|
<b>pcre2_serialize_encode()</b> to hold a serialized byte stream. The argument
|
||||||
|
must point to such a byte stream.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -0,0 +1,49 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_serialize_get_number_of_codes specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_serialize_get_number_of_codes man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
The <i>bytes</i> argument must point to a serialized byte stream that was
|
||||||
|
originally created by <b>pcre2_serialize_encode()</b> (though it may have been
|
||||||
|
saved on disc or elsewhere in the meantime). The function returns the number of
|
||||||
|
serialized patterns in the byte stream, or one of the following negative error
|
||||||
|
codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in <i>bytes</i>
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_NULL the argument is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -21,35 +21,37 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC6" href="#SEC6">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a>
|
<li><a name="TOC6" href="#SEC6">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a>
|
||||||
<li><a name="TOC7" href="#SEC7">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a>
|
<li><a name="TOC7" href="#SEC7">PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION</a>
|
||||||
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
||||||
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
|
||||||
<li><a name="TOC10" href="#SEC10">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
||||||
<li><a name="TOC11" href="#SEC11">PCRE2 API OVERVIEW</a>
|
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||||
<li><a name="TOC12" href="#SEC12">STRING LENGTHS AND OFFSETS</a>
|
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
|
||||||
<li><a name="TOC13" href="#SEC13">NEWLINES</a>
|
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
|
||||||
<li><a name="TOC14" href="#SEC14">MULTITHREADING</a>
|
<li><a name="TOC14" href="#SEC14">NEWLINES</a>
|
||||||
<li><a name="TOC15" href="#SEC15">PCRE2 CONTEXTS</a>
|
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
|
||||||
<li><a name="TOC16" href="#SEC16">CHECKING BUILD-TIME OPTIONS</a>
|
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
|
||||||
<li><a name="TOC17" href="#SEC17">COMPILING A PATTERN</a>
|
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
|
||||||
<li><a name="TOC18" href="#SEC18">COMPILATION ERROR CODES</a>
|
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
|
||||||
<li><a name="TOC19" href="#SEC19">JUST-IN-TIME (JIT) COMPILATION</a>
|
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
|
||||||
<li><a name="TOC20" href="#SEC20">LOCALE SUPPORT</a>
|
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||||
<li><a name="TOC21" href="#SEC21">INFORMATION ABOUT A COMPILED PATTERN</a>
|
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
|
||||||
<li><a name="TOC22" href="#SEC22">THE MATCH DATA BLOCK</a>
|
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||||
<li><a name="TOC23" href="#SEC23">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
<li><a name="TOC23" href="#SEC23">SERIALIZATION AND PRECOMPILING</a>
|
||||||
<li><a name="TOC24" href="#SEC24">NEWLINE HANDLING WHEN MATCHING</a>
|
<li><a name="TOC24" href="#SEC24">THE MATCH DATA BLOCK</a>
|
||||||
<li><a name="TOC25" href="#SEC25">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
<li><a name="TOC25" href="#SEC25">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||||
<li><a name="TOC26" href="#SEC26">OTHER INFORMATION ABOUT A MATCH</a>
|
<li><a name="TOC26" href="#SEC26">NEWLINE HANDLING WHEN MATCHING</a>
|
||||||
<li><a name="TOC27" href="#SEC27">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
<li><a name="TOC27" href="#SEC27">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||||
<li><a name="TOC28" href="#SEC28">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
<li><a name="TOC28" href="#SEC28">OTHER INFORMATION ABOUT A MATCH</a>
|
||||||
<li><a name="TOC29" href="#SEC29">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
<li><a name="TOC29" href="#SEC29">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||||
<li><a name="TOC30" href="#SEC30">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
<li><a name="TOC30" href="#SEC30">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||||
<li><a name="TOC31" href="#SEC31">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
<li><a name="TOC31" href="#SEC31">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||||
<li><a name="TOC32" href="#SEC32">DUPLICATE SUBPATTERN NAMES</a>
|
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||||
<li><a name="TOC33" href="#SEC33">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
<li><a name="TOC33" href="#SEC33">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||||
<li><a name="TOC34" href="#SEC34">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
<li><a name="TOC34" href="#SEC34">DUPLICATE SUBPATTERN NAMES</a>
|
||||||
<li><a name="TOC35" href="#SEC35">SEE ALSO</a>
|
<li><a name="TOC35" href="#SEC35">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||||
<li><a name="TOC36" href="#SEC36">AUTHOR</a>
|
<li><a name="TOC36" href="#SEC36">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||||
<li><a name="TOC37" href="#SEC37">REVISION</a>
|
<li><a name="TOC37" href="#SEC37">SEE ALSO</a>
|
||||||
|
<li><a name="TOC38" href="#SEC38">AUTHOR</a>
|
||||||
|
<li><a name="TOC39" href="#SEC39">REVISION</a>
|
||||||
</ul>
|
</ul>
|
||||||
<P>
|
<P>
|
||||||
<b>#include <pcre2.h></b>
|
<b>#include <pcre2.h></b>
|
||||||
|
@ -260,7 +262,24 @@ document for an overview of all the PCRE2 documentation.
|
||||||
<br>
|
<br>
|
||||||
<b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b>
|
<b>void pcre2_jit_stack_free(pcre2_jit_stack *<i>jit_stack</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
<br><a name="SEC9" href="#TOC1">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a><br>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||||
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||||
|
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC10" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
||||||
|
@ -274,7 +293,7 @@ document for an overview of all the PCRE2 documentation.
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC10" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||||
<P>
|
<P>
|
||||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
||||||
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
||||||
|
@ -335,7 +354,7 @@ In the function summaries above, and in the rest of this document and other
|
||||||
PCRE2 documents, functions and data types are described using their generic
|
PCRE2 documents, functions and data types are described using their generic
|
||||||
names, without the 8, 16, or 32 suffix.
|
names, without the 8, 16, or 32 suffix.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC11" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 has its own native API, which is described in this document. There are
|
PCRE2 has its own native API, which is described in this document. There are
|
||||||
also some wrapper functions for the 8-bit library that correspond to the
|
also some wrapper functions for the 8-bit library that correspond to the
|
||||||
|
@ -426,7 +445,7 @@ Finally, there are functions for finding out information about a compiled
|
||||||
pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
|
pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
|
||||||
PCRE2 was built (<b>pcre2_config()</b>).
|
PCRE2 was built (<b>pcre2_config()</b>).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC12" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The PCRE2 API uses string lengths and offsets into strings of code units in
|
The PCRE2 API uses string lengths and offsets into strings of code units in
|
||||||
several places. These values are always of type PCRE2_SIZE, which is an
|
several places. These values are always of type PCRE2_SIZE, which is an
|
||||||
|
@ -436,7 +455,7 @@ as a special indicator for zero-terminated strings and unset offsets.
|
||||||
Therefore, the longest string that can be handled is one less than this
|
Therefore, the longest string that can be handled is one less than this
|
||||||
maximum.
|
maximum.
|
||||||
<a name="newlines"></a></P>
|
<a name="newlines"></a></P>
|
||||||
<br><a name="SEC13" href="#TOC1">NEWLINES</a><br>
|
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 supports five different conventions for indicating line breaks in
|
PCRE2 supports five different conventions for indicating line breaks in
|
||||||
strings: a single CR (carriage return) character, a single LF (linefeed)
|
strings: a single CR (carriage return) character, a single LF (linefeed)
|
||||||
|
@ -471,7 +490,7 @@ The choice of newline convention does not affect the interpretation of
|
||||||
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
||||||
its own separate convention.
|
its own separate convention.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC14" href="#TOC1">MULTITHREADING</a><br>
|
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
|
||||||
<P>
|
<P>
|
||||||
In a multithreaded application it is important to keep thread-specific data
|
In a multithreaded application it is important to keep thread-specific data
|
||||||
separate from data that can be shared between threads. The PCRE2 library code
|
separate from data that can be shared between threads. The PCRE2 library code
|
||||||
|
@ -516,7 +535,7 @@ storing the results of a match. This includes details of what was matched, as
|
||||||
well as additional information such as the name of a (*MARK) setting. Each
|
well as additional information such as the name of a (*MARK) setting. Each
|
||||||
thread must provide its own version of this memory.
|
thread must provide its own version of this memory.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC15" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
||||||
specialist applications, for example, those that use custom memory management
|
specialist applications, for example, those that use custom memory management
|
||||||
|
@ -797,7 +816,7 @@ exit so that they can be re-used when possible during the match. In the absence
|
||||||
of these functions, the normal custom memory management functions are used, if
|
of these functions, the normal custom memory management functions are used, if
|
||||||
supplied, otherwise the system functions.
|
supplied, otherwise the system functions.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
|
@ -929,7 +948,7 @@ the PCRE2 version string, zero-terminated. The number of code units used is
|
||||||
returned. This is the length of the string plus one unit for the terminating
|
returned. This is the length of the string plus one unit for the terminating
|
||||||
zero.
|
zero.
|
||||||
<a name="compiling"></a></P>
|
<a name="compiling"></a></P>
|
||||||
<br><a name="SEC17" href="#TOC1">COMPILING A PATTERN</a><br>
|
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
||||||
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
||||||
|
@ -1305,7 +1324,7 @@ the behaviour of PCRE2 are given in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
page.
|
page.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC18" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||||
<P>
|
<P>
|
||||||
There are over 80 positive error codes that <b>pcre2_compile()</b> may return if
|
There are over 80 positive error codes that <b>pcre2_compile()</b> may return if
|
||||||
it finds an error in the pattern. There are also some negative error codes that
|
it finds an error in the pattern. There are also some negative error codes that
|
||||||
|
@ -1315,7 +1334,7 @@ are used for invalid UTF strings. These are the same as given by
|
||||||
page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
|
page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
|
||||||
textual error message from any error code.
|
textual error message from any error code.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -1353,7 +1372,7 @@ patterns to be analyzed, and for one-off matches and simple patterns the
|
||||||
benefit of faster execution might be offset by a much slower compilation time.
|
benefit of faster execution might be offset by a much slower compilation time.
|
||||||
Most, but not all patterns can be optimized by the JIT compiler.
|
Most, but not all patterns can be optimized by the JIT compiler.
|
||||||
<a name="localesupport"></a></P>
|
<a name="localesupport"></a></P>
|
||||||
<br><a name="SEC20" href="#TOC1">LOCALE SUPPORT</a><br>
|
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 handles caseless matching, and determines whether characters are letters,
|
PCRE2 handles caseless matching, and determines whether characters are letters,
|
||||||
digits, or whatever, by reference to a set of tables, indexed by character code
|
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||||
|
@ -1409,7 +1428,7 @@ is saved with the compiled pattern, and the same tables are used by
|
||||||
compilation, and matching all happen in the same locale, but different patterns
|
compilation, and matching all happen in the same locale, but different patterns
|
||||||
can be processed in different locales.
|
can be processed in different locales.
|
||||||
<a name="infoaboutpattern"></a></P>
|
<a name="infoaboutpattern"></a></P>
|
||||||
<br><a name="SEC21" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
|
@ -1478,8 +1497,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
|
||||||
PCRE2_INFO_BACKREFMAX
|
PCRE2_INFO_BACKREFMAX
|
||||||
</pre>
|
</pre>
|
||||||
Return the number of the highest back reference in the pattern. The third
|
Return the number of the highest back reference in the pattern. The third
|
||||||
argument should point to an <b>uint32_t</b> variable. Zero is returned if there
|
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
|
||||||
are no back references.
|
numbers as well as names, and these count towards the highest back reference.
|
||||||
|
Back references such as \4 or \g{12} match the captured characters of the
|
||||||
|
given group, but in addition, the check that a capturing group is set in a
|
||||||
|
conditional subpattern such as (?(3)a|b) is also a back reference. Zero is
|
||||||
|
returned if there are no back references.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_BSR
|
PCRE2_INFO_BSR
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1689,14 +1712,24 @@ set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET
|
||||||
PCRE2_INFO_SIZE
|
PCRE2_INFO_SIZE
|
||||||
</pre>
|
</pre>
|
||||||
Return the size of the compiled pattern in bytes (for all three libraries). The
|
Return the size of the compiled pattern in bytes (for all three libraries). The
|
||||||
third argument should point to a <b>size_t</b> variable. This value does not
|
third argument should point to a <b>size_t</b> variable. This value includes the
|
||||||
include the size of the <b>pcre2_code</b> structure that is returned by
|
size of the general data block that precedes the code units of the compiled
|
||||||
<b>pcre_compile()</b>. The value that is used when <b>pcre2_compile()</b> is
|
pattern itself. The value that is used when <b>pcre2_compile()</b> is getting
|
||||||
getting memory in which to place the compiled data is the value returned by
|
memory in which to place the compiled pattern may be slightly larger than the
|
||||||
this option plus the size of the <b>pcre2_code</b> structure. Processing a
|
value returned by this option, because there are cases where the code that
|
||||||
pattern with the JIT compiler does not alter the value returned by this option.
|
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||||
|
compiler does not alter the value returned by this option.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC23" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||||
|
<P>
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. The functions whose names begin
|
||||||
|
with <b>pcre2_serialize_</b> are used for this purpose. They are described in
|
||||||
|
the
|
||||||
|
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||||
|
documentation.
|
||||||
<a name="matchdatablock"></a></P>
|
<a name="matchdatablock"></a></P>
|
||||||
<br><a name="SEC22" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
<br><a name="SEC24" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
<b>pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
||||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
@ -1767,7 +1800,7 @@ match data block (for that match) have taken place.
|
||||||
When a match data block itself is no longer needed, it should be freed by
|
When a match data block itself is no longer needed, it should be freed by
|
||||||
calling <b>pcre2_match_data_free()</b>.
|
calling <b>pcre2_match_data_free()</b>.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC23" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
<br><a name="SEC25" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -1981,7 +2014,7 @@ examples, in the
|
||||||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC24" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
<br><a name="SEC26" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||||
<P>
|
<P>
|
||||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||||
standard convention for the operating system. The default can be overridden in
|
standard convention for the operating system. The default can be overridden in
|
||||||
|
@ -2016,7 +2049,7 @@ LF in the characters that it matches.
|
||||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||||
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
||||||
<a name="matchedstrings"></a></P>
|
<a name="matchedstrings"></a></P>
|
||||||
<br><a name="SEC25" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
<br><a name="SEC27" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -2118,7 +2151,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
|
||||||
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
||||||
had.
|
had.
|
||||||
<a name="matchotherdata"></a></P>
|
<a name="matchotherdata"></a></P>
|
||||||
<br><a name="SEC26" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
<br><a name="SEC28" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -2162,7 +2195,7 @@ the code unit offset of the invalid UTF character. Details are given in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
page.
|
page.
|
||||||
<a name="errorlist"></a></P>
|
<a name="errorlist"></a></P>
|
||||||
<br><a name="SEC27" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
<br><a name="SEC29" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||||
<P>
|
<P>
|
||||||
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
||||||
converted to a text string by calling <b>pcre2_get_error_message()</b>. Negative
|
converted to a text string by calling <b>pcre2_get_error_message()</b>. Negative
|
||||||
|
@ -2271,7 +2304,7 @@ is attempted.
|
||||||
</pre>
|
</pre>
|
||||||
The internal recursion limit was reached.
|
The internal recursion limit was reached.
|
||||||
<a name="extractbynumber"></a></P>
|
<a name="extractbynumber"></a></P>
|
||||||
<br><a name="SEC28" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
<br><a name="SEC30" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
||||||
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
||||||
|
@ -2368,7 +2401,7 @@ The substring did not participate in the match. For example, if the pattern is
|
||||||
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
||||||
capturing slots, substring number 1 is unset.
|
capturing slots, substring number 1 is unset.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC29" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
<br><a name="SEC31" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
||||||
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
||||||
|
@ -2407,7 +2440,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
|
||||||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||||
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
||||||
<a name="extractbyname"></a></P>
|
<a name="extractbyname"></a></P>
|
||||||
<br><a name="SEC30" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
||||||
<b> PCRE2_SPTR <i>name</i>);</b>
|
<b> PCRE2_SPTR <i>name</i>);</b>
|
||||||
|
@ -2467,7 +2500,7 @@ names are not included in the compiled code. The matching process uses only
|
||||||
numbers. For this reason, the use of different names for subpatterns of the
|
numbers. For this reason, the use of different names for subpatterns of the
|
||||||
same number causes an error at compile time.
|
same number causes an error at compile time.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
<br><a name="SEC33" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -2528,7 +2561,7 @@ straight back. PCRE2_ERROR_BADREPLACEMENT is returned for an invalid
|
||||||
replacement string (unrecognized sequence following a dollar sign), and
|
replacement string (unrecognized sequence following a dollar sign), and
|
||||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough.
|
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC32" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
<br><a name="SEC34" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
||||||
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
||||||
|
@ -2573,7 +2606,7 @@ The format of the name table is described above in the section entitled
|
||||||
Given all the relevant entries for the name, you can extract each of their
|
Given all the relevant entries for the name, you can extract each of their
|
||||||
numbers, and hence the captured data.
|
numbers, and hence the captured data.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC33" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
<br><a name="SEC35" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||||
<P>
|
<P>
|
||||||
The traditional matching function uses a similar algorithm to Perl, which stops
|
The traditional matching function uses a similar algorithm to Perl, which stops
|
||||||
when it finds the first match at a given point in the subject. If you want to
|
when it finds the first match at a given point in the subject. If you want to
|
||||||
|
@ -2591,7 +2624,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
|
||||||
other alternatives. Ultimately, when it runs out of matches,
|
other alternatives. Ultimately, when it runs out of matches,
|
||||||
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
||||||
<a name="dfamatch"></a></P>
|
<a name="dfamatch"></a></P>
|
||||||
<br><a name="SEC34" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
<br><a name="SEC36" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -2786,13 +2819,13 @@ some plausibility checks are made on the contents of the workspace, which
|
||||||
should contain data about the previous partial match. If any of these checks
|
should contain data about the previous partial match. If any of these checks
|
||||||
fail, this error is given.
|
fail, this error is given.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC35" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC37" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
||||||
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
||||||
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC36" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC38" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
|
@ -2801,9 +2834,9 @@ University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC37" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC39" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -0,0 +1,184 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2serialize specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2serialize man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<ul>
|
||||||
|
<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
|
||||||
|
<li><a name="TOC2" href="#SEC2">SAVING COMPILED PATTERNS</a>
|
||||||
|
<li><a name="TOC3" href="#SEC3">RE-USING PRECOMPILED PATTERNS</a>
|
||||||
|
<li><a name="TOC4" href="#SEC4">AUTHOR</a>
|
||||||
|
<li><a name="TOC5" href="#SEC5">REVISION</a>
|
||||||
|
</ul>
|
||||||
|
<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
<b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
|
||||||
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
|
||||||
|
<b> int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
|
||||||
|
<b> PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>void pcre2_serialize_free(uint8_t *<i>bytes</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int32_t pcre2_serialize_get_number_of_codes(const uint8_t *<i>bytes</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
If you are running an application that uses a large number of regular
|
||||||
|
expression patterns, it may be useful to store them in a precompiled form
|
||||||
|
instead of having to compile them every time the application is run. However,
|
||||||
|
if you are using the just-in-time optimization feature, it is not possible to
|
||||||
|
save and reload the JIT data, because it is position-dependent. In addition,
|
||||||
|
the host on which the patterns are reloaded must be running the same version of
|
||||||
|
PCRE2, with the same code unit width, and must also have the same endianness,
|
||||||
|
pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
|
||||||
|
system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
|
||||||
|
can they be reloaded using the 8-bit library.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC2" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
Before compiled patterns can be saved they must be serialized, that is,
|
||||||
|
converted to a stream of bytes. A single byte stream may contain any number of
|
||||||
|
compiled patterns, but they must all use the same character tables. A single
|
||||||
|
copy of the tables is included in the byte stream (its size is 1088 bytes). For
|
||||||
|
more details of character tables, see the
|
||||||
|
<a href="pcre2api.html#localesupport">section on locale support</a>
|
||||||
|
in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
documentation.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The function <b>pcre2_serialize_encode()</b> creates a serialized byte stream
|
||||||
|
from a list of compiled patterns. Its first two arguments specify the list,
|
||||||
|
being a pointer to a vector of pointers to compiled patterns, and the length of
|
||||||
|
the vector. The third and fourth arguments point to variables which are set to
|
||||||
|
point to the created byte stream and its length, respectively. The final
|
||||||
|
argument is a pointer to a general context, which can be used to specify custom
|
||||||
|
memory mangagement functions. If this argument is NULL, <b>malloc()</b> is used
|
||||||
|
to obtain memory for the byte stream. The yield of the function is the number
|
||||||
|
of serialized patterns, or one of the following negative error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA the number of patterns is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Once a set of patterns has been serialized you can save the data in any
|
||||||
|
appropriate manner. Here is sample code that compiles two patterns and writes
|
||||||
|
them to a file. It assumes that the variable <i>fd</i> refers to a file that is
|
||||||
|
open for output. The error checking that should be present in a real
|
||||||
|
application has been omitted for simplicity.
|
||||||
|
<pre>
|
||||||
|
int errorcode;
|
||||||
|
uint8_t *bytes;
|
||||||
|
PCRE2_SIZE erroroffset;
|
||||||
|
PCRE2_SIZE bytescount;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
list_of_codes[0] = pcre2_compile("first pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
list_of_codes[1] = pcre2_compile("second pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
|
||||||
|
&bytescount, NULL);
|
||||||
|
errorcode = fwrite(bytes, 1, bytescount, fd);
|
||||||
|
</pre>
|
||||||
|
Note that the serialized data is binary data that may contain any of the 256
|
||||||
|
possible byte values. On systems that make a distinction between binary and
|
||||||
|
non-binary data, be sure that the file is opened for binary output.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Serializing a set of patterns leaves the original data untouched, so they can
|
||||||
|
still be used for matching. Their memory must eventually be freed in the usual
|
||||||
|
way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
|
||||||
|
stream, it too must be freed by calling <b>pcre2_serialize_free()</b>.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC3" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
In order to re-use a set of saved patterns you must first make the serialized
|
||||||
|
byte stream available in main memory (for example, by reading from a file). The
|
||||||
|
management of this memory block is up to the application. You can use the
|
||||||
|
<b>pcre2_serialize_get_number_of_codes()</b> function to find out how many
|
||||||
|
compiled patterns are in the serialized data without actually decoding the
|
||||||
|
patterns:
|
||||||
|
<pre>
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
|
||||||
|
</pre>
|
||||||
|
The <b>pcre2_serialize_decode()</b> function reads a byte stream and recreates
|
||||||
|
the compiled patterns in new memory blocks, setting pointers to them in a
|
||||||
|
vector. The first two arguments are a pointer to a suitable vector and its
|
||||||
|
length, and the third argument points to a byte stream. The final argument is a
|
||||||
|
pointer to a general context, which can be used to specify custom memory
|
||||||
|
mangagement functions for the decoded patterns. If this argument is NULL,
|
||||||
|
<b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
|
||||||
|
stream is no longer needed and can be discarded.
|
||||||
|
<pre>
|
||||||
|
int32_t number_of_codes;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes =
|
||||||
|
pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
|
||||||
|
</pre>
|
||||||
|
If the vector is not large enough for all the patterns in the byte stream, it
|
||||||
|
is filled with those that fit, and the remainder are ignored. The yield of the
|
||||||
|
function is the number of decoded patterns, or one of the following negative
|
||||||
|
error codes:
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_BADDATA second argument is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE2 version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL first or third argument is NULL
|
||||||
|
</pre>
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Decoded patterns can be used for matching in the usual way, and must be freed
|
||||||
|
by calling <b>pcre2_code_free()</b> as normal. A single copy of the character
|
||||||
|
tables is used by all the decoded patterns. A reference count is used to
|
||||||
|
arrange for its memory to be automatically freed when the last pattern is
|
||||||
|
freed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If a pattern was processed by <b>pcre2_jit_compile()</b> before being
|
||||||
|
serialized, the JIT data is discarded and so is no longer available after a
|
||||||
|
save/restore cycle. You can, however, process a restored pattern with
|
||||||
|
<b>pcre2_jit_compile()</b> if you wish.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
|
||||||
|
<P>
|
||||||
|
Philip Hazel
|
||||||
|
<br>
|
||||||
|
University Computing Service
|
||||||
|
<br>
|
||||||
|
Cambridge, England.
|
||||||
|
<br>
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
||||||
|
<P>
|
||||||
|
Last updated: 20 January 2015
|
||||||
|
<br>
|
||||||
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -30,9 +30,10 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a>
|
<li><a name="TOC15" href="#SEC15">RESTARTING AFTER A PARTIAL MATCH</a>
|
||||||
<li><a name="TOC16" href="#SEC16">CALLOUTS</a>
|
<li><a name="TOC16" href="#SEC16">CALLOUTS</a>
|
||||||
<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a>
|
<li><a name="TOC17" href="#SEC17">NON-PRINTING CHARACTERS</a>
|
||||||
<li><a name="TOC18" href="#SEC18">SEE ALSO</a>
|
<li><a name="TOC18" href="#SEC18">SAVING AND RESTORING COMPILED PATTERNS</a>
|
||||||
<li><a name="TOC19" href="#SEC19">AUTHOR</a>
|
<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
|
||||||
<li><a name="TOC20" href="#SEC20">REVISION</a>
|
<li><a name="TOC20" href="#SEC20">AUTHOR</a>
|
||||||
|
<li><a name="TOC21" href="#SEC21">REVISION</a>
|
||||||
</ul>
|
</ul>
|
||||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -51,10 +52,11 @@ documentation.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The input for <b>pcre2test</b> is a sequence of regular expression patterns and
|
The input for <b>pcre2test</b> is a sequence of regular expression patterns and
|
||||||
subject strings to be matched. The output shows the result of each match
|
subject strings to be matched. There are also command lines for setting
|
||||||
attempt. Modifiers on the command line, the patterns, and the subject lines
|
defaults and controlling some special actions. The output shows the result of
|
||||||
specify PCRE2 function options, control how the subject is processed, and what
|
each match attempt. Modifiers on external or internal command lines, the
|
||||||
output is produced.
|
patterns, and the subject lines specify PCRE2 function options, control how the
|
||||||
|
subject is processed, and what output is produced.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
As the original fairly simple PCRE library evolved, it acquired many different
|
As the original fairly simple PCRE library evolved, it acquired many different
|
||||||
|
@ -227,9 +229,7 @@ If <b>pcre2test</b> is given two filename arguments, it reads from the first and
|
||||||
writes to the second. If the first name is "-", input is taken from the
|
writes to the second. If the first name is "-", input is taken from the
|
||||||
standard input. If <b>pcre2test</b> is given only one argument, it reads from
|
standard input. If <b>pcre2test</b> is given only one argument, it reads from
|
||||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||||
stdout. When the input is a terminal, it prompts for each line of input, using
|
stdout.
|
||||||
"re>" to prompt for regular expression patterns, and "data>" to prompt for
|
|
||||||
subject lines.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When <b>pcre2test</b> is built, a configuration option can specify that it
|
When <b>pcre2test</b> is built, a configuration option can specify that it
|
||||||
|
@ -242,10 +242,16 @@ the <b>-help</b> option states whether or not <b>readline()</b> will be used.
|
||||||
The program handles any number of tests, each of which consists of a set of
|
The program handles any number of tests, each of which consists of a set of
|
||||||
input lines. Each set starts with a regular expression pattern, followed by any
|
input lines. Each set starts with a regular expression pattern, followed by any
|
||||||
number of subject lines to be matched against that pattern. In between sets of
|
number of subject lines to be matched against that pattern. In between sets of
|
||||||
test data, command lines that begin with a hash (#) character may appear. This
|
test data, command lines that begin with # may appear. This file format, with
|
||||||
file format, with some restrictions, can also be processed by the
|
some restrictions, can also be processed by the <b>perltest.sh</b> script that
|
||||||
<b>perltest.sh</b> script that is distributed with PCRE2 as a means of checking
|
is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
|
||||||
that the behaviour of PCRE2 and Perl is the same.
|
and Perl is the same.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
When the input is a terminal, <b>pcre2test</b> prompts for each line of input,
|
||||||
|
using "re>" to prompt for regular expression patterns, and "data>" to prompt
|
||||||
|
for subject lines. Command lines starting with # can be entered only in
|
||||||
|
response to the "re>" prompt.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Each subject line is matched separately and independently. If you want to do
|
Each subject line is matched separately and independently. If you want to do
|
||||||
|
@ -263,21 +269,27 @@ still input to be read.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br>
|
<br><a name="SEC6" href="#TOC1">COMMAND LINES</a><br>
|
||||||
<P>
|
<P>
|
||||||
In between sets of test data, a line that begins with a hash (#) character is
|
In between sets of test data, a line that begins with # is interpreted as a
|
||||||
interpreted as a command line. If the first character is followed by white
|
command line. If the first character is followed by white space or an
|
||||||
space or an exclamation mark, the line is treated as a comment, and ignored.
|
exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
|
||||||
Otherwise, the following commands are recognized:
|
following commands are recognized:
|
||||||
<pre>
|
<pre>
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
</pre>
|
</pre>
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
||||||
options set, which locks out the use of UTF and Unicode property features. This
|
options set, which locks out the use of UTF and Unicode property features. This
|
||||||
is a trigger guard that is used in test files to ensure that UTF/Unicode tests
|
is a trigger guard that is used in test files to ensure that UTF or Unicode
|
||||||
are not accidentally added to files that are used when UTF support is not
|
property tests are not accidentally added to files that are used when Unicode
|
||||||
included in the library. This effect can also be obtained by the use of
|
support is not included in the library. This effect can also be obtained by the
|
||||||
<b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be unset, and
|
use of <b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be
|
||||||
the automatic options are not displayed in pattern information, to avoid
|
unset, and the automatic options are not displayed in pattern information, to
|
||||||
cluttering up test output.
|
avoid cluttering up test output.
|
||||||
|
<pre>
|
||||||
|
#load <filename>
|
||||||
|
</pre>
|
||||||
|
This command is used to load a set of precompiled patterns from a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
<pre>
|
<pre>
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -293,6 +305,18 @@ lines, none of the other command lines are permitted, because they and many
|
||||||
of the modifiers are specific to <b>pcre2test</b>, and should not be used in
|
of the modifiers are specific to <b>pcre2test</b>, and should not be used in
|
||||||
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
||||||
command helps detect tests that are accidentally put in the wrong file.
|
command helps detect tests that are accidentally put in the wrong file.
|
||||||
|
<pre>
|
||||||
|
#pop [<modifiers>]
|
||||||
|
</pre>
|
||||||
|
This command is used to manipulate the stack of compiled patterns, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
|
<pre>
|
||||||
|
#save <filename>
|
||||||
|
</pre>
|
||||||
|
This command is used to save a set of compiled patterns to a file, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
<pre>
|
<pre>
|
||||||
#subject <modifier-list>
|
#subject <modifier-list>
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -428,7 +452,7 @@ There are three types of modifier that can appear in pattern lines, two of
|
||||||
which may also be used in a <b>#pattern</b> command. A pattern's modifier list
|
which may also be used in a <b>#pattern</b> command. A pattern's modifier list
|
||||||
can add to or override default modifiers that were set by a previous
|
can add to or override default modifiers that were set by a previous
|
||||||
<b>#pattern</b> command.
|
<b>#pattern</b> command.
|
||||||
</P>
|
<a name="optionmodifiers"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting compilation options
|
Setting compilation options
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -465,7 +489,7 @@ As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
|
||||||
non-printing characters in output strings to be printed using the \x{hh...}
|
non-printing characters in output strings to be printed using the \x{hh...}
|
||||||
notation. Otherwise, those less than 0x100 are output in hex without the curly
|
notation. Otherwise, those less than 0x100 are output in hex without the curly
|
||||||
brackets.
|
brackets.
|
||||||
</P>
|
<a name="controlmodifiers"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting compilation controls
|
Setting compilation controls
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -486,8 +510,8 @@ about the pattern:
|
||||||
memory show memory used
|
memory show memory used
|
||||||
newline=<type> set newline type
|
newline=<type> set newline type
|
||||||
parens_nest_limit=<n> set maximum parentheses depth
|
parens_nest_limit=<n> set maximum parentheses depth
|
||||||
perlcompat lock out non-Perl modifiers
|
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
|
push push compiled pattern onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -726,6 +750,22 @@ not affect the compilation process.
|
||||||
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
||||||
defaults, set them in a <b>#subject</b> command.
|
defaults, set them in a <b>#subject</b> command.
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Saving a compiled pattern
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
When a pattern with the <b>push</b> modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
|
||||||
|
line to contain a new pattern (or a command) instead of a subject line. This
|
||||||
|
facility is used when saving compiled patterns to a file, as described in the
|
||||||
|
section entitled "Saving and restoring compiled patterns"
|
||||||
|
<a href="#saverestore">below.</a>
|
||||||
|
The <b>push</b> modifier is incompatible with compilation modifiers such as
|
||||||
|
<b>global</b> that act at match time. Any that are specified are ignored, with a
|
||||||
|
warning message, except for <b>replace</b>, which causes an error. Note that,
|
||||||
|
<b>jitverify</b>, which is allowed, does not carry through to any subsequent
|
||||||
|
matching that uses this pattern.
|
||||||
|
</P>
|
||||||
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The modifiers that can appear in subject lines and the <b>#subject</b>
|
The modifiers that can appear in subject lines and the <b>#subject</b>
|
||||||
|
@ -1292,14 +1332,75 @@ string, it behaves in the same way, unless a different locale has been set for
|
||||||
the pattern (using the <b>/locale</b> modifier). In this case, the
|
the pattern (using the <b>/locale</b> modifier). In this case, the
|
||||||
<b>isprint()</b> function is used to distinguish printing and non-printing
|
<b>isprint()</b> function is used to distinguish printing and non-printing
|
||||||
characters.
|
characters.
|
||||||
|
<a name="saverestore"></a></P>
|
||||||
|
<br><a name="SEC18" href="#TOC1">SAVING AND RESTORING COMPILED PATTERNS</a><br>
|
||||||
|
<P>
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. JIT data cannot be saved. The host
|
||||||
|
on which the patterns are reloaded must be running the same version of PCRE2,
|
||||||
|
with the same code unit width, and must also have the same endianness, pointer
|
||||||
|
width and PCRE2_SIZE type. Before compiled patterns can be saved they must be
|
||||||
|
serialized, that is, converted to a stream of bytes. A single byte stream may
|
||||||
|
contain any number of compiled patterns, but they must all use the same
|
||||||
|
character tables. A single copy of the tables is included in the byte stream
|
||||||
|
(its size is 1088 bytes).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC18" href="#TOC1">SEE ALSO</a><br>
|
<P>
|
||||||
|
The functions whose names begin with <b>pcre2_serialize_</b> are used
|
||||||
|
for serializing and de-serializing. They are described in the
|
||||||
|
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||||
|
documentation. In this section we describe the features of <b>pcre2test</b> that
|
||||||
|
can be used to test these functions.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
When a pattern with <b>push</b> modifier is successfully compiled, it is pushed
|
||||||
|
onto a stack of compiled patterns, and <b>pcre2test</b> expects the next line to
|
||||||
|
contain a new pattern (or command) instead of a subject line. By this means, a
|
||||||
|
number of patterns can be compiled and retained. The <b>push</b> modifier is
|
||||||
|
incompatible with <b>posix</b>, and control modifiers that act at match time are
|
||||||
|
ignored (with a message). The <b>jitverify</b> modifier applies only at compile
|
||||||
|
time. The command
|
||||||
|
<pre>
|
||||||
|
#save <filename>
|
||||||
|
</pre>
|
||||||
|
causes all the stacked patterns to be serialized and the result written to the
|
||||||
|
named file. Afterwards, all the stacked patterns are freed. The command
|
||||||
|
<pre>
|
||||||
|
#load <filename>
|
||||||
|
</pre>
|
||||||
|
reads the data in the file, and then arranges for it to be de-serialized, with
|
||||||
|
the resulting compiled patterns added to the pattern stack. The pattern on the
|
||||||
|
top of the stack can be retrieved by the #pop command, which must be followed
|
||||||
|
by lines of subjects that are to be matched with the pattern, terminated as
|
||||||
|
usual by an empty line or end of file. This command may be followed by a
|
||||||
|
modifier list containing only
|
||||||
|
<a href="#controlmodifiers">control modifiers</a>
|
||||||
|
that act after a pattern has been compiled. In particular, <b>hex</b>,
|
||||||
|
<b>posix</b>, and <b>push</b> are not allowed, nor are any
|
||||||
|
<a href="#optionmodifiers">option-setting modifiers.</a>
|
||||||
|
The JIT modifiers are, however permitted. Here is an example that saves and
|
||||||
|
reloads two patterns.
|
||||||
|
<pre>
|
||||||
|
/abc/push
|
||||||
|
/xyz/push
|
||||||
|
#save tempfile
|
||||||
|
#load tempfile
|
||||||
|
#pop info
|
||||||
|
xyz
|
||||||
|
|
||||||
|
#pop jit,bincode
|
||||||
|
abc
|
||||||
|
</pre>
|
||||||
|
If <b>jitverify</b> is used with #pop, it does not automatically imply
|
||||||
|
<b>jit</b>, which is different behaviour from when it is used on a pattern.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
||||||
<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d),
|
<b>pcre2jit</b>, <b>pcre2matching</b>(3), <b>pcre2partial</b>(d),
|
||||||
<b>pcre2pattern</b>(3).
|
<b>pcre2pattern</b>(3), <b>pcre2serialize</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
|
@ -1308,9 +1409,9 @@ University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC20" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -65,6 +65,9 @@ first.
|
||||||
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
<tr><td><a href="pcre2sample.html">pcre2sample</a></td>
|
||||||
<td> Discussion of the pcre2demo program</td></tr>
|
<td> Discussion of the pcre2demo program</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2serialize.html">pcre2serialize</a></td>
|
||||||
|
<td> Serializing functions for saving precompiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
<tr><td><a href="pcre2stack.html">pcre2stack</a></td>
|
||||||
<td> Discussion of PCRE2's stack usage</td></tr>
|
<td> Discussion of PCRE2's stack usage</td></tr>
|
||||||
|
|
||||||
|
@ -177,6 +180,18 @@ in the library.
|
||||||
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
<tr><td><a href="pcre2_pattern_info.html">pcre2_pattern_info</a></td>
|
||||||
<td> Extract information about a pattern</td></tr>
|
<td> Extract information about a pattern</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_decode.html">pcre2_serialize_decode</a></td>
|
||||||
|
<td> Decode serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_encode.html">pcre2_serialize_encode</a></td>
|
||||||
|
<td> Serialize compiled patterns for save/restore</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_free.html">pcre2_serialize_free</a></td>
|
||||||
|
<td> Free serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_serialize_get_number_of_codes.html">pcre2_serialize_get_number_of_codes</a></td>
|
||||||
|
<td> Get number of serialized compiled patterns</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
<tr><td><a href="pcre2_set_bsr.html">pcre2_set_bsr</a></td>
|
||||||
<td> Set \R convention</td></tr>
|
<td> Set \R convention</td></tr>
|
||||||
|
|
||||||
|
|
971
doc/pcre2.txt
971
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,50 @@
|
||||||
|
.TH PCRE2_SERIALIZE_DECODE 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||||
|
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
This function decodes a serialized set of compiled patterns back into a list of
|
||||||
|
individual patterns. Its arguments are:
|
||||||
|
.sp
|
||||||
|
\fIcodes\fP pointer to a vector in which to build the list
|
||||||
|
\fInumber_of_codes\fP number of slots in the vector
|
||||||
|
\fIbytes\fP the serialized byte stream
|
||||||
|
\fIgcontext\fP pointer to a general context or NULL
|
||||||
|
.sp
|
||||||
|
The \fIbytes\fP argument must point to a block of data that was originally
|
||||||
|
created by \fBpcre2_serialize_encode()\fP, though it may have been saved on
|
||||||
|
disc or elsewhere in the meantime. If there are more codes in the serialized
|
||||||
|
data than slots in the list, only those compiled patterns that will fit are
|
||||||
|
decoded. The yield of the function is the number of decoded patterns, or one of
|
||||||
|
the following negative error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL \fIcodes\fP or \fIbytes\fP is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -0,0 +1,49 @@
|
||||||
|
.TH PCRE2_SERIALIZE_ENCODE 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||||
|
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
This function encodes a list of compiled patterns into a byte stream that can
|
||||||
|
be saved on disc or elsewhere. Its arguments are:
|
||||||
|
.sp
|
||||||
|
\fIcodes\fP pointer to a vector containing the list
|
||||||
|
\fInumber_of_codes\fP number of slots in the vector
|
||||||
|
\fIserialized_bytes\fP set to point to the serialized byte stream
|
||||||
|
\fIserialized_size\fP set to the number of bytes in the byte stream
|
||||||
|
\fIgcontext\fP pointer to a general context or NULL
|
||||||
|
.sp
|
||||||
|
The context argument is used to obtain memory for the byte stream. When the
|
||||||
|
serialized data is no longer needed, it must be freed by calling
|
||||||
|
\fBpcre2_serialize_free()\fP. The yield of the function is the number of
|
||||||
|
serialized patterns, or one of the following negative error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL an argument other than \fIgcontext\fP is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -0,0 +1,28 @@
|
||||||
|
.TH PCRE2_SERIALIZE_FREE 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
This function frees the memory that was obtained by
|
||||||
|
\fBpcre2_serialize_encode()\fP to hold a serialized byte stream. The argument
|
||||||
|
must point to such a byte stream.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -0,0 +1,37 @@
|
||||||
|
.TH PCRE2_SERIALIZE_GET_NUMBER_OF_CODES 3 "19 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH SYNOPSIS
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.B #include <pcre2.h>
|
||||||
|
.PP
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.SH DESCRIPTION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
The \fIbytes\fP argument must point to a serialized byte stream that was
|
||||||
|
originally created by \fBpcre2_serialize_encode()\fP (though it may have been
|
||||||
|
saved on disc or elsewhere in the meantime). The function returns the number of
|
||||||
|
serialized patterns in the byte stream, or one of the following negative error
|
||||||
|
codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version
|
||||||
|
PCRE2_ERROR_NULL the argument is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
.P
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2posix\fP
|
||||||
|
.\"
|
||||||
|
page.
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "13 January 2015" "PCRE2 10.10"
|
.TH PCRE2API 3 "23 January 2015" "PCRE2 10.10"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -205,6 +205,24 @@ document for an overview of all the PCRE2 documentation.
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SH "PCRE2 NATIVE API SERIALIZATION FUNCTIONS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||||
|
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||||
|
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "PCRE2 NATIVE API AUXILIARY FUNCTIONS"
|
.SH "PCRE2 NATIVE API AUXILIARY FUNCTIONS"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -1689,12 +1707,26 @@ set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
||||||
PCRE2_INFO_SIZE
|
PCRE2_INFO_SIZE
|
||||||
.sp
|
.sp
|
||||||
Return the size of the compiled pattern in bytes (for all three libraries). The
|
Return the size of the compiled pattern in bytes (for all three libraries). The
|
||||||
third argument should point to a \fBsize_t\fP variable. This value does not
|
third argument should point to a \fBsize_t\fP variable. This value includes the
|
||||||
include the size of the \fBpcre2_code\fP structure that is returned by
|
size of the general data block that precedes the code units of the compiled
|
||||||
\fBpcre_compile()\fP. The value that is used when \fBpcre2_compile()\fP is
|
pattern itself. The value that is used when \fBpcre2_compile()\fP is getting
|
||||||
getting memory in which to place the compiled data is the value returned by
|
memory in which to place the compiled pattern may be slightly larger than the
|
||||||
this option plus the size of the \fBpcre2_code\fP structure. Processing a
|
value returned by this option, because there are cases where the code that
|
||||||
pattern with the JIT compiler does not alter the value returned by this option.
|
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||||
|
compiler does not alter the value returned by this option.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH "SERIALIZATION AND PRECOMPILING"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. The functions whose names begin
|
||||||
|
with \fBpcre2_serialize_\fP are used for this purpose. They are described in
|
||||||
|
the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2serialize\fP
|
||||||
|
.\"
|
||||||
|
documentation.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="matchdatablock"></a>
|
.\" HTML <a name="matchdatablock"></a>
|
||||||
|
@ -2853,6 +2885,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 13 January 2015
|
Last updated: 23 January 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -0,0 +1,170 @@
|
||||||
|
.TH PCRE2SERIALIZE 3 "20 January 2015" "PCRE2 10.10"
|
||||||
|
.SH NAME
|
||||||
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
|
.SH "SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
|
||||||
|
.B " pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
|
||||||
|
.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
|
||||||
|
.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
|
||||||
|
.sp
|
||||||
|
.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
|
||||||
|
.sp
|
||||||
|
.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
|
||||||
|
.fi
|
||||||
|
.sp
|
||||||
|
If you are running an application that uses a large number of regular
|
||||||
|
expression patterns, it may be useful to store them in a precompiled form
|
||||||
|
instead of having to compile them every time the application is run. However,
|
||||||
|
if you are using the just-in-time optimization feature, it is not possible to
|
||||||
|
save and reload the JIT data, because it is position-dependent. In addition,
|
||||||
|
the host on which the patterns are reloaded must be running the same version of
|
||||||
|
PCRE2, with the same code unit width, and must also have the same endianness,
|
||||||
|
pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
|
||||||
|
system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
|
||||||
|
can they be reloaded using the 8-bit library.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH "SAVING COMPILED PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
Before compiled patterns can be saved they must be serialized, that is,
|
||||||
|
converted to a stream of bytes. A single byte stream may contain any number of
|
||||||
|
compiled patterns, but they must all use the same character tables. A single
|
||||||
|
copy of the tables is included in the byte stream (its size is 1088 bytes). For
|
||||||
|
more details of character tables, see the
|
||||||
|
.\" HTML <a href="pcre2api.html#localesupport">
|
||||||
|
.\" </a>
|
||||||
|
section on locale support
|
||||||
|
.\"
|
||||||
|
in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
documentation.
|
||||||
|
.P
|
||||||
|
The function \fBpcre2_serialize_encode()\fP creates a serialized byte stream
|
||||||
|
from a list of compiled patterns. Its first two arguments specify the list,
|
||||||
|
being a pointer to a vector of pointers to compiled patterns, and the length of
|
||||||
|
the vector. The third and fourth arguments point to variables which are set to
|
||||||
|
point to the created byte stream and its length, respectively. The final
|
||||||
|
argument is a pointer to a general context, which can be used to specify custom
|
||||||
|
memory mangagement functions. If this argument is NULL, \fBmalloc()\fP is used
|
||||||
|
to obtain memory for the byte stream. The yield of the function is the number
|
||||||
|
of serialized patterns, or one of the following negative error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA the number of patterns is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
|
||||||
|
PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
|
||||||
|
that a slot in the vector does not point to a compiled pattern.
|
||||||
|
.P
|
||||||
|
Once a set of patterns has been serialized you can save the data in any
|
||||||
|
appropriate manner. Here is sample code that compiles two patterns and writes
|
||||||
|
them to a file. It assumes that the variable \fIfd\fP refers to a file that is
|
||||||
|
open for output. The error checking that should be present in a real
|
||||||
|
application has been omitted for simplicity.
|
||||||
|
.sp
|
||||||
|
int errorcode;
|
||||||
|
uint8_t *bytes;
|
||||||
|
PCRE2_SIZE erroroffset;
|
||||||
|
PCRE2_SIZE bytescount;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
list_of_codes[0] = pcre2_compile("first pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
list_of_codes[1] = pcre2_compile("second pattern",
|
||||||
|
PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
|
||||||
|
errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
|
||||||
|
&bytescount, NULL);
|
||||||
|
errorcode = fwrite(bytes, 1, bytescount, fd);
|
||||||
|
.sp
|
||||||
|
Note that the serialized data is binary data that may contain any of the 256
|
||||||
|
possible byte values. On systems that make a distinction between binary and
|
||||||
|
non-binary data, be sure that the file is opened for binary output.
|
||||||
|
.P
|
||||||
|
Serializing a set of patterns leaves the original data untouched, so they can
|
||||||
|
still be used for matching. Their memory must eventually be freed in the usual
|
||||||
|
way by calling \fBpcre2_code_free()\fP. When you have finished with the byte
|
||||||
|
stream, it too must be freed by calling \fBpcre2_serialize_free()\fP.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH "RE-USING PRECOMPILED PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
In order to re-use a set of saved patterns you must first make the serialized
|
||||||
|
byte stream available in main memory (for example, by reading from a file). The
|
||||||
|
management of this memory block is up to the application. You can use the
|
||||||
|
\fBpcre2_serialize_get_number_of_codes()\fP function to find out how many
|
||||||
|
compiled patterns are in the serialized data without actually decoding the
|
||||||
|
patterns:
|
||||||
|
.sp
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
|
||||||
|
.sp
|
||||||
|
The \fBpcre2_serialize_decode()\fP function reads a byte stream and recreates
|
||||||
|
the compiled patterns in new memory blocks, setting pointers to them in a
|
||||||
|
vector. The first two arguments are a pointer to a suitable vector and its
|
||||||
|
length, and the third argument points to a byte stream. The final argument is a
|
||||||
|
pointer to a general context, which can be used to specify custom memory
|
||||||
|
mangagement functions for the decoded patterns. If this argument is NULL,
|
||||||
|
\fBmalloc()\fP and \fBfree()\fP are used. After deserialization, the byte
|
||||||
|
stream is no longer needed and can be discarded.
|
||||||
|
.sp
|
||||||
|
int32_t number_of_codes;
|
||||||
|
pcre2_code *list_of_codes[2];
|
||||||
|
uint8_t *bytes = <serialized data>;
|
||||||
|
int32_t number_of_codes =
|
||||||
|
pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
|
||||||
|
.sp
|
||||||
|
If the vector is not large enough for all the patterns in the byte stream, it
|
||||||
|
is filled with those that fit, and the remainder are ignored. The yield of the
|
||||||
|
function is the number of decoded patterns, or one of the following negative
|
||||||
|
error codes:
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADDATA second argument is zero or less
|
||||||
|
PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
|
||||||
|
PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE2 version
|
||||||
|
PCRE2_ERROR_MEMORY memory allocation failed
|
||||||
|
PCRE2_ERROR_NULL first or third argument is NULL
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
|
||||||
|
on a system with different endianness.
|
||||||
|
.P
|
||||||
|
Decoded patterns can be used for matching in the usual way, and must be freed
|
||||||
|
by calling \fBpcre2_code_free()\fP as normal. A single copy of the character
|
||||||
|
tables is used by all the decoded patterns. A reference count is used to
|
||||||
|
arrange for its memory to be automatically freed when the last pattern is
|
||||||
|
freed.
|
||||||
|
.P
|
||||||
|
If a pattern was processed by \fBpcre2_jit_compile()\fP before being
|
||||||
|
serialized, the JIT data is discarded and so is no longer available after a
|
||||||
|
save/restore cycle. You can, however, process a restored pattern with
|
||||||
|
\fBpcre2_jit_compile()\fP if you wish.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH AUTHOR
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
Philip Hazel
|
||||||
|
University Computing Service
|
||||||
|
Cambridge, England.
|
||||||
|
.fi
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SH REVISION
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
Last updated: 20 January 2015
|
||||||
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
|
.fi
|
173
doc/pcre2test.1
173
doc/pcre2test.1
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "02 January 2015" "PCRE 10.00"
|
.TH PCRE2TEST 1 "23 January 2015" "PCRE 10.10"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -21,10 +21,11 @@ options, see the
|
||||||
documentation.
|
documentation.
|
||||||
.P
|
.P
|
||||||
The input for \fBpcre2test\fP is a sequence of regular expression patterns and
|
The input for \fBpcre2test\fP is a sequence of regular expression patterns and
|
||||||
subject strings to be matched. The output shows the result of each match
|
subject strings to be matched. There are also command lines for setting
|
||||||
attempt. Modifiers on the command line, the patterns, and the subject lines
|
defaults and controlling some special actions. The output shows the result of
|
||||||
specify PCRE2 function options, control how the subject is processed, and what
|
each match attempt. Modifiers on external or internal command lines, the
|
||||||
output is produced.
|
patterns, and the subject lines specify PCRE2 function options, control how the
|
||||||
|
subject is processed, and what output is produced.
|
||||||
.P
|
.P
|
||||||
As the original fairly simple PCRE library evolved, it acquired many different
|
As the original fairly simple PCRE library evolved, it acquired many different
|
||||||
features, and as a result, the original \fBpcretest\fP program ended up with a
|
features, and as a result, the original \fBpcretest\fP program ended up with a
|
||||||
|
@ -185,9 +186,7 @@ If \fBpcre2test\fP is given two filename arguments, it reads from the first and
|
||||||
writes to the second. If the first name is "-", input is taken from the
|
writes to the second. If the first name is "-", input is taken from the
|
||||||
standard input. If \fBpcre2test\fP is given only one argument, it reads from
|
standard input. If \fBpcre2test\fP is given only one argument, it reads from
|
||||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||||
stdout. When the input is a terminal, it prompts for each line of input, using
|
stdout.
|
||||||
"re>" to prompt for regular expression patterns, and "data>" to prompt for
|
|
||||||
subject lines.
|
|
||||||
.P
|
.P
|
||||||
When \fBpcre2test\fP is built, a configuration option can specify that it
|
When \fBpcre2test\fP is built, a configuration option can specify that it
|
||||||
should be linked with the \fBlibreadline\fP or \fBlibedit\fP library. When this
|
should be linked with the \fBlibreadline\fP or \fBlibedit\fP library. When this
|
||||||
|
@ -198,10 +197,15 @@ the \fB-help\fP option states whether or not \fBreadline()\fP will be used.
|
||||||
The program handles any number of tests, each of which consists of a set of
|
The program handles any number of tests, each of which consists of a set of
|
||||||
input lines. Each set starts with a regular expression pattern, followed by any
|
input lines. Each set starts with a regular expression pattern, followed by any
|
||||||
number of subject lines to be matched against that pattern. In between sets of
|
number of subject lines to be matched against that pattern. In between sets of
|
||||||
test data, command lines that begin with a hash (#) character may appear. This
|
test data, command lines that begin with # may appear. This file format, with
|
||||||
file format, with some restrictions, can also be processed by the
|
some restrictions, can also be processed by the \fBperltest.sh\fP script that
|
||||||
\fBperltest.sh\fP script that is distributed with PCRE2 as a means of checking
|
is distributed with PCRE2 as a means of checking that the behaviour of PCRE2
|
||||||
that the behaviour of PCRE2 and Perl is the same.
|
and Perl is the same.
|
||||||
|
.P
|
||||||
|
When the input is a terminal, \fBpcre2test\fP prompts for each line of input,
|
||||||
|
using "re>" to prompt for regular expression patterns, and "data>" to prompt
|
||||||
|
for subject lines. Command lines starting with # can be entered only in
|
||||||
|
response to the "re>" prompt.
|
||||||
.P
|
.P
|
||||||
Each subject line is matched separately and independently. If you want to do
|
Each subject line is matched separately and independently. If you want to do
|
||||||
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
||||||
|
@ -219,21 +223,30 @@ still input to be read.
|
||||||
.SH "COMMAND LINES"
|
.SH "COMMAND LINES"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
In between sets of test data, a line that begins with a hash (#) character is
|
In between sets of test data, a line that begins with # is interpreted as a
|
||||||
interpreted as a command line. If the first character is followed by white
|
command line. If the first character is followed by white space or an
|
||||||
space or an exclamation mark, the line is treated as a comment, and ignored.
|
exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
|
||||||
Otherwise, the following commands are recognized:
|
following commands are recognized:
|
||||||
.sp
|
.sp
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
.sp
|
.sp
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
||||||
options set, which locks out the use of UTF and Unicode property features. This
|
options set, which locks out the use of UTF and Unicode property features. This
|
||||||
is a trigger guard that is used in test files to ensure that UTF/Unicode tests
|
is a trigger guard that is used in test files to ensure that UTF or Unicode
|
||||||
are not accidentally added to files that are used when UTF support is not
|
property tests are not accidentally added to files that are used when Unicode
|
||||||
included in the library. This effect can also be obtained by the use of
|
support is not included in the library. This effect can also be obtained by the
|
||||||
\fB#pattern\fP; the difference is that \fB#forbid_utf\fP cannot be unset, and
|
use of \fB#pattern\fP; the difference is that \fB#forbid_utf\fP cannot be
|
||||||
the automatic options are not displayed in pattern information, to avoid
|
unset, and the automatic options are not displayed in pattern information, to
|
||||||
cluttering up test output.
|
avoid cluttering up test output.
|
||||||
|
.sp
|
||||||
|
#load <filename>
|
||||||
|
.sp
|
||||||
|
This command is used to load a set of precompiled patterns from a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
.sp
|
.sp
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
.sp
|
.sp
|
||||||
|
@ -249,6 +262,24 @@ lines, none of the other command lines are permitted, because they and many
|
||||||
of the modifiers are specific to \fBpcre2test\fP, and should not be used in
|
of the modifiers are specific to \fBpcre2test\fP, and should not be used in
|
||||||
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
|
test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
|
||||||
command helps detect tests that are accidentally put in the wrong file.
|
command helps detect tests that are accidentally put in the wrong file.
|
||||||
|
.sp
|
||||||
|
#pop [<modifiers>]
|
||||||
|
.sp
|
||||||
|
This command is used to manipulate the stack of compiled patterns, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
|
.sp
|
||||||
|
#save <filename>
|
||||||
|
.sp
|
||||||
|
This command is used to save a set of compiled patterns to a file, as described
|
||||||
|
in the section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
.sp
|
.sp
|
||||||
#subject <modifier-list>
|
#subject <modifier-list>
|
||||||
.sp
|
.sp
|
||||||
|
@ -387,6 +418,7 @@ can add to or override default modifiers that were set by a previous
|
||||||
\fB#pattern\fP command.
|
\fB#pattern\fP command.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.\" HTML <a name="optionmodifiers"></a>
|
||||||
.SS "Setting compilation options"
|
.SS "Setting compilation options"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -426,6 +458,7 @@ notation. Otherwise, those less than 0x100 are output in hex without the curly
|
||||||
brackets.
|
brackets.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.\" HTML <a name="controlmodifiers"></a>
|
||||||
.SS "Setting compilation controls"
|
.SS "Setting compilation controls"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -445,8 +478,8 @@ about the pattern:
|
||||||
memory show memory used
|
memory show memory used
|
||||||
newline=<type> set newline type
|
newline=<type> set newline type
|
||||||
parens_nest_limit=<n> set maximum parentheses depth
|
parens_nest_limit=<n> set maximum parentheses depth
|
||||||
perlcompat lock out non-Perl modifiers
|
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
|
push push compiled pattern onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
.sp
|
.sp
|
||||||
|
@ -683,6 +716,25 @@ These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
||||||
defaults, set them in a \fB#subject\fP command.
|
defaults, set them in a \fB#subject\fP command.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Saving a compiled pattern"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
When a pattern with the \fBpush\fP modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and \fBpcre2test\fP expects the next
|
||||||
|
line to contain a new pattern (or a command) instead of a subject line. This
|
||||||
|
facility is used when saving compiled patterns to a file, as described in the
|
||||||
|
section entitled "Saving and restoring compiled patterns"
|
||||||
|
.\" HTML <a href="#saverestore">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
|
The \fBpush\fP modifier is incompatible with compilation modifiers such as
|
||||||
|
\fBglobal\fP that act at match time. Any that are specified are ignored, with a
|
||||||
|
warning message, except for \fBreplace\fP, which causes an error. Note that,
|
||||||
|
\fBjitverify\fP, which is allowed, does not carry through to any subsequent
|
||||||
|
matching that uses this pattern.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "SUBJECT MODIFIERS"
|
.SH "SUBJECT MODIFIERS"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -1253,12 +1305,83 @@ characters.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.\" HTML <a name="saverestore"></a>
|
||||||
|
.SH "SAVING AND RESTORING COMPILED PATTERNS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
|
later, subject to a number of restrictions. JIT data cannot be saved. The host
|
||||||
|
on which the patterns are reloaded must be running the same version of PCRE2,
|
||||||
|
with the same code unit width, and must also have the same endianness, pointer
|
||||||
|
width and PCRE2_SIZE type. Before compiled patterns can be saved they must be
|
||||||
|
serialized, that is, converted to a stream of bytes. A single byte stream may
|
||||||
|
contain any number of compiled patterns, but they must all use the same
|
||||||
|
character tables. A single copy of the tables is included in the byte stream
|
||||||
|
(its size is 1088 bytes).
|
||||||
|
.P
|
||||||
|
The functions whose names begin with \fBpcre2_serialize_\fP are used
|
||||||
|
for serializing and de-serializing. They are described in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2serialize\fP
|
||||||
|
.\"
|
||||||
|
documentation. In this section we describe the features of \fBpcre2test\fP that
|
||||||
|
can be used to test these functions.
|
||||||
|
.P
|
||||||
|
When a pattern with \fBpush\fP modifier is successfully compiled, it is pushed
|
||||||
|
onto a stack of compiled patterns, and \fBpcre2test\fP expects the next line to
|
||||||
|
contain a new pattern (or command) instead of a subject line. By this means, a
|
||||||
|
number of patterns can be compiled and retained. The \fBpush\fP modifier is
|
||||||
|
incompatible with \fBposix\fP, and control modifiers that act at match time are
|
||||||
|
ignored (with a message). The \fBjitverify\fP modifier applies only at compile
|
||||||
|
time. The command
|
||||||
|
.sp
|
||||||
|
#save <filename>
|
||||||
|
.sp
|
||||||
|
causes all the stacked patterns to be serialized and the result written to the
|
||||||
|
named file. Afterwards, all the stacked patterns are freed. The command
|
||||||
|
.sp
|
||||||
|
#load <filename>
|
||||||
|
.sp
|
||||||
|
reads the data in the file, and then arranges for it to be de-serialized, with
|
||||||
|
the resulting compiled patterns added to the pattern stack. The pattern on the
|
||||||
|
top of the stack can be retrieved by the #pop command, which must be followed
|
||||||
|
by lines of subjects that are to be matched with the pattern, terminated as
|
||||||
|
usual by an empty line or end of file. This command may be followed by a
|
||||||
|
modifier list containing only
|
||||||
|
.\" HTML <a href="#controlmodifiers">
|
||||||
|
.\" </a>
|
||||||
|
control modifiers
|
||||||
|
.\"
|
||||||
|
that act after a pattern has been compiled. In particular, \fBhex\fP,
|
||||||
|
\fBposix\fP, and \fBpush\fP are not allowed, nor are any
|
||||||
|
.\" HTML <a href="#optionmodifiers">
|
||||||
|
.\" </a>
|
||||||
|
option-setting modifiers.
|
||||||
|
.\"
|
||||||
|
The JIT modifiers are, however permitted. Here is an example that saves and
|
||||||
|
reloads two patterns.
|
||||||
|
.sp
|
||||||
|
/abc/push
|
||||||
|
/xyz/push
|
||||||
|
#save tempfile
|
||||||
|
#load tempfile
|
||||||
|
#pop info
|
||||||
|
xyz
|
||||||
|
.sp
|
||||||
|
#pop jit,bincode
|
||||||
|
abc
|
||||||
|
.sp
|
||||||
|
If \fBjitverify\fP is used with #pop, it does not automatically imply
|
||||||
|
\fBjit\fP, which is different behaviour from when it is used on a pattern.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "SEE ALSO"
|
.SH "SEE ALSO"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
\fBpcre2\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
|
\fBpcre2\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
|
||||||
\fBpcre2jit\fP, \fBpcre2matching\fP(3), \fBpcre2partial\fP(d),
|
\fBpcre2jit\fP, \fBpcre2matching\fP(3), \fBpcre2partial\fP(d),
|
||||||
\fBpcre2pattern\fP(3).
|
\fBpcre2pattern\fP(3), \fBpcre2serialize\fP(3).
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH AUTHOR
|
.SH AUTHOR
|
||||||
|
@ -1275,6 +1398,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -17,10 +17,12 @@ SYNOPSIS
|
||||||
options, see the pcre2api documentation.
|
options, see the pcre2api documentation.
|
||||||
|
|
||||||
The input for pcre2test is a sequence of regular expression patterns
|
The input for pcre2test is a sequence of regular expression patterns
|
||||||
and subject strings to be matched. The output shows the result of each
|
and subject strings to be matched. There are also command lines for
|
||||||
match attempt. Modifiers on the command line, the patterns, and the
|
setting defaults and controlling some special actions. The output shows
|
||||||
subject lines specify PCRE2 function options, control how the subject
|
the result of each match attempt. Modifiers on external or internal
|
||||||
is processed, and what output is produced.
|
command lines, the patterns, and the subject lines specify PCRE2 func-
|
||||||
|
tion options, control how the subject is processed, and what output is
|
||||||
|
produced.
|
||||||
|
|
||||||
As the original fairly simple PCRE library evolved, it acquired many
|
As the original fairly simple PCRE library evolved, it acquired many
|
||||||
different features, and as a result, the original pcretest program
|
different features, and as a result, the original pcretest program
|
||||||
|
@ -173,9 +175,7 @@ DESCRIPTION
|
||||||
and writes to the second. If the first name is "-", input is taken from
|
and writes to the second. If the first name is "-", input is taken from
|
||||||
the standard input. If pcre2test is given only one argument, it reads
|
the standard input. If pcre2test is given only one argument, it reads
|
||||||
from that file and writes to stdout. Otherwise, it reads from stdin and
|
from that file and writes to stdout. Otherwise, it reads from stdin and
|
||||||
writes to stdout. When the input is a terminal, it prompts for each
|
writes to stdout.
|
||||||
line of input, using "re>" to prompt for regular expression patterns,
|
|
||||||
and "data>" to prompt for subject lines.
|
|
||||||
|
|
||||||
When pcre2test is built, a configuration option can specify that it
|
When pcre2test is built, a configuration option can specify that it
|
||||||
should be linked with the libreadline or libedit library. When this is
|
should be linked with the libreadline or libedit library. When this is
|
||||||
|
@ -186,11 +186,15 @@ DESCRIPTION
|
||||||
The program handles any number of tests, each of which consists of a
|
The program handles any number of tests, each of which consists of a
|
||||||
set of input lines. Each set starts with a regular expression pattern,
|
set of input lines. Each set starts with a regular expression pattern,
|
||||||
followed by any number of subject lines to be matched against that pat-
|
followed by any number of subject lines to be matched against that pat-
|
||||||
tern. In between sets of test data, command lines that begin with a
|
tern. In between sets of test data, command lines that begin with # may
|
||||||
hash (#) character may appear. This file format, with some restric-
|
appear. This file format, with some restrictions, can also be processed
|
||||||
tions, can also be processed by the perltest.sh script that is distrib-
|
by the perltest.sh script that is distributed with PCRE2 as a means of
|
||||||
uted with PCRE2 as a means of checking that the behaviour of PCRE2 and
|
checking that the behaviour of PCRE2 and Perl is the same.
|
||||||
Perl is the same.
|
|
||||||
|
When the input is a terminal, pcre2test prompts for each line of input,
|
||||||
|
using "re>" to prompt for regular expression patterns, and "data>" to
|
||||||
|
prompt for subject lines. Command lines starting with # can be entered
|
||||||
|
only in response to the "re>" prompt.
|
||||||
|
|
||||||
Each subject line is matched separately and independently. If you want
|
Each subject line is matched separately and independently. If you want
|
||||||
to do multi-line matches, you have to use the \n escape sequence (or \r
|
to do multi-line matches, you have to use the \n escape sequence (or \r
|
||||||
|
@ -207,22 +211,28 @@ DESCRIPTION
|
||||||
|
|
||||||
COMMAND LINES
|
COMMAND LINES
|
||||||
|
|
||||||
In between sets of test data, a line that begins with a hash (#) char-
|
In between sets of test data, a line that begins with # is interpreted
|
||||||
acter is interpreted as a command line. If the first character is fol-
|
as a command line. If the first character is followed by white space or
|
||||||
lowed by white space or an exclamation mark, the line is treated as a
|
an exclamation mark, the line is treated as a comment, and ignored.
|
||||||
comment, and ignored. Otherwise, the following commands are recog-
|
Otherwise, the following commands are recognized:
|
||||||
nized:
|
|
||||||
|
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
|
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
||||||
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
||||||
property features. This is a trigger guard that is used in test files
|
property features. This is a trigger guard that is used in test files
|
||||||
to ensure that UTF/Unicode tests are not accidentally added to files
|
to ensure that UTF or Unicode property tests are not accidentally added
|
||||||
that are used when UTF support is not included in the library. This
|
to files that are used when Unicode support is not included in the
|
||||||
effect can also be obtained by the use of #pattern; the difference is
|
library. This effect can also be obtained by the use of #pattern; the
|
||||||
that #forbid_utf cannot be unset, and the automatic options are not
|
difference is that #forbid_utf cannot be unset, and the automatic
|
||||||
displayed in pattern information, to avoid cluttering up test output.
|
options are not displayed in pattern information, to avoid cluttering
|
||||||
|
up test output.
|
||||||
|
|
||||||
|
#load <filename>
|
||||||
|
|
||||||
|
This command is used to load a set of precompiled patterns from a file,
|
||||||
|
as described in the section entitled "Saving and restoring compiled
|
||||||
|
patterns" below.
|
||||||
|
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
|
|
||||||
|
@ -240,6 +250,18 @@ COMMAND LINES
|
||||||
#perltest command helps detect tests that are accidentally put in the
|
#perltest command helps detect tests that are accidentally put in the
|
||||||
wrong file.
|
wrong file.
|
||||||
|
|
||||||
|
#pop [<modifiers>]
|
||||||
|
|
||||||
|
This command is used to manipulate the stack of compiled patterns, as
|
||||||
|
described in the section entitled "Saving and restoring compiled pat-
|
||||||
|
terns" below.
|
||||||
|
|
||||||
|
#save <filename>
|
||||||
|
|
||||||
|
This command is used to save a set of compiled patterns to a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled pat-
|
||||||
|
terns" below.
|
||||||
|
|
||||||
#subject <modifier-list>
|
#subject <modifier-list>
|
||||||
|
|
||||||
This command sets a default modifier list that applies to all subse-
|
This command sets a default modifier list that applies to all subse-
|
||||||
|
@ -432,8 +454,8 @@ PATTERN MODIFIERS
|
||||||
memory show memory used
|
memory show memory used
|
||||||
newline=<type> set newline type
|
newline=<type> set newline type
|
||||||
parens_nest_limit=<n> set maximum parentheses depth
|
parens_nest_limit=<n> set maximum parentheses depth
|
||||||
perlcompat lock out non-Perl modifiers
|
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
|
push push compiled pattern onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
|
|
||||||
|
@ -644,6 +666,19 @@ PATTERN MODIFIERS
|
||||||
These modifiers may not appear in a #pattern command. If you want them
|
These modifiers may not appear in a #pattern command. If you want them
|
||||||
as defaults, set them in a #subject command.
|
as defaults, set them in a #subject command.
|
||||||
|
|
||||||
|
Saving a compiled pattern
|
||||||
|
|
||||||
|
When a pattern with the push modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||||
|
next line to contain a new pattern (or a command) instead of a subject
|
||||||
|
line. This facility is used when saving compiled patterns to a file, as
|
||||||
|
described in the section entitled "Saving and restoring compiled pat-
|
||||||
|
terns" below. The push modifier is incompatible with compilation modi-
|
||||||
|
fiers such as global that act at match time. Any that are specified are
|
||||||
|
ignored, with a warning message, except for replace, which causes an
|
||||||
|
error. Note that, jitverify, which is allowed, does not carry through
|
||||||
|
to any subsequent matching that uses this pattern.
|
||||||
|
|
||||||
|
|
||||||
SUBJECT MODIFIERS
|
SUBJECT MODIFIERS
|
||||||
|
|
||||||
|
@ -652,7 +687,7 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
Setting match options
|
Setting match options
|
||||||
|
|
||||||
The following modifiers set options for pcre2_match() or
|
The following modifiers set options for pcre2_match() or
|
||||||
pcre2_dfa_match(). See pcreapi for a description of their effects.
|
pcre2_dfa_match(). See pcreapi for a description of their effects.
|
||||||
|
|
||||||
anchored set PCRE2_ANCHORED
|
anchored set PCRE2_ANCHORED
|
||||||
|
@ -666,20 +701,20 @@ SUBJECT MODIFIERS
|
||||||
partial_hard (or ph) set PCRE2_PARTIAL_HARD
|
partial_hard (or ph) set PCRE2_PARTIAL_HARD
|
||||||
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
|
partial_soft (or ps) set PCRE2_PARTIAL_SOFT
|
||||||
|
|
||||||
The partial matching modifiers are provided with abbreviations because
|
The partial matching modifiers are provided with abbreviations because
|
||||||
they appear frequently in tests.
|
they appear frequently in tests.
|
||||||
|
|
||||||
If the /posix modifier was present on the pattern, causing the POSIX
|
If the /posix modifier was present on the pattern, causing the POSIX
|
||||||
wrapper API to be used, the only option-setting modifiers that have any
|
wrapper API to be used, the only option-setting modifiers that have any
|
||||||
effect are notbol, notempty, and noteol, causing REG_NOTBOL,
|
effect are notbol, notempty, and noteol, causing REG_NOTBOL,
|
||||||
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
|
||||||
Any other modifiers cause an error.
|
Any other modifiers cause an error.
|
||||||
|
|
||||||
Setting match controls
|
Setting match controls
|
||||||
|
|
||||||
The following modifiers affect the matching process or request addi-
|
The following modifiers affect the matching process or request addi-
|
||||||
tional information. Some of them may also be specified on a pattern
|
tional information. Some of them may also be specified on a pattern
|
||||||
line (see above), in which case they apply to every subject line that
|
line (see above), in which case they apply to every subject line that
|
||||||
is matched against that pattern.
|
is matched against that pattern.
|
||||||
|
|
||||||
aftertext show text after match
|
aftertext show text after match
|
||||||
|
@ -712,23 +747,23 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
Showing more text
|
Showing more text
|
||||||
|
|
||||||
The aftertext modifier requests that as well as outputting the part of
|
The aftertext modifier requests that as well as outputting the part of
|
||||||
the subject string that matched the entire pattern, pcre2test should in
|
the subject string that matched the entire pattern, pcre2test should in
|
||||||
addition output the remainder of the subject string. This is useful for
|
addition output the remainder of the subject string. This is useful for
|
||||||
tests where the subject contains multiple copies of the same substring.
|
tests where the subject contains multiple copies of the same substring.
|
||||||
The allaftertext modifier requests the same action for captured sub-
|
The allaftertext modifier requests the same action for captured sub-
|
||||||
strings as well as the main matched substring. In each case the remain-
|
strings as well as the main matched substring. In each case the remain-
|
||||||
der is output on the following line with a plus character following the
|
der is output on the following line with a plus character following the
|
||||||
capture number.
|
capture number.
|
||||||
|
|
||||||
The allusedtext modifier requests that all the text that was consulted
|
The allusedtext modifier requests that all the text that was consulted
|
||||||
during a successful pattern match by the interpreter should be shown.
|
during a successful pattern match by the interpreter should be shown.
|
||||||
This feature is not supported for JIT matching, and if requested with
|
This feature is not supported for JIT matching, and if requested with
|
||||||
JIT it is ignored (with a warning message). Setting this modifier
|
JIT it is ignored (with a warning message). Setting this modifier
|
||||||
affects the output if there is a lookbehind at the start of a match, or
|
affects the output if there is a lookbehind at the start of a match, or
|
||||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||||
that precede or follow the start and end of the actual match are indi-
|
that precede or follow the start and end of the actual match are indi-
|
||||||
cated in the output by '<' or '>' characters underneath them. Here is
|
cated in the output by '<' or '>' characters underneath them. Here is
|
||||||
an example:
|
an example:
|
||||||
|
|
||||||
re> /(?<=pqr)abc(?=xyz)/
|
re> /(?<=pqr)abc(?=xyz)/
|
||||||
|
@ -736,16 +771,16 @@ SUBJECT MODIFIERS
|
||||||
0: pqrabcxyz
|
0: pqrabcxyz
|
||||||
<<< >>>
|
<<< >>>
|
||||||
|
|
||||||
This shows that the matched string is "abc", with the preceding and
|
This shows that the matched string is "abc", with the preceding and
|
||||||
following strings "pqr" and "xyz" having been consulted during the
|
following strings "pqr" and "xyz" having been consulted during the
|
||||||
match (when processing the assertions).
|
match (when processing the assertions).
|
||||||
|
|
||||||
The startchar modifier requests that the starting character for the
|
The startchar modifier requests that the starting character for the
|
||||||
match be indicated, if it is different to the start of the matched
|
match be indicated, if it is different to the start of the matched
|
||||||
string. The only time when this occurs is when \K has been processed as
|
string. The only time when this occurs is when \K has been processed as
|
||||||
part of the match. In this situation, the output for the matched string
|
part of the match. In this situation, the output for the matched string
|
||||||
is displayed from the starting character instead of from the match
|
is displayed from the starting character instead of from the match
|
||||||
point, with circumflex characters under the earlier characters. For
|
point, with circumflex characters under the earlier characters. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /abc\Kxyz/
|
re> /abc\Kxyz/
|
||||||
|
@ -753,7 +788,7 @@ SUBJECT MODIFIERS
|
||||||
0: abcxyz
|
0: abcxyz
|
||||||
^^^
|
^^^
|
||||||
|
|
||||||
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
||||||
ever, these two modifiers are mutually exclusive.
|
ever, these two modifiers are mutually exclusive.
|
||||||
|
|
||||||
Showing the value of all capture groups
|
Showing the value of all capture groups
|
||||||
|
@ -761,84 +796,84 @@ SUBJECT MODIFIERS
|
||||||
The allcaptures modifier requests that the values of all potential cap-
|
The allcaptures modifier requests that the values of all potential cap-
|
||||||
tured parentheses be output after a match. By default, only those up to
|
tured parentheses be output after a match. By default, only those up to
|
||||||
the highest one actually used in the match are output (corresponding to
|
the highest one actually used in the match are output (corresponding to
|
||||||
the return code from pcre2_match()). Groups that did not take part in
|
the return code from pcre2_match()). Groups that did not take part in
|
||||||
the match are output as "<unset>".
|
the match are output as "<unset>".
|
||||||
|
|
||||||
Testing callouts
|
Testing callouts
|
||||||
|
|
||||||
A callout function is supplied when pcre2test calls the library match-
|
A callout function is supplied when pcre2test calls the library match-
|
||||||
ing functions, unless callout_none is specified. If callout_capture is
|
ing functions, unless callout_none is specified. If callout_capture is
|
||||||
set, the current captured groups are output when a callout occurs.
|
set, the current captured groups are output when a callout occurs.
|
||||||
|
|
||||||
The callout_fail modifier can be given one or two numbers. If there is
|
The callout_fail modifier can be given one or two numbers. If there is
|
||||||
only one number, 1 is returned instead of 0 when a callout of that num-
|
only one number, 1 is returned instead of 0 when a callout of that num-
|
||||||
ber is reached. If two numbers are given, 1 is returned when callout
|
ber is reached. If two numbers are given, 1 is returned when callout
|
||||||
<n> is reached for the <m>th time.
|
<n> is reached for the <m>th time.
|
||||||
|
|
||||||
The callout_data modifier can be given an unsigned or a negative num-
|
The callout_data modifier can be given an unsigned or a negative num-
|
||||||
ber. Any value other than zero is used as a return from pcre2test's
|
ber. Any value other than zero is used as a return from pcre2test's
|
||||||
callout function.
|
callout function.
|
||||||
|
|
||||||
Finding all matches in a string
|
Finding all matches in a string
|
||||||
|
|
||||||
Searching for all possible matches within a subject can be requested by
|
Searching for all possible matches within a subject can be requested by
|
||||||
the global or /altglobal modifier. After finding a match, the matching
|
the global or /altglobal modifier. After finding a match, the matching
|
||||||
function is called again to search the remainder of the subject. The
|
function is called again to search the remainder of the subject. The
|
||||||
difference between global and altglobal is that the former uses the
|
difference between global and altglobal is that the former uses the
|
||||||
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
||||||
searching at a new point within the entire string (which is what Perl
|
searching at a new point within the entire string (which is what Perl
|
||||||
does), whereas the latter passes over a shortened subject. This makes a
|
does), whereas the latter passes over a shortened subject. This makes a
|
||||||
difference to the matching process if the pattern begins with a lookbe-
|
difference to the matching process if the pattern begins with a lookbe-
|
||||||
hind assertion (including \b or \B).
|
hind assertion (including \b or \B).
|
||||||
|
|
||||||
If an empty string is matched, the next match is done with the
|
If an empty string is matched, the next match is done with the
|
||||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||||
for another, non-empty, match at the same point in the subject. If this
|
for another, non-empty, match at the same point in the subject. If this
|
||||||
match fails, the start offset is advanced, and the normal match is
|
match fails, the start offset is advanced, and the normal match is
|
||||||
retried. This imitates the way Perl handles such cases when using the
|
retried. This imitates the way Perl handles such cases when using the
|
||||||
/g modifier or the split() function. Normally, the start offset is
|
/g modifier or the split() function. Normally, the start offset is
|
||||||
advanced by one character, but if the newline convention recognizes
|
advanced by one character, but if the newline convention recognizes
|
||||||
CRLF as a newline, and the current character is CR followed by LF, an
|
CRLF as a newline, and the current character is CR followed by LF, an
|
||||||
advance of two characters occurs.
|
advance of two characters occurs.
|
||||||
|
|
||||||
Testing substring extraction functions
|
Testing substring extraction functions
|
||||||
|
|
||||||
The copy and get modifiers can be used to test the pcre2_sub-
|
The copy and get modifiers can be used to test the pcre2_sub-
|
||||||
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
||||||
given more than once, and each can specify a group name or number, for
|
given more than once, and each can specify a group name or number, for
|
||||||
example:
|
example:
|
||||||
|
|
||||||
abcd\=copy=1,copy=3,get=G1
|
abcd\=copy=1,copy=3,get=G1
|
||||||
|
|
||||||
If the #subject command is used to set default copy and/or get lists,
|
If the #subject command is used to set default copy and/or get lists,
|
||||||
these can be unset by specifying a negative number to cancel all num-
|
these can be unset by specifying a negative number to cancel all num-
|
||||||
bered groups and an empty name to cancel all named groups.
|
bered groups and an empty name to cancel all named groups.
|
||||||
|
|
||||||
The getall modifier tests pcre2_substring_list_get(), which extracts
|
The getall modifier tests pcre2_substring_list_get(), which extracts
|
||||||
all captured substrings.
|
all captured substrings.
|
||||||
|
|
||||||
If the subject line is successfully matched, the substrings extracted
|
If the subject line is successfully matched, the substrings extracted
|
||||||
by the convenience functions are output with C, G, or L after the
|
by the convenience functions are output with C, G, or L after the
|
||||||
string number instead of a colon. This is in addition to the normal
|
string number instead of a colon. This is in addition to the normal
|
||||||
full list. The string length (that is, the return from the extraction
|
full list. The string length (that is, the return from the extraction
|
||||||
function) is given in parentheses after each substring, followed by the
|
function) is given in parentheses after each substring, followed by the
|
||||||
name when the extraction was by name.
|
name when the extraction was by name.
|
||||||
|
|
||||||
Testing the substitution function
|
Testing the substitution function
|
||||||
|
|
||||||
If the replace modifier is set, the pcre2_substitute() function is
|
If the replace modifier is set, the pcre2_substitute() function is
|
||||||
called instead of one of the matching functions. Unlike subject
|
called instead of one of the matching functions. Unlike subject
|
||||||
strings, pcre2test does not process replacement strings for escape
|
strings, pcre2test does not process replacement strings for escape
|
||||||
sequences. In UTF mode, a replacement string is checked to see if it is
|
sequences. In UTF mode, a replacement string is checked to see if it is
|
||||||
a valid UTF-8 string. If so, it is correctly converted to a UTF string
|
a valid UTF-8 string. If so, it is correctly converted to a UTF string
|
||||||
of the appropriate code unit width. If it is not a valid UTF-8 string,
|
of the appropriate code unit width. If it is not a valid UTF-8 string,
|
||||||
the individual code units are copied directly. This provides a means of
|
the individual code units are copied directly. This provides a means of
|
||||||
passing an invalid UTF-8 string for testing purposes.
|
passing an invalid UTF-8 string for testing purposes.
|
||||||
|
|
||||||
If the global modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
If the global modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||||
pcre2_substitute(). After a successful substitution, the modified
|
pcre2_substitute(). After a successful substitution, the modified
|
||||||
string is output, preceded by the number of replacements. This may be
|
string is output, preceded by the number of replacements. This may be
|
||||||
zero if there were no matches. Here is a simple example of a substitu-
|
zero if there were no matches. Here is a simple example of a substitu-
|
||||||
tion test:
|
tion test:
|
||||||
|
|
||||||
/abc/replace=xxx
|
/abc/replace=xxx
|
||||||
|
@ -847,11 +882,11 @@ SUBJECT MODIFIERS
|
||||||
=abc=abc=\=global
|
=abc=abc=\=global
|
||||||
2: =xxx=xxx=
|
2: =xxx=xxx=
|
||||||
|
|
||||||
Subject and replacement strings should be kept relatively short for
|
Subject and replacement strings should be kept relatively short for
|
||||||
substitution tests, as fixed-size buffers are used. To make it easy to
|
substitution tests, as fixed-size buffers are used. To make it easy to
|
||||||
test for buffer overflow, if the replacement string starts with a num-
|
test for buffer overflow, if the replacement string starts with a num-
|
||||||
ber in square brackets, that number is passed to pcre2_substitute() as
|
ber in square brackets, that number is passed to pcre2_substitute() as
|
||||||
the size of the output buffer, with the replacement string starting at
|
the size of the output buffer, with the replacement string starting at
|
||||||
the next character. Here is an example that tests the edge case:
|
the next character. Here is an example that tests the edge case:
|
||||||
|
|
||||||
/abc/
|
/abc/
|
||||||
|
@ -861,123 +896,123 @@ SUBJECT MODIFIERS
|
||||||
Failed: error -47: no more memory
|
Failed: error -47: no more memory
|
||||||
|
|
||||||
A replacement string is ignored with POSIX and DFA matching. Specifying
|
A replacement string is ignored with POSIX and DFA matching. Specifying
|
||||||
partial matching provokes an error return ("bad option value") from
|
partial matching provokes an error return ("bad option value") from
|
||||||
pcre2_substitute().
|
pcre2_substitute().
|
||||||
|
|
||||||
Setting the JIT stack size
|
Setting the JIT stack size
|
||||||
|
|
||||||
The jitstack modifier provides a way of setting the maximum stack size
|
The jitstack modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if
|
that is used by the just-in-time optimization code. It is ignored if
|
||||||
JIT optimization is not being used. The value is a number of kilobytes.
|
JIT optimization is not being used. The value is a number of kilobytes.
|
||||||
Providing a stack that is larger than the default 32K is necessary only
|
Providing a stack that is larger than the default 32K is necessary only
|
||||||
for very complicated patterns.
|
for very complicated patterns.
|
||||||
|
|
||||||
Setting match and recursion limits
|
Setting match and recursion limits
|
||||||
|
|
||||||
The match_limit and recursion_limit modifiers set the appropriate lim-
|
The match_limit and recursion_limit modifiers set the appropriate lim-
|
||||||
its in the match context. These values are ignored when the find_limits
|
its in the match context. These values are ignored when the find_limits
|
||||||
modifier is specified.
|
modifier is specified.
|
||||||
|
|
||||||
Finding minimum limits
|
Finding minimum limits
|
||||||
|
|
||||||
If the find_limits modifier is present, pcre2test calls pcre2_match()
|
If the find_limits modifier is present, pcre2test calls pcre2_match()
|
||||||
several times, setting different values in the match context via
|
several times, setting different values in the match context via
|
||||||
pcre2_set_match_limit() and pcre2_set_recursion_limit() until it finds
|
pcre2_set_match_limit() and pcre2_set_recursion_limit() until it finds
|
||||||
the minimum values for each parameter that allow pcre2_match() to com-
|
the minimum values for each parameter that allow pcre2_match() to com-
|
||||||
plete without error.
|
plete without error.
|
||||||
|
|
||||||
If JIT is being used, only the match limit is relevant. If DFA matching
|
If JIT is being used, only the match limit is relevant. If DFA matching
|
||||||
is being used, neither limit is relevant, and this modifier is ignored
|
is being used, neither limit is relevant, and this modifier is ignored
|
||||||
(with a warning message).
|
(with a warning message).
|
||||||
|
|
||||||
The match_limit number is a measure of the amount of backtracking that
|
The match_limit number is a measure of the amount of backtracking that
|
||||||
takes place, and learning the minimum value can be instructive. For
|
takes place, and learning the minimum value can be instructive. For
|
||||||
most simple matches, the number is quite small, but for patterns with
|
most simple matches, the number is quite small, but for patterns with
|
||||||
very large numbers of matching possibilities, it can become large very
|
very large numbers of matching possibilities, it can become large very
|
||||||
quickly with increasing length of subject string. The
|
quickly with increasing length of subject string. The
|
||||||
match_limit_recursion number is a measure of how much stack (or, if
|
match_limit_recursion number is a measure of how much stack (or, if
|
||||||
PCRE2 is compiled with NO_RECURSE, how much heap) memory is needed to
|
PCRE2 is compiled with NO_RECURSE, how much heap) memory is needed to
|
||||||
complete the match attempt.
|
complete the match attempt.
|
||||||
|
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
||||||
|
|
||||||
The mark modifier causes the names from backtracking control verbs that
|
The mark modifier causes the names from backtracking control verbs that
|
||||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||||
it is added to the non-match message.
|
it is added to the non-match message.
|
||||||
|
|
||||||
Showing memory usage
|
Showing memory usage
|
||||||
|
|
||||||
The memory modifier causes pcre2test to log all memory allocation and
|
The memory modifier causes pcre2test to log all memory allocation and
|
||||||
freeing calls that occur during a match operation.
|
freeing calls that occur during a match operation.
|
||||||
|
|
||||||
Setting a starting offset
|
Setting a starting offset
|
||||||
|
|
||||||
The offset modifier sets an offset in the subject string at which
|
The offset modifier sets an offset in the subject string at which
|
||||||
matching starts. Its value is a number of code units, not characters.
|
matching starts. Its value is a number of code units, not characters.
|
||||||
|
|
||||||
Setting the size of the output vector
|
Setting the size of the output vector
|
||||||
|
|
||||||
The ovector modifier applies only to the subject line in which it
|
The ovector modifier applies only to the subject line in which it
|
||||||
appears, though of course it can also be used to set a default in a
|
appears, though of course it can also be used to set a default in a
|
||||||
#subject command. It specifies the number of pairs of offsets that are
|
#subject command. It specifies the number of pairs of offsets that are
|
||||||
available for storing matching information. The default is 15.
|
available for storing matching information. The default is 15.
|
||||||
|
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
regexec() to be called with a NULL capture vector. When not testing the
|
regexec() to be called with a NULL capture vector. When not testing the
|
||||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||||
ate_from_pattern() to be called, in order to create a match block of
|
ate_from_pattern() to be called, in order to create a match block of
|
||||||
exactly the right size for the pattern. (It is not possible to create a
|
exactly the right size for the pattern. (It is not possible to create a
|
||||||
match block with a zero-length ovector; there is always at least one
|
match block with a zero-length ovector; there is always at least one
|
||||||
pair of offsets.)
|
pair of offsets.)
|
||||||
|
|
||||||
Passing the subject as zero-terminated
|
Passing the subject as zero-terminated
|
||||||
|
|
||||||
By default, the subject string is passed to a native API matching func-
|
By default, the subject string is passed to a native API matching func-
|
||||||
tion with its correct length. In order to test the facility for passing
|
tion with its correct length. In order to test the facility for passing
|
||||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
||||||
via the POSIX interface, this modifier has no effect, as there is no
|
via the POSIX interface, this modifier has no effect, as there is no
|
||||||
facility for passing a length.)
|
facility for passing a length.)
|
||||||
|
|
||||||
When testing pcre2_substitute(), this modifier also has the effect of
|
When testing pcre2_substitute(), this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
|
|
||||||
|
|
||||||
THE ALTERNATIVE MATCHING FUNCTION
|
THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
By default, pcre2test uses the standard PCRE2 matching function,
|
By default, pcre2test uses the standard PCRE2 matching function,
|
||||||
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
||||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||||
ferent way, and has some restrictions. The differences between the two
|
ferent way, and has some restrictions. The differences between the two
|
||||||
functions are described in the pcre2matching documentation.
|
functions are described in the pcre2matching documentation.
|
||||||
|
|
||||||
If the dfa modifier is set, the alternative matching function is used.
|
If the dfa modifier is set, the alternative matching function is used.
|
||||||
This function finds all possible matches at a given point in the sub-
|
This function finds all possible matches at a given point in the sub-
|
||||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||||
after the first match is found. This is always the shortest possible
|
after the first match is found. This is always the shortest possible
|
||||||
match.
|
match.
|
||||||
|
|
||||||
|
|
||||||
DEFAULT OUTPUT FROM pcre2test
|
DEFAULT OUTPUT FROM pcre2test
|
||||||
|
|
||||||
This section describes the output when the normal matching function,
|
This section describes the output when the normal matching function,
|
||||||
pcre2_match(), is being used.
|
pcre2_match(), is being used.
|
||||||
|
|
||||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||||
strings, starting with number 0 for the string that matched the whole
|
strings, starting with number 0 for the string that matched the whole
|
||||||
pattern. Otherwise, it outputs "No match" when the return is
|
pattern. Otherwise, it outputs "No match" when the return is
|
||||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||||
this is the entire substring that was inspected during the partial
|
this is the entire substring that was inspected during the partial
|
||||||
match; it may include characters before the actual match start if a
|
match; it may include characters before the actual match start if a
|
||||||
lookbehind assertion, \K, \b, or \B was involved.)
|
lookbehind assertion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||||
and a short descriptive phrase. If the error is a failed UTF string
|
and a short descriptive phrase. If the error is a failed UTF string
|
||||||
check, the code unit offset of the start of the failing character is
|
check, the code unit offset of the start of the failing character is
|
||||||
also output. Here is an example of an interactive pcre2test run.
|
also output. Here is an example of an interactive pcre2test run.
|
||||||
|
|
||||||
$ pcre2test
|
$ pcre2test
|
||||||
|
@ -993,8 +1028,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Unset capturing substrings that are not followed by one that is set are
|
Unset capturing substrings that are not followed by one that is set are
|
||||||
not shown by pcre2test unless the allcaptures modifier is specified. In
|
not shown by pcre2test unless the allcaptures modifier is specified. In
|
||||||
the following example, there are two capturing substrings, but when the
|
the following example, there are two capturing substrings, but when the
|
||||||
first data line is matched, the second, unset substring is not shown.
|
first data line is matched, the second, unset substring is not shown.
|
||||||
An "internal" unset substring is shown as "<unset>", as for the second
|
An "internal" unset substring is shown as "<unset>", as for the second
|
||||||
data line.
|
data line.
|
||||||
|
|
||||||
re> /(a)|(b)/
|
re> /(a)|(b)/
|
||||||
|
@ -1006,11 +1041,11 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
1: <unset>
|
1: <unset>
|
||||||
2: b
|
2: b
|
||||||
|
|
||||||
If the strings contain any non-printing characters, they are output as
|
If the strings contain any non-printing characters, they are output as
|
||||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||||
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
||||||
nition of non-printing characters. If the /aftertext modifier is set,
|
nition of non-printing characters. If the /aftertext modifier is set,
|
||||||
the output for substring 0 is followed by the the rest of the subject
|
the output for substring 0 is followed by the the rest of the subject
|
||||||
string, identified by "0+" like this:
|
string, identified by "0+" like this:
|
||||||
|
|
||||||
re> /cat/aftertext
|
re> /cat/aftertext
|
||||||
|
@ -1018,7 +1053,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: cat
|
0: cat
|
||||||
0+ aract
|
0+ aract
|
||||||
|
|
||||||
If global matching is requested, the results of successive matching
|
If global matching is requested, the results of successive matching
|
||||||
attempts are output in sequence, like this:
|
attempts are output in sequence, like this:
|
||||||
|
|
||||||
re> /\Bi(\w\w)/g
|
re> /\Bi(\w\w)/g
|
||||||
|
@ -1030,8 +1065,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: ipp
|
0: ipp
|
||||||
1: pp
|
1: pp
|
||||||
|
|
||||||
"No match" is output only if the first match attempt fails. Here is an
|
"No match" is output only if the first match attempt fails. Here is an
|
||||||
example of a failure message (the offset 4 that is specified by the
|
example of a failure message (the offset 4 that is specified by the
|
||||||
offset modifier is past the end of the subject string):
|
offset modifier is past the end of the subject string):
|
||||||
|
|
||||||
re> /xyz/
|
re> /xyz/
|
||||||
|
@ -1039,7 +1074,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Error -24 (bad offset value)
|
Error -24 (bad offset value)
|
||||||
|
|
||||||
Note that whereas patterns can be continued over several lines (a plain
|
Note that whereas patterns can be continued over several lines (a plain
|
||||||
">" prompt is used for continuations), subject lines may not. However
|
">" prompt is used for continuations), subject lines may not. However
|
||||||
newlines can be included in a subject by means of the \n escape (or \r,
|
newlines can be included in a subject by means of the \n escape (or \r,
|
||||||
\r\n, etc., depending on the newline sequence setting).
|
\r\n, etc., depending on the newline sequence setting).
|
||||||
|
|
||||||
|
@ -1047,7 +1082,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
When the alternative matching function, pcre2_dfa_match(), is used, the
|
When the alternative matching function, pcre2_dfa_match(), is used, the
|
||||||
output consists of a list of all the matches that start at the first
|
output consists of a list of all the matches that start at the first
|
||||||
point in the subject where there is at least one match. For example:
|
point in the subject where there is at least one match. For example:
|
||||||
|
|
||||||
re> /(tang|tangerine|tan)/
|
re> /(tang|tangerine|tan)/
|
||||||
|
@ -1056,11 +1091,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tang
|
1: tang
|
||||||
2: tan
|
2: tan
|
||||||
|
|
||||||
Using the normal matching function on this data finds only "tang". The
|
Using the normal matching function on this data finds only "tang". The
|
||||||
longest matching string is always given first (and numbered zero).
|
longest matching string is always given first (and numbered zero).
|
||||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||||
followed by the partially matching substring. Note that this is the
|
followed by the partially matching substring. Note that this is the
|
||||||
entire substring that was inspected during the partial match; it may
|
entire substring that was inspected during the partial match; it may
|
||||||
include characters before the actual match start if a lookbehind asser-
|
include characters before the actual match start if a lookbehind asser-
|
||||||
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
tion, \b, or \B was involved. (\K is not supported for DFA matching.)
|
||||||
|
|
||||||
|
@ -1076,16 +1111,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tan
|
1: tan
|
||||||
0: tan
|
0: tan
|
||||||
|
|
||||||
The alternative matching function does not support substring capture,
|
The alternative matching function does not support substring capture,
|
||||||
so the modifiers that are concerned with captured substrings are not
|
so the modifiers that are concerned with captured substrings are not
|
||||||
relevant.
|
relevant.
|
||||||
|
|
||||||
|
|
||||||
RESTARTING AFTER A PARTIAL MATCH
|
RESTARTING AFTER A PARTIAL MATCH
|
||||||
|
|
||||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||||
TIAL return, indicating that the subject partially matched the pattern,
|
TIAL return, indicating that the subject partially matched the pattern,
|
||||||
you can restart the match with additional subject data by means of the
|
you can restart the match with additional subject data by means of the
|
||||||
dfa_restart modifier. For example:
|
dfa_restart modifier. For example:
|
||||||
|
|
||||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||||
|
@ -1094,29 +1129,29 @@ RESTARTING AFTER A PARTIAL MATCH
|
||||||
data> n05\=dfa,dfa_restart
|
data> n05\=dfa,dfa_restart
|
||||||
0: n05
|
0: n05
|
||||||
|
|
||||||
For further information about partial matching, see the pcre2partial
|
For further information about partial matching, see the pcre2partial
|
||||||
documentation.
|
documentation.
|
||||||
|
|
||||||
|
|
||||||
CALLOUTS
|
CALLOUTS
|
||||||
|
|
||||||
If the pattern contains any callout requests, pcre2test's callout func-
|
If the pattern contains any callout requests, pcre2test's callout func-
|
||||||
tion is called during matching. This works with both matching func-
|
tion is called during matching. This works with both matching func-
|
||||||
tions. By default, the called function displays the callout number, the
|
tions. By default, the called function displays the callout number, the
|
||||||
start and current positions in the text at the callout time, and the
|
start and current positions in the text at the callout time, and the
|
||||||
next pattern item to be tested. For example:
|
next pattern item to be tested. For example:
|
||||||
|
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match
|
This output indicates that callout number 0 occurred for a match
|
||||||
attempt starting at the fourth character of the subject string, when
|
attempt starting at the fourth character of the subject string, when
|
||||||
the pointer was at the seventh character, and when the next pattern
|
the pointer was at the seventh character, and when the next pattern
|
||||||
item was \d. Just one circumflex is output if the start and current
|
item was \d. Just one circumflex is output if the start and current
|
||||||
positions are the same.
|
positions are the same.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the /auto_callout pattern modifier. In this case, instead
|
a result of the /auto_callout pattern modifier. In this case, instead
|
||||||
of showing the callout number, the offset in the pattern, preceded by a
|
of showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
|
@ -1130,7 +1165,7 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
|
@ -1144,37 +1179,96 @@ CALLOUTS
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
The mark changes between matching "a" and "b", but stays the same for
|
The mark changes between matching "a" and "b", but stays the same for
|
||||||
the rest of the match, so nothing more is output. If, as a result of
|
the rest of the match, so nothing more is output. If, as a result of
|
||||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||||
output.
|
output.
|
||||||
|
|
||||||
The callout function in pcre2test returns zero (carry on matching) by
|
The callout function in pcre2test returns zero (carry on matching) by
|
||||||
default, but you can use a callout_fail modifier in a subject line (as
|
default, but you can use a callout_fail modifier in a subject line (as
|
||||||
described above) to change this and other parameters of the callout.
|
described above) to change this and other parameters of the callout.
|
||||||
|
|
||||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||||
cated regular expressions. For further information about callouts, see
|
cated regular expressions. For further information about callouts, see
|
||||||
the pcre2callout documentation.
|
the pcre2callout documentation.
|
||||||
|
|
||||||
|
|
||||||
NON-PRINTING CHARACTERS
|
NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text in the compiled version of a pattern,
|
When pcre2test is outputting text in the compiled version of a pattern,
|
||||||
bytes other than 32-126 are always treated as non-printing characters
|
bytes other than 32-126 are always treated as non-printing characters
|
||||||
and are therefore shown as hex escapes.
|
and are therefore shown as hex escapes.
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the /locale modifier). In this case, the
|
set for the pattern (using the /locale modifier). In this case, the
|
||||||
isprint() function is used to distinguish printing and non-printing
|
isprint() function is used to distinguish printing and non-printing
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
|
||||||
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
|
||||||
|
It is possible to save compiled patterns on disc or elsewhere, and
|
||||||
|
reload them later, subject to a number of restrictions. JIT data cannot
|
||||||
|
be saved. The host on which the patterns are reloaded must be running
|
||||||
|
the same version of PCRE2, with the same code unit width, and must also
|
||||||
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
|
ber of compiled patterns, but they must all use the same character
|
||||||
|
tables. A single copy of the tables is included in the byte stream (its
|
||||||
|
size is 1088 bytes).
|
||||||
|
|
||||||
|
The functions whose names begin with pcre2_serialize_ are used for
|
||||||
|
serializing and de-serializing. They are described in the pcre2serial-
|
||||||
|
ize documentation. In this section we describe the features of
|
||||||
|
pcre2test that can be used to test these functions.
|
||||||
|
|
||||||
|
When a pattern with push modifier is successfully compiled, it is
|
||||||
|
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||||
|
next line to contain a new pattern (or command) instead of a subject
|
||||||
|
line. By this means, a number of patterns can be compiled and retained.
|
||||||
|
The push modifier is incompatible with posix, and control modifiers
|
||||||
|
that act at match time are ignored (with a message). The jitverify mod-
|
||||||
|
ifier applies only at compile time. The command
|
||||||
|
|
||||||
|
#save <filename>
|
||||||
|
|
||||||
|
causes all the stacked patterns to be serialized and the result written
|
||||||
|
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||||
|
command
|
||||||
|
|
||||||
|
#load <filename>
|
||||||
|
|
||||||
|
reads the data in the file, and then arranges for it to be de-serial-
|
||||||
|
ized, with the resulting compiled patterns added to the pattern stack.
|
||||||
|
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||||
|
mand, which must be followed by lines of subjects that are to be
|
||||||
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
|
of file. This command may be followed by a modifier list containing
|
||||||
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
|
particular, hex, posix, and push are not allowed, nor are any option-
|
||||||
|
setting modifiers. The JIT modifiers are, however permitted. Here is
|
||||||
|
an example that saves and reloads two patterns.
|
||||||
|
|
||||||
|
/abc/push
|
||||||
|
/xyz/push
|
||||||
|
#save tempfile
|
||||||
|
#load tempfile
|
||||||
|
#pop info
|
||||||
|
xyz
|
||||||
|
|
||||||
|
#pop jit,bincode
|
||||||
|
abc
|
||||||
|
|
||||||
|
If jitverify is used with #pop, it does not automatically imply jit,
|
||||||
|
which is different behaviour from when it is used on a pattern.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3),
|
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3),
|
||||||
pcre2partial(d), pcre2pattern(3).
|
pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
|
||||||
|
|
||||||
|
|
||||||
AUTHOR
|
AUTHOR
|
||||||
|
@ -1186,5 +1280,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 02 January 2015
|
Last updated: 23 January 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
|
|
|
@ -293,7 +293,7 @@ if [ $usevalgrind -ne 0 ]; then
|
||||||
|
|
||||||
for opts in \
|
for opts in \
|
||||||
"--disable-stack-for-recursion --disable-shared" \
|
"--disable-stack-for-recursion --disable-shared" \
|
||||||
"--with-link-size=3 --disable-shared" \
|
"--with-link-size=3 --enable-pcre2-16 --enable-pcre2-32 --disable-shared" \
|
||||||
"--disable-unicode --disable-shared"
|
"--disable-unicode --disable-shared"
|
||||||
do
|
do
|
||||||
opts="--enable-valgrind $opts"
|
opts="--enable-valgrind $opts"
|
||||||
|
|
|
@ -200,7 +200,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_NAME "PCRE2"
|
#define PACKAGE_NAME "PCRE2"
|
||||||
|
|
||||||
/* Define to the full name and version of this package. */
|
/* Define to the full name and version of this package. */
|
||||||
#define PACKAGE_STRING "PCRE2 10.00"
|
#define PACKAGE_STRING "PCRE2 10.10-RC1"
|
||||||
|
|
||||||
/* Define to the one symbol short name of this package. */
|
/* Define to the one symbol short name of this package. */
|
||||||
#define PACKAGE_TARNAME "pcre2"
|
#define PACKAGE_TARNAME "pcre2"
|
||||||
|
@ -209,7 +209,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_URL ""
|
#define PACKAGE_URL ""
|
||||||
|
|
||||||
/* Define to the version of this package. */
|
/* Define to the version of this package. */
|
||||||
#define PACKAGE_VERSION "10.00"
|
#define PACKAGE_VERSION "10.10-RC1"
|
||||||
|
|
||||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||||
|
@ -287,7 +287,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
/* #undef SUPPORT_VALGRIND */
|
/* #undef SUPPORT_VALGRIND */
|
||||||
|
|
||||||
/* Version number of package */
|
/* Version number of package */
|
||||||
#define VERSION "10.00"
|
#define VERSION "10.10-RC1"
|
||||||
|
|
||||||
/* Define to empty if `const' does not conform to ANSI C. */
|
/* Define to empty if `const' does not conform to ANSI C. */
|
||||||
/* #undef const */
|
/* #undef const */
|
||||||
|
|
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
/* The current PCRE version information. */
|
/* The current PCRE version information. */
|
||||||
|
|
||||||
#define PCRE2_MAJOR 10
|
#define PCRE2_MAJOR 10
|
||||||
#define PCRE2_MINOR 00
|
#define PCRE2_MINOR 10
|
||||||
#define PCRE2_PRERELEASE
|
#define PCRE2_PRERELEASE -RC1
|
||||||
#define PCRE2_DATE 2014-01-05
|
#define PCRE2_DATE 2014-01-13
|
||||||
|
|
||||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||||
imported have to be identified as such. When building PCRE2, the appropriate
|
imported have to be identified as such. When building PCRE2, the appropriate
|
||||||
|
@ -455,6 +455,18 @@ PCRE2_EXP_DECL void pcre2_substring_list_free(PCRE2_SPTR *); \
|
||||||
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
||||||
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
||||||
|
|
||||||
|
/* Functions for serializing / deserializing compiled patterns. */
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_EXP_DECL int pcre2_serialize_encode(const pcre2_code **, \
|
||||||
|
PCRE2_SIZE, uint8_t **, PCRE2_SIZE *, \
|
||||||
|
pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int pcre2_serialize_decode(pcre2_code **, PCRE2_SIZE, \
|
||||||
|
const uint8_t *, pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int pcre2_serialize_get_number_of_codes(const uint8_t *, \
|
||||||
|
PCRE2_SIZE *); \
|
||||||
|
PCRE2_EXP_DECL void pcre2_serialize_free(uint8_t *);
|
||||||
|
|
||||||
|
|
||||||
/* Convenience function for match + substitute. */
|
/* Convenience function for match + substitute. */
|
||||||
|
|
||||||
|
@ -560,6 +572,10 @@ pcre2_compile are called by application code. */
|
||||||
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
||||||
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
||||||
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
||||||
|
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
|
||||||
|
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
|
||||||
|
#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_)
|
||||||
|
#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_)
|
||||||
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
||||||
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
||||||
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
||||||
|
@ -596,8 +612,9 @@ PCRE2_MATCH_CONTEXT_FUNCTIONS \
|
||||||
PCRE2_COMPILE_FUNCTIONS \
|
PCRE2_COMPILE_FUNCTIONS \
|
||||||
PCRE2_PATTERN_INFO_FUNCTIONS \
|
PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||||
PCRE2_MATCH_FUNCTIONS \
|
PCRE2_MATCH_FUNCTIONS \
|
||||||
PCRE2_SUBSTITUTE_FUNCTION \
|
|
||||||
PCRE2_SUBSTRING_FUNCTIONS \
|
PCRE2_SUBSTRING_FUNCTIONS \
|
||||||
|
PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_SUBSTITUTE_FUNCTION \
|
||||||
PCRE2_JIT_FUNCTIONS \
|
PCRE2_JIT_FUNCTIONS \
|
||||||
PCRE2_OTHER_FUNCTIONS
|
PCRE2_OTHER_FUNCTIONS
|
||||||
|
|
||||||
|
@ -625,6 +642,8 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
||||||
#undef PCRE2_MATCH_FUNCTIONS
|
#undef PCRE2_MATCH_FUNCTIONS
|
||||||
#undef PCRE2_SUBSTRING_FUNCTIONS
|
#undef PCRE2_SUBSTRING_FUNCTIONS
|
||||||
|
#undef PCRE2_SERIALIZE_FUNCTIONS
|
||||||
|
#undef PCRE2_SUBSTITUTE_FUNCTION
|
||||||
#undef PCRE2_JIT_FUNCTIONS
|
#undef PCRE2_JIT_FUNCTIONS
|
||||||
#undef PCRE2_OTHER_FUNCTIONS
|
#undef PCRE2_OTHER_FUNCTIONS
|
||||||
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
|
|
|
@ -198,11 +198,13 @@ greater than zero. */
|
||||||
#define PCRE2_ERROR_UTF32_ERR1 (-27)
|
#define PCRE2_ERROR_UTF32_ERR1 (-27)
|
||||||
#define PCRE2_ERROR_UTF32_ERR2 (-28)
|
#define PCRE2_ERROR_UTF32_ERR2 (-28)
|
||||||
|
|
||||||
/* Error codes for pcre2[_dfa]_match(), substring extraction functions, and
|
/* Error codes for pcre2[_dfa]_match(), substring extraction functions, context
|
||||||
context functions. */
|
functions, and serializing functions. They are in numerical order. Originally
|
||||||
|
they were in alphabetical order too, but now that PCRE2 is released, the
|
||||||
|
numbers must not be changed. */
|
||||||
|
|
||||||
#define PCRE2_ERROR_BADDATA (-29)
|
#define PCRE2_ERROR_BADDATA (-29)
|
||||||
#define PCRE2_ERROR_BADLENGTH (-30)
|
#define PCRE2_ERROR_MIXEDTABLES (-30) /* Name was changed */
|
||||||
#define PCRE2_ERROR_BADMAGIC (-31)
|
#define PCRE2_ERROR_BADMAGIC (-31)
|
||||||
#define PCRE2_ERROR_BADMODE (-32)
|
#define PCRE2_ERROR_BADMODE (-32)
|
||||||
#define PCRE2_ERROR_BADOFFSET (-33)
|
#define PCRE2_ERROR_BADOFFSET (-33)
|
||||||
|
@ -455,6 +457,17 @@ PCRE2_EXP_DECL void pcre2_substring_list_free(PCRE2_SPTR *); \
|
||||||
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
PCRE2_EXP_DECL int pcre2_substring_list_get(pcre2_match_data *, \
|
||||||
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
PCRE2_UCHAR ***, PCRE2_SIZE **);
|
||||||
|
|
||||||
|
/* Functions for serializing / deserializing compiled patterns. */
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_EXP_DECL int32_t pcre2_serialize_encode(const pcre2_code **, \
|
||||||
|
int32_t, uint8_t **, PCRE2_SIZE *, \
|
||||||
|
pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int32_t pcre2_serialize_decode(pcre2_code **, int32_t, \
|
||||||
|
const uint8_t *, pcre2_general_context *); \
|
||||||
|
PCRE2_EXP_DECL int32_t pcre2_serialize_get_number_of_codes(const uint8_t *); \
|
||||||
|
PCRE2_EXP_DECL void pcre2_serialize_free(uint8_t *);
|
||||||
|
|
||||||
|
|
||||||
/* Convenience function for match + substitute. */
|
/* Convenience function for match + substitute. */
|
||||||
|
|
||||||
|
@ -560,6 +573,10 @@ pcre2_compile are called by application code. */
|
||||||
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_)
|
||||||
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_)
|
||||||
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_)
|
||||||
|
#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_)
|
||||||
|
#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_)
|
||||||
|
#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_)
|
||||||
|
#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_)
|
||||||
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_)
|
||||||
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_)
|
||||||
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
|
||||||
|
@ -596,8 +613,9 @@ PCRE2_MATCH_CONTEXT_FUNCTIONS \
|
||||||
PCRE2_COMPILE_FUNCTIONS \
|
PCRE2_COMPILE_FUNCTIONS \
|
||||||
PCRE2_PATTERN_INFO_FUNCTIONS \
|
PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||||
PCRE2_MATCH_FUNCTIONS \
|
PCRE2_MATCH_FUNCTIONS \
|
||||||
PCRE2_SUBSTITUTE_FUNCTION \
|
|
||||||
PCRE2_SUBSTRING_FUNCTIONS \
|
PCRE2_SUBSTRING_FUNCTIONS \
|
||||||
|
PCRE2_SERIALIZE_FUNCTIONS \
|
||||||
|
PCRE2_SUBSTITUTE_FUNCTION \
|
||||||
PCRE2_JIT_FUNCTIONS \
|
PCRE2_JIT_FUNCTIONS \
|
||||||
PCRE2_OTHER_FUNCTIONS
|
PCRE2_OTHER_FUNCTIONS
|
||||||
|
|
||||||
|
@ -625,6 +643,8 @@ PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
#undef PCRE2_PATTERN_INFO_FUNCTIONS
|
||||||
#undef PCRE2_MATCH_FUNCTIONS
|
#undef PCRE2_MATCH_FUNCTIONS
|
||||||
#undef PCRE2_SUBSTRING_FUNCTIONS
|
#undef PCRE2_SUBSTRING_FUNCTIONS
|
||||||
|
#undef PCRE2_SERIALIZE_FUNCTIONS
|
||||||
|
#undef PCRE2_SUBSTITUTE_FUNCTION
|
||||||
#undef PCRE2_JIT_FUNCTIONS
|
#undef PCRE2_JIT_FUNCTIONS
|
||||||
#undef PCRE2_OTHER_FUNCTIONS
|
#undef PCRE2_OTHER_FUNCTIONS
|
||||||
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS
|
||||||
|
|
|
@ -683,10 +683,28 @@ static const uint8_t opcode_possessify[] = {
|
||||||
PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
|
||||||
pcre2_code_free(pcre2_code *code)
|
pcre2_code_free(pcre2_code *code)
|
||||||
{
|
{
|
||||||
|
PCRE2_SIZE* ref_count;
|
||||||
|
|
||||||
if (code != NULL)
|
if (code != NULL)
|
||||||
{
|
{
|
||||||
if (code->executable_jit != NULL)
|
if (code->executable_jit != NULL)
|
||||||
PRIV(jit_free)(code->executable_jit, &code->memctl);
|
PRIV(jit_free)(code->executable_jit, &code->memctl);
|
||||||
|
|
||||||
|
if ((code->flags & PCRE2_DEREF_TABLES) != 0)
|
||||||
|
{
|
||||||
|
/* Decoded tables belong to the codes after deserialization, and they must
|
||||||
|
be freed when there are no more reference to them. The *ref_count should
|
||||||
|
always be > 0. */
|
||||||
|
|
||||||
|
ref_count = (PCRE2_SIZE *)(code->tables + tables_length);
|
||||||
|
if (*ref_count > 0)
|
||||||
|
{
|
||||||
|
(*ref_count)--;
|
||||||
|
if (*ref_count == 0)
|
||||||
|
code->memctl.free((void *)code->tables, code->memctl.memory_data);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
code->memctl.free(code, code->memctl.memory_data);
|
code->memctl.free(code, code->memctl.memory_data);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -7317,8 +7335,14 @@ for (i = 0; i < cb->names_found; i++)
|
||||||
|
|
||||||
PUT2(slot, 0, groupno);
|
PUT2(slot, 0, groupno);
|
||||||
memcpy(slot + IMM2_SIZE, name, CU2BYTES(length));
|
memcpy(slot + IMM2_SIZE, name, CU2BYTES(length));
|
||||||
slot[IMM2_SIZE + length] = 0;
|
|
||||||
cb->names_found++;
|
cb->names_found++;
|
||||||
|
|
||||||
|
/* Add a terminating zero and fill the rest of the slot with zeroes so that
|
||||||
|
the memory is all initialized. Otherwise valgrind moans about uninitialized
|
||||||
|
memory when saving serialized compiled patterns. */
|
||||||
|
|
||||||
|
memset(slot + IMM2_SIZE + length, 0,
|
||||||
|
CU2BYTES(cb->name_entry_size - length - IMM2_SIZE));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -7356,6 +7380,7 @@ PCRE2_SPTR codestart; /* Start of compiled code */
|
||||||
PCRE2_SPTR ptr; /* Current pointer in pattern */
|
PCRE2_SPTR ptr; /* Current pointer in pattern */
|
||||||
|
|
||||||
size_t length = 1; /* Allow or final END opcode */
|
size_t length = 1; /* Allow or final END opcode */
|
||||||
|
size_t usedlength; /* Actual length used */
|
||||||
size_t re_blocksize; /* Size of memory block */
|
size_t re_blocksize; /* Size of memory block */
|
||||||
|
|
||||||
int32_t firstcuflags, reqcuflags; /* Type of first/req code unit */
|
int32_t firstcuflags, reqcuflags; /* Type of first/req code unit */
|
||||||
|
@ -7754,13 +7779,16 @@ overflow. */
|
||||||
|
|
||||||
if (errorcode == 0 && ptr < cb.end_pattern) errorcode = ERR22;
|
if (errorcode == 0 && ptr < cb.end_pattern) errorcode = ERR22;
|
||||||
*code++ = OP_END;
|
*code++ = OP_END;
|
||||||
if ((size_t)(code - codestart) > length) errorcode = ERR23;
|
usedlength = code - codestart;
|
||||||
|
if (usedlength > length) errorcode = ERR23;
|
||||||
|
|
||||||
|
/* If the estimated length exceeds the really used length, adjust the value of
|
||||||
|
re->blocksize, and if valgrind support is configured, mark the extra allocated
|
||||||
|
memory as unaddressable, so that any out-of-bound reads can be detected. */
|
||||||
|
|
||||||
|
re->blocksize -= CU2BYTES(length - usedlength);
|
||||||
#ifdef SUPPORT_VALGRIND
|
#ifdef SUPPORT_VALGRIND
|
||||||
/* If the estimated length exceeds the really used length, mark the extra
|
VALGRIND_MAKE_MEM_NOACCESS(code, CU2BYTES(length - usedlength));
|
||||||
allocated memory as unaddressable, so that any out-of-bound reads can be
|
|
||||||
detected. */
|
|
||||||
VALGRIND_MAKE_MEM_NOACCESS(code, (length - (code - codestart)) * sizeof(PCRE2_UCHAR));
|
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* Fill in any forward references that are required. There may be repeated
|
/* Fill in any forward references that are required. There may be repeated
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2014 University of Cambridge
|
New API code Copyright (c) 2015 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -200,7 +200,7 @@ static const char match_error_texts[] =
|
||||||
"UTF-32 error: code points greater than 0x10ffff are not defined\0"
|
"UTF-32 error: code points greater than 0x10ffff are not defined\0"
|
||||||
"bad data value\0"
|
"bad data value\0"
|
||||||
/* 30 */
|
/* 30 */
|
||||||
"bad length\0"
|
"patterns do not all use the same character tables\0"
|
||||||
"magic number missing\0"
|
"magic number missing\0"
|
||||||
"pattern compiled in wrong mode: 8/16/32-bit error\0"
|
"pattern compiled in wrong mode: 8/16/32-bit error\0"
|
||||||
"bad offset value\0"
|
"bad offset value\0"
|
||||||
|
|
|
@ -523,6 +523,7 @@ bytes in a code unit in that mode. */
|
||||||
#define PCRE2_NL_SET 0x00008000 /* newline was set in the pattern */
|
#define PCRE2_NL_SET 0x00008000 /* newline was set in the pattern */
|
||||||
#define PCRE2_NOTEMPTY_SET 0x00010000 /* (*NOTEMPTY) used ) keep */
|
#define PCRE2_NOTEMPTY_SET 0x00010000 /* (*NOTEMPTY) used ) keep */
|
||||||
#define PCRE2_NE_ATST_SET 0x00020000 /* (*NOTEMPTY_ATSTART) used) together */
|
#define PCRE2_NE_ATST_SET 0x00020000 /* (*NOTEMPTY_ATSTART) used) together */
|
||||||
|
#define PCRE2_DEREF_TABLES 0x00040000 /* Release character tables. */
|
||||||
|
|
||||||
#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)
|
#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)
|
||||||
|
|
||||||
|
@ -1763,6 +1764,15 @@ typedef struct {
|
||||||
#define UCD_CASESET(ch) GET_UCD(ch)->caseset
|
#define UCD_CASESET(ch) GET_UCD(ch)->caseset
|
||||||
#define UCD_OTHERCASE(ch) ((uint32_t)((int)ch + (int)(GET_UCD(ch)->other_case)))
|
#define UCD_OTHERCASE(ch) ((uint32_t)((int)ch + (int)(GET_UCD(ch)->other_case)))
|
||||||
|
|
||||||
|
/* Header for serialized pcre2 codes. */
|
||||||
|
|
||||||
|
typedef struct pcre2_serialized_data {
|
||||||
|
uint32_t magic;
|
||||||
|
uint32_t version;
|
||||||
|
uint32_t config;
|
||||||
|
int32_t number_of_codes;
|
||||||
|
} pcre2_serialized_data;
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/* ----------------- Items that need PCRE2_CODE_UNIT_WIDTH ----------------- */
|
/* ----------------- Items that need PCRE2_CODE_UNIT_WIDTH ----------------- */
|
||||||
|
|
|
@ -0,0 +1,251 @@
|
||||||
|
/*************************************************
|
||||||
|
* Perl-Compatible Regular Expressions *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
/* PCRE is a library of functions to support regular expressions whose syntax
|
||||||
|
and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
|
Written by Philip Hazel
|
||||||
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
|
New API code Copyright (c) 2015 University of Cambridge
|
||||||
|
|
||||||
|
-----------------------------------------------------------------------------
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of the University of Cambridge nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||||
|
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||||
|
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||||
|
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||||
|
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||||
|
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||||
|
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||||
|
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||||
|
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
-----------------------------------------------------------------------------
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* This module contains functions for serializing and deserializing
|
||||||
|
a sequence of compiled codes. */
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef HAVE_CONFIG_H
|
||||||
|
#include "config.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
#include "pcre2_internal.h"
|
||||||
|
|
||||||
|
/* Magic number to provide a small check against being handed junk. */
|
||||||
|
|
||||||
|
#define SERIALIZED_DATA_MAGIC 0x50523253u
|
||||||
|
|
||||||
|
/* Deserialization is limited to the current PCRE version and
|
||||||
|
character width. */
|
||||||
|
|
||||||
|
#define SERIALIZED_DATA_VERSION \
|
||||||
|
((PCRE2_MAJOR) | ((PCRE2_MINOR) << 16))
|
||||||
|
|
||||||
|
#define SERIALIZED_DATA_CONFIG \
|
||||||
|
(sizeof(PCRE2_UCHAR) | ((sizeof(void*)) << 8) | ((sizeof(PCRE2_SIZE)) << 16))
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Serialize compiled patterns *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_encode(const pcre2_code **codes, int32_t number_of_codes,
|
||||||
|
uint8_t **serialized_bytes, PCRE2_SIZE *serialized_size,
|
||||||
|
pcre2_general_context *gcontext)
|
||||||
|
{
|
||||||
|
uint8_t *bytes;
|
||||||
|
uint8_t *dst_bytes;
|
||||||
|
int32_t i;
|
||||||
|
PCRE2_SIZE total_size;
|
||||||
|
const pcre2_real_code *re;
|
||||||
|
const uint8_t *tables;
|
||||||
|
pcre2_serialized_data *data;
|
||||||
|
|
||||||
|
const pcre2_memctl *memctl = (gcontext != NULL) ?
|
||||||
|
&gcontext->memctl : &PRIV(default_compile_context).memctl;
|
||||||
|
|
||||||
|
if (codes == NULL || serialized_bytes == NULL || serialized_size == NULL)
|
||||||
|
return PCRE2_ERROR_NULL;
|
||||||
|
|
||||||
|
if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA;
|
||||||
|
|
||||||
|
/* Compute total size. */
|
||||||
|
total_size = sizeof(pcre2_serialized_data) + tables_length;
|
||||||
|
tables = NULL;
|
||||||
|
|
||||||
|
for (i = 0; i < number_of_codes; i++)
|
||||||
|
{
|
||||||
|
if (codes[i] == NULL) return PCRE2_ERROR_NULL;
|
||||||
|
re = (const pcre2_real_code *)(codes[i]);
|
||||||
|
if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC;
|
||||||
|
if (tables == NULL)
|
||||||
|
tables = re->tables;
|
||||||
|
else if (tables != re->tables)
|
||||||
|
return PCRE2_ERROR_MIXEDTABLES;
|
||||||
|
total_size += re->blocksize;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Initialize the byte stream. */
|
||||||
|
bytes = memctl->malloc(total_size + sizeof(pcre2_memctl), memctl->memory_data);
|
||||||
|
if (bytes == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
|
||||||
|
/* The controller is stored as a hidden parameter. */
|
||||||
|
memcpy(bytes, memctl, sizeof(pcre2_memctl));
|
||||||
|
bytes += sizeof(pcre2_memctl);
|
||||||
|
|
||||||
|
data = (pcre2_serialized_data *)bytes;
|
||||||
|
data->magic = SERIALIZED_DATA_MAGIC;
|
||||||
|
data->version = SERIALIZED_DATA_VERSION;
|
||||||
|
data->config = SERIALIZED_DATA_CONFIG;
|
||||||
|
data->number_of_codes = number_of_codes;
|
||||||
|
|
||||||
|
/* Copy all compiled code data. */
|
||||||
|
dst_bytes = bytes + sizeof(pcre2_serialized_data);
|
||||||
|
memcpy(dst_bytes, tables, tables_length);
|
||||||
|
dst_bytes += tables_length;
|
||||||
|
|
||||||
|
for (i = 0; i < number_of_codes; i++)
|
||||||
|
{
|
||||||
|
re = (const pcre2_real_code *)(codes[i]);
|
||||||
|
memcpy(dst_bytes, (char *)re, re->blocksize);
|
||||||
|
dst_bytes += re->blocksize;
|
||||||
|
}
|
||||||
|
|
||||||
|
*serialized_bytes = bytes;
|
||||||
|
*serialized_size = total_size;
|
||||||
|
return number_of_codes;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Deserialize compiled patterns *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_decode(pcre2_code **codes, int32_t number_of_codes,
|
||||||
|
const uint8_t *bytes, pcre2_general_context *gcontext)
|
||||||
|
{
|
||||||
|
const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes;
|
||||||
|
const pcre2_memctl *memctl = (gcontext != NULL) ?
|
||||||
|
&gcontext->memctl : &PRIV(default_compile_context).memctl;
|
||||||
|
|
||||||
|
const uint8_t *src_bytes;
|
||||||
|
pcre2_real_code *src_re;
|
||||||
|
pcre2_real_code *dst_re;
|
||||||
|
uint8_t *tables;
|
||||||
|
int32_t i, j;
|
||||||
|
|
||||||
|
/* Sanity checks. */
|
||||||
|
|
||||||
|
if (data == NULL || codes == NULL) return PCRE2_ERROR_NULL;
|
||||||
|
if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA;
|
||||||
|
if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC;
|
||||||
|
if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE;
|
||||||
|
if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE;
|
||||||
|
|
||||||
|
if (number_of_codes > data->number_of_codes)
|
||||||
|
number_of_codes = data->number_of_codes;
|
||||||
|
|
||||||
|
src_bytes = bytes + sizeof(pcre2_serialized_data);
|
||||||
|
|
||||||
|
/* Decode tables. The reference count for the tables is stored immediately
|
||||||
|
following them. */
|
||||||
|
|
||||||
|
tables = memctl->malloc(tables_length + sizeof(PCRE2_SIZE), memctl->memory_data);
|
||||||
|
if (tables == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
|
||||||
|
memcpy(tables, src_bytes, tables_length);
|
||||||
|
*(PCRE2_SIZE *)(tables + tables_length) = number_of_codes;
|
||||||
|
src_bytes += tables_length;
|
||||||
|
|
||||||
|
/* Decode byte stream. */
|
||||||
|
|
||||||
|
for (i = 0; i < number_of_codes; i++)
|
||||||
|
{
|
||||||
|
src_re = (pcre2_real_code *)src_bytes;
|
||||||
|
|
||||||
|
/* The allocator provided by gcontext replaces the original one. */
|
||||||
|
dst_re = (pcre2_real_code *)PRIV(memctl_malloc)
|
||||||
|
(src_re->blocksize, (pcre2_memctl *)gcontext);
|
||||||
|
if (dst_re == NULL)
|
||||||
|
{
|
||||||
|
memctl->free(tables, memctl->memory_data);
|
||||||
|
for (j = 0; j < i; j++)
|
||||||
|
{
|
||||||
|
memctl->free(codes[j], memctl->memory_data);
|
||||||
|
codes[j] = NULL;
|
||||||
|
}
|
||||||
|
return PCRE2_ERROR_NOMEMORY;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* The new allocator must be preserved. */
|
||||||
|
memcpy(((uint8_t *)dst_re) + sizeof(pcre2_memctl),
|
||||||
|
src_bytes + sizeof(pcre2_memctl),
|
||||||
|
src_re->blocksize - sizeof(pcre2_memctl));
|
||||||
|
|
||||||
|
/* At the moment only one table is supported. */
|
||||||
|
dst_re->tables = tables;
|
||||||
|
dst_re->executable_jit = NULL;
|
||||||
|
dst_re->flags |= PCRE2_DEREF_TABLES;
|
||||||
|
|
||||||
|
codes[i] = dst_re;
|
||||||
|
src_bytes += src_re->blocksize;
|
||||||
|
}
|
||||||
|
|
||||||
|
return number_of_codes;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Get the number of serialized patterns *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_get_number_of_codes(const uint8_t *bytes)
|
||||||
|
{
|
||||||
|
const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes;
|
||||||
|
|
||||||
|
if (data == NULL) return PCRE2_ERROR_NULL;
|
||||||
|
if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC;
|
||||||
|
if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE;
|
||||||
|
if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE;
|
||||||
|
|
||||||
|
return data->number_of_codes;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Free the allocated stream *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION
|
||||||
|
pcre2_serialize_free(uint8_t *bytes)
|
||||||
|
{
|
||||||
|
if (bytes != NULL)
|
||||||
|
{
|
||||||
|
pcre2_memctl *memctl = (pcre2_memctl *)(bytes - sizeof(pcre2_memctl));
|
||||||
|
memctl->free(memctl, memctl->memory_data);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* End of pcre2_serialize.c */
|
514
src/pcre2test.c
514
src/pcre2test.c
|
@ -166,6 +166,7 @@ void vms_setsymbol( char *, char *, int );
|
||||||
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
||||||
#define LOCALESIZE 32 /* Size of locale name */
|
#define LOCALESIZE 32 /* Size of locale name */
|
||||||
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
||||||
|
#define PATSTACKSIZE 20 /* Pattern stack for save/restore testing */
|
||||||
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
||||||
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
||||||
|
|
||||||
|
@ -313,6 +314,26 @@ modes, so use the form of the first that is available. */
|
||||||
#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_32
|
#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_32
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
/* ------------- Structure and table for handling #-commands ------------- */
|
||||||
|
|
||||||
|
typedef struct cmdstruct {
|
||||||
|
const char *name;
|
||||||
|
int value;
|
||||||
|
} cmdstruct;
|
||||||
|
|
||||||
|
enum { CMD_FORBID_UTF, CMD_LOAD, CMD_PATTERN, CMD_PERLTEST, CMD_POP, CMD_SAVE,
|
||||||
|
CMD_SUBJECT, CMD_UNKNOWN };
|
||||||
|
|
||||||
|
static cmdstruct cmdlist[] = {
|
||||||
|
{ "forbid_utf", CMD_FORBID_UTF },
|
||||||
|
{ "load", CMD_LOAD },
|
||||||
|
{ "pattern", CMD_PATTERN },
|
||||||
|
{ "perltest", CMD_PERLTEST },
|
||||||
|
{ "pop", CMD_POP },
|
||||||
|
{ "save", CMD_SAVE },
|
||||||
|
{ "subject", CMD_SUBJECT }};
|
||||||
|
|
||||||
|
#define cmdlistcount sizeof(cmdlist)/sizeof(cmdstruct)
|
||||||
|
|
||||||
/* ------------- Structures and tables for handling modifiers -------------- */
|
/* ------------- Structures and tables for handling modifiers -------------- */
|
||||||
|
|
||||||
|
@ -367,8 +388,9 @@ either on a pattern or a data line, so they must all be distinct. */
|
||||||
#define CTL_MARK 0x00020000u
|
#define CTL_MARK 0x00020000u
|
||||||
#define CTL_MEMORY 0x00040000u
|
#define CTL_MEMORY 0x00040000u
|
||||||
#define CTL_POSIX 0x00080000u
|
#define CTL_POSIX 0x00080000u
|
||||||
#define CTL_STARTCHAR 0x00100000u
|
#define CTL_PUSH 0x00100000u
|
||||||
#define CTL_ZERO_TERMINATE 0x00200000u
|
#define CTL_STARTCHAR 0x00200000u
|
||||||
|
#define CTL_ZERO_TERMINATE 0x00400000u
|
||||||
|
|
||||||
#define CTL_BSR_SET 0x80000000u /* This is informational */
|
#define CTL_BSR_SET 0x80000000u /* This is informational */
|
||||||
#define CTL_NL_SET 0x40000000u /* This is informational */
|
#define CTL_NL_SET 0x40000000u /* This is informational */
|
||||||
|
@ -426,6 +448,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
|
||||||
/* Ids for which context to modify. */
|
/* Ids for which context to modify. */
|
||||||
|
|
||||||
enum { CTX_PAT, /* Active pattern context */
|
enum { CTX_PAT, /* Active pattern context */
|
||||||
|
CTX_POPPAT, /* Ditto, for a popped pattern */
|
||||||
CTX_DEFPAT, /* Default pattern context */
|
CTX_DEFPAT, /* Default pattern context */
|
||||||
CTX_DAT, /* Active data (match) context */
|
CTX_DAT, /* Active data (match) context */
|
||||||
CTX_DEFDAT }; /* Default data (match) context */
|
CTX_DEFDAT }; /* Default data (match) context */
|
||||||
|
@ -513,6 +536,7 @@ static modstruct modlist[] = {
|
||||||
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||||
|
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
|
||||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
||||||
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
||||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||||
|
@ -544,6 +568,20 @@ static modstruct modlist[] = {
|
||||||
|
|
||||||
#define EXCLUSIVE_DAT_CONTROLS (CTL_ALLUSEDTEXT|CTL_STARTCHAR)
|
#define EXCLUSIVE_DAT_CONTROLS (CTL_ALLUSEDTEXT|CTL_STARTCHAR)
|
||||||
|
|
||||||
|
/* Control bits that are not ignored with 'push'. */
|
||||||
|
|
||||||
|
#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \
|
||||||
|
CTL_BINCODE|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO|CTL_JITVERIFY| \
|
||||||
|
CTL_MEMORY|CTL_PUSH|CTL_BSR_SET|CTL_NL_SET)
|
||||||
|
|
||||||
|
/* Controls that apply only at compile time with 'push'. */
|
||||||
|
|
||||||
|
#define PUSH_COMPILE_ONLY_CONTROLS CTL_JITVERIFY
|
||||||
|
|
||||||
|
/* Controls that are forbidden with #pop. */
|
||||||
|
|
||||||
|
#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_PUSH)
|
||||||
|
|
||||||
/* Table of single-character abbreviated modifiers. The index field is
|
/* Table of single-character abbreviated modifiers. The index field is
|
||||||
initialized to -1, but the first time the modifier is encountered, it is filled
|
initialized to -1, but the first time the modifier is encountered, it is filled
|
||||||
in with the index of the full entry in modlist, to save repeated searching when
|
in with the index of the full entry in modlist, to save repeated searching when
|
||||||
|
@ -671,6 +709,9 @@ static patctl pat_patctl;
|
||||||
static datctl def_datctl;
|
static datctl def_datctl;
|
||||||
static datctl dat_datctl;
|
static datctl dat_datctl;
|
||||||
|
|
||||||
|
static void *patstack[PATSTACKSIZE];
|
||||||
|
static int patstacknext = 0;
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
static regex_t preg = { NULL, NULL, 0, 0 };
|
static regex_t preg = { NULL, NULL, 0, 0 };
|
||||||
#endif
|
#endif
|
||||||
|
@ -928,6 +969,38 @@ are supported. */
|
||||||
else \
|
else \
|
||||||
pcre2_printint_32(compiled_code32,outfile,a)
|
pcre2_printint_32(compiled_code32,outfile,a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8)); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16)); \
|
||||||
|
else \
|
||||||
|
r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8)); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16)); \
|
||||||
|
else \
|
||||||
|
r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
pcre2_serialize_free_8(a); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
pcre2_serialize_free_16(a); \
|
||||||
|
else \
|
||||||
|
pcre2_serialize_free_32(a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
if (test_mode == PCRE8_MODE) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_8(a); \
|
||||||
|
else if (test_mode == PCRE16_MODE) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_16(a); \
|
||||||
|
else \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_32(a); \
|
||||||
|
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
if (test_mode == PCRE8_MODE) \
|
if (test_mode == PCRE8_MODE) \
|
||||||
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c); \
|
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c); \
|
||||||
|
@ -1297,11 +1370,35 @@ the three different cases. */
|
||||||
a = G(pcre2_pattern_info_,BITTWO)(G(b,BITTWO),c,d)
|
a = G(pcre2_pattern_info_,BITTWO)(G(b,BITTWO),c,d)
|
||||||
|
|
||||||
#define PCRE2_PRINTINT(a) \
|
#define PCRE2_PRINTINT(a) \
|
||||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
G(pcre2_printint_,BITONE)(G(compiled_code,BITONE),outfile,a); \
|
G(pcre2_printint_,BITONE)(G(compiled_code,BITONE),outfile,a); \
|
||||||
else \
|
else \
|
||||||
G(pcre2_printint_,BITTWO)(G(compiled_code,BITTWO),outfile,a)
|
G(pcre2_printint_,BITTWO)(G(compiled_code,BITTWO),outfile,a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
r = G(pcre2_serialize_decode_,BITONE)((G(pcre2_code_,BITONE) **)a,b,c,G(d,BITONE)); \
|
||||||
|
else \
|
||||||
|
r = G(pcre2_serialize_decode_,BITTWO)((G(pcre2_code_,BITTWO) **)a,b,c,G(d,BITTWO))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
r = G(pcre2_serialize_encode_,BITONE)((G(const pcre2_code_,BITONE) **)a,b,c,d,G(e,BITONE)); \
|
||||||
|
else \
|
||||||
|
r = G(pcre2_serialize_encode_,BITTWO)((G(const pcre2_code_,BITTWO) **)a,b,c,d,G(e,BITTWO))
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
G(pcre2_serialize_free_,BITONE)(a); \
|
||||||
|
else \
|
||||||
|
G(pcre2_serialize_free_,BITTWO)(a)
|
||||||
|
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
|
r = G(pcre2_serialize_get_number_of_codes_,BITONE)(a); \
|
||||||
|
else \
|
||||||
|
r = G(pcre2_serialize_get_number_of_codes_,BITTWO)(a)
|
||||||
|
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
|
||||||
G(pcre2_set_callout_,BITONE)(G(a,BITONE), \
|
G(pcre2_set_callout_,BITONE)(G(a,BITONE), \
|
||||||
|
@ -1510,6 +1607,13 @@ the three different cases. */
|
||||||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_8(G(a,8))
|
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_8(G(a,8))
|
||||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_8(G(b,8),c,d)
|
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_8(G(b,8),c,d)
|
||||||
#define PCRE2_PRINTINT(a) pcre2_printint_8(compiled_code8,outfile,a)
|
#define PCRE2_PRINTINT(a) pcre2_printint_8(compiled_code8,outfile,a)
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8))
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8))
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_8(a)
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_8(a)
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c)
|
pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c)
|
||||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_8(G(a,8),b)
|
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_8(G(a,8),b)
|
||||||
|
@ -1591,6 +1695,13 @@ the three different cases. */
|
||||||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_16(G(a,16))
|
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_16(G(a,16))
|
||||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_16(G(b,16),c,d)
|
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_16(G(b,16),c,d)
|
||||||
#define PCRE2_PRINTINT(a) pcre2_printint_16(compiled_code16,outfile,a)
|
#define PCRE2_PRINTINT(a) pcre2_printint_16(compiled_code16,outfile,a)
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16))
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16))
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_16(a)
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_16(a)
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
pcre2_set_callout_16(G(a,16),(int (*)(pcre2_callout_block_16 *, void *))b,c);
|
pcre2_set_callout_16(G(a,16),(int (*)(pcre2_callout_block_16 *, void *))b,c);
|
||||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_16(G(a,16),b)
|
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_16(G(a,16),b)
|
||||||
|
@ -1672,6 +1783,13 @@ the three different cases. */
|
||||||
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_32(G(a,32))
|
#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_32(G(a,32))
|
||||||
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_32(G(b,32),c,d)
|
#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_32(G(b,32),c,d)
|
||||||
#define PCRE2_PRINTINT(a) pcre2_printint_32(compiled_code32,outfile,a)
|
#define PCRE2_PRINTINT(a) pcre2_printint_32(compiled_code32,outfile,a)
|
||||||
|
#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \
|
||||||
|
r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32))
|
||||||
|
#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \
|
||||||
|
r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32))
|
||||||
|
#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_32(a)
|
||||||
|
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
|
||||||
|
r = pcre2_serialize_get_number_of_codes_32(a)
|
||||||
#define PCRE2_SET_CALLOUT(a,b,c) \
|
#define PCRE2_SET_CALLOUT(a,b,c) \
|
||||||
pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c);
|
pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c);
|
||||||
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_32(G(a,32),b)
|
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_32(G(a,32),b)
|
||||||
|
@ -2792,6 +2910,7 @@ it is allowed here and find the field that is to be changed.
|
||||||
Arguments:
|
Arguments:
|
||||||
m the modifier list entry
|
m the modifier list entry
|
||||||
ctx CTX_PAT => pattern context
|
ctx CTX_PAT => pattern context
|
||||||
|
CTX_POPPAT => pattern context for popped pattern
|
||||||
CTX_DEFPAT => default pattern context
|
CTX_DEFPAT => default pattern context
|
||||||
CTX_DAT => data context
|
CTX_DAT => data context
|
||||||
CTX_DEFDAT => default data context
|
CTX_DEFDAT => default data context
|
||||||
|
@ -2837,8 +2956,8 @@ switch (m->which)
|
||||||
if (dctl != NULL) field = dctl;
|
if (dctl != NULL) field = dctl;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case MOD_PAT: /* Pattern modifier */
|
case MOD_PAT: /* Pattern modifier */
|
||||||
case MOD_PATP: /* Allowed for Perl test */
|
case MOD_PATP: /* Allowed for Perl test */
|
||||||
if (pctl != NULL) field = pctl;
|
if (pctl != NULL) field = pctl;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
@ -2878,6 +2997,7 @@ modifiers that apply to contexts.
|
||||||
Arguments:
|
Arguments:
|
||||||
p point to modifier string
|
p point to modifier string
|
||||||
ctx CTX_PAT => pattern context
|
ctx CTX_PAT => pattern context
|
||||||
|
CTX_POPPAT => pattern context for popped pattern
|
||||||
CTX_DEFPAT => default pattern context
|
CTX_DEFPAT => default pattern context
|
||||||
CTX_DAT => data context
|
CTX_DAT => data context
|
||||||
CTX_DEFDAT => default data context
|
CTX_DEFDAT => default data context
|
||||||
|
@ -2902,11 +3022,8 @@ for (;;)
|
||||||
int index;
|
int index;
|
||||||
char *endptr;
|
char *endptr;
|
||||||
|
|
||||||
/* Skip white space and commas; after a comma we have passed the first
|
/* Skip white space and commas. */
|
||||||
item. */
|
|
||||||
|
|
||||||
while (isspace(*p)) p++;
|
|
||||||
if (*p == ',') first = FALSE;
|
|
||||||
while (isspace(*p) || *p == ',') p++;
|
while (isspace(*p) || *p == ',') p++;
|
||||||
if (*p == 0) break;
|
if (*p == 0) break;
|
||||||
|
|
||||||
|
@ -3163,6 +3280,17 @@ for (;;)
|
||||||
}
|
}
|
||||||
|
|
||||||
p = pp;
|
p = pp;
|
||||||
|
first = FALSE;
|
||||||
|
|
||||||
|
if (ctx == CTX_POPPAT &&
|
||||||
|
(pctl->options != 0 ||
|
||||||
|
pctl->tables_id != 0 ||
|
||||||
|
pctl->locale[0] != 0 ||
|
||||||
|
(pctl->control & NOTPOP_CONTROLS) != 0))
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** '%s' is not valid here\n", m->name);
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return TRUE;
|
return TRUE;
|
||||||
|
@ -3246,7 +3374,7 @@ Returns: nothing
|
||||||
static void
|
static void
|
||||||
show_controls(uint32_t controls, const char *before)
|
show_controls(uint32_t controls, const char *before)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
before,
|
before,
|
||||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||||
|
@ -3268,6 +3396,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
((controls & CTL_MARK) != 0)? " mark" : "",
|
((controls & CTL_MARK) != 0)? " mark" : "",
|
||||||
((controls & CTL_MEMORY) != 0)? " memory" : "",
|
((controls & CTL_MEMORY) != 0)? " memory" : "",
|
||||||
((controls & CTL_POSIX) != 0)? " posix" : "",
|
((controls & CTL_POSIX) != 0)? " posix" : "",
|
||||||
|
((controls & CTL_PUSH) != 0)? " push" : "",
|
||||||
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
|
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
|
||||||
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
|
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
|
||||||
}
|
}
|
||||||
|
@ -3347,6 +3476,40 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s",
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Show memory usage info for a pattern *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
static void
|
||||||
|
show_memory_info(void)
|
||||||
|
{
|
||||||
|
uint32_t name_count, name_entry_size;
|
||||||
|
size_t size, cblock_size;
|
||||||
|
|
||||||
|
#ifdef SUPPORT_PCRE2_8
|
||||||
|
if (test_mode == 8) cblock_size = sizeof(pcre2_real_code_8);
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_16
|
||||||
|
if (test_mode == 16) cblock_size = sizeof(pcre2_real_code_16);
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_32
|
||||||
|
if (test_mode == 32) cblock_size = sizeof(pcre2_real_code_32);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
(void)pattern_info(PCRE2_INFO_SIZE, &size, FALSE);
|
||||||
|
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
||||||
|
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
||||||
|
fprintf(outfile, "Memory allocation (code space): %d\n",
|
||||||
|
(int)(size - name_count*name_entry_size*code_unit_size - cblock_size));
|
||||||
|
if (pat_patctl.jit != 0)
|
||||||
|
{
|
||||||
|
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
||||||
|
fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
* Show information about a pattern *
|
* Show information about a pattern *
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
@ -3624,12 +3787,79 @@ return PR_OK;
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Handle serialization error *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
/* Print an error message after a serialization failure.
|
||||||
|
|
||||||
|
Arguments:
|
||||||
|
rc the error code
|
||||||
|
msg an initial message for what failed
|
||||||
|
|
||||||
|
Returns: nothing
|
||||||
|
*/
|
||||||
|
|
||||||
|
static void
|
||||||
|
serial_error(int rc, const char *msg)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "%s failed: error %d: ", msg, rc);
|
||||||
|
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||||
|
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||||
|
fprintf(outfile, "\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*************************************************
|
||||||
|
* Open file for save/load commands *
|
||||||
|
*************************************************/
|
||||||
|
|
||||||
|
/* This function decodes the file name and opens the file.
|
||||||
|
|
||||||
|
Arguments:
|
||||||
|
buffptr point after the #command
|
||||||
|
mode open mode
|
||||||
|
fptr points to the FILE variable
|
||||||
|
|
||||||
|
Returns: PR_OK or PR_ABEND
|
||||||
|
*/
|
||||||
|
|
||||||
|
static int
|
||||||
|
open_file(uint8_t *buffptr, const char *mode, FILE **fptr)
|
||||||
|
{
|
||||||
|
char *endf;
|
||||||
|
char *filename = (char *)buffptr;
|
||||||
|
while (isspace(*filename)) filename++;
|
||||||
|
endf = filename + strlen8(filename);
|
||||||
|
while (endf > filename && isspace(endf[-1])) endf--;
|
||||||
|
|
||||||
|
if (endf == filename)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** File name expected after #save\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
*endf = 0;
|
||||||
|
*fptr = fopen((const char *)filename, mode);
|
||||||
|
if (*fptr == NULL)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Failed to open '%s'\n", filename);
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
return PR_OK;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
* Process command line *
|
* Process command line *
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This function is called for lines beginning with # and a character that is
|
/* This function is called for lines beginning with # and a character that is
|
||||||
not ! or whitespace, when encountered between tests. The line is in buffer.
|
not ! or whitespace, when encountered between tests, which means that there is
|
||||||
|
no compiled pattern (compiled_code is NULL). The line is in buffer.
|
||||||
|
|
||||||
Arguments: none
|
Arguments: none
|
||||||
|
|
||||||
|
@ -3641,33 +3871,176 @@ Returns: PR_OK continue processing next line
|
||||||
static int
|
static int
|
||||||
process_command(void)
|
process_command(void)
|
||||||
{
|
{
|
||||||
|
FILE *f;
|
||||||
|
PCRE2_SIZE serial_size;
|
||||||
|
size_t i;
|
||||||
|
int rc, cmd, cmdlen;
|
||||||
|
const char *cmdname;
|
||||||
|
uint8_t *argptr, *serial;
|
||||||
|
|
||||||
if (restrict_for_perl_test)
|
if (restrict_for_perl_test)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
|
fprintf(outfile, "** #-commands are not allowed after #perltest\n");
|
||||||
return PR_ABEND;
|
return PR_ABEND;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (strncmp((char *)buffer, "#forbid_utf", 11) == 0 && isspace(buffer[11]))
|
cmd = CMD_UNKNOWN;
|
||||||
|
cmdlen = 0;
|
||||||
|
|
||||||
|
for (i = 0; i < cmdlistcount; i++)
|
||||||
{
|
{
|
||||||
forbid_utf = PCRE2_NEVER_UTF|PCRE2_NEVER_UCP;
|
cmdname = cmdlist[i].name;
|
||||||
|
cmdlen = strlen(cmdname);
|
||||||
|
if (strncmp((char *)(buffer+1), cmdname, cmdlen) == 0 &&
|
||||||
|
isspace(buffer[cmdlen+1]))
|
||||||
|
{
|
||||||
|
cmd = cmdlist[i].value;
|
||||||
|
break;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
else if (strncmp((char *)buffer, "#pattern", 8) == 0 && isspace(buffer[8]))
|
|
||||||
|
argptr = buffer + cmdlen + 1;
|
||||||
|
|
||||||
|
switch(cmd)
|
||||||
{
|
{
|
||||||
(void)decode_modifiers(buffer + 8, CTX_DEFPAT, &def_patctl, NULL);
|
case CMD_UNKNOWN:
|
||||||
|
fprintf(outfile, "** Unknown command: %s", buffer);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case CMD_FORBID_UTF:
|
||||||
|
forbid_utf = PCRE2_NEVER_UTF|PCRE2_NEVER_UCP;
|
||||||
|
break;
|
||||||
|
|
||||||
|
case CMD_PERLTEST:
|
||||||
|
restrict_for_perl_test = TRUE;
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Set default pattern modifiers */
|
||||||
|
|
||||||
|
case CMD_PATTERN:
|
||||||
|
(void)decode_modifiers(argptr, CTX_DEFPAT, &def_patctl, NULL);
|
||||||
if (def_patctl.jit == 0 && (def_patctl.control & CTL_JITVERIFY) != 0)
|
if (def_patctl.jit == 0 && (def_patctl.control & CTL_JITVERIFY) != 0)
|
||||||
def_patctl.jit = 7;
|
def_patctl.jit = 7;
|
||||||
}
|
break;
|
||||||
else if (strncmp((char *)buffer, "#perltest", 9) == 0 && isspace(buffer[9]))
|
|
||||||
{
|
/* Set default subject modifiers */
|
||||||
restrict_for_perl_test = TRUE;
|
|
||||||
}
|
case CMD_SUBJECT:
|
||||||
else if (strncmp((char *)buffer, "#subject", 8) == 0 && isspace(buffer[8]))
|
(void)decode_modifiers(argptr, CTX_DEFDAT, NULL, &def_datctl);
|
||||||
{
|
break;
|
||||||
(void)decode_modifiers(buffer + 8, CTX_DEFDAT, NULL, &def_datctl);
|
|
||||||
}
|
/* Pop a compiled pattern off the stack. Modifiers that do not affect the
|
||||||
else
|
compiled pattern (e.g. to give information) are permitted. The default
|
||||||
{
|
pattern modifiers are ignored. */
|
||||||
fprintf(outfile, "** Unknown command: %s", buffer);
|
|
||||||
|
case CMD_POP:
|
||||||
|
if (patstacknext <= 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Can't pop off an empty stack\n");
|
||||||
|
return PR_SKIP;
|
||||||
|
}
|
||||||
|
memset(&pat_patctl, 0, sizeof(patctl)); /* Completely unset */
|
||||||
|
if (!decode_modifiers(argptr, CTX_POPPAT, &pat_patctl, NULL))
|
||||||
|
return PR_SKIP;
|
||||||
|
SET(compiled_code, patstack[--patstacknext]);
|
||||||
|
if (pat_patctl.jit != 0)
|
||||||
|
{
|
||||||
|
PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
|
||||||
|
}
|
||||||
|
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||||
|
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||||
|
{
|
||||||
|
rc = show_pattern_info();
|
||||||
|
if (rc != PR_OK) return rc;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Save the stack of compiled patterns to a file, then empty the stack. */
|
||||||
|
|
||||||
|
case CMD_SAVE:
|
||||||
|
if (patstacknext <= 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** No stacked patterns to save\n");
|
||||||
|
return PR_OK;
|
||||||
|
}
|
||||||
|
|
||||||
|
rc = open_file(argptr+1, OUTPUT_MODE, &f);
|
||||||
|
if (rc != PR_OK) return rc;
|
||||||
|
|
||||||
|
PCRE2_SERIALIZE_ENCODE(rc, patstack, patstacknext, &serial, &serial_size,
|
||||||
|
general_context);
|
||||||
|
if (rc < 0)
|
||||||
|
{
|
||||||
|
serial_error(rc, "Serialization");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Write the length at the start of the file to make it straightforward to
|
||||||
|
get the right memory when re-loading. This saves having to read the file size
|
||||||
|
in different operating systems. To allow for different endianness (even
|
||||||
|
though reloading with the opposite endianness does not work), write the
|
||||||
|
length byte-by-byte. */
|
||||||
|
|
||||||
|
for (i = 0; i < 4; i++) fputc((serial_size >> (i*8)) & 255, f);
|
||||||
|
if (fwrite(serial, 1, serial_size, f) != serial_size)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Wrong return from fwrite()\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
fclose(f);
|
||||||
|
PCRE2_SERIALIZE_FREE(serial);
|
||||||
|
while(patstacknext > 0)
|
||||||
|
{
|
||||||
|
SET(compiled_code, patstack[--patstacknext]);
|
||||||
|
SUB1(pcre2_code_free, compiled_code);
|
||||||
|
}
|
||||||
|
SET(compiled_code, NULL);
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* Load a set of compiled patterns from a file onto the stack */
|
||||||
|
|
||||||
|
case CMD_LOAD:
|
||||||
|
rc = open_file(argptr+1, INPUT_MODE, &f);
|
||||||
|
if (rc != PR_OK) return rc;
|
||||||
|
|
||||||
|
serial_size = 0;
|
||||||
|
for (i = 0; i < 4; i++) serial_size |= fgetc(f) << (i*8);
|
||||||
|
|
||||||
|
serial = malloc(serial_size);
|
||||||
|
if (serial == NULL)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Failed to get memory (size %ld) for #load\n",
|
||||||
|
serial_size);
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (fread(serial, 1, serial_size, f) != serial_size)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Wrong return from fread()\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
fclose(f);
|
||||||
|
|
||||||
|
PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(rc, serial);
|
||||||
|
if (rc < 0) serial_error(rc, "Get number of codes"); else
|
||||||
|
{
|
||||||
|
if (rc + patstacknext > PATSTACKSIZE)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Not enough space on pattern stack for %d pattern%s\n",
|
||||||
|
rc, (rc == 1)? "" : "s");
|
||||||
|
rc = PATSTACKSIZE - patstacknext;
|
||||||
|
fprintf(outfile, "** Decoding %d pattern%s\n", rc,
|
||||||
|
(rc == 1)? "" : "s");
|
||||||
|
}
|
||||||
|
PCRE2_SERIALIZE_DECODE(rc, patstack + patstacknext, rc, serial,
|
||||||
|
general_context);
|
||||||
|
if (rc < 0) serial_error(rc, "Deserialization");
|
||||||
|
else patstacknext += rc;
|
||||||
|
}
|
||||||
|
|
||||||
|
free(serial);
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
return PR_OK;
|
return PR_OK;
|
||||||
|
@ -3750,6 +4123,14 @@ if (pat_patctl.jit == 0 &&
|
||||||
(pat_patctl.control & (CTL_JITVERIFY|CTL_JITFAST)) != 0)
|
(pat_patctl.control & (CTL_JITVERIFY|CTL_JITFAST)) != 0)
|
||||||
pat_patctl.jit = 7;
|
pat_patctl.jit = 7;
|
||||||
|
|
||||||
|
/* POSIX and 'push' do not play together. */
|
||||||
|
|
||||||
|
if ((pat_patctl.control & (CTL_POSIX|CTL_PUSH)) == (CTL_POSIX|CTL_PUSH))
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** The POSIX interface is incompatible with 'push'\n");
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
|
||||||
/* Now copy the pattern to pbuffer8 for use in 8-bit testing and for reflecting
|
/* Now copy the pattern to pbuffer8 for use in 8-bit testing and for reflecting
|
||||||
in callouts. Convert to binary if required. */
|
in callouts. Convert to binary if required. */
|
||||||
|
|
||||||
|
@ -3897,8 +4278,31 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
|
||||||
#endif /* SUPPORT_PCRE2_8 */
|
#endif /* SUPPORT_PCRE2_8 */
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Handle compiling via the native interface, converting the input in non-8-bit
|
/* Handle compiling via the native interface. Controls that act later are
|
||||||
modes. */
|
ignored with "push". Replacements are locked out. */
|
||||||
|
|
||||||
|
if ((pat_patctl.control & CTL_PUSH) != 0)
|
||||||
|
{
|
||||||
|
if (pat_patctl.replacement[0] != 0)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Replacement text is not supported with 'push'.\n");
|
||||||
|
return PR_OK;
|
||||||
|
}
|
||||||
|
if ((pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS) != 0)
|
||||||
|
{
|
||||||
|
show_controls(pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS,
|
||||||
|
"** Ignored when compiled pattern is stacked with 'push':");
|
||||||
|
fprintf(outfile, "\n");
|
||||||
|
}
|
||||||
|
if ((pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS) != 0)
|
||||||
|
{
|
||||||
|
show_controls(pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS,
|
||||||
|
"** Applies only to compile when pattern is stacked with 'push':");
|
||||||
|
fprintf(outfile, "\n");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Convert the input in non-8-bit modes. */
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
if (test_mode == PCRE8_MODE) errorcode = 0;
|
if (test_mode == PCRE8_MODE) errorcode = 0;
|
||||||
|
@ -4017,39 +4421,27 @@ if (pat_patctl.jit != 0)
|
||||||
|
|
||||||
/* Output code size and other information if requested. */
|
/* Output code size and other information if requested. */
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_MEMORY) != 0)
|
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||||
{
|
|
||||||
uint32_t name_count, name_entry_size;
|
|
||||||
size_t size, cblock_size;
|
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
|
||||||
if (test_mode == 8) cblock_size = sizeof(pcre2_real_code_8);
|
|
||||||
#endif
|
|
||||||
#ifdef SUPPORT_PCRE2_16
|
|
||||||
if (test_mode == 16) cblock_size = sizeof(pcre2_real_code_16);
|
|
||||||
#endif
|
|
||||||
#ifdef SUPPORT_PCRE2_32
|
|
||||||
if (test_mode == 32) cblock_size = sizeof(pcre2_real_code_32);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
(void)pattern_info(PCRE2_INFO_SIZE, &size, FALSE);
|
|
||||||
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
|
||||||
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
|
||||||
fprintf(outfile, "Memory allocation (code space): %d\n",
|
|
||||||
(int)(size - name_count*name_entry_size*code_unit_size - cblock_size));
|
|
||||||
if (pat_patctl.jit != 0)
|
|
||||||
{
|
|
||||||
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
|
||||||
fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)size);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||||
{
|
{
|
||||||
int rc = show_pattern_info();
|
int rc = show_pattern_info();
|
||||||
if (rc != PR_OK) return rc;
|
if (rc != PR_OK) return rc;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* The "push" control requests that the compiled pattern be remembered on a
|
||||||
|
stack. This is mainly for testing the serialization functionality. */
|
||||||
|
|
||||||
|
if ((pat_patctl.control & CTL_PUSH) != 0)
|
||||||
|
{
|
||||||
|
if (patstacknext >= PATSTACKSIZE)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Too many pushed patterns (max %d)\n", PATSTACKSIZE);
|
||||||
|
return PR_ABEND;
|
||||||
|
}
|
||||||
|
patstack[patstacknext++] = PTR(compiled_code);
|
||||||
|
SET(compiled_code, NULL);
|
||||||
|
}
|
||||||
|
|
||||||
return PR_OK;
|
return PR_OK;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -6253,7 +6645,7 @@ if (argc > 1 && strcmp(argv[op], "-") != 0)
|
||||||
infile = fopen(argv[op], INPUT_MODE);
|
infile = fopen(argv[op], INPUT_MODE);
|
||||||
if (infile == NULL)
|
if (infile == NULL)
|
||||||
{
|
{
|
||||||
printf("** Failed to open %s\n", argv[op]);
|
printf("** Failed to open '%s'\n", argv[op]);
|
||||||
yield = 1;
|
yield = 1;
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
@ -6264,7 +6656,7 @@ if (argc > 2)
|
||||||
outfile = fopen(argv[op+1], OUTPUT_MODE);
|
outfile = fopen(argv[op+1], OUTPUT_MODE);
|
||||||
if (outfile == NULL)
|
if (outfile == NULL)
|
||||||
{
|
{
|
||||||
printf("** Failed to open %s\n", argv[op+1]);
|
printf("** Failed to open '%s'\n", argv[op+1]);
|
||||||
yield = 1;
|
yield = 1;
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
@ -6399,6 +6791,12 @@ free((void *)locale_tables);
|
||||||
PCRE2_MATCH_DATA_FREE(match_data);
|
PCRE2_MATCH_DATA_FREE(match_data);
|
||||||
SUB1(pcre2_code_free, compiled_code);
|
SUB1(pcre2_code_free, compiled_code);
|
||||||
|
|
||||||
|
while(patstacknext-- > 0)
|
||||||
|
{
|
||||||
|
SET(compiled_code, patstack[patstacknext]);
|
||||||
|
SUB1(pcre2_code_free, compiled_code);
|
||||||
|
}
|
||||||
|
|
||||||
PCRE2_JIT_FREE_UNUSED_MEMORY(general_context);
|
PCRE2_JIT_FREE_UNUSED_MEMORY(general_context);
|
||||||
if (jit_stack != NULL)
|
if (jit_stack != NULL)
|
||||||
{
|
{
|
||||||
|
|
|
@ -6,4 +6,4 @@
|
||||||
|
|
||||||
/a*/I
|
/a*/I
|
||||||
|
|
||||||
# End of testinput14
|
# End of testinput15
|
||||||
|
|
|
@ -161,7 +161,7 @@
|
||||||
# match to happen via the interpreter, but for fast JIT invalid options are
|
# match to happen via the interpreter, but for fast JIT invalid options are
|
||||||
# ignored, so an unanchored match happens.
|
# ignored, so an unanchored match happens.
|
||||||
|
|
||||||
/abcd/jit
|
/abcd/
|
||||||
abcd\=anchored
|
abcd\=anchored
|
||||||
fail abcd\=anchored
|
fail abcd\=anchored
|
||||||
|
|
||||||
|
@ -169,4 +169,21 @@
|
||||||
abcd\=anchored
|
abcd\=anchored
|
||||||
succeed abcd\=anchored
|
succeed abcd\=anchored
|
||||||
|
|
||||||
|
# Push/pop does not lose the JIT information, though jitverify applies only to
|
||||||
|
# compilation, but serializing (save/load) discards JIT data completely.
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
#save testsaved1
|
||||||
|
#load testsaved1
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#pop jit,jitverify
|
||||||
|
abcdef
|
||||||
|
|
||||||
# End of testinput16
|
# End of testinput16
|
||||||
|
|
|
@ -0,0 +1,62 @@
|
||||||
|
# This set of tests exercises the serialization/deserialization functions in
|
||||||
|
# the library. It does not use UTF or JIT.
|
||||||
|
|
||||||
|
#forbid_utf
|
||||||
|
|
||||||
|
# Compile several patterns, push them onto the stack, and then write them
|
||||||
|
# all to a file.
|
||||||
|
|
||||||
|
#pattern push
|
||||||
|
|
||||||
|
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
|
||||||
|
(?(DEFINE)
|
||||||
|
(?<NAME_PAT>[a-z]+)
|
||||||
|
(?<ADDRESS_PAT>\d+)
|
||||||
|
)/x
|
||||||
|
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
|
||||||
|
|
||||||
|
#save testsaved1
|
||||||
|
|
||||||
|
# Do it again for some more patterns.
|
||||||
|
|
||||||
|
/(*MARK:A)(*SKIP:B)(C|X)/mark
|
||||||
|
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
|
||||||
|
|
||||||
|
#save testsaved2
|
||||||
|
#pattern -push
|
||||||
|
|
||||||
|
# Reload the patterns, then pop them one by one and check them.
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#load testsaved2
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
foofoo
|
||||||
|
barbar
|
||||||
|
|
||||||
|
#pop mark
|
||||||
|
C
|
||||||
|
D
|
||||||
|
|
||||||
|
#pop
|
||||||
|
AmanaplanacanalPanama
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
metcalfe 33
|
||||||
|
|
||||||
|
# Check for an error when different tables are used.
|
||||||
|
|
||||||
|
/abc/push,tables=1
|
||||||
|
/xyz/push,tables=2
|
||||||
|
#save testsaved1
|
||||||
|
|
||||||
|
#pop
|
||||||
|
xyz
|
||||||
|
|
||||||
|
#pop
|
||||||
|
abc
|
||||||
|
|
||||||
|
#pop should give an error
|
||||||
|
pqr
|
||||||
|
|
||||||
|
# End of testinput19
|
|
@ -14,4 +14,4 @@ Capturing subpattern count = 0
|
||||||
May match empty string
|
May match empty string
|
||||||
Subject length lower bound = 0
|
Subject length lower bound = 0
|
||||||
|
|
||||||
# End of testinput14
|
# End of testinput15
|
||||||
|
|
|
@ -310,7 +310,7 @@ Failed: error -46: JIT stack limit reached
|
||||||
# match to happen via the interpreter, but for fast JIT invalid options are
|
# match to happen via the interpreter, but for fast JIT invalid options are
|
||||||
# ignored, so an unanchored match happens.
|
# ignored, so an unanchored match happens.
|
||||||
|
|
||||||
/abcd/jit
|
/abcd/
|
||||||
abcd\=anchored
|
abcd\=anchored
|
||||||
0: abcd
|
0: abcd
|
||||||
fail abcd\=anchored
|
fail abcd\=anchored
|
||||||
|
@ -322,4 +322,36 @@ No match
|
||||||
succeed abcd\=anchored
|
succeed abcd\=anchored
|
||||||
0: abcd (JIT)
|
0: abcd (JIT)
|
||||||
|
|
||||||
|
# Push/pop does not lose the JIT information, though jitverify applies only to
|
||||||
|
# compilation, but serializing (save/load) discards JIT data completely.
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
** Applied only to compile when pattern is stacked with 'push': jitverify
|
||||||
|
Capturing subpattern count = 0
|
||||||
|
Compile options: <none>
|
||||||
|
Overall options: anchored
|
||||||
|
Subject length lower bound = 6
|
||||||
|
JIT compilation was successful
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
0: def (JIT)
|
||||||
|
|
||||||
|
/^abc\Kdef/info,push
|
||||||
|
** Applied only to compile when pattern is stacked with 'push': jitverify
|
||||||
|
Capturing subpattern count = 0
|
||||||
|
Compile options: <none>
|
||||||
|
Overall options: anchored
|
||||||
|
Subject length lower bound = 6
|
||||||
|
JIT compilation was successful
|
||||||
|
#save testsaved1
|
||||||
|
#load testsaved1
|
||||||
|
#pop jitverify
|
||||||
|
abcdef
|
||||||
|
0: def
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#pop jit,jitverify
|
||||||
|
abcdef
|
||||||
|
0: def (JIT)
|
||||||
|
|
||||||
# End of testinput16
|
# End of testinput16
|
||||||
|
|
|
@ -0,0 +1,100 @@
|
||||||
|
# This set of tests exercises the serialization/deserialization functions in
|
||||||
|
# the library. It does not use UTF or JIT.
|
||||||
|
|
||||||
|
#forbid_utf
|
||||||
|
|
||||||
|
# Compile several patterns, push them onto the stack, and then write them
|
||||||
|
# all to a file.
|
||||||
|
|
||||||
|
#pattern push
|
||||||
|
|
||||||
|
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
|
||||||
|
(?(DEFINE)
|
||||||
|
(?<NAME_PAT>[a-z]+)
|
||||||
|
(?<ADDRESS_PAT>\d+)
|
||||||
|
)/x
|
||||||
|
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
|
||||||
|
|
||||||
|
#save testsaved1
|
||||||
|
|
||||||
|
# Do it again for some more patterns.
|
||||||
|
|
||||||
|
/(*MARK:A)(*SKIP:B)(C|X)/mark
|
||||||
|
** Ignored when compiled pattern is stacked with 'push': mark
|
||||||
|
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
|
||||||
|
|
||||||
|
#save testsaved2
|
||||||
|
#pattern -push
|
||||||
|
|
||||||
|
# Reload the patterns, then pop them one by one and check them.
|
||||||
|
|
||||||
|
#load testsaved1
|
||||||
|
#load testsaved2
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
Capturing subpattern count = 2
|
||||||
|
Max back reference = 2
|
||||||
|
Named capturing subpatterns:
|
||||||
|
n 1
|
||||||
|
n 2
|
||||||
|
Options: dupnames
|
||||||
|
Starting code units: b f
|
||||||
|
Subject length lower bound = 6
|
||||||
|
foofoo
|
||||||
|
0: foofoo
|
||||||
|
1: foo
|
||||||
|
barbar
|
||||||
|
0: barbar
|
||||||
|
1: <unset>
|
||||||
|
2: bar
|
||||||
|
|
||||||
|
#pop mark
|
||||||
|
C
|
||||||
|
0: C
|
||||||
|
1: C
|
||||||
|
MK: A
|
||||||
|
D
|
||||||
|
No match, mark = A
|
||||||
|
|
||||||
|
#pop
|
||||||
|
AmanaplanacanalPanama
|
||||||
|
0: AmanaplanacanalPanama
|
||||||
|
1: <unset>
|
||||||
|
2: <unset>
|
||||||
|
3: AmanaplanacanalPanama
|
||||||
|
4: A
|
||||||
|
|
||||||
|
#pop info
|
||||||
|
Capturing subpattern count = 4
|
||||||
|
Named capturing subpatterns:
|
||||||
|
ADDR 2
|
||||||
|
ADDRESS_PAT 4
|
||||||
|
NAME 1
|
||||||
|
NAME_PAT 3
|
||||||
|
Options: extended
|
||||||
|
Subject length lower bound = 3
|
||||||
|
metcalfe 33
|
||||||
|
0: metcalfe 33
|
||||||
|
1: metcalfe
|
||||||
|
2: 33
|
||||||
|
|
||||||
|
# Check for an error when different tables are used.
|
||||||
|
|
||||||
|
/abc/push,tables=1
|
||||||
|
/xyz/push,tables=2
|
||||||
|
#save testsaved1
|
||||||
|
Serialization failed: error -30: patterns do not all use the same character tables
|
||||||
|
|
||||||
|
#pop
|
||||||
|
xyz
|
||||||
|
0: xyz
|
||||||
|
|
||||||
|
#pop
|
||||||
|
abc
|
||||||
|
0: abc
|
||||||
|
|
||||||
|
#pop should give an error
|
||||||
|
** Can't pop off an empty stack
|
||||||
|
pqr
|
||||||
|
|
||||||
|
# End of testinput19
|
Loading…
Reference in New Issue