More documentation - initial drafts.

This commit is contained in:
Philip.Hazel 2014-09-28 11:31:21 +00:00
parent 81525e6007
commit 22fafb8a6f
4 changed files with 1281 additions and 3 deletions

37
AUTHORS
View File

@ -1 +1,36 @@
This is a placeholder AUTHORS file for a work in progress.
THE MAIN PCRE2 LIBRARY CODE
---------------------------
Written by: Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2014 University of Cambridge
All rights reserved
PCRE JUST-IN-TIME COMPILATION SUPPORT
-------------------------------------
Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2014 Zoltan Herczeg
All rights reserved.
STACK-LESS JUST-IN-TIME COMPILER
--------------------------------
Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2014 Zoltan Herczeg
All rights reserved.
####

12
NEWS
View File

@ -1 +1,11 @@
This is a placeholder NEWS file for a work in progress.
News about PCRE2 releases
-------------------------
Version 10.0 xx-xxxx-2014
-------------------------
Version 10.0 is the first release of PCRE2, a revised API for the PCRE library.
Changes prior to 10.0 are logged in the ChangeLog file for the old API, up to
item 20 for release 8.36.
****

402
NON-AUTOTOOLS-BUILD Normal file
View File

@ -0,0 +1,402 @@
Building PCRE2 without using autotools
--------------------------------------
This document has been converted from the PCRE1 document, but is not yet
complete. I have removed a number of quite old sections about building in
various environments, as they applied only to PCRE1 and are probably out of
date.
This document contains the following sections:
General
Generic instructions for the PCRE2 C library
Building for virtual Pascal
Stack size in Windows environments
Linking programs in Windows environments
Calling conventions in Windows environments
Comments about Win32 builds
Building PCRE2 on Windows with CMake
Testing with RunTest.bat
Building PCRE2 on native z/OS and z/VM
GENERAL
I (Philip Hazel) have no experience of Windows or VMS sytems and how their
libraries work. The items in the PCRE2 distribution and Makefile that relate to
anything other than Linux systems are untested by me.
The basic PCRE2 library consists entirely of code written in Standard C, and so
should compile successfully on any system that has a Standard C compiler and
library.
The PCRE2 distribution includes a "configure" file for use by the
configure/make (autotools) build system, as found in many Unix-like
environments. The README file contains information about the options for
"configure".
There is also support for CMake, which some users prefer, especially in Windows
environments, though it can also be run in Unix-like environments. See the
section entitled "Building PCRE2 on Windows with CMake" below.
Versions of src/config.h and src/pcre2.h are distributed in the PCRE2 tarballs
under the names src/config.h.generic and src/pcre2.h.generic. These are
provided for those who build PCRE2 without using "configure" or CMake. If you
use "configure" or CMake, the .generic versions are not used.
GENERIC INSTRUCTIONS FOR THE PCRE2 C LIBRARY
The following are generic instructions for building the PCRE2 C library "by
hand". If you are going to use CMake, this section does not apply to you; you
can skip ahead to the CMake section.
(1) Copy or rename the file src/config.h.generic as src/config.h, and edit the
macro settings that it contains to whatever is appropriate for your
environment. In particular, you can alter the definition of the NEWLINE
macro to specify what character(s) you want to be interpreted as line
terminators.
When you compile any of the PCRE2 modules, you must specify
-DHAVE_CONFIG_H to your compiler so that src/config.h is included in the
sources.
An alternative approach is not to edit src/config.h, but to use -D on the
compiler command line to make any changes that you need to the
configuration options. In this case -DHAVE_CONFIG_H must not be set.
NOTE: There have been occasions when the way in which certain parameters
in src/config.h are used has changed between releases. (In the
configure/make world, this is handled automatically.) When upgrading to a
new release, you are strongly advised to review src/config.h.generic
before re-using what you had previously.
(2) Copy or rename the file src/pcre2.h.generic as src/pcre2.h.
(3) EITHER:
Copy or rename file src/pcre2_chartables.c.dist as
src/pcre2_chartables.c.
OR:
Compile src/dftables.c as a stand-alone program (using -DHAVE_CONFIG_H
if you have set up src/config.h), and then run it with the single
argument "src/pcre2_chartables.c". This generates a set of standard
character tables and writes them to that file. The tables are generated
using the default C locale for your system. If you want to use a locale
that is specified by LC_xxx environment variables, add the -L option to
the dftables command. You must use this method if you are building on a
system that uses EBCDIC code.
The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
specify alternative tables at run time.
(4) For an 8-bit library, compile the following source files, setting
-DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also set -DHAVE_CONFIG_H
if you have set up src/config.h with your configuration, or else use other
-D settings to change the configuration as required.
pcre2_auto_possess.c
pcre2_chartables.c
pcre2_compile.c
pcre2_config.c
pcre2_context.c
pcre2_dfa_match.c
pcre2_error.c
pcre2_jit_compile.c
pcre2_jit_match.c
pcre2_jit_misc.c
pcre2_maketables.c
pcre2_match.c
pcre2_match_data.c
pcre2_newline.c
pcre2_ord2utf.c
pcre2_pattern_info.c
pcre2_string_utils.c
pcre2_study.c
pcre2_substring.c
pcre2_tables.c
pcre2_ucd.c
pcre2_valid_utf.c
pcre2_xclass.c
Make sure that you include -I. in the compiler command (or equivalent for
an unusual compiler) so that all included PCRE2 header files are first
sought in the src directory under the current directory. Otherwise you run
the risk of picking up a previously-installed file from somewhere else.
Note that you must compile pcre2_jit_xxx.c, even if you have not defined
SUPPORT_JIT in src/config.h, because when JIT support is not configured,
dummy functions are compiled. When JIT support IS configured, the JIT
sources #include other files from the sljit subdirectory, where there
should be 16 files, all of whose names begin with "sljit".
(5) Now link all the compiled code into an object library in whichever form
your system keeps such libraries. This is the basic PCRE2 C 8-bit library.
If your system has static and shared libraries, you may have to do this
once for each type.
(6) If you want to build a 16-bit library or 32-bit library (as well as, or
instead of the 8-bit library) just supply 16 or 32 as the value of
-DPCRE2_CODE_UNIT_WIDTH when you are compiling.
(7) If you want to build the POSIX wrapper functions (which apply only to the
8-bit library), ensure that you have the pcre2posix.h file and then
compile pcre2posix.c. Link the result (on its own) as the pcre2posix
library.
(8) The pcre2test program can be linked with any combination of the 8-bit,
16-bit and 32-bit libraries (depending on what you selected in
src/config.h). Compile pcre2test.c; don't forget -DHAVE_CONFIG_H if
necessary, but do NOT define PCRE2_CODE_UNIT_WIDTH. Then link with the
appropriate library/ies. If you compiled an 8-bit library, pcre2test also
needs the pcre2posix wrapper library.
(9) Run pcre2test on the testinput files in the testdata directory, and check
that the output matches the corresponding testoutput files. There are
comments about what each test does in the section entitled "Testing PCRE2"
in the README file. If you compiled more than one of the 8-bit, 16-bit and
32-bit libraries, you need to run pcre2test with the -16 option to do
16-bit tests and with the -32 option to do 32-bit tests.
Some tests are relevant only when certain build-time options are selected.
For example, test 4 is for Unicode support, and will not run if you have
built PCRE2 without it. See the comments at the start of each testinput
file. If you have a suitable Unix-like shell, the RunTest script will run
the appropriate tests for you. The command "RunTest list" will output a
list of all the tests.
Note that the supplied files are in Unix format, with just LF characters
as line terminators. You may need to edit them to change this if your
system uses a different convention.
(10) If you have built PCRE2 with SUPPORT_JIT, the JIT features can be tested
by running pcre2test with the -jit option. This is done automatically by
the RunTest script. You might also like to build and run the freestanding
JIT test program, pcre2_jit_test.c.
(11) If you want to use the pcre2grep command, compile and link pcre2grep.c; it
uses only the basic 8-bit PCRE2 library (it does not need the pcre2posix
library).
BUILDING FOR VIRTUAL PASCAL
FIXME FOR PCRE2
A script for building PCRE2 using Borland's C++ compiler for use with VPASCAL
was contributed by Alexander Tokarev. Stefan Weber updated the script and added
additional files. The following files in the distribution are for building
PCRE2 for use with VP/Borland: makevp_c.txt, makevp_l.txt, makevp.bat,
pcre2gexp.pas.
STACK SIZE IN WINDOWS ENVIRONMENTS
The default processor stack size of 1Mb in some Windows environments is too
small for matching patterns that need much recursion. In particular, test 2 may
fail because of this. Normally, running out of stack causes a crash, but there
have been cases where the test program has just died silently. See your linker
documentation for how to increase stack size if you experience problems. The
Linux default of 8Mb is a reasonable choice for the stack, though even that can
be too small for some pattern/subject combinations.
PCRE2 has a compile configuration option to disable the use of stack for
recursion so that heap is used instead. However, pattern matching is
significantly slower when this is done. There is more about stack usage in the
"pcre2stack" documentation.
LINKING PROGRAMS IN WINDOWS ENVIRONMENTS
If you want to statically link a program against a PCRE2 library in the form of
a non-dll .a file, you must define PCRE2_STATIC before including src/pcre2.h.
CALLING CONVENTIONS IN WINDOWS ENVIRONMENTS
It is possible to compile programs to use different calling conventions using
MSVC. Search the web for "calling conventions" for more information. To make it
easier to change the calling convention for the exported functions in the
PCRE2 library, the macro PCRE2_CALL_CONVENTION is present in all the external
definitions. It can be set externally when compiling (e.g. in CFLAGS). If it is
not set, it defaults to empty; the default calling convention is then used
(which is what is wanted most of the time).
COMMENTS ABOUT WIN32 BUILDS (see also "BUILDING PCRE2 ON WINDOWS WITH CMAKE")
There are two ways of building PCRE2 using the "configure, make, make install"
paradigm on Windows systems: using MinGW or using Cygwin. These are not at all
the same thing; they are completely different from each other. There is also
support for building using CMake, which some users find a more straightforward
way of building PCRE2 under Windows.
The MinGW home page (http://www.mingw.org/) says this:
MinGW: A collection of freely available and freely distributable Windows
specific header files and import libraries combined with GNU toolsets that
allow one to produce native Windows programs that do not rely on any
3rd-party C runtime DLLs.
The Cygwin home page (http://www.cygwin.com/) says this:
Cygwin is a Linux-like environment for Windows. It consists of two parts:
. A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing
substantial Linux API functionality
. A collection of tools which provide Linux look and feel.
On both MinGW and Cygwin, PCRE2 should build correctly using:
./configure && make && make install
This should create two libraries called libpcre2-8 and libpcre2-posix. These
are independent libraries: when you link with libpcre2-posix you must also link
with libpcre2-8, which contains the basic functions.
Using Cygwin's compiler generates libraries and executables that depend on
cygwin1.dll. If a library that is generated this way is distributed,
cygwin1.dll has to be distributed as well. Since cygwin1.dll is under the GPL
licence, this forces not only PCRE2 to be under the GPL, but also the entire
application. A distributor who wants to keep their own code proprietary must
purchase an appropriate Cygwin licence.
MinGW has no such restrictions. The MinGW compiler generates a library or
executable that can run standalone on Windows without any third party dll or
licensing issues.
But there is more complication:
If a Cygwin user uses the -mno-cygwin Cygwin gcc flag, what that really does is
to tell Cygwin's gcc to use the MinGW gcc. Cygwin's gcc is only acting as a
front end to MinGW's gcc (if you install Cygwin's gcc, you get both Cygwin's
gcc and MinGW's gcc). So, a user can:
. Build native binaries by using MinGW or by getting Cygwin and using
-mno-cygwin.
. Build binaries that depend on cygwin1.dll by using Cygwin with the normal
compiler flags.
The test files that are supplied with PCRE2 are in UNIX format, with LF
characters as line terminators. Unless your PCRE2 library uses a default
newline option that includes LF as a valid newline, it may be necessary to
change the line terminators in the test files to get some of the tests to work.
BUILDING PCRE2 ON WINDOWS WITH CMAKE
CMake is an alternative configuration facility that can be used instead of
"configure". CMake creates project files (make files, solution files, etc.)
tailored to numerous development environments, including Visual Studio,
Borland, Msys, MinGW, NMake, and Unix. If possible, use short paths with no
spaces in the names for your CMake installation and your PCRE2 source and build
directories.
The following instructions were contributed by a PCRE1 user, but they should
also work for PCRE2. If they are not followed exactly, errors may occur. In the
event that errors do occur, it is recommended that you delete the CMake cache
before attempting to repeat the CMake build process. In the CMake GUI, the
cache can be deleted by selecting "File > Delete Cache".
1. Install the latest CMake version available from http://www.cmake.org/, and
ensure that cmake\bin is on your path.
2. Unzip (retaining folder structure) the PCRE2 source tree into a source
directory such as C:\pcre2. You should ensure your local date and time
is not earlier than the file dates in your source dir if the release is
very new.
3. Create a new, empty build directory, preferably a subdirectory of the
source dir. For example, C:\pcre2\pcre2-xx\build.
4. Run cmake-gui from the Shell envirornment of your build tool, for example,
Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++. Do not try
to start Cmake from the Windows Start menu, as this can lead to errors.
5. Enter C:\pcre2\pcre2-xx and C:\pcre2\pcre2-xx\build for the source and
build directories, respectively.
6. Hit the "Configure" button.
7. Select the particular IDE / build tool that you are using (Visual
Studio, MSYS makefiles, MinGW makefiles, etc.)
8. The GUI will then list several configuration options. This is where
you can enable Unicode support or other PCRE2 optional features.
9. Hit "Configure" again. The adjacent "Generate" button should now be
active.
10. Hit "Generate".
11. The build directory should now contain a usable build system, be it a
solution file for Visual Studio, makefiles for MinGW, etc. Exit from
cmake-gui and use the generated build system with your compiler or IDE.
E.g., for MinGW you can run "make", or for Visual Studio, open the PCRE2
solution, select the desired configuration (Debug, or Release, etc.) and
build the ALL_BUILD project.
12. If during configuration with cmake-gui you've elected to build the test
programs, you can execute them by building the test project. E.g., for
MinGW: "make check"; for Visual Studio build the RUN_TESTS project. The
most recent build configuration is targeted by the tests. A summary of
test results is presented. Complete test output is subsequently
available for review in Testing\Temporary under your build dir.
TESTING WITH RUNTEST.BAT FIXME FIXME NOT YET TESTED/UPDATED FIXME
If configured with CMake, building the test project ("make check" or building
ALL_TESTS in Visual Studio) creates (and runs) pcre2_test.bat (and depending
on your configuration options, possibly other test programs) in the build
directory. Pcre_test.bat runs RunTest.Bat with correct source and exe paths.
For manual testing with RunTest.bat, provided the build dir is a subdirectory
of the source directory: Open command shell window. Chdir to the location
of your pcre2test.exe and pcre2grep.exe programs. Call RunTest.bat with
"..\RunTest.Bat" or "..\..\RunTest.bat" as appropriate.
To run only a particular test with RunTest.Bat provide a test number argument.
Otherwise:
1. Copy RunTest.bat into the directory where pcre2test.exe and pcre2grep.exe
have been created.
2. Edit RunTest.bat to indentify the full or relative location of
the pcre2 source (wherein which the testdata folder resides), e.g.:
set srcdir=C:\pcre2\pcre2-10.00
3. In a Windows command environment, chdir to the location of your bat and
exe programs.
4. Run RunTest.bat. Test outputs will automatically be compared to expected
results, and discrepancies will be identified in the console output.
To independently test the just-in-time compiler, run pcre2_jit_test.exe.
BUILDING PCRE2 ON NATIVE Z/OS AND Z/VM
z/OS and z/VM are operating systems for mainframe computers, produced by IBM.
The character code used is EBCDIC, not ASCII or Unicode. In z/OS, UNIX APIs and
applications can be supported through UNIX System Services, and in such an
environment PCRE2 can be built in the same way as in other systems. However, in
native z/OS (without UNIX System Services) and in z/VM, special ports are
required. For details, please see this web site:
http://www.zaconsultants.net
There is also a mirror here:
http://www.vsoft-software.com/downloads.html
The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
course.
==========================
Last Updated: 28 September 2014

833
README
View File

@ -1 +1,832 @@
This is a placeholder README file for a work in progress.
README file for PCRE2 (Perl-compatible regular expression library)
------------------------------------------------------------------
PCRE2 is a re-implementation of the original PCRE library with an entirely new
API. The latest release of PCRE2 is always available in three alternative
formats from:
FIXME: THIS WILL NOT BE THE CASE UNTIL THERE IS A FORMAL RELEASE.
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre2/pcre2-xxx.tar.gz
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre2/pcre2-xxx.tar.bz2
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre2/pcre2-xxx.zip
There is a mailing list for discussion about the development of PCRE (both the
original and new APIs) at pcre-dev@exim.org. You can access the archives and
subscribe or manage your subscription here:
https://lists.exim.org/mailman/listinfo/pcre-dev
Please read the NEWS file if you are upgrading from a previous release.
The contents of this README file are:
The PCRE2 APIs
Documentation for PCRE2
Contributions by users of PCRE2
Building PCRE2 on non-Unix-like systems
Building PCRE2 without using autotools
Building PCRE2 using autotools
Retrieving configuration information
Shared libraries
Cross-compiling using autotools
Making new tarballs
Testing PCRE2
Character tables
File manifest
The PCRE2 APIs
--------------
PCRE2 is written in C, and it has its own API. There are three sets of
functions, one for the 8-bit library, which processes strings of bytes, one for
the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. As this is a new API,
there as yet no C++ wrappers.
The distribution does contain a set of C wrapper functions for the 8-bit
library that are based on the POSIX regular expression API (see the pcre2posix
man page). These end up in the library called libpcre2posix. Note that this
just provides a POSIX calling interface to PCRE2; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE2's facilities.
The header file for the POSIX-style functions is called pcre2posix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
with existing files of that name by distributing it that way. To use PCRE2 with
an existing program that uses the POSIX API, pcre2posix.h will have to be
renamed or pointed at by a link.
If you are using the POSIX interface to PCRE2 and there is already a POSIX
regex library installed on your system, as well as worrying about the regex.h
header file (as mentioned above), you must also take care when linking programs
to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
pick up the POSIX functions of the same name from the other library.
One way of avoiding this confusion is to compile PCRE2 with the addition of
-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
compiler flags (CFLAGS if you are using "configure" -- see below). This has the
effect of renaming the functions so that the names no longer clash. Of course,
you have to do the same thing for your applications, or write them using the
new names.
Documentation for PCRE2
----------------------
If you install PCRE2 in the normal way on a Unix-like system, you will end up
with a set of man pages whose names all start with "pcre2". The one that is
just called "pcre2" lists all the others. In addition to these man pages, the
PCRE2 documentation is supplied in two other forms:
1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
doc/pcre2test.txt in the source distribution. The first of these is a
concatenation of the text forms of all the section 3 man pages except the
listing of pcre2demo.c and those that summarize individual functions. The
other two are the text forms of the section 1 man pages for the pcre2grep
and pcre2test commands. These text forms are provided for ease of scanning
with text editors or similar tools. They are installed in
<prefix>/share/doc/pcre2, where <prefix> is the installation prefix
(defaulting to /usr/local).
2. A set of files containing all the documentation in HTML form, hyperlinked
in various ways, and rooted in a file called index.html, is distributed in
doc/html and installed in <prefix>/share/doc/pcre2/html.
Building PCRE2 on non-Unix-like systems
--------------------------------------
For a non-Unix-like system, please read the comments in the file
NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
"make" you may be able to build PCRE2 using autotools in the same way as for
many Unix-like systems.
PCRE2 can also be configured using CMake, which can be run in various ways
(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
NON-AUTOTOOLS-BUILD has information about CMake.
PCRE2 has been compiled on many different operating systems. It should be
straightforward to build PCRE2 on any system that has a Standard C compiler and
library, because it uses only Standard C functions.
Building PCRE2 without using autotools
-------------------------------------
The use of autotools (in particular, libtool) is problematic in some
environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
file for ways of building PCRE2 without using autotools.
Building PCRE2 using autotools
-----------------------------
The following instructions assume the use of the widely used "configure; make;
make install" (autotools) process.
To build PCRE2 on system that supports autotools, first run the "configure"
command from the PCRE2 distribution directory, with your current directory set
to the directory where you want the files to be created. This command is a
standard GNU "autoconf" configuration script, for which generic instructions
are supplied in the file INSTALL.
Most commonly, people build PCRE2 within its own distribution directory, and in
this case, on many systems, just running "./configure" is sufficient. However,
the usual methods of changing standard defaults are available. For example:
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
This command specifies that the C compiler should be run with the flags '-O2
-Wall' instead of the default, and that "make install" should install PCRE2
under /opt/local instead of the default /usr/local.
If you want to build in a different directory, just run "configure" with that
directory as current. For example, suppose you have unpacked the PCRE2 source
into /source/pcre2/pcre2-xxx, but you want to build it in
/build/pcre2/pcre2-xxx:
cd /build/pcre2/pcre2-xxx
/source/pcre2/pcre2-xxx/configure
PCRE2 is written in C and is normally compiled as a C library. However, it is
possible to build it as a C++ library, though the provided building apparatus
does not have any features to support this.
There are some optional features that can be included or omitted from the PCRE2
library. They are also documented in the pcre2build man page.
. By default, both shared and static libraries are built. You can change this
by adding one of these options to the "configure" command:
--disable-shared
--disable-static
(See also "Shared libraries on Unix-like systems" below.)
. By default, only the 8-bit library is built. If you add --enable-pcre16 to
the "configure" command, the 16-bit library is also built. If you add
--enable-pcre32 to the "configure" command, the 32-bit library is also built.
If you want only the 16-bit or 32-bit library, use --disable-pcre8 to disable
building the 8-bit library.
. If you want to include support for just-in-time compiling, which can give
large performance improvements on certain platforms, add --enable-jit to the
"configure" command. This support is available only for certain hardware
architectures. If you try to enable it on an unsupported architecture, there
will be a compile time error. FIXME: NOT YET IMPLEMENTED.
. When JIT support is enabled, pcre2grep automatically makes use of it, unless
you add --disable-pcre2grep-jit to the "configure" command.
. If you want to make use of the support for UTF-8 Unicode character strings in
the 8-bit library, UTF-16 Unicode character strings in the 16-bit library,
and UTF-32 Unicode character strings in the 32-bit library, you must add
--enable-unicode to the "configure" command. Without it, the code for
handling UTF-8, UTF-16 and UTF-8 is not included. It is not possible to
configure one library with UTF support and the other without in the same
configuration.
Even when --enable-unicode is included, the use of a UTF encoding still has
to be enabled by an option at run time. When PCRE2 is compiled with this
option, its input can only either be ASCII or UTF-8/16/32, even when running
on EBCDIC platforms. It is not possible to use both --enable-unicode and
--enable-ebcdic at the same time.
When --enable-unicode is specified, as well as supporting UTF strings, PCRE2
includes support for the \P, \p, and \X sequences that recognize Unicode
character properties. However, only the basic two-letter properties such as
Lu are supported.
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF or any
of the preceding, or any of the Unicode newline sequences as indicating the
end of a line. Whatever you specify at build time is the default; the caller
of PCRE2 can change the selection at run time. The default newline indicator
is a single LF character (the Unix standard). You can specify the default
newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
--enable-newline-is-any to the "configure" command, respectively.
If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
the standard tests will fail, because the lines in the test files end with
LF. Even if the files are edited to change the line endings, there are likely
to be some failures. With --enable-newline-is-anycrlf or
--enable-newline-is-any, many tests should succeed, but there may be some
failures.
. By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers
to be the end of a line (see above). However, the caller of PCRE2 can
restrict \R to match only CR, LF, or CRLF. You can make this the default by
adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
. PCRE2 has a counter that limits the depth of nesting of parentheses in a
pattern. This limits the amount of system stack that a pattern uses when it
is compiled. The default is 250, but you can change it by setting, for
example,
--with-parens-nest-limit=500
. PCRE2 has a counter that can be set to limit the amount of resources it uses
when matching a pattern. If the limit is exceeded during a match, the match
fails. The default is ten million. You can change the default by setting, for
example,
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion on the
pcre2api man page.
. There is a separate counter that limits the depth of recursive function calls
during a matching process. This also has a default of ten million, which is
essentially "unlimited". You can change the default by setting, for example,
--with-match-limit-recursion=500000
Recursive function calls use up the runtime stack; running out of stack can
cause programs to crash in strange ways. There is a discussion about stack
sizes in the pcre2stack man page.
. In the 8-bit library, the default maximum compiled pattern size is around
64K. You can increase this by adding --with-link-size=3 to the "configure"
command. PCRE2 then uses three bytes instead of two for offsets to different
parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
the same as --with-link-size=4, which (in both libraries) uses four-byte
offsets. Increasing the internal link size reduces performance. In the 32-bit
library, the link size setting is ignored, as 4-byte offsets are always used.
. You can build PCRE2 so that its internal match() function that is called from
pcre2_match() does not call itself recursively. Instead, it uses memory
blocks obtained from the heap to save data that would otherwise be saved on
the stack. To build PCRE2 like this, use
--disable-stack-for-recursion
on the "configure" command. PCRE2 runs more slowly in this mode, but it may
be necessary in environments with limited stack sizes. This applies only to
the normal execution of the pcre2_match() function; if JIT support is being
successfully used, it is not relevant. Equally, it does not apply to
pcre2_dfa_match(), which does not use deeply nested recursion. There is a
discussion about stack sizes in the pcre2stack man page.
. For speed, PCRE2 uses four tables for manipulating and identifying characters
whose code point values are less than 256. By default, it uses a set of
tables for ASCII encoding that is part of the distribution. If you specify
--enable-rebuild-chartables
a program called dftables is compiled and run in the default C locale when
you obey "make". It builds a source file called pcre2_chartables.c. If you do
not specify this option, pcre2_chartables.c is created as a copy of
pcre2_chartables.c.dist. See "Character tables" below for further
information.
. It is possible to compile PCRE2 for use on systems that use EBCDIC as their
character code (as opposed to ASCII/Unicode) by specifying
--enable-ebcdic
This automatically implies --enable-rebuild-chartables (see above). However,
when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
which specifies that the code value for the EBCDIC NL character is 0x25
instead of the default 0x15.
. In environments where valgrind is installed, if you specify
--enable-valgrind
PCRE2 will use valgrind annotations to mark certain memory regions as
unaddressable. This allows it to detect invalid memory accesses, and is
mostly useful for debugging PCRE2 itself.
. In environments where the gcc compiler is used and lcov version 1.6 or above
is installed, if you specify
--enable-coverage
the build process implements a code coverage report for the test suite. The
report is generated by running "make coverage". If ccache is installed on
your system, it must be disabled when building PCRE2 for coverage reporting.
You can do this by setting the environment variable CCACHE_DISABLE=1 before
running "make" to build PCRE2. There is more information about coverage
reporting in the "pcre2build" documentation.
. The pcre2grep program currently supports only 8-bit data files, and so
requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
specifying one or both of
--enable-pcre2grep-libz
--enable-pcre2grep-libbz2
Of course, the relevant libraries must be installed on your system.
. The default size (in bytes) of the internal buffer used by pcre2grep can be
set by, for example:
--with-pcre2grep-bufsize=51200
The value must be a plain integer. The default is 20480.
. It is possible to compile pcre2test so that it links with the libreadline
or libedit libraries, by specifying, respectively,
--enable-pcre2test-libreadline or --enable-pcre2test-libedit
If this is done, when pcre2test's input is from a terminal, it reads it using
the readline() function. This provides line-editing and history facilities.
Note that libreadline is GPL-licenced, so if you distribute a binary of
pcre2test linked in this way, there may be licensing issues. These can be
avoided by linking with libedit (which has a BSD licence) instead.
Enabling libreadline causes the -lreadline option to be added to the
pcre2test build. In many operating environments with a sytem-installed
readline library this is sufficient. However, in some environments (e.g. if
an unmodified distribution version of readline is in use), it may be
necessary to specify something like LIBS="-lncurses" as well. This is
because, to quote the readline INSTALL, "Readline uses the termcap functions,
but does not link with the termcap or curses library itself, allowing
applications which link with readline the to choose an appropriate library."
If you get error messages about missing functions tgetstr, tgetent, tputs,
tgetflag, or tgoto, this is the problem, and linking with the ncurses library
should fix it.
The "configure" script builds the following files for the basic C library:
. Makefile the makefile that builds the library
. src/config.h build-time configuration options for the library
. src/pcre2.h the public PCRE2 header file
. pcre2-config script that shows the building settings such as CFLAGS
that were set for "configure"
. libpcre2-8.pc )
. libpcre2-16.pc ) data for the pkg-config command
. libpcre2-32.pc )
. libpcre2-posix.pc )
. libtool script that builds shared and/or static libraries
Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
tarballs under the names config.h.generic and pcre2.h.generic. These are
provided for those who have to build PCRE2 without using "configure" or CMake.
If you use "configure" or CMake, the .generic versions are not used.
The "configure" script also creates config.status, which is an executable
script that can be run to recreate the configuration, and config.log, which
contains compiler output from tests that "configure" runs.
Once "configure" has run, you can run "make". This builds whichever of the
libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
program called pcre2test. If you enabled JIT support with --enable-jit, another
test program called pcre2_jit_test is built as well. FIXME: still to be
implemented. If the 8-bit library is built, libpcre2-posix and the pcre2grep
command are also built.
The command "make check" runs all the appropriate tests. Details of the PCRE2
tests are given below in a separate section of this document.
You can use "make install" to install PCRE2 into live directories on your
system. The following are installed (file names are all relative to the
<prefix> that is set when "configure" is run):
Commands (bin):
pcre2test
pcre2grep (if 8-bit support is enabled)
pcre2-config
Libraries (lib):
libpcre2-8 (if 8-bit support is enabled)
libpcre2-16 (if 16-bit support is enabled)
libpcre2-32 (if 32-bit support is enabled)
libpcre2-posix (if 8-bit support is enabled)
Configuration information (lib/pkgconfig):
libpcre2-8.pc
libpcre2-16.pc
libpcre2-32.pc
libpcre2-posix.pc
Header files (include):
pcre2.h
pcre2posix.h
Man pages (share/man/man{1,3}):
pcre2grep.1
pcre2test.1
pcre2-config.1
pcre2.3
pcre2*.3 (lots more pages, all starting "pcre2")
HTML documentation (share/doc/pcre2/html):
index.html
*.html (lots more pages, hyperlinked from index.html)
Text file documentation (share/doc/pcre2):
AUTHORS
COPYING
ChangeLog
LICENCE
NEWS
README
pcre2.txt (a concatenation of the man(3) pages)
pcre2test.txt the pcre2test man page
pcre2grep.txt the pcre2grep man page
pcre2-config.txt the pcre2-config man page
If you want to remove PCRE2 from your system, you can run "make uninstall".
This removes all the files that "make install" installed. However, it does not
remove any directories, because these are often shared with other programs.
Retrieving configuration information
------------------------------------
Running "make install" installs the command pcre2-config, which can be used to
recall information about the PCRE2 configuration and installation. For example:
pcre2-config --version
prints the version number, and
pcre2-config --libs8
outputs information about where the 8-bit library is installed. This command
can be included in makefiles for programs that use PCRE2, saving the programmer
from having to remember too many details. Run pcre2-config with no arguments to
obtain a list of possible arguments.
The pkg-config command is another system for saving and retrieving information
about installed libraries. Instead of separate commands for each library, a
single command is used. For example:
pkg-config --libs libpcre2-16
The data is held in *.pc files that are installed in a directory called
<prefix>/lib/pkgconfig.
Shared libraries
----------------
The default distribution builds PCRE2 as shared libraries and static libraries,
as long as the operating system supports shared libraries. Shared library
support relies on the "libtool" script which is built as part of the
"configure" process.
The libtool script is used to compile and link both shared and static
libraries. They are placed in a subdirectory called .libs when they are newly
built. The programs pcre2test and pcre2grep are built to use these uninstalled
libraries (by means of wrapper scripts in the case of shared libraries). When
you use "make install" to install shared libraries, pcre2grep and pcre2test are
automatically re-built to use the newly installed shared libraries before being
installed themselves. However, the versions left in the build directory still
use the uninstalled libraries.
To build PCRE2 using static libraries only you must use --disable-shared when
configuring it. For example:
./configure --prefix=/usr/gnu --disable-shared
Then run "make" in the usual way. Similarly, you can use --disable-static to
build only shared libraries.
Cross-compiling using autotools
-------------------------------
You can specify CC and CFLAGS in the normal way to the "configure" command, in
order to cross-compile PCRE2 for some other host. However, you should NOT
specify --enable-rebuild-chartables, because if you do, the dftables.c source
file is compiled and run on the local host, in order to generate the inbuilt
character tables (the pcre2_chartables.c file). This will probably not work,
because dftables.c needs to be compiled with the local compiler, not the cross
compiler.
When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
created by making a copy of pcre2_chartables.c.dist, which is a default set of
tables that assumes ASCII code. Cross-compiling with the default tables should
not be a problem.
If you need to modify the character tables when cross-compiling, you should
move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
and run it on the local host to make a new version of pcre2_chartables.c.dist.
Then when you cross-compile PCRE2 this new version of the tables will be used.
Making new tarballs
-------------------
The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
zip formats. The command "make distcheck" does the same, but then does a trial
build of the new distribution to ensure that it works.
If you have modified any of the man page sources in the doc directory, you
should first run the PrepareRelease script before making a distribution. This
script creates the .txt and HTML forms of the documentation from the man pages.
Testing PCRE2
------------
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
There is another script called RunGrepTest that tests the options of the
pcre2grep command. When JIT support is enabled, another test program called
pcre2_jit_test is built. Both the scripts and all the program tests are run if
you obey "make check". For other environments, see the instructions in
NON-AUTOTOOLS-BUILD.
The RunTest script runs the pcre2test test program (which is documented in its
own man page) on each of the relevant testinput files in the testdata
directory, and compares the output with the contents of the corresponding
testoutput files. RunTest uses a file called testtry to hold the main output
from pcre2test. Other files whose names begin with "test" are used as working
files in some tests.
Some tests are relevant only when certain build-time options were selected. For
example, the tests for UTF-8/16/32 support are run only if --enable-unicode was
used. RunTest outputs a comment when it skips a test.
Many of the tests that are not skipped are run twice if JIT support is
available. On the second run, JIT compilation is forced. This testing can be
suppressed by putting "nojit" on the RunTest command line.
The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
libraries that are enabled. If you want to run just one set of tests, call
RunTest with either the -8, -16 or -32 option.
If valgrind is installed, you can run the tests under it by putting "valgrind"
on the RunTest command line. To run pcre2test on just one or more specific test
files, give their numbers as arguments to RunTest, for example:
RunTest 2 7 11
You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
end), or a number preceded by ~ to exclude a test. For example:
Runtest 3-15 ~10
This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
except test 13. Whatever order the arguments are in, the tests are always run
in numerical order.
You can also call RunTest with the single argument "list" to cause it to output
a list of tests.
The first two tests can always be run, as they expect only plain text strings
(not UTF) and make no use of Unicode properties. The first test file can be fed
directly into the perltest.pl script to check that Perl gives the same results.
The only difference you should see is in the first few lines, where the Perl
version is given instead of the PCRE2 version. The second set of tests check
auxiliary functions, error detection, and run-time flags that are specific to
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
check some of the internals of pcre2_compile().
If you build PCRE2 with a locale setting that is not the standard C locale, the
character tables may be different (see next paragraph). In some cases, this may
cause failures in the second set of tests. For example, in a locale where the
isprint() function yields TRUE for characters in the range 128-255, the use of
[:isascii:] inside a character class defines a different set of characters, and
this shows up in this test as a difference in the compiled code, which is being
listed for checking. Where the comparison test output contains [\x00-\x7f] the
test will contain [\x00-\xff], and similarly in some other cases. This is not a
bug in PCRE2.
The third set of tests checks pcre2_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
default tables. The script uses the "locale" command to check for the
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
that it finds. If the "locale" command fails, or if its output doesn't include
"fr_FR", "french", or "fr" in the list of available locales, the third test
cannot be run, and a comment is output to say why. If running this test
produces an error like this
** Failed to set locale "fr_FR"
it means that the given locale is not available on your system, despite being
listed by "locale". This does not mean that PCRE2 is broken. There are three
alternative output files for the third test, because three different versions
of the French locale have been encountered. The test passes if its output
matches any one of them.
The fourth and fifth tests check UTF and Unicode property support, the fourth
being compatible with the perltest.pl script, and the fifth checking
PCRE2-specific things.
The sixth and seventh tests check the pcre2_dfa_match() alternative matching
function, in non-UTF mode and UTF-mode with Unicode property support,
respectively.
The eighth test checks some internal offsets and code size features; it is
run only when the default "link size" of 2 is set (in other cases the sizes
change) and when Unicode support is enabled.
The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
generate different output in 8-bit mode. Each pair are for general cases and
Unicode support, respectively. The thirteenth test checks the handling of
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
modes.
The fourteenth test is run only when JIT support is not available, and the
fifteenth test is run only when JIT support is available. They test some
JIT-specific features such as information output from pcre2test about JIT
compilation.
The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
POSIX interface to the 8-bit library, withouth and with Unicode support,
respectively.
Character tables
----------------
For speed, PCRE2 uses four tables for manipulating and identifying characters
whose code point values are less than 256. By default, a set of tables that is
built into the library is used. The pcre2_maketables() function can be called
by an application to create a new set of tables in the current locale. This are
passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
compile context.
The source file called pcre2_chartables.c contains the default set of tables.
By default, this is created as a copy of pcre2_chartables.c.dist, which
contains tables for ASCII coding. However, if --enable-rebuild-chartables is
specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C
locale which is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to
move pcre2_chartables.c.dist out of the way and replace it with your customized
tables.
When the dftables program is run as a result of --enable-rebuild-chartables,
it uses the default C locale that is set on your system. It does not pay
attention to the LC_xxx environment variables. In other words, it uses the
system's default locale rather than whatever the compiling user happens to have
set. If you really do want to build a source set of character tables in a
locale that is specified by the LC_xxx variables, you can run the dftables
program by hand with the -L option. For example:
./dftables -L pcre2_chartables.c.special
The first two 256-byte tables provide lower casing and case flipping functions,
respectively. The next table consists of three 32-byte bit maps which identify
digits, "word" characters, and white space, respectively. These are used when
building 32-byte bit maps that represent character classes for code points less
than 256. The final 256-byte table has bits indicating various character types,
as follows:
1 white space character
2 letter
4 decimal digit
8 hexadecimal digit
16 alphanumeric or '_'
128 regular expression metacharacter or binary zero
You should not alter the set of characters that contain the 128 bit, as that
will cause PCRE2 to malfunction.
File manifest
-------------
The distribution should contain the files listed below.
(A) Source files for the PCRE2 library functions and their headers are found in
the src directory:
src/dftables.c auxiliary program for building pcre2_chartables.c
when --enable-rebuild-chartables is specified
src/pcre2_chartables.c.dist a default set of character tables that assume
ASCII coding; unless --enable-rebuild-chartables is
specified, used by copying to pcre2_chartables.c
src/pcre2posix.c )
src/pcre2_auto_possess.c )
src/pcre2_compile.c )
src/pcre2_config.c )
src/pcre2_context.c )
src/pcre2_dfa_match.c )
src/pcre2_error.c )
src/pcre2_exec.c )
src/pcre2_jit_compile.c )
src/pcre2_jit_match.c ) sources for the functions in the library,
src/pcre2_jit_misc.c ) and some internal functions that they use
src/pcre2_maketables.c )
src/pcre2_match.c )
src/pcre2_match_data.c )
src/pcre2_newline.c )
src/pcre2_ord2utf.c )
src/pcre2_pattern_info.c )
src/pcre2_string_utils.c )
src/pcre2_study.c )
src/pcre2_substring.c )
src/pcre2_tables.c )
src/pcre2_ucd.c )
src/pcre2_valid_utf.c )
src/pcre2_xclass.c )
src/pcre2_printint.c debugging function that is used by pcre2test,
src/config.h.in template for config.h, when built by "configure"
src/pcre2.h.in template for pcre2.h when built by "configure"
src/pcre2posix.h header for the external POSIX wrapper API
src/pcre2_internal.h header for internal use
src/pcre2_intmodedep.h a mode-specific internal header
src/pcre2_ucp.h header for Unicode property handling
sljit/* 16 files that make up the JIT compiler FIXME
(B) Source files for programs that use PCRE2:
src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program
(C) Auxiliary files:
132html script to turn "man" pages into HTML
AUTHORS information about the author of PCRE2
ChangeLog log of changes to the code
CleanTxt script to clean nroff output for txt man pages
Detrail script to remove trailing spaces
HACKING some notes about the internals of PCRE2
INSTALL generic installation instructions
LICENCE conditions for the use of PCRE2
COPYING the same, using GNU's standard name
Makefile.in ) template for Unix Makefile, which is built by
) "configure"
Makefile.am ) the automake input that was used to create
) Makefile.in
NEWS important changes in this release
NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools
PrepareRelease script to make preparations for "make dist"
README this file
RunTest a Unix shell script for running tests
RunGrepTest a Unix shell script for pcre2grep tests
aclocal.m4 m4 macros (generated by "aclocal")
config.guess ) files used by libtool,
config.sub ) used only when building a shared library
configure a configuring shell script (built by autoconf)
configure.ac ) the autoconf input that was used to build
) "configure" and config.h
depcomp ) script to find program dependencies, generated by
) automake
doc/*.3 man page sources for PCRE2
doc/*.1 man page sources for pcre2grep and pcre2test
doc/index.html.src the base HTML page
doc/html/* HTML documentation
doc/pcre2.txt plain text version of the man pages
doc/pcre2test.txt plain text documentation of test program
doc/perltest.txt plain text documentation of Perl test program
install-sh a shell script for installing files
libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config
libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config
libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config
libpcre2posix.pc.in template for libpcre2posix.pc for pkg-config
ltmain.sh file used to build a libtool script
missing ) common stub for a few missing GNU programs while
) installing, generated by automake
mkinstalldirs script for making install directories
perltest.pl Perl test program
pcre2-config.in source of script which retains PCRE2 information
pcre2_jit_test.c test program for the JIT compiler
testdata/testinput* test data for main library tests
testdata/testoutput* expected test results
testdata/grep* input and output for pcre2grep tests
testdata/* other supporting test files
(D) Auxiliary files for cmake support
cmake/COPYING-CMAKE-SCRIPTS
cmake/FindPackageHandleStandardArgs.cmake
cmake/FindEditline.cmake
cmake/FindReadline.cmake
CMakeLists.txt
config-cmake.h.in
(E) Auxiliary files for VPASCAL FIXME FIXME
makevp.bat
makevp_c.txt
makevp_l.txt
pcre2gexp.pas
(F) Auxiliary files for building PCRE2 "by hand"
pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments
config.h.generic ) a version of config.h for use in non-"configure"
) environments
(F) Miscellaneous
RunTest.bat a script for running tests under Windows FIXME
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 27 October 2014