Update HTML docs with new URLs etc.
This commit is contained in:
parent
5c0d38b3a8
commit
4ccef1697a
|
@ -6,25 +6,18 @@ API. Since its initial release in 2015, there has been further development of
|
||||||
the code and it now differs from PCRE1 in more than just the API. There are new
|
the code and it now differs from PCRE1 in more than just the API. There are new
|
||||||
features, and the internals have been improved. The original PCRE1 library is
|
features, and the internals have been improved. The original PCRE1 library is
|
||||||
now obsolete and should not be used in new projects. The latest release of
|
now obsolete and should not be used in new projects. The latest release of
|
||||||
PCRE2 is available in three alternative formats from:
|
PCRE2 is available in .tar.gz or .zip form from its GitHub repository:
|
||||||
|
|
||||||
=============================================================================
|
https://github.com/PhilipHazel/pcre2/releases
|
||||||
This information is still current (21 August 2021), but the PCRE2 project is in
|
|
||||||
the process of moving to different infrastructure, so in the near future there
|
|
||||||
will be new URLs here. The mailing list will also change.
|
|
||||||
|
|
||||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz
|
There is a mailing list for discussion about the development of PCRE2 at
|
||||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2
|
pcre2-dev@googlegroups.com. You can subscribe by sending an email to
|
||||||
https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip
|
pcre2-dev+subscribe@googlegroups.com.
|
||||||
|
|
||||||
There is a mailing list for discussion about the development of PCRE at
|
You can access the archives and also subscribe or manage your subscription
|
||||||
pcre-dev@exim.org. You can access the archives and subscribe or manage your
|
here:
|
||||||
subscription here:
|
|
||||||
|
|
||||||
https://lists.exim.org/mailman/listinfo/pcre-dev
|
|
||||||
|
|
||||||
=============================================================================
|
|
||||||
|
|
||||||
|
https://groups.google.com/pcre2-dev
|
||||||
|
|
||||||
Please read the NEWS file if you are upgrading from a previous release. The
|
Please read the NEWS file if you are upgrading from a previous release. The
|
||||||
contents of this README file are:
|
contents of this README file are:
|
||||||
|
@ -387,7 +380,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
defined and has a value greater than or equal to 199901L (indicating C99).
|
defined and has a value greater than or equal to 199901L (indicating C99).
|
||||||
However, there is at least one environment that claims to be C99 but does not
|
However, there is at least one environment that claims to be C99 but does not
|
||||||
support these modifiers. If --disable-percent-zt is specified, no use is made
|
support these modifiers. If --disable-percent-zt is specified, no use is made
|
||||||
of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for
|
of the z or t modifiers. Instead of %td or %zu, %lu is used, with a cast for
|
||||||
size_t values.
|
size_t values.
|
||||||
|
|
||||||
. There is a special option called --enable-fuzz-support for use by people who
|
. There is a special option called --enable-fuzz-support for use by people who
|
||||||
|
@ -578,9 +571,9 @@ at build time" for more details.
|
||||||
Making new tarballs
|
Making new tarballs
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
|
The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
|
||||||
zip formats. The command "make distcheck" does the same, but then does a trial
|
The command "make distcheck" does the same, but then does a trial build of the
|
||||||
build of the new distribution to ensure that it works.
|
new distribution to ensure that it works.
|
||||||
|
|
||||||
If you have modified any of the man page sources in the doc directory, you
|
If you have modified any of the man page sources in the doc directory, you
|
||||||
should first run the PrepareRelease script before making a distribution. This
|
should first run the PrepareRelease script before making a distribution. This
|
||||||
|
@ -912,4 +905,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: Philip.Hazel
|
Email local part: Philip.Hazel
|
||||||
Email domain: gmail.com
|
Email domain: gmail.com
|
||||||
Last updated: 28 April 2021
|
Last updated: 25 August 2021
|
||||||
|
|
|
@ -28,7 +28,8 @@ nearly two decades, the limitations of the original API were making development
|
||||||
increasingly difficult. The new API is more extensible, and it was simplified
|
increasingly difficult. The new API is more extensible, and it was simplified
|
||||||
by abolishing the separate "study" optimizing function; in PCRE2, patterns are
|
by abolishing the separate "study" optimizing function; in PCRE2, patterns are
|
||||||
automatically optimized where possible. Since forking from PCRE1, the code has
|
automatically optimized where possible. Since forking from PCRE1, the code has
|
||||||
been extensively refactored and new features introduced.
|
been extensively refactored and new features introduced. The old library is now
|
||||||
|
obsolete and is no longer maintained.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
As well as Perl-style regular expression patterns, some features that appeared
|
As well as Perl-style regular expression patterns, some features that appeared
|
||||||
|
@ -193,18 +194,18 @@ function, listing its arguments and results.
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
<br>
|
<br>
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Putting an actual email address here is a spam magnet. If you want to email me,
|
Putting an actual email address here is a spam magnet. If you want to email me,
|
||||||
use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
|
use my two names separated by a dot at google.com.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 April 2021
|
Last updated: 25 August 2021
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2021 University of Cambridge.
|
Copyright © 1997-2021 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
134
doc/pcre2.txt
134
doc/pcre2.txt
|
@ -25,121 +25,122 @@ INTRODUCTION
|
||||||
API is more extensible, and it was simplified by abolishing the sepa-
|
API is more extensible, and it was simplified by abolishing the sepa-
|
||||||
rate "study" optimizing function; in PCRE2, patterns are automatically
|
rate "study" optimizing function; in PCRE2, patterns are automatically
|
||||||
optimized where possible. Since forking from PCRE1, the code has been
|
optimized where possible. Since forking from PCRE1, the code has been
|
||||||
extensively refactored and new features introduced.
|
extensively refactored and new features introduced. The old library is
|
||||||
|
now obsolete and is no longer maintained.
|
||||||
|
|
||||||
As well as Perl-style regular expression patterns, some features that
|
As well as Perl-style regular expression patterns, some features that
|
||||||
appeared in Python and the original PCRE before they appeared in Perl
|
appeared in Python and the original PCRE before they appeared in Perl
|
||||||
are available using the Python syntax. There is also some support for
|
are available using the Python syntax. There is also some support for
|
||||||
one or two .NET and Oniguruma syntax items, and there are options for
|
one or two .NET and Oniguruma syntax items, and there are options for
|
||||||
requesting some minor changes that give better ECMAScript (aka Java-
|
requesting some minor changes that give better ECMAScript (aka Java-
|
||||||
Script) compatibility.
|
Script) compatibility.
|
||||||
|
|
||||||
The source code for PCRE2 can be compiled to support strings of 8-bit,
|
The source code for PCRE2 can be compiled to support strings of 8-bit,
|
||||||
16-bit, or 32-bit code units, which means that up to three separate li-
|
16-bit, or 32-bit code units, which means that up to three separate li-
|
||||||
braries may be installed, one for each code unit size. The size of code
|
braries may be installed, one for each code unit size. The size of code
|
||||||
unit is not related to the bit size of the underlying hardware. In a
|
unit is not related to the bit size of the underlying hardware. In a
|
||||||
64-bit environment that also supports 32-bit applications, versions of
|
64-bit environment that also supports 32-bit applications, versions of
|
||||||
PCRE2 that are compiled in both 64-bit and 32-bit modes may be needed.
|
PCRE2 that are compiled in both 64-bit and 32-bit modes may be needed.
|
||||||
|
|
||||||
The original work to extend PCRE to 16-bit and 32-bit code units was
|
The original work to extend PCRE to 16-bit and 32-bit code units was
|
||||||
done by Zoltan Herczeg and Christian Persch, respectively. In all three
|
done by Zoltan Herczeg and Christian Persch, respectively. In all three
|
||||||
cases, strings can be interpreted either as one character per code
|
cases, strings can be interpreted either as one character per code
|
||||||
unit, or as UTF-encoded Unicode, with support for Unicode general cate-
|
unit, or as UTF-encoded Unicode, with support for Unicode general cate-
|
||||||
gory properties. Unicode support is optional at build time (but is the
|
gory properties. Unicode support is optional at build time (but is the
|
||||||
default). However, processing strings as UTF code units must be enabled
|
default). However, processing strings as UTF code units must be enabled
|
||||||
explicitly at run time. The version of Unicode in use can be discovered
|
explicitly at run time. The version of Unicode in use can be discovered
|
||||||
by running
|
by running
|
||||||
|
|
||||||
pcre2test -C
|
pcre2test -C
|
||||||
|
|
||||||
The three libraries contain identical sets of functions, with names
|
The three libraries contain identical sets of functions, with names
|
||||||
ending in _8, _16, or _32, respectively (for example, pcre2_com-
|
ending in _8, _16, or _32, respectively (for example, pcre2_com-
|
||||||
pile_8()). However, by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or
|
pile_8()). However, by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or
|
||||||
32, a program that uses just one code unit width can be written using
|
32, a program that uses just one code unit width can be written using
|
||||||
generic names such as pcre2_compile(), and the documentation is written
|
generic names such as pcre2_compile(), and the documentation is written
|
||||||
assuming that this is the case.
|
assuming that this is the case.
|
||||||
|
|
||||||
In addition to the Perl-compatible matching function, PCRE2 contains an
|
In addition to the Perl-compatible matching function, PCRE2 contains an
|
||||||
alternative function that matches the same compiled patterns in a dif-
|
alternative function that matches the same compiled patterns in a dif-
|
||||||
ferent way. In certain circumstances, the alternative function has some
|
ferent way. In certain circumstances, the alternative function has some
|
||||||
advantages. For a discussion of the two matching algorithms, see the
|
advantages. For a discussion of the two matching algorithms, see the
|
||||||
pcre2matching page.
|
pcre2matching page.
|
||||||
|
|
||||||
Details of exactly which Perl regular expression features are and are
|
Details of exactly which Perl regular expression features are and are
|
||||||
not supported by PCRE2 are given in separate documents. See the
|
not supported by PCRE2 are given in separate documents. See the
|
||||||
pcre2pattern and pcre2compat pages. There is a syntax summary in the
|
pcre2pattern and pcre2compat pages. There is a syntax summary in the
|
||||||
pcre2syntax page.
|
pcre2syntax page.
|
||||||
|
|
||||||
Some features of PCRE2 can be included, excluded, or changed when the
|
Some features of PCRE2 can be included, excluded, or changed when the
|
||||||
library is built. The pcre2_config() function makes it possible for a
|
library is built. The pcre2_config() function makes it possible for a
|
||||||
client to discover which features are available. The features them-
|
client to discover which features are available. The features them-
|
||||||
selves are described in the pcre2build page. Documentation about build-
|
selves are described in the pcre2build page. Documentation about build-
|
||||||
ing PCRE2 for various operating systems can be found in the README and
|
ing PCRE2 for various operating systems can be found in the README and
|
||||||
NON-AUTOTOOLS_BUILD files in the source distribution.
|
NON-AUTOTOOLS_BUILD files in the source distribution.
|
||||||
|
|
||||||
The libraries contains a number of undocumented internal functions and
|
The libraries contains a number of undocumented internal functions and
|
||||||
data tables that are used by more than one of the exported external
|
data tables that are used by more than one of the exported external
|
||||||
functions, but which are not intended for use by external callers.
|
functions, but which are not intended for use by external callers.
|
||||||
Their names all begin with "_pcre2", which hopefully will not provoke
|
Their names all begin with "_pcre2", which hopefully will not provoke
|
||||||
any name clashes. In some environments, it is possible to control which
|
any name clashes. In some environments, it is possible to control which
|
||||||
external symbols are exported when a shared library is built, and in
|
external symbols are exported when a shared library is built, and in
|
||||||
these cases the undocumented symbols are not exported.
|
these cases the undocumented symbols are not exported.
|
||||||
|
|
||||||
|
|
||||||
SECURITY CONSIDERATIONS
|
SECURITY CONSIDERATIONS
|
||||||
|
|
||||||
If you are using PCRE2 in a non-UTF application that permits users to
|
If you are using PCRE2 in a non-UTF application that permits users to
|
||||||
supply arbitrary patterns for compilation, you should be aware of a
|
supply arbitrary patterns for compilation, you should be aware of a
|
||||||
feature that allows users to turn on UTF support from within a pattern.
|
feature that allows users to turn on UTF support from within a pattern.
|
||||||
For example, an 8-bit pattern that begins with "(*UTF)" turns on UTF-8
|
For example, an 8-bit pattern that begins with "(*UTF)" turns on UTF-8
|
||||||
mode, which interprets patterns and subjects as strings of UTF-8 code
|
mode, which interprets patterns and subjects as strings of UTF-8 code
|
||||||
units instead of individual 8-bit characters. This causes both the pat-
|
units instead of individual 8-bit characters. This causes both the pat-
|
||||||
tern and any data against which it is matched to be checked for UTF-8
|
tern and any data against which it is matched to be checked for UTF-8
|
||||||
validity. If the data string is very long, such a check might use suf-
|
validity. If the data string is very long, such a check might use suf-
|
||||||
ficiently many resources as to cause your application to lose perfor-
|
ficiently many resources as to cause your application to lose perfor-
|
||||||
mance.
|
mance.
|
||||||
|
|
||||||
One way of guarding against this possibility is to use the pcre2_pat-
|
One way of guarding against this possibility is to use the pcre2_pat-
|
||||||
tern_info() function to check the compiled pattern's options for
|
tern_info() function to check the compiled pattern's options for
|
||||||
PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when
|
PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when
|
||||||
calling pcre2_compile(). This causes a compile time error if the pat-
|
calling pcre2_compile(). This causes a compile time error if the pat-
|
||||||
tern contains a UTF-setting sequence.
|
tern contains a UTF-setting sequence.
|
||||||
|
|
||||||
The use of Unicode properties for character types such as \d can also
|
The use of Unicode properties for character types such as \d can also
|
||||||
be enabled from within the pattern, by specifying "(*UCP)". This fea-
|
be enabled from within the pattern, by specifying "(*UCP)". This fea-
|
||||||
ture can be disallowed by setting the PCRE2_NEVER_UCP option.
|
ture can be disallowed by setting the PCRE2_NEVER_UCP option.
|
||||||
|
|
||||||
If your application is one that supports UTF, be aware that validity
|
If your application is one that supports UTF, be aware that validity
|
||||||
checking can take time. If the same data string is to be matched many
|
checking can take time. If the same data string is to be matched many
|
||||||
times, you can use the PCRE2_NO_UTF_CHECK option for the second and
|
times, you can use the PCRE2_NO_UTF_CHECK option for the second and
|
||||||
subsequent matches to avoid running redundant checks.
|
subsequent matches to avoid running redundant checks.
|
||||||
|
|
||||||
The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead
|
The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead
|
||||||
to problems, because it may leave the current matching point in the
|
to problems, because it may leave the current matching point in the
|
||||||
middle of a multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C op-
|
middle of a multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C op-
|
||||||
tion can be used by an application to lock out the use of \C, causing a
|
tion can be used by an application to lock out the use of \C, causing a
|
||||||
compile-time error if it is encountered. It is also possible to build
|
compile-time error if it is encountered. It is also possible to build
|
||||||
PCRE2 with the use of \C permanently disabled.
|
PCRE2 with the use of \C permanently disabled.
|
||||||
|
|
||||||
Another way that performance can be hit is by running a pattern that
|
Another way that performance can be hit is by running a pattern that
|
||||||
has a very large search tree against a string that will never match.
|
has a very large search tree against a string that will never match.
|
||||||
Nested unlimited repeats in a pattern are a common example. PCRE2 pro-
|
Nested unlimited repeats in a pattern are a common example. PCRE2 pro-
|
||||||
vides some protection against this: see the pcre2_set_match_limit()
|
vides some protection against this: see the pcre2_set_match_limit()
|
||||||
function in the pcre2api page. There is a similar function called
|
function in the pcre2api page. There is a similar function called
|
||||||
pcre2_set_depth_limit() that can be used to restrict the amount of mem-
|
pcre2_set_depth_limit() that can be used to restrict the amount of mem-
|
||||||
ory that is used.
|
ory that is used.
|
||||||
|
|
||||||
|
|
||||||
USER DOCUMENTATION
|
USER DOCUMENTATION
|
||||||
|
|
||||||
The user documentation for PCRE2 comprises a number of different sec-
|
The user documentation for PCRE2 comprises a number of different sec-
|
||||||
tions. In the "man" format, each of these is a separate "man page". In
|
tions. In the "man" format, each of these is a separate "man page". In
|
||||||
the HTML format, each is a separate page, linked from the index page.
|
the HTML format, each is a separate page, linked from the index page.
|
||||||
In the plain text format, the descriptions of the pcre2grep and
|
In the plain text format, the descriptions of the pcre2grep and
|
||||||
pcre2test programs are in files called pcre2grep.txt and pcre2test.txt,
|
pcre2test programs are in files called pcre2grep.txt and pcre2test.txt,
|
||||||
respectively. The remaining sections, except for the pcre2demo section
|
respectively. The remaining sections, except for the pcre2demo section
|
||||||
(which is a program listing), and the short pages for individual func-
|
(which is a program listing), and the short pages for individual func-
|
||||||
tions, are concatenated in pcre2.txt, for ease of searching. The sec-
|
tions, are concatenated in pcre2.txt, for ease of searching. The sec-
|
||||||
tions are as follows:
|
tions are as follows:
|
||||||
|
|
||||||
pcre2 this document
|
pcre2 this document
|
||||||
|
@ -165,24 +166,23 @@ USER DOCUMENTATION
|
||||||
pcre2test description of the pcre2test command
|
pcre2test description of the pcre2test command
|
||||||
pcre2unicode discussion of Unicode and UTF support
|
pcre2unicode discussion of Unicode and UTF support
|
||||||
|
|
||||||
In the "man" and HTML formats, there is also a short page for each C
|
In the "man" and HTML formats, there is also a short page for each C
|
||||||
library function, listing its arguments and results.
|
library function, listing its arguments and results.
|
||||||
|
|
||||||
|
|
||||||
AUTHOR
|
AUTHOR
|
||||||
|
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
|
|
||||||
Putting an actual email address here is a spam magnet. If you want to
|
Putting an actual email address here is a spam magnet. If you want to
|
||||||
email me, use my two initials, followed by the two digits 10, at the
|
email me, use my two names separated by a dot at google.com.
|
||||||
domain cam.ac.uk.
|
|
||||||
|
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 28 April 2021
|
Last updated: 25 August 2021
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2021 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue