File tidies for 10.23-RC1

This commit is contained in:
Philip.Hazel 2017-01-16 17:40:47 +00:00
parent 31c62034fe
commit ca1ecd41a1
44 changed files with 1519 additions and 1351 deletions

View File

@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2016 University of Cambridge
Copyright (c) 1997-2017 University of Cambridge
All rights reserved
@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2016 Zoltan Herczeg
Copyright(c) 2010-2017 Zoltan Herczeg
All rights reserved.
@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2016 Zoltan Herczeg
Copyright(c) 2009-2017 Zoltan Herczeg
All rights reserved.
####

View File

@ -2,8 +2,8 @@ Change Log for PCRE2
--------------------
Version 10.23 xx-xxxxxx-2016
----------------------------
Version 10.23 16-January-2017
-----------------------------
1. Extended pcre2test with the utf8_input modifier so that it is able to
generate all possible 16-bit and 32-bit code unit values in non-UTF modes.
@ -37,12 +37,12 @@ some minor bugs and Perl incompatibilities were fixed, including:
(f) When testing zero-terminated patterns under valgrind, the terminating
zero is now marked "no access". This catches bugs that would otherwise
show up only with non-zero-terminated patterns.
(g) A hyphen appearing immediately after a POSIX character class (for example
/[[:ascii:]-z]/) now generates an error. Perl does accept this as a
literal, but gives a warning, so it seems best to fail it in PCRE.
(h) An empty \Q\E sequence may appear after a callout that precedes an
(g) A hyphen appearing immediately after a POSIX character class (for example
/[[:ascii:]-z]/) now generates an error. Perl does accept this as a
literal, but gives a warning, so it seems best to fail it in PCRE.
(h) An empty \Q\E sequence may appear after a callout that precedes an
assertion condition (it is, of course, ignored).
One effect of the refactoring is that some error numbers and messages have
@ -125,9 +125,9 @@ fully released code, but are noted here for the record.
(r) If a character whose code point was greater than 0xffff appeared within
a lookbehind that was within another lookbehind, the calculation of the
lookbehind length went wrong and could provoke an internal error.
(t) The sequence \E- or \Q\E- after a POSIX class in a character class caused
an internal error. Now the hyphen is treated as a literal.
an internal error. Now the hyphen is treated as a literal.
4. Back references are now permitted in lookbehind assertions when there are
no duplicated group numbers (that is, (?| has not been used), and, if the
@ -269,14 +269,14 @@ auto_callout".
pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide
characters to match (for example, /[\s[:^ascii:]]/).
40. The callout_error modifier has been added to pcre2test to make it possible
40. The callout_error modifier has been added to pcre2test to make it possible
to return PCRE2_ERROR_CALLOUT from a callout.
41. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of
41. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of
"<esc>[00m".
42. The limit in the auto-possessification code that was intended to catch
overly-complicated patterns and not spend too much time auto-possessifying was
42. The limit in the auto-possessification code that was intended to catch
overly-complicated patterns and not spend too much time auto-possessifying was
being reset too often, resulting in very long compile times for some patterns.
Now such patterns are no longer completely auto-possessified.
@ -284,10 +284,10 @@ Now such patterns are no longer completely auto-possessified.
44. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood.
45. Minor cosmetic fix to pcre2test: move a variable that is not used under
45. Minor cosmetic fix to pcre2test: move a variable that is not used under
Windows into the "not Windows" code.
46. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy
46. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy
some of the code:
* normalised the Windows condition by ensuring WIN32 is defined;
@ -306,27 +306,27 @@ some of the code:
* fixes a typo.
In addition, colour settings containing anything other than digits and
semicolon are ignored, and the colour controls are no longer output for empty
semicolon are ignored, and the colour controls are no longer output for empty
strings.
47. Detecting patterns that are too large inside the length-measuring loop
saves processing ridiculously long patterns to their end.
48. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it
48. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it
just wastes time. In the UTF case it can also produce redundant entries in
XCLASS lists caused by characters with multiple other cases and pairs of
characters in the same "not-x" sublists.
49. A pattern such as /(?=(a\K))/ can report the end of the match being before
its start; pcre2test was not handling this correctly when using the POSIX
49. A pattern such as /(?=(a\K))/ can report the end of the match being before
its start; pcre2test was not handling this correctly when using the POSIX
interface (it was OK with the native interface).
50. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will
continue to work, falling back to interpretation if anything goes wrong with
50. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will
continue to work, falling back to interpretation if anything goes wrong with
JIT.
51. Applied patches from Christian Persch to configure.ac to make use of the
AC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT
51. Applied patches from Christian Persch to configure.ac to make use of the
AC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT
modules.
52. Minor fixed to pcre2grep from Jason Hood:
@ -335,7 +335,7 @@ modules.
to use appropriate quotes [in an example];
* LC_ALL was displayed as "LCC_ALL";
* numbers 11, 12 & 13 should end in "th".
* use double quotes in usage message
* use double quotes in usage message
Version 10.22 29-July-2016

View File

@ -25,7 +25,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England.
Copyright (c) 1997-2016 University of Cambridge
Copyright (c) 1997-2017 University of Cambridge
All rights reserved.
@ -36,7 +36,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2010-2016 Zoltan Herczeg
Copyright(c) 2010-2017 Zoltan Herczeg
All rights reserved.
@ -47,7 +47,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester
Emain domain: freemail.hu
Copyright(c) 2009-2016 Zoltan Herczeg
Copyright(c) 2009-2017 Zoltan Herczeg
All rights reserved.

View File

@ -3,6 +3,10 @@
AUTOMAKE_OPTIONS = subdir-objects
ACLOCAL_AMFLAGS = -I m4
## This seems to have become necessary for building in non-source directory.
AM_CPPFLAGS="-I$(srcdir)/src"
## Specify the documentation files that are distributed.
dist_doc_DATA = \

36
NEWS
View File

@ -1,6 +1,42 @@
News about PCRE2 releases
-------------------------
Version 10.23 16-January-2017
-----------------------------
1. ChangeLog has the details of a lot of bug fixes and tidies.
2. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
checking is now done in the pre-pass that identifies capturing groups. This has
reduced the amount of duplication and made the code tidier. While doing this,
some minor bugs and Perl incompatibilities were fixed (see ChangeLog for
details.)
3. Back references are now permitted in lookbehind assertions when there are
no duplicated group numbers (that is, (?| has not been used), and, if the
reference is by name, there is only one group of that name. The referenced
group must, of course be of fixed length.
4. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
not recognize this syntax.
5. pcre2grep now automatically expands its buffer up to a maximum set by
--max-buffer-size.
6. The -t option (grand total) has been added to pcre2grep.
7. A new function called pcre2_code_copy_with_tables() exists to copy a
compiled pattern along with a private copy of the character tables that is
uses.
8. A user supplied a number of patches to upgrade pcre2grep under Windows and
tidy the code.
9. Several updates have been made to pcre2test and test scripts (see
ChangeLog).
Version 10.22 29-July-2016
--------------------------

40
README
View File

@ -337,18 +337,18 @@ library. They are also documented in the pcre2build man page.
--with-pcre2grep-bufsize=51200
The value must be a plain integer. The default is 20480. The amount of memory
used by pcre2grep is actually three times this number, to allow for "before"
and "after" lines. If very long lines are encountered, the buffer is
The value must be a plain integer. The default is 20480. The amount of memory
used by pcre2grep is actually three times this number, to allow for "before"
and "after" lines. If very long lines are encountered, the buffer is
automatically enlarged, up to a fixed maximum size.
. The default maximum size of pcre2grep's internal buffer can be set by, for
. The default maximum size of pcre2grep's internal buffer can be set by, for
example:
--with-pcre2grep-max-bufsize=2097152
The default is either 1048576 or the value of --with-pcre2grep-bufsize,
whichever is the larger.
The default is either 1048576 or the value of --with-pcre2grep-bufsize,
whichever is the larger.
. It is possible to compile pcre2test so that it links with the libreadline
or libedit libraries, by specifying, respectively,
@ -372,21 +372,21 @@ library. They are also documented in the pcre2build man page.
If you get error messages about missing functions tgetstr, tgetent, tputs,
tgetflag, or tgoto, this is the problem, and linking with the ncurses library
should fix it.
. There is a special option called --enable-fuzz-support for use by people who
want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
. There is a special option called --enable-fuzz-support for use by people who
want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
be built, but not installed. This contains a single function called
LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
length of the string. When called, this function tries to compile the string
as a pattern, and if that succeeds, to match it. This is done both with no
LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
length of the string. When called, this function tries to compile the string
as a pattern, and if that succeeds, to match it. This is done both with no
options and with some random options bits that are generated from the string.
Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
be created. This is normally run under valgrind or used when PCRE2 is
compiled with address sanitizing enabled. It calls the fuzzing function and
outputs information about it is doing. The input strings are specified by
arguments: if an argument starts with "=" the rest of it is a literal input
string. Otherwise, it is assumed to be a file name, and the contents of the
outputs information about it is doing. The input strings are specified by
arguments: if an argument starts with "=" the rest of it is a literal input
string. Otherwise, it is assumed to be a file name, and the contents of the
file are the test string.
The "configure" script builds the following files for the basic C library:
@ -777,7 +777,7 @@ The distribution should contain the files listed below.
src/pcre2_xclass.c )
src/pcre2_printint.c debugging function that is used by pcre2test,
src/pcre2_fuzzsupport.c function for (optional) fuzzing support
src/pcre2_fuzzsupport.c function for (optional) fuzzing support
src/config.h.in template for config.h, when built by "configure"
src/pcre2.h.in template for pcre2.h when built by "configure"

View File

@ -520,7 +520,7 @@ for bmode in "$test8" "$test16" "$test32"; do
echo $title2 "(excluding UTF-$bits) (64M stack)"
$sim $valgrind ${opt:+$vjs} ./pcre2test -q -S 64 $bmode $opt $testdata/testinput2 testtry
if [ $? = 0 ] ; then
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -63,-62,-2,-1,0,100,188,189 >>testtry
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -63,-62,-2,-1,0,100,188,189,190,191 >>testtry
checkresult $? 2 "$opt"
else
echo " "

View File

@ -11,16 +11,16 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre2_major, [10])
m4_define(pcre2_minor, [23])
m4_define(pcre2_prerelease, [-RC1])
m4_define(pcre2_date, [2016-08-01])
m4_define(pcre2_date, [2017-01-16])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre2_8_version, [4:0:4])
m4_define(libpcre2_16_version, [4:0:4])
m4_define(libpcre2_32_version, [4:0:4])
m4_define(libpcre2_posix_version, [1:0:0])
m4_define(libpcre2_8_version, [5:0:5])
m4_define(libpcre2_16_version, [5:0:5])
m4_define(libpcre2_32_version, [5:0:5])
m4_define(libpcre2_posix_version, [1:1:0])
AC_PREREQ(2.57)
AC_INIT(PCRE2, pcre2_major.pcre2_minor[]pcre2_prerelease, , pcre2)
@ -630,7 +630,7 @@ fi
AC_DEFINE_UNQUOTED([PCRE2GREP_BUFSIZE], [$with_pcre2grep_bufsize], [
The value of PCRE2GREP_BUFSIZE is the starting size of the buffer used by
pcre2grep to hold parts of the file it is searching. The buffer will be
expanded up to PCRE2GREP_MAX_BUFSIZE if necessary, for files containing very
expanded up to PCRE2GREP_MAX_BUFSIZE if necessary, for files containing very
long lines. The actual amount of memory used by pcre2grep is three times this
number, because it allows for the buffering of "before" and "after" lines.])
@ -936,7 +936,7 @@ $PACKAGE-$VERSION configuration summary:
C compiler flags ................... : ${CFLAGS} ${VISIBILITY_CFLAGS}
Linker flags ....................... : ${LDFLAGS}
Extra libraries .................... : ${LIBS}
Build 8-bit pcre2 library .......... : ${enable_pcre2_8}
Build 16-bit pcre2 library ......... : ${enable_pcre2_16}
Build 32-bit pcre2 library ......... : ${enable_pcre2_32}
@ -966,7 +966,7 @@ $PACKAGE-$VERSION configuration summary:
Link pcre2test with libreadline .... : ${enable_pcre2test_libreadline}
Valgrind support ................... : ${enable_valgrind}
Code coverage ...................... : ${enable_coverage}
Fuzzer support ..................... : ${enable_fuzz_support}
Fuzzer support ..................... : ${enable_fuzz_support}
EOF

View File

@ -337,18 +337,18 @@ library. They are also documented in the pcre2build man page.
--with-pcre2grep-bufsize=51200
The value must be a plain integer. The default is 20480. The amount of memory
used by pcre2grep is actually three times this number, to allow for "before"
and "after" lines. If very long lines are encountered, the buffer is
The value must be a plain integer. The default is 20480. The amount of memory
used by pcre2grep is actually three times this number, to allow for "before"
and "after" lines. If very long lines are encountered, the buffer is
automatically enlarged, up to a fixed maximum size.
. The default maximum size of pcre2grep's internal buffer can be set by, for
. The default maximum size of pcre2grep's internal buffer can be set by, for
example:
--with-pcre2grep-max-bufsize=2097152
The default is either 1048576 or the value of --with-pcre2grep-bufsize,
whichever is the larger.
The default is either 1048576 or the value of --with-pcre2grep-bufsize,
whichever is the larger.
. It is possible to compile pcre2test so that it links with the libreadline
or libedit libraries, by specifying, respectively,
@ -372,21 +372,21 @@ library. They are also documented in the pcre2build man page.
If you get error messages about missing functions tgetstr, tgetent, tputs,
tgetflag, or tgoto, this is the problem, and linking with the ncurses library
should fix it.
. There is a special option called --enable-fuzz-support for use by people who
want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
. There is a special option called --enable-fuzz-support for use by people who
want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
be built, but not installed. This contains a single function called
LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
length of the string. When called, this function tries to compile the string
as a pattern, and if that succeeds, to match it. This is done both with no
LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
length of the string. When called, this function tries to compile the string
as a pattern, and if that succeeds, to match it. This is done both with no
options and with some random options bits that are generated from the string.
Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
be created. This is normally run under valgrind or used when PCRE2 is
compiled with address sanitizing enabled. It calls the fuzzing function and
outputs information about it is doing. The input strings are specified by
arguments: if an argument starts with "=" the rest of it is a literal input
string. Otherwise, it is assumed to be a file name, and the contents of the
outputs information about it is doing. The input strings are specified by
arguments: if an argument starts with "=" the rest of it is a literal input
string. Otherwise, it is assumed to be a file name, and the contents of the
file are the test string.
The "configure" script builds the following files for the basic C library:
@ -777,7 +777,7 @@ The distribution should contain the files listed below.
src/pcre2_xclass.c )
src/pcre2_printint.c debugging function that is used by pcre2test,
src/pcre2_fuzzsupport.c function for (optional) fuzzing support
src/pcre2_fuzzsupport.c function for (optional) fuzzing support
src/config.h.in template for config.h, when built by "configure"
src/pcre2.h.in template for pcre2.h when built by "configure"

View File

@ -27,7 +27,7 @@ DESCRIPTION
<P>
This function makes a copy of the memory used for a compiled pattern, excluding
any memory used by the JIT compiler. Without a subsequent call to
<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching.
<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching.
Unlike <b>pcre2_code_copy()</b>, a separate copy of the character tables is also
made, with the new code pointing to it. This memory will be automatically freed
when <b>pcre2_code_free()</b> is called. The yield of the function is NULL if

View File

@ -45,8 +45,8 @@ is <b>pcre2_match()</b>.) The arguments for this function are:
<i>wscount</i> Number of elements in the vector
</pre>
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
up a callout function. The <i>length</i> and <i>startoffset</i> values are code
units, not characters. The options are:
up a callout function or specify the recursion limit. The <i>length</i> and
<i>startoffset</i> values are code units, not characters. The options are:
<pre>
PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject is not the beginning of a line

View File

@ -570,7 +570,7 @@ If JIT is being used, but the JIT compilation is not being done immediately,
(perhaps waiting to see if the pattern is used often enough) similar logic is
required. JIT compilation updates a pointer within the compiled code block, so
a thread must gain unique write access to the pointer before calling
<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> or
<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> or
<b>pcre2_code_copy_with_tables()</b> can be used to obtain a private copy of the
compiled code.
</P>
@ -740,7 +740,7 @@ functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
<br>
This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
depth of parenthesis nesting in a pattern. This limit stops rogue patterns
using up too much system stack when being compiled. The limit applies to
using up too much system stack when being compiled. The limit applies to
parentheses of all kinds, not just capturing parentheses.
<b>int pcre2_set_compile_recursion_guard(pcre2_compile_context *<i>ccontext</i>,</b>
<b> int (*<i>guard_function</i>)(uint32_t, void *), void *<i>user_data</i>);</b>
@ -879,21 +879,23 @@ This limit is of use only if it is set smaller than <i>match_limit</i>.
Limiting the recursion depth limits the amount of system stack that can be
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
and is ignored, when matching is done using JIT compiled code or by the
<b>pcre2_dfa_match()</b> function.
and is ignored, when matching is done using JIT compiled code. However, it is
supported by <b>pcre2_dfa_match()</b>, which uses recursive function calls less
frequently than <b>pcre2_match()</b>, but which can be caused to use a lot of
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
</P>
<P>
The default value for <i>recursion_limit</i> can be set when PCRE2 is built; the
default default is the same value as the default for <i>match_limit</i>. If the
limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_RECURSIONLIMIT. A
value for the recursion limit may also be supplied by an item at the start of a
pattern of the form
limit is exceeded, <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> return
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
supplied by an item at the start of a pattern of the form
<pre>
(*LIMIT_RECURSION=ddd)
</pre>
where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
limit is set, less than the default.
less than the limit set by the caller of <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
<b>int pcre2_set_recursion_memory_management(</b>
<b> pcre2_match_context *<i>mcontext</i>,</b>
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
@ -1087,20 +1089,20 @@ if the code has been processed by the JIT compiler (see
<a href="#jitcompiling">below),</a>
the JIT information cannot be copied (because it is position-dependent).
The new copy can initially be used only for non-JIT matching, though it can be
passed to <b>pcre2_jit_compile()</b> if required.
passed to <b>pcre2_jit_compile()</b> if required.
</P>
<P>
The <b>pcre2_code_copy()</b> function provides a way for individual threads in a
multithreaded application to acquire a private copy of shared compiled code.
However, it does not make a copy of the character tables used by the compiled
multithreaded application to acquire a private copy of shared compiled code.
However, it does not make a copy of the character tables used by the compiled
pattern; the new pattern code points to the same tables as the original code.
(See
<a href="#jitcompiling">"Locale Support"</a>
below for details of these character tables.) In many applications the same
tables are used throughout, so this behaviour is appropriate. Nevertheless,
there are occasions when a copy of a compiled pattern and the relevant tables
are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
Copies of both the code and the tables are made, with the new code pointing to
tables are used throughout, so this behaviour is appropriate. Nevertheless,
there are occasions when a copy of a compiled pattern and the relevant tables
are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
Copies of both the code and the tables are made, with the new code pointing to
the new tables. The memory for the new tables is automatically freed when
<b>pcre2_code_free()</b> is called for the new copy of the compiled code.
</P>
@ -1140,7 +1142,7 @@ NULL immediately. Otherwise, the variables to which these point are set to an
error code and an offset (number of code units) within the pattern,
respectively, when <b>pcre2_compile()</b> returns NULL because a compilation
error has occurred. The values are not defined when compilation is successful
and <b>pcre2_compile()</b> returns a non-NULL value.
and <b>pcre2_compile()</b> returns a non-NULL value.
</P>
<P>
The value returned in <i>erroroffset</i> is an indication of where in the
@ -1976,7 +1978,7 @@ Information about a successful or unsuccessful match is placed in a match
data block, which is an opaque structure that is accessed by function calls. In
particular, the match data block contains a vector of offsets into the subject
string that define the matched part of the subject and any substrings that were
captured. This is know as the <i>ovector</i>.
captured. This is known as the <i>ovector</i>.
</P>
<P>
Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
@ -3263,7 +3265,7 @@ Cambridge, England.
</P>
<br><a name="SEC41" href="#TOC1">REVISION</a><br>
<P>
Last updated: 22 November 2016
Last updated: 23 December 2016
<br>
Copyright &copy; 1997-2016 University of Cambridge.
<br>

View File

@ -380,15 +380,15 @@ scanning, in order to be able to output "before" and "after" lines when it
finds a match. The starting size of the buffer is controlled by a parameter
whose default value is 20K. The buffer itself is three times this size, but
because of the way it is used for holding "before" lines, the longest line that
is guaranteed to be processable is the parameter size. If a longer line is
is guaranteed to be processable is the parameter size. If a longer line is
encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
specified maximum size, whose default is 1M or the starting size, whichever is
the larger. You can change the default parameter values by adding, for example,
<pre>
--with-pcre2grep-bufsize=51200
--with-pcre2grep-max-bufsize=2097152
--with-pcre2grep-max-bufsize=2097152
</pre>
to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override
to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override
these values by using --buffer-size and --max-buffer-size on the command line.
</P>
<br><a name="SEC18" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>

View File

@ -64,7 +64,7 @@ For example, if PCRE2_AUTO_CALLOUT is used with the pattern
</pre>
it is processed as if it were
<pre>
(?C255)A(?C3)B(?C255)
(?C255)A(?C3)B(?C255)
</pre>
Here is a more complicated example:
<pre>

View File

@ -133,8 +133,8 @@ an error is given at compile time.
<P>
14. Perl used to recognize comments in some places that PCRE2 does not, for
example, between the ( and ? at the start of a subpattern. If the /x modifier
is set, Perl allowed white space between ( and ? though the latest Perls give
an error (for a while it was just deprecated). There may still be some cases
is set, Perl allowed white space between ( and ? though the latest Perls give
an error (for a while it was just deprecated). There may still be some cases
where Perl behaves differently.
</P>
<P>
@ -159,7 +159,7 @@ list is with respect to Perl 5.10:
<br>
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
each alternative branch of a lookbehind assertion can match a different length
of string. Perl requires them all to have the same length.
of string. Perl requires them all to have the same length.
<br>
<br>
(b) From PCRE2 10.23, back references to groups of fixed length are supported

View File

@ -91,7 +91,7 @@ if a line is too long and the buffer can no longer be expanded.
</P>
<P>
The block of memory that is actually used is three times the "buffer size", to
allow for buffering "before" and "after" lines. If the buffer size is too
allow for buffering "before" and "after" lines. If the buffer size is too
small, fewer than requested "before" and "after" lines may be output.
</P>
<P>
@ -178,8 +178,8 @@ Treat binary files as text. This is equivalent to
</P>
<P>
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
Output up to <i>number</i> lines of context before each matching line. Fewer
lines are output if the previous match or the start of the file is within
Output up to <i>number</i> lines of context before each matching line. Fewer
lines are output if the previous match or the start of the file is within
<i>number</i> lines, or if the processing buffer size has been set too small. If
file names and/or line numbers are being output, a hyphen separator is used
instead of a colon for the context lines. A line containing "--" is output
@ -202,8 +202,8 @@ return code.
</P>
<P>
<b>--buffer-size=</b><i>number</i>
Set the parameter that controls how much memory is obtained at the start of
processing for buffering files that are being scanned. See also
Set the parameter that controls how much memory is obtained at the start of
processing for buffering files that are being scanned. See also
<b>--max-buffer-size</b> below.
</P>
<P>
@ -222,7 +222,7 @@ suppressed lines than the count (that is, the number of matches).
<br>
<br>
If no lines are selected, the number zero is output. If several files are are
being scanned, a count is output for each of them and the <b>-t</b> option can
being scanned, a count is output for each of them and the <b>-t</b> option can
be used to cause a total to be output at the end. However, if the
<b>--files-with-matches</b> option is also used, only those files whose counts
are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
@ -245,13 +245,23 @@ because <b>pcre2grep</b> has to search for all possible matches in a line, not
just one, in order to colour them all.
<br>
<br>
The colour that is used can be specified by setting the environment variable
PCRE2GREP_COLOUR or PCRE2GREP_COLOR. If neither of these are set,
<b>pcre2grep</b> looks for GREP_COLOUR or GREP_COLOR. The value of the variable
should be a string of two numbers, separated by a semicolon. They are copied
directly into the control string for setting colour on a terminal, so it is
your responsibility to ensure that they make sense. If neither of the
environment variables is set, the default is "1;31", which gives red.
The colour that is used can be specified by setting one of the environment
variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, PCREGREP_COLOUR, or
PCREGREP_COLOR, which are checked in that order. If none of these are set,
<b>pcre2grep</b> looks for GREP_COLORS or GREP_COLOR (in that order). The value
of the variable should be a string of two numbers, separated by a semicolon,
except in the case of GREP_COLORS, which must start with "ms=" or "mt="
followed by two semicolon-separated colours, terminated by the end of the
string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is
ignored, and GREP_COLOR is checked.
<br>
<br>
If the string obtained from one of the above variables contains any characters
other than semicolon or digits, the setting is ignored and the default colour
is used. The string is copied directly into the control string for setting
colour on a terminal, so it is your responsibility to ensure that the values
make sense. If no relevant environment variable is set, the default is "1;31",
which gives red.
</P>
<P>
<b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
@ -353,7 +363,7 @@ tested before the file's patterns. However, no other pattern is taken from the
command line; all arguments are treated as the names of paths to be searched.
</P>
<P>
<b>--file-list</b>=<i>filename</i>
<b>--file-list</b>=<i>filename</i>
Read a list of files and/or directories that are to be scanned from the given
file, one per line. Trailing white space is removed from each line, and blank
lines are ignored. These paths are processed before any that are listed on the
@ -518,8 +528,8 @@ when the PCRE2 library is compiled, with the default default being 10 million.
</P>
<P>
\fB--max-buffer-size=<i>number</i>
This limits the expansion of the processing buffer, whose initial size can be
set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no
This limits the expansion of the processing buffer, whose initial size can be
set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no
smaller than the starting buffer size.
</P>
<P>
@ -619,7 +629,7 @@ capturing parentheses do not exist in the pattern, or were not set in the
match, nothing is output unless the file name or line number are being output.
<br>
<br>
If this option is given multiple times, multiple substrings are output for each
If this option is given multiple times, multiple substrings are output for each
match, in the order the options are given, and all on one line. For example,
-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
then 3 again to be output. By default, there is no separator (but see the next
@ -693,7 +703,7 @@ specified by any of the <b>--include</b> or <b>--exclude</b> options.
<P>
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. In multiline mode
a line) and in addition, require them to match entire lines. In multiline mode
the match may be more than one line. This is equivalent to having \A and \Z
characters at the start and end of each alternative top-level branch in every
pattern. This option applies only to the patterns that are matched against the
@ -773,12 +783,12 @@ character. Otherwise <b>pcre2grep</b> will assume that it has no data.
</P>
<br><a name="SEC10" href="#TOC1">CALLING EXTERNAL SCRIPTS</a><br>
<P>
On non-Windows systems, <b>pcre2grep</b> has, by default, support for calling
external programs or scripts during matching by making use of PCRE2's callout
facility. However, this support can be disabled when <b>pcre2grep</b> is built.
You can find out whether your binary has support for callouts by running it
with the <b>--help</b> option. If the support is not enabled, all callouts in
patterns are ignored by <b>pcre2grep</b>.
<b>pcre2grep</b> has, by default, support for calling external programs or
scripts during matching by making use of PCRE2's callout facility. However,
this support can be disabled when <b>pcre2grep</b> is built. You can find out
whether your binary has support for callouts by running it with the <b>--help</b>
option. If the support is not enabled, all callouts in patterns are ignored by
<b>pcre2grep</b>.
</P>
<P>
A callout in a PCRE2 pattern is of the form (?C&#60;arg&#62;) where the argument is
@ -860,7 +870,7 @@ Cambridge, England.
</P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P>
Last updated: 31 October 2016
Last updated: 31 December 2016
<br>
Copyright &copy; 1997-2016 University of Cambridge.
<br>

View File

@ -62,8 +62,8 @@ There is no limit to the number of parenthesized subpatterns, but there can be
no more than 65535 capturing subpatterns. There is, however, a limit to the
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
order to limit the amount of system stack used at compile time. The default
limit can be specified when PCRE2 is built; the default default is 250. An
application can change this limit by calling pcre2_set_parens_nest_limit() to
limit can be specified when PCRE2 is built; the default default is 250. An
application can change this limit by calling pcre2_set_parens_nest_limit() to
set the limit in a compile context.
</P>
<P>
@ -76,7 +76,7 @@ is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
32-bit libraries.
</P>
<P>
The maximum length of a string argument to a callout is the largest number a
The maximum length of a string argument to a callout is the largest number a
32-bit unsigned integer can hold.
</P>
<br><b>

View File

@ -190,6 +190,12 @@ be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used.
</P>
<P>
The match limit is used (but in a different way) when JIT is being used, but it
is not relevant, and is ignored, when matching with <b>pcre2_dfa_match()</b>.
However, the recursion limit is relevant for DFA matching, which does use some
function recursion, in particular, for recursions within the pattern.
<a name="newlines"></a></P>
<br><b>
Newline conventions
@ -1347,10 +1353,10 @@ or immediately after a range. For example, [b-d-z] matches letters in the range
b to d, a hyphen character, or z.
</P>
<P>
Perl treats a hyphen as a literal if it appears before a POSIX class (see
below) or a character type escape such as as \d, but gives a warning in its
warning mode, as this is most likely a user error. As PCRE2 has no facility for
warning, an error is given in these cases.
Perl treats a hyphen as a literal if it appears before or after a POSIX class
(see below) or a character type escape such as as \d, but gives a warning in
its warning mode, as this is most likely a user error. As PCRE2 has no facility
for warning, an error is given in these cases.
</P>
<P>
It is not possible to have the literal character "]" as the end character of a
@ -2087,7 +2093,7 @@ subpattern is possible using named parentheses (see below).
</P>
<P>
Another way of avoiding the ambiguity inherent in the use of digits following a
backslash is to use the \g escape sequence. This escape must be followed by a
backslash is to use the \g escape sequence. This escape must be followed by a
signed or unsigned number, optionally enclosed in braces. These examples are
all identical:
<pre>
@ -2108,8 +2114,8 @@ can be helpful in long patterns, and also in patterns that are created by
joining together fragments that contain references within themselves.
</P>
<P>
The sequence \g{+1} is a reference to the next capturing subpattern. This kind
of forward reference can be useful it patterns that repeat. Perl does not
The sequence \g{+1} is a reference to the next capturing subpattern. This kind
of forward reference can be useful it patterns that repeat. Perl does not
support the use of + in this way.
</P>
<P>
@ -2212,11 +2218,11 @@ capturing is carried out only for positive assertions. (Perl sometimes, but not
always, does do capturing in negative assertions.)
</P>
<P>
WARNING: If a positive assertion containing one or more capturing subpatterns
succeeds, but failure to match later in the pattern causes backtracking over
this assertion, the captures within the assertion are reset only if no higher
numbered captures are already set. This is, unfortunately, a fundamental
limitation of the current implementation; it may get removed in a future
WARNING: If a positive assertion containing one or more capturing subpatterns
succeeds, but failure to match later in the pattern causes backtracking over
this assertion, the captures within the assertion are reset only if no higher
numbered captures are already set. This is, unfortunately, a fundamental
limitation of the current implementation; it may get removed in a future
reworking.
</P>
<P>
@ -2478,7 +2484,7 @@ Checking for a used subpattern by name
<P>
Perl uses the syntax (?(&#60;name&#62;)...) or (?('name')...) to test for a used
subpattern by name. For compatibility with earlier versions of PCRE1, which had
this facility before Perl, the syntax (?(name)...) is also recognized. Note,
this facility before Perl, the syntax (?(name)...) is also recognized. Note,
however, that undelimited names consisting of the letter R followed by digits
are ambiguous (see the following section).
</P>
@ -2508,7 +2514,7 @@ If a condition is the string (R), and there is no subpattern with the name R,
the condition is true if matching is currently in a recursion or subroutine
call to the whole pattern or any subpattern. If digits follow the letter R, and
there is no subpattern with that name, the condition is true if the most recent
call is into a subpattern with the given number, which must exist somewhere in
call is into a subpattern with the given number, which must exist somewhere in
the overall pattern. This is a contrived example that is equivalent to a+b:
<pre>
((?(R1)a+|(?1)b))
@ -2523,11 +2529,11 @@ If a name preceded by ampersand follows the letter R, for example:
<pre>
(?(R&name)...)
</pre>
the condition is true if the most recent recursion is into a subpattern of that
the condition is true if the most recent recursion is into a subpattern of that
name (which must exist within the pattern).
</P>
<P>
This condition does not check the entire recursion stack. It tests only the
This condition does not check the entire recursion stack. It tests only the
current level. If the name used in a condition of this kind is a duplicate, the
test is applied to all subpatterns of the same name, and is true if any one of
them is the most recent recursion.
@ -3008,21 +3014,21 @@ depending on whether or not a name is present.
By default, for compatibility with Perl, a name is any sequence of characters
that does not include a closing parenthesis. The name is not processed in
any way, and it is not possible to include a closing parenthesis in the name.
This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
is no longer Perl-compatible.
This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
is no longer Perl-compatible.
</P>
<P>
When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
and only an unescaped closing parenthesis terminates the name. However, the
only backslash items that are permitted are \Q, \E, and sequences such as
\x{100} that define character code points. Character type escapes such as \d
and only an unescaped closing parenthesis terminates the name. However, the
only backslash items that are permitted are \Q, \E, and sequences such as
\x{100} that define character code points. Character type escapes such as \d
are faulted.
</P>
<P>
A closing parenthesis can be included in a name either as \) or between \Q
and \E. In addition to backslash processing, if the PCRE2_EXTENDED option is
also set, unescaped whitespace in verb names is skipped, and #-comments are
recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not
recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not
affect verb names unless PCRE2_ALT_VERBNAMES is also set.
</P>
<P>
@ -3446,7 +3452,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 23 October 2016
Last updated: 27 December 2016
<br>
Copyright &copy; 1997-2016 University of Cambridge.
<br>

View File

@ -57,12 +57,13 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
Normally, these are never very deep, and the limit on the complexity of
<b>pcre2_dfa_match()</b> is controlled by the amount of workspace it is given.
However, it is possible to write patterns with runaway infinite recursions;
such patterns will cause <b>pcre2_dfa_match()</b> to run out of stack. At
present, there is no protection against this.
such patterns will cause <b>pcre2_dfa_match()</b> to run out of stack unless a
limit is applied (see below).
</P>
<P>
The comments that follow do NOT apply to <b>pcre2_dfa_match()</b>; they are
relevant only for <b>pcre2_match()</b> without the JIT optimization.
The comments in the next three sections do not apply to
<b>pcre2_dfa_match()</b>; they are relevant only for <b>pcre2_match()</b> without
the JIT optimization.
</P>
<br><b>
Reducing <b>pcre2_match()</b>'s stack usage
@ -115,7 +116,7 @@ entitled
in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation. Since the block sizes are always the same, it may be possible to
implement customized a memory handler that is more efficient than the standard
implement a customized memory handler that is more efficient than the standard
function. The memory blocks obtained for this purpose are retained and re-used
if possible while <b>pcre2_match()</b> is running. They are all freed just
before it exits.
@ -151,6 +152,15 @@ pattern to match. This is done by calling <b>pcre2_match()</b> repeatedly with
different limits.
</P>
<br><b>
Limiting <b>pcre2_dfa_match()</b>'s stack usage
</b><br>
<P>
The recursion limit, as described above for <b>pcre2_match()</b>, also applies
to <b>pcre2_dfa_match()</b>, whose use of recursive function calls for
recursions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
</P>
<br><b>
Changing stack size in Unix-like systems
</b><br>
<P>
@ -198,9 +208,9 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 21 November 2014
Last updated: 23 December 2016
<br>
Copyright &copy; 1997-2014 University of Cambridge.
Copyright &copy; 1997-2016 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -451,9 +451,10 @@ appear.
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
</pre>
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
limits set by the caller of pcre2_match(), not increase them. The application
can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
PCRE2_NEVER_UCP options, respectively, at compile time.
limits set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>, not
increase them. The application can lock out the use of (*UTF) and (*UCP) by
setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
compile time.
</P>
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
<P>
@ -494,7 +495,7 @@ Each top-level branch of a look behind must be of a fixed length.
\g{n} reference by number
\g+n relative reference by number (PCRE2 extension)
\g-n relative reference by number
\g{+n} relative reference by number (PCRE2 extension)
\g{+n} relative reference by number (PCRE2 extension)
\g{-n} relative reference by number
\k&#60;name&#62; reference by name (Perl)
\k'name' reference by name (Perl)
@ -541,7 +542,7 @@ Each top-level branch of a look behind must be of a fixed length.
(?(VERSION[&#62;]=n.m) test PCRE2 version
(?(assert) assertion condition
</pre>
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
conditions or recursion tests. Such a condition is interpreted as a reference
condition if the relevant named group exists.
</P>
@ -595,7 +596,7 @@ Cambridge, England.
</P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P>
Last updated: 28 September 2016
Last updated: 23 December 2016
<br>
Copyright &copy; 1997-2016 University of Cambridge.
<br>

View File

@ -114,7 +114,7 @@ to the library. For subject lines, backslash escapes can be used. In addition,
when the <b>utf</b> modifier (see
<a href="#optionmodifiers">"Setting compilation options"</a>
below) is set, the pattern and any following subject lines are interpreted as
UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
</P>
<P>
For non-UTF testing of wide characters, the <b>utf8_input</b> modifier can be
@ -153,8 +153,13 @@ the 32-bit library has been built, this is the default. If the 32-bit library
has not been built, this option causes an error.
</P>
<P>
<b>-ac</b>
Behave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
automatic callouts into every pattern that is compiled.
</P>
<P>
<b>-b</b>
Behave as if each pattern has the <b>/fullbincode</b> modifier; the full
Behave as if each pattern has the <b>fullbincode</b> modifier; the full
internal binary form of the pattern is output after compilation.
</P>
<P>
@ -220,7 +225,7 @@ Output a brief summary these options and then exit.
</P>
<P>
<b>-i</b>
Behave as if each pattern has the <b>/info</b> modifier; information about the
Behave as if each pattern has the <b>info</b> modifier; information about the
compiled pattern is given after compilation.
</P>
<P>
@ -582,7 +587,7 @@ for a description of their effects.
As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
non-printing characters in output strings to be printed using the \x{hh...}
notation. Otherwise, those less than 0x100 are output in hex without the curly
brackets. Setting <b>utf</b> in 16-bit or 32-bit mode also causes pattern and
brackets. Setting <b>utf</b> in 16-bit or 32-bit mode also causes pattern and
subject strings to be translated to UTF-16 or UTF-32, respectively, before
being passed to library functions.
<a name="controlmodifiers"></a></P>
@ -615,8 +620,8 @@ about the pattern:
pushcopy push a copy onto the stack
stackguard=&#60;number&#62; test the stackguard feature
tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
</pre>
The effects of these modifiers are described in the following sections.
</P>
@ -705,7 +710,7 @@ Specifying the pattern's length
By default, patterns are passed to the compiling functions as zero-terminated
strings. When using the POSIX wrapper API, there is no other option. However,
when using PCRE2's native API, patterns can be passed by length instead of
being zero-terminated. The <b>use_length</b> modifier causes this to happen.
being zero-terminated. The <b>use_length</b> modifier causes this to happen.
Using a length happens automatically (whether or not <b>use_length</b> is set)
when <b>hex</b> is set, because patterns specified in hexadecimal may contain
binary zeros.
@ -733,17 +738,17 @@ the delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are
mutually exclusive.
</P>
<P>
The POSIX API cannot be used with patterns specified in hexadecimal because
they may contain binary zeros, which conflicts with <b>regcomp()</b>'s
requirement for a zero-terminated string. Such patterns are always passed to
The POSIX API cannot be used with patterns specified in hexadecimal because
they may contain binary zeros, which conflicts with <b>regcomp()</b>'s
requirement for a zero-terminated string. Such patterns are always passed to
<b>pcre2_compile()</b> as a string with a length, not as zero-terminated.
</P>
<br><b>
Specifying wide characters in 16-bit and 32-bit modes
</b><br>
<P>
In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
translated to UTF-16 or UTF-32 when the <b>utf</b> modifier is set. For testing
In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
translated to UTF-16 or UTF-32 when the <b>utf</b> modifier is set. For testing
the 16-bit and 32-bit libraries in non-UTF mode, the <b>utf8_input</b> modifier
can be used. It is mutually exclusive with <b>utf</b>. Input lines are
interpreted as UTF-8 as a means of specifying wide characters. More details are
@ -806,7 +811,7 @@ modes are to be compiled:
2 compile JIT code for soft partial matching
4 compile JIT code for hard partial matching
</pre>
The possible values for the <b>/jit</b> modifier are therefore:
The possible values for the <b>jit</b> modifier are therefore:
<pre>
0 disable JIT
1 normal matching only
@ -852,14 +857,14 @@ code was actually used in the match.
Setting a locale
</b><br>
<P>
The <b>/locale</b> modifier must specify the name of a locale, for example:
The <b>locale</b> modifier must specify the name of a locale, for example:
<pre>
/pattern/locale=fr_FR
</pre>
The given locale is set, <b>pcre2_maketables()</b> is called to build a set of
character tables for the locale, and this is then passed to
<b>pcre2_compile()</b> when compiling the regular expression. The same tables
are used when matching the following subject lines. The <b>/locale</b> modifier
are used when matching the following subject lines. The <b>locale</b> modifier
applies only to the pattern on which it appears, but can be given in a
<b>#pattern</b> command if a default is needed. Setting a locale and alternate
character tables are mutually exclusive.
@ -868,7 +873,7 @@ character tables are mutually exclusive.
Showing pattern memory
</b><br>
<P>
The <b>/memory</b> modifier causes the size in bytes of the memory used to hold
The <b>memory</b> modifier causes the size in bytes of the memory used to hold
the compiled pattern to be output. This does not include the size of the
<b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is
subsequently passed to the JIT compiler, the size of the JIT compiled code is
@ -937,7 +942,7 @@ an error.
Testing the stack guard feature
</b><br>
<P>
The <b>/stackguard</b> modifier is used to test the use of
The <b>stackguard</b> modifier is used to test the use of
<b>pcre2_set_compile_recursion_guard()</b>, a function that is provided to
enable stack availability to be checked during compilation (see the
<a href="pcre2api.html"><b>pcre2api</b></a>
@ -952,7 +957,7 @@ be aborted.
Using alternative character tables
</b><br>
<P>
The value specified for the <b>/tables</b> modifier must be one of the digits 0,
The value specified for the <b>tables</b> modifier must be one of the digits 0,
1, or 2. It causes a specific set of built-in character tables to be passed to
<b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour with
different character tables. The digit specifies the tables as follows:
@ -1042,7 +1047,7 @@ The partial matching modifiers are provided with abbreviations because they
appear frequently in tests.
</P>
<P>
If the <b>/posix</b> modifier was present on the pattern, causing the POSIX
If the <b>posix</b> modifier was present on the pattern, causing the POSIX
wrapper API to be used, the only option-setting modifiers that have any effect
are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
@ -1064,6 +1069,7 @@ pattern.
altglobal alternative global matching
callout_capture show captures at callout time
callout_data=&#60;n&#62; set a value to pass via callouts
callout_error=&#60;n&#62;[:&#60;m&#62;] control callout error
callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
callout_none do not supply a callout function
copy=&#60;number or name&#62; copy captured substring
@ -1159,15 +1165,22 @@ Testing callouts
<P>
A callout function is supplied when <b>pcre2test</b> calls the library matching
functions, unless <b>callout_none</b> is specified. If <b>callout_capture</b> is
set, the current captured groups are output when a callout occurs.
set, the current captured groups are output when a callout occurs. The default
return from the callout function is zero, which allows matching to continue.
</P>
<P>
The <b>callout_fail</b> modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 when a callout of that number is
reached. If two numbers are given, 1 is returned when callout &#60;n&#62; is reached
for the &#60;m&#62;th time. Note that callouts with string arguments are always given
the number zero. See "Callouts" below for a description of the output when a
callout it taken.
only one number, 1 is returned instead of 0 (causing matching to backtrack)
when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;) are given, 1
is returned when callout &#60;n&#62; is reached and there have been at least &#60;m&#62;
callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence.
</P>
<P>
Note that callouts with string arguments are always given the number zero. See
"Callouts" below for a description of the output when a callout it taken.
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
@ -1180,7 +1193,7 @@ Finding all matches in a string
</b><br>
<P>
Searching for all possible matches within a subject can be requested by the
<b>global</b> or <b>/altglobal</b> modifier. After finding a match, the matching
<b>global</b> or <b>altglobal</b> modifier. After finding a match, the matching
function is called again to search the remainder of the subject. The difference
between <b>global</b> and <b>altglobal</b> is that the former uses the
<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
@ -1480,7 +1493,7 @@ unset substring is shown as "&#60;unset&#62;", as for the second data line.
If the strings contain any non-printing characters, they are output as \xhh
escapes if the value is less than 256 and UTF mode is not set. Otherwise they
are output as \x{hh...} escapes. See below for the definition of non-printing
characters. If the <b>/aftertext</b> modifier is set, the output for substring
characters. If the <b>aftertext</b> modifier is set, the output for substring
0 is followed by the the rest of the subject string, identified by "0+" like
this:
<pre>
@ -1673,7 +1686,7 @@ therefore shown as hex escapes.
<P>
When <b>pcre2test</b> is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been set for
the pattern (using the <b>/locale</b> modifier). In this case, the
the pattern (using the <b>locale</b> modifier). In this case, the
<b>isprint()</b> function is used to distinguish printing and non-printing
characters.
<a name="saverestore"></a></P>
@ -1766,7 +1779,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 04 November 2016
Last updated: 28 December 2016
<br>
Copyright &copy; 1997-2016 University of Cambridge.
<br>

File diff suppressed because it is too large Load Diff

View File

@ -15,7 +15,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.sp
This function makes a copy of the memory used for a compiled pattern, excluding
any memory used by the JIT compiler. Without a subsequent call to
\fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching.
\fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching.
Unlike \fBpcre2_code_copy()\fP, a separate copy of the character tables is also
made, with the new code pointing to it. This memory will be automatically freed
when \fBpcre2_code_free()\fP is called. The yield of the function is NULL if

View File

@ -511,7 +511,7 @@ If JIT is being used, but the JIT compilation is not being done immediately,
(perhaps waiting to see if the pattern is used often enough) similar logic is
required. JIT compilation updates a pointer within the compiled code block, so
a thread must gain unique write access to the pointer before calling
\fBpcre2_jit_compile()\fP. Alternatively, \fBpcre2_code_copy()\fP or
\fBpcre2_jit_compile()\fP. Alternatively, \fBpcre2_code_copy()\fP or
\fBpcre2_code_copy_with_tables()\fP can be used to obtain a private copy of the
compiled code.
.
@ -696,7 +696,7 @@ functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
.sp
This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
depth of parenthesis nesting in a pattern. This limit stops rogue patterns
using up too much system stack when being compiled. The limit applies to
using up too much system stack when being compiled. The limit applies to
parentheses of all kinds, not just capturing parentheses.
.sp
.nf
@ -840,9 +840,9 @@ This limit is of use only if it is set smaller than \fImatch_limit\fP.
Limiting the recursion depth limits the amount of system stack that can be
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
and is ignored, when matching is done using JIT compiled code. However, it is
and is ignored, when matching is done using JIT compiled code. However, it is
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
.P
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
@ -1056,11 +1056,11 @@ below),
.\"
the JIT information cannot be copied (because it is position-dependent).
The new copy can initially be used only for non-JIT matching, though it can be
passed to \fBpcre2_jit_compile()\fP if required.
passed to \fBpcre2_jit_compile()\fP if required.
.P
The \fBpcre2_code_copy()\fP function provides a way for individual threads in a
multithreaded application to acquire a private copy of shared compiled code.
However, it does not make a copy of the character tables used by the compiled
multithreaded application to acquire a private copy of shared compiled code.
However, it does not make a copy of the character tables used by the compiled
pattern; the new pattern code points to the same tables as the original code.
(See
.\" HTML <a href="#jitcompiling">
@ -1068,10 +1068,10 @@ pattern; the new pattern code points to the same tables as the original code.
"Locale Support"
.\"
below for details of these character tables.) In many applications the same
tables are used throughout, so this behaviour is appropriate. Nevertheless,
there are occasions when a copy of a compiled pattern and the relevant tables
are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility.
Copies of both the code and the tables are made, with the new code pointing to
tables are used throughout, so this behaviour is appropriate. Nevertheless,
there are occasions when a copy of a compiled pattern and the relevant tables
are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility.
Copies of both the code and the tables are made, with the new code pointing to
the new tables. The memory for the new tables is automatically freed when
\fBpcre2_code_free()\fP is called for the new copy of the compiled code.
.P
@ -1114,7 +1114,7 @@ NULL immediately. Otherwise, the variables to which these point are set to an
error code and an offset (number of code units) within the pattern,
respectively, when \fBpcre2_compile()\fP returns NULL because a compilation
error has occurred. The values are not defined when compilation is successful
and \fBpcre2_compile()\fP returns a non-NULL value.
and \fBpcre2_compile()\fP returns a non-NULL value.
.P
The value returned in \fIerroroffset\fP is an indication of where in the
pattern the error occurred. It is not necessarily the furthest point in the

View File

@ -388,15 +388,15 @@ scanning, in order to be able to output "before" and "after" lines when it
finds a match. The starting size of the buffer is controlled by a parameter
whose default value is 20K. The buffer itself is three times this size, but
because of the way it is used for holding "before" lines, the longest line that
is guaranteed to be processable is the parameter size. If a longer line is
is guaranteed to be processable is the parameter size. If a longer line is
encountered, \fBpcre2grep\fP automatically expands the buffer, up to a
specified maximum size, whose default is 1M or the starting size, whichever is
the larger. You can change the default parameter values by adding, for example,
.sp
--with-pcre2grep-bufsize=51200
--with-pcre2grep-max-bufsize=2097152
--with-pcre2grep-max-bufsize=2097152
.sp
to the \fBconfigure\fP command. The caller of \fPpcre2grep\fP can override
to the \fBconfigure\fP command. The caller of \fPpcre2grep\fP can override
these values by using --buffer-size and --max-buffer-size on the command line.
.
.
@ -522,7 +522,7 @@ There is a special option for use by people who want to run fuzzing tests on
PCRE2:
.sp
--enable-fuzz-support
.sp
.sp
At present this applies only to the 8-bit library. If set, it causes an extra
library called libpcre2-fuzzsupport.a to be built, but not installed. This
contains a single function called LLVMFuzzerTestOneInput() whose arguments are

View File

@ -47,7 +47,7 @@ For example, if PCRE2_AUTO_CALLOUT is used with the pattern
.sp
it is processed as if it were
.sp
(?C255)A(?C3)B(?C255)
(?C255)A(?C3)B(?C255)
.sp
Here is a more complicated example:
.sp

View File

@ -118,8 +118,8 @@ an error is given at compile time.
.P
14. Perl used to recognize comments in some places that PCRE2 does not, for
example, between the ( and ? at the start of a subpattern. If the /x modifier
is set, Perl allowed white space between ( and ? though the latest Perls give
an error (for a while it was just deprecated). There may still be some cases
is set, Perl allowed white space between ( and ? though the latest Perls give
an error (for a while it was just deprecated). There may still be some cases
where Perl behaves differently.
.P
15. Perl, when in warning mode, gives warnings for character classes such as
@ -140,7 +140,7 @@ list is with respect to Perl 5.10:
.sp
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
each alternative branch of a lookbehind assertion can match a different length
of string. Perl requires them all to have the same length.
of string. Perl requires them all to have the same length.
.sp
(b) From PCRE2 10.23, back references to groups of fixed length are supported
in lookbehinds, provided that there is no possibility of referencing a

View File

@ -62,7 +62,7 @@ built, with the default defaults being 20K and 1M respectively. An error occurs
if a line is too long and the buffer can no longer be expanded.
.P
The block of memory that is actually used is three times the "buffer size", to
allow for buffering "before" and "after" lines. If the buffer size is too
allow for buffering "before" and "after" lines. If the buffer size is too
small, fewer than requested "before" and "after" lines may be output.
.P
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
@ -146,8 +146,8 @@ Treat binary files as text. This is equivalent to
\fB--binary-files\fP=\fItext\fP.
.TP
\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
Output up to \fInumber\fP lines of context before each matching line. Fewer
lines are output if the previous match or the start of the file is within
Output up to \fInumber\fP lines of context before each matching line. Fewer
lines are output if the previous match or the start of the file is within
\fInumber\fP lines, or if the processing buffer size has been set too small. If
file names and/or line numbers are being output, a hyphen separator is used
instead of a colon for the context lines. A line containing "--" is output
@ -168,8 +168,8 @@ be of interest and are skipped without causing any output or affecting the
return code.
.TP
\fB--buffer-size=\fP\fInumber\fP
Set the parameter that controls how much memory is obtained at the start of
processing for buffering files that are being scanned. See also
Set the parameter that controls how much memory is obtained at the start of
processing for buffering files that are being scanned. See also
\fB--max-buffer-size\fP below.
.TP
\fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP
@ -185,7 +185,7 @@ exactly the same as the number of lines that would have been output, but if the
suppressed lines than the count (that is, the number of matches).
.sp
If no lines are selected, the number zero is output. If several files are are
being scanned, a count is output for each of them and the \fB-t\fP option can
being scanned, a count is output for each of them and the \fB-t\fP option can
be used to cause a total to be output at the end. However, if the
\fB--files-with-matches\fP option is also used, only those files whose counts
are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP,
@ -209,14 +209,14 @@ The colour that is used can be specified by setting one of the environment
variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, PCREGREP_COLOUR, or
PCREGREP_COLOR, which are checked in that order. If none of these are set,
\fBpcre2grep\fP looks for GREP_COLORS or GREP_COLOR (in that order). The value
of the variable should be a string of two numbers, separated by a semicolon,
of the variable should be a string of two numbers, separated by a semicolon,
except in the case of GREP_COLORS, which must start with "ms=" or "mt="
followed by two semicolon-separated colours, terminated by the end of the
string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is
string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is
ignored, and GREP_COLOR is checked.
.sp
If the string obtained from one of the above variables contains any characters
other than semicolon or digits, the setting is ignored and the default colour
other than semicolon or digits, the setting is ignored and the default colour
is used. The string is copied directly into the control string for setting
colour on a terminal, so it is your responsibility to ensure that the values
make sense. If no relevant environment variable is set, the default is "1;31",
@ -311,7 +311,7 @@ specified on the command line using \fB-e\fP may also be present; they are
tested before the file's patterns. However, no other pattern is taken from the
command line; all arguments are treated as the names of paths to be searched.
.TP
\fB--file-list\fP=\fIfilename\fP
\fB--file-list\fP=\fIfilename\fP
Read a list of files and/or directories that are to be scanned from the given
file, one per line. Trailing white space is removed from each line, and blank
lines are ignored. These paths are processed before any that are listed on the
@ -456,8 +456,8 @@ There are no short forms for these options. The default settings are specified
when the PCRE2 library is compiled, with the default default being 10 million.
.TP
\fB--max-buffer-size=\fInumber\fP
This limits the expansion of the processing buffer, whose initial size can be
set by \fB--buffer-size\fP. The maximum buffer size is silently forced to be no
This limits the expansion of the processing buffer, whose initial size can be
set by \fB--buffer-size\fP. The maximum buffer size is silently forced to be no
smaller than the starting buffer size.
.TP
\fB-M\fP, \fB--multiline\fP
@ -547,7 +547,7 @@ for the non-argument case above also apply to this case. If the specified
capturing parentheses do not exist in the pattern, or were not set in the
match, nothing is output unless the file name or line number are being output.
.sp
If this option is given multiple times, multiple substrings are output for each
If this option is given multiple times, multiple substrings are output for each
match, in the order the options are given, and all on one line. For example,
-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
then 3 again to be output. By default, there is no separator (but see the next
@ -610,7 +610,7 @@ specified by any of the \fB--include\fP or \fB--exclude\fP options.
.TP
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. In multiline mode
a line) and in addition, require them to match entire lines. In multiline mode
the match may be more than one line. This is equivalent to having \eA and \eZ
characters at the start and end of each alternative top-level branch in every
pattern. This option applies only to the patterns that are matched against the

View File

@ -219,15 +219,25 @@ OPTIONS
possible matches in a line, not just one, in order to colour
them all.
The colour that is used can be specified by setting the envi-
ronment variable PCRE2GREP_COLOUR or PCRE2GREP_COLOR. If nei-
ther of these are set, pcre2grep looks for GREP_COLOUR or
GREP_COLOR. The value of the variable should be a string of
two numbers, separated by a semicolon. They are copied
directly into the control string for setting colour on a ter-
minal, so it is your responsibility to ensure that they make
sense. If neither of the environment variables is set, the
default is "1;31", which gives red.
The colour that is used can be specified by setting one of
the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR,
PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
order. If none of these are set, pcre2grep looks for
GREP_COLORS or GREP_COLOR (in that order). The value of the
variable should be a string of two numbers, separated by a
semicolon, except in the case of GREP_COLORS, which must
start with "ms=" or "mt=" followed by two semicolon-separated
colours, terminated by the end of the string or by a colon.
If GREP_COLORS does not start with "ms=" or "mt=" it is
ignored, and GREP_COLOR is checked.
If the string obtained from one of the above variables con-
tains any characters other than semicolon or digits, the set-
ting is ignored and the default colour is used. The string is
copied directly into the control string for setting colour on
a terminal, so it is your responsibility to ensure that the
values make sense. If no relevant environment variable is
set, the default is "1;31", which gives red.
-D action, --devices=action
If an input path is not a regular file or a directory,
@ -763,12 +773,12 @@ OPTIONS WITH DATA
CALLING EXTERNAL SCRIPTS
On non-Windows systems, pcre2grep has, by default, support for calling
external programs or scripts during matching by making use of PCRE2's
callout facility. However, this support can be disabled when pcre2grep
is built. You can find out whether your binary has support for call-
outs by running it with the --help option. If the support is not
enabled, all callouts in patterns are ignored by pcre2grep.
pcre2grep has, by default, support for calling external programs or
scripts during matching by making use of PCRE2's callout facility. How-
ever, this support can be disabled when pcre2grep is built. You can
find out whether your binary has support for callouts by running it
with the --help option. If the support is not enabled, all callouts in
patterns are ignored by pcre2grep.
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
ment is either a number or a quoted string (see the pcre2callout docu-
@ -852,5 +862,5 @@ AUTHOR
REVISION
Last updated: 31 October 2016
Last updated: 31 December 2016
Copyright (c) 1997-2016 University of Cambridge.

View File

@ -47,8 +47,8 @@ There is no limit to the number of parenthesized subpatterns, but there can be
no more than 65535 capturing subpatterns. There is, however, a limit to the
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
order to limit the amount of system stack used at compile time. The default
limit can be specified when PCRE2 is built; the default default is 250. An
application can change this limit by calling pcre2_set_parens_nest_limit() to
limit can be specified when PCRE2 is built; the default default is 250. An
application can change this limit by calling pcre2_set_parens_nest_limit() to
set the limit in a compile context.
.P
The maximum length of name for a named subpattern is 32 code units, and the
@ -58,7 +58,7 @@ The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
32-bit libraries.
.P
The maximum length of a string argument to a callout is the largest number a
The maximum length of a string argument to a callout is the largest number a
32-bit unsigned integer can hold.
.
.

View File

@ -159,9 +159,9 @@ for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used.
.P
The match limit is used (but in a different way) when JIT is being used, but it
is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
However, the recursion limit is relevant for DFA matching, which does use some
The match limit is used (but in a different way) when JIT is being used, but it
is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
However, the recursion limit is relevant for DFA matching, which does use some
function recursion, in particular, for recursions within the pattern.
.
.
@ -2096,7 +2096,7 @@ no such problem when named parentheses are used. A back reference to any
subpattern is possible using named parentheses (see below).
.P
Another way of avoiding the ambiguity inherent in the use of digits following a
backslash is to use the \eg escape sequence. This escape must be followed by a
backslash is to use the \eg escape sequence. This escape must be followed by a
signed or unsigned number, optionally enclosed in braces. These examples are
all identical:
.sp
@ -2116,8 +2116,8 @@ Similarly, \eg{-2} would be equivalent to \e1. The use of relative references
can be helpful in long patterns, and also in patterns that are created by
joining together fragments that contain references within themselves.
.P
The sequence \eg{+1} is a reference to the next capturing subpattern. This kind
of forward reference can be useful it patterns that repeat. Perl does not
The sequence \eg{+1} is a reference to the next capturing subpattern. This kind
of forward reference can be useful it patterns that repeat. Perl does not
support the use of + in this way.
.P
A back reference matches whatever actually matched the capturing subpattern in
@ -2227,11 +2227,11 @@ numbering the capturing subpatterns in the whole pattern. However, substring
capturing is carried out only for positive assertions. (Perl sometimes, but not
always, does do capturing in negative assertions.)
.P
WARNING: If a positive assertion containing one or more capturing subpatterns
succeeds, but failure to match later in the pattern causes backtracking over
this assertion, the captures within the assertion are reset only if no higher
numbered captures are already set. This is, unfortunately, a fundamental
limitation of the current implementation; it may get removed in a future
WARNING: If a positive assertion containing one or more capturing subpatterns
succeeds, but failure to match later in the pattern causes backtracking over
this assertion, the captures within the assertion are reset only if no higher
numbered captures are already set. This is, unfortunately, a fundamental
limitation of the current implementation; it may get removed in a future
reworking.
.P
For compatibility with Perl, most assertion subpatterns may be repeated; though
@ -2497,7 +2497,7 @@ This makes the fragment independent of the parentheses in the larger pattern.
.sp
Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used
subpattern by name. For compatibility with earlier versions of PCRE1, which had
this facility before Perl, the syntax (?(name)...) is also recognized. Note,
this facility before Perl, the syntax (?(name)...) is also recognized. Note,
however, that undelimited names consisting of the letter R followed by digits
are ambiguous (see the following section).
.P
@ -2531,11 +2531,11 @@ If a condition is the string (R), and there is no subpattern with the name R,
the condition is true if matching is currently in a recursion or subroutine
call to the whole pattern or any subpattern. If digits follow the letter R, and
there is no subpattern with that name, the condition is true if the most recent
call is into a subpattern with the given number, which must exist somewhere in
call is into a subpattern with the given number, which must exist somewhere in
the overall pattern. This is a contrived example that is equivalent to a+b:
.sp
((?(R1)a+|(?1)b))
.sp
.sp
However, in both cases, if there is a subpattern with a matching name, the
condition tests for its being set, as described in the section above, instead
of testing for recursion. For example, creating a group with the name R1 by
@ -2545,10 +2545,10 @@ If a name preceded by ampersand follows the letter R, for example:
.sp
(?(R&name)...)
.sp
the condition is true if the most recent recursion is into a subpattern of that
the condition is true if the most recent recursion is into a subpattern of that
name (which must exist within the pattern).
.P
This condition does not check the entire recursion stack. It tests only the
This condition does not check the entire recursion stack. It tests only the
current level. If the name used in a condition of this kind is a duplicate, the
test is applied to all subpatterns of the same name, and is true if any one of
them is the most recent recursion.
@ -3039,19 +3039,19 @@ depending on whether or not a name is present.
By default, for compatibility with Perl, a name is any sequence of characters
that does not include a closing parenthesis. The name is not processed in
any way, and it is not possible to include a closing parenthesis in the name.
This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
is no longer Perl-compatible.
This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
is no longer Perl-compatible.
.P
When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
and only an unescaped closing parenthesis terminates the name. However, the
only backslash items that are permitted are \eQ, \eE, and sequences such as
\ex{100} that define character code points. Character type escapes such as \ed
and only an unescaped closing parenthesis terminates the name. However, the
only backslash items that are permitted are \eQ, \eE, and sequences such as
\ex{100} that define character code points. Character type escapes such as \ed
are faulted.
.P
A closing parenthesis can be included in a name either as \e) or between \eQ
and \eE. In addition to backslash processing, if the PCRE2_EXTENDED option is
also set, unescaped whitespace in verb names is skipped, and #-comments are
recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not
recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not
affect verb names unless PCRE2_ALT_VERBNAMES is also set.
.P
The maximum length of a name is 255 in the 8-bit library and 65535 in the

View File

@ -43,8 +43,8 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
Normally, these are never very deep, and the limit on the complexity of
\fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
However, it is possible to write patterns with runaway infinite recursions;
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
limit is applied (see below).
such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
limit is applied (see below).
.P
The comments in the next three sections do not apply to
\fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without
@ -151,9 +151,9 @@ different limits.
.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage"
.rs
.sp
The recursion limit, as described above for \fBpcre2_match()\fP, also applies
The recursion limit, as described above for \fBpcre2_match()\fP, also applies
to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for
recursions in the pattern can lead to runaway stack usage. The non-recursive
recursions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
.
.

View File

@ -476,7 +476,7 @@ Each top-level branch of a look behind must be of a fixed length.
\eg{n} reference by number
\eg+n relative reference by number (PCRE2 extension)
\eg-n relative reference by number
\eg{+n} relative reference by number (PCRE2 extension)
\eg{+n} relative reference by number (PCRE2 extension)
\eg{-n} relative reference by number
\ek<name> reference by name (Perl)
\ek'name' reference by name (Perl)
@ -523,7 +523,7 @@ Each top-level branch of a look behind must be of a fixed length.
(?(VERSION[>]=n.m) test PCRE2 version
(?(assert) assertion condition
.sp
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
conditions or recursion tests. Such a condition is interpreted as a reference
condition if the relevant named group exists.
.

View File

@ -88,7 +88,7 @@ when the \fButf\fP modifier (see
"Setting compilation options"
.\"
below) is set, the pattern and any following subject lines are interpreted as
UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
.P
For non-UTF testing of wide characters, the \fButf8_input\fP modifier can be
used. This is mutually exclusive with \fButf\fP, and is allowed only in 16-bit
@ -125,7 +125,7 @@ the 32-bit library has been built, this is the default. If the 32-bit library
has not been built, this option causes an error.
.TP 10
\fB-ac\fP
Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
automatic callouts into every pattern that is compiled.
.TP 10
\fB-b\fP
@ -550,7 +550,7 @@ for a description of their effects.
As well as turning on the PCRE2_UTF option, the \fButf\fP modifier causes all
non-printing characters in output strings to be printed using the \ex{hh...}
notation. Otherwise, those less than 0x100 are output in hex without the curly
brackets. Setting \fButf\fP in 16-bit or 32-bit mode also causes pattern and
brackets. Setting \fButf\fP in 16-bit or 32-bit mode also causes pattern and
subject strings to be translated to UTF-16 or UTF-32, respectively, before
being passed to library functions.
.
@ -584,8 +584,8 @@ about the pattern:
pushcopy push a copy onto the stack
stackguard=<number> test the stackguard feature
tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
.sp
The effects of these modifiers are described in the following sections.
.
@ -669,7 +669,7 @@ default values).
By default, patterns are passed to the compiling functions as zero-terminated
strings. When using the POSIX wrapper API, there is no other option. However,
when using PCRE2's native API, patterns can be passed by length instead of
being zero-terminated. The \fBuse_length\fP modifier causes this to happen.
being zero-terminated. The \fBuse_length\fP modifier causes this to happen.
Using a length happens automatically (whether or not \fBuse_length\fP is set)
when \fBhex\fP is set, because patterns specified in hexadecimal may contain
binary zeros.
@ -696,17 +696,17 @@ Either single or double quotes may be used. There is no way of including
the delimiter within a substring. The \fBhex\fP and \fBexpand\fP modifiers are
mutually exclusive.
.P
The POSIX API cannot be used with patterns specified in hexadecimal because
they may contain binary zeros, which conflicts with \fBregcomp()\fP's
requirement for a zero-terminated string. Such patterns are always passed to
The POSIX API cannot be used with patterns specified in hexadecimal because
they may contain binary zeros, which conflicts with \fBregcomp()\fP's
requirement for a zero-terminated string. Such patterns are always passed to
\fBpcre2_compile()\fP as a string with a length, not as zero-terminated.
.
.
.SS "Specifying wide characters in 16-bit and 32-bit modes"
.rs
.sp
In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
translated to UTF-16 or UTF-32 when the \fButf\fP modifier is set. For testing
In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
translated to UTF-16 or UTF-32 when the \fButf\fP modifier is set. For testing
the 16-bit and 32-bit libraries in non-UTF mode, the \fButf8_input\fP modifier
can be used. It is mutually exclusive with \fButf\fP. Input lines are
interpreted as UTF-8 as a means of specifying wide characters. More details are
@ -1134,15 +1134,15 @@ does no capturing); it is ignored, with a warning message, if present.
.sp
A callout function is supplied when \fBpcre2test\fP calls the library matching
functions, unless \fBcallout_none\fP is specified. If \fBcallout_capture\fP is
set, the current captured groups are output when a callout occurs. The default
set, the current captured groups are output when a callout occurs. The default
return from the callout function is zero, which allows matching to continue.
.P
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 (causing matching to backtrack)
when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1
is returned when callout <n> is reached and there have been at least <m>
callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence.
.P

View File

@ -116,7 +116,11 @@ COMMAND LINE OPTIONS
is the default. If the 32-bit library has not been built,
this option causes an error.
-b Behave as if each pattern has the /fullbincode modifier; the
-ac Behave as if each pattern has the auto_callout modifier, that
is, insert automatic callouts into every pattern that is com-
piled.
-b Behave as if each pattern has the fullbincode modifier; the
full internal binary form of the pattern is output after com-
pilation.
@ -174,7 +178,7 @@ COMMAND LINE OPTIONS
-help Output a brief summary these options and then exit.
-i Behave as if each pattern has the /info modifier; information
-i Behave as if each pattern has the info modifier; information
about the compiled pattern is given after compilation.
-jit Behave as if each pattern line has the jit modifier; after
@ -724,7 +728,7 @@ PATTERN MODIFIERS
2 compile JIT code for soft partial matching
4 compile JIT code for hard partial matching
The possible values for the /jit modifier are therefore:
The possible values for the jit modifier are therefore:
0 disable JIT
1 normal matching only
@ -764,24 +768,24 @@ PATTERN MODIFIERS
Setting a locale
The /locale modifier must specify the name of a locale, for example:
The locale modifier must specify the name of a locale, for example:
/pattern/locale=fr_FR
The given locale is set, pcre2_maketables() is called to build a set of
character tables for the locale, and this is then passed to pcre2_com-
pile() when compiling the regular expression. The same tables are used
when matching the following subject lines. The /locale modifier applies
when matching the following subject lines. The locale modifier applies
only to the pattern on which it appears, but can be given in a #pattern
command if a default is needed. Setting a locale and alternate charac-
ter tables are mutually exclusive.
Showing pattern memory
The /memory modifier causes the size in bytes of the memory used to
hold the compiled pattern to be output. This does not include the size
of the pcre2_code block; it is just the actual compiled data. If the
pattern is subsequently passed to the JIT compiler, the size of the JIT
The memory modifier causes the size in bytes of the memory used to hold
the compiled pattern to be output. This does not include the size of
the pcre2_code block; it is just the actual compiled data. If the pat-
tern is subsequently passed to the JIT compiler, the size of the JIT
compiled code is also output. Here is an example:
re> /a(b)c/jit,memory
@ -837,7 +841,7 @@ PATTERN MODIFIERS
Testing the stack guard feature
The /stackguard modifier is used to test the use of pcre2_set_com-
The stackguard modifier is used to test the use of pcre2_set_com-
pile_recursion_guard(), a function that is provided to enable stack
availability to be checked during compilation (see the pcre2api docu-
mentation for details). If the number specified by the modifier is
@ -849,7 +853,7 @@ PATTERN MODIFIERS
Using alternative character tables
The value specified for the /tables modifier must be one of the digits
The value specified for the tables modifier must be one of the digits
0, 1, or 2. It causes a specific set of built-in character tables to be
passed to pcre2_compile(). This is used in the PCRE2 tests to check be-
haviour with different character tables. The digit specifies the tables
@ -931,7 +935,7 @@ SUBJECT MODIFIERS
The partial matching modifiers are provided with abbreviations because
they appear frequently in tests.
If the /posix modifier was present on the pattern, causing the POSIX
If the posix modifier was present on the pattern, causing the POSIX
wrapper API to be used, the only option-setting modifiers that have any
effect are notbol, notempty, and noteol, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec().
@ -951,6 +955,7 @@ SUBJECT MODIFIERS
altglobal alternative global matching
callout_capture show captures at callout time
callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error
callout_fail=<n>[:<m>] control callout failure
callout_none do not supply a callout function
copy=<number or name> copy captured substring
@ -1042,14 +1047,22 @@ SUBJECT MODIFIERS
A callout function is supplied when pcre2test calls the library match-
ing functions, unless callout_none is specified. If callout_capture is
set, the current captured groups are output when a callout occurs.
set, the current captured groups are output when a callout occurs. The
default return from the callout function is zero, which allows matching
to continue.
The callout_fail modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 when a callout of that num-
ber is reached. If two numbers are given, 1 is returned when callout
<n> is reached for the <m>th time. Note that callouts with string argu-
ments are always given the number zero. See "Callouts" below for a
description of the output when a callout it taken.
only one number, 1 is returned instead of 0 (causing matching to back-
track) when a callout of that number is reached. If two numbers
(<n>:<m>) are given, 1 is returned when callout <n> is reached and
there have been at least <m> callouts. The callout_error modifier is
similar, except that PCRE2_ERROR_CALLOUT is returned, causing the
entire matching process to be aborted. If both these modifiers are set
for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero. See "Callouts" below for a description of the output when a call-
out it taken.
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
@ -1060,7 +1073,7 @@ SUBJECT MODIFIERS
Finding all matches in a string
Searching for all possible matches within a subject can be requested by
the global or /altglobal modifier. After finding a match, the matching
the global or altglobal modifier. After finding a match, the matching
function is called again to search the remainder of the subject. The
difference between global and altglobal is that the former uses the
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
@ -1329,7 +1342,7 @@ DEFAULT OUTPUT FROM pcre2test
If the strings contain any non-printing characters, they are output as
\xhh escapes if the value is less than 256 and UTF mode is not set.
Otherwise they are output as \x{hh...} escapes. See below for the defi-
nition of non-printing characters. If the /aftertext modifier is set,
nition of non-printing characters. If the aftertext modifier is set,
the output for substring 0 is followed by the the rest of the subject
string, identified by "0+" like this:
@ -1512,7 +1525,7 @@ NON-PRINTING CHARACTERS
When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been
set for the pattern (using the /locale modifier). In this case, the
set for the pattern (using the locale modifier). In this case, the
isprint() function is used to distinguish printing and non-printing
characters.
@ -1601,5 +1614,5 @@ AUTHOR
REVISION
Last updated: 04 November 2016
Last updated: 28 December 2016
Copyright (c) 1997-2016 University of Cambridge.

View File

@ -78,6 +78,9 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <memory.h> header file. */
/* #undef HAVE_MEMORY_H */
/* Define to 1 if you have the `mkostemp' function. */
/* #undef HAVE_MKOSTEMP */
/* Define if you have POSIX threads libraries and header files. */
/* #undef HAVE_PTHREAD */
@ -90,6 +93,9 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <readline/readline.h> header file. */
/* #undef HAVE_READLINE_READLINE_H */
/* Define to 1 if you have the `secure_getenv' function. */
/* #undef HAVE_SECURE_GETENV */
/* Define to 1 if you have the <stdint.h> header file. */
/* #undef HAVE_STDINT_H */
@ -308,9 +314,40 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to any value for valgrind support to find invalid memory reads. */
/* #undef SUPPORT_VALGRIND */
/* Enable extensions on AIX 3, Interix. */
#ifndef _ALL_SOURCE
# define _ALL_SOURCE 1
#endif
/* Enable GNU extensions on systems that have them. */
#ifndef _GNU_SOURCE
# define _GNU_SOURCE 1
#endif
/* Enable threading extensions on Solaris. */
#ifndef _POSIX_PTHREAD_SEMANTICS
# define _POSIX_PTHREAD_SEMANTICS 1
#endif
/* Enable extensions on HP NonStop. */
#ifndef _TANDEM_SOURCE
# define _TANDEM_SOURCE 1
#endif
/* Enable general extensions on Solaris. */
#ifndef __EXTENSIONS__
# define __EXTENSIONS__ 1
#endif
/* Version number of package */
#define VERSION "10.23-RC1"
/* Define to 1 if on MINIX. */
/* #undef _MINIX */
/* Define to 2 if the system does not provide POSIX.1 features except with
this defined. */
/* #undef _POSIX_1_SOURCE */
/* Define to 1 if you need to in order for `stat' and other things to work. */
/* #undef _POSIX_SOURCE */
/* Define to empty if `const' does not conform to ANSI C. */
/* #undef const */

View File

@ -44,7 +44,7 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10
#define PCRE2_MINOR 23
#define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2016-08-01
#define PCRE2_DATE 2017-01-16
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate

View File

@ -44,7 +44,7 @@ POSSIBILITY OF SUCH DAMAGE.
#define PCRE2_MAJOR 10
#define PCRE2_MINOR 23
#define PCRE2_PRERELEASE -RC1
#define PCRE2_DATE 2016-08-01
#define PCRE2_DATE 2017-01-16
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE2, the appropriate
@ -465,7 +465,9 @@ PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \
PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
pcre2_code_free(pcre2_code *); \
PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \
*pcre2_code_copy(const pcre2_code *);
*pcre2_code_copy(const pcre2_code *); \
PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \
*pcre2_code_copy_with_tables(const pcre2_code *);
/* Functions that give information about a compiled pattern. */
@ -629,6 +631,7 @@ pcre2_compile are called by application code. */
#define pcre2_callout_enumerate PCRE2_SUFFIX(pcre2_callout_enumerate_)
#define pcre2_code_copy PCRE2_SUFFIX(pcre2_code_copy_)
#define pcre2_code_copy_with_tables PCRE2_SUFFIX(pcre2_code_copy_with_tables_)
#define pcre2_code_free PCRE2_SUFFIX(pcre2_code_free_)
#define pcre2_compile PCRE2_SUFFIX(pcre2_compile_)
#define pcre2_compile_context_copy PCRE2_SUFFIX(pcre2_compile_context_copy_)

View File

@ -1047,8 +1047,8 @@ but some compilers complain about an unreachable statement. */
/* Replaces single character iterations with their possessive alternatives
if appropriate. This function modifies the compiled opcode! Hitting a
non-existent opcode may indicate a bug in PCRE2, but it can also be caused if a
bad UTF string was compiled with PCRE2_NO_UTF_CHECK. The rec_limit catches
overly complicated or large patterns. In these cases, the check just stops,
bad UTF string was compiled with PCRE2_NO_UTF_CHECK. The rec_limit catches
overly complicated or large patterns. In these cases, the check just stops,
leaving the remainder of the pattern unpossessified.
Arguments:

View File

@ -116,9 +116,11 @@ them will be able to (i.e. assume a 64-bit world). */
/* Function definitions to allow mutual recursion */
#ifdef SUPPORT_UNICODE
static unsigned int
add_list_to_class_internal(uint8_t *, PCRE2_UCHAR **, uint32_t, compile_block *,
const uint32_t *, unsigned int);
add_list_to_class_internal(uint8_t *, PCRE2_UCHAR **, uint32_t,
compile_block *, const uint32_t *, unsigned int);
#endif
static int
compile_regex(uint32_t, PCRE2_UCHAR **, uint32_t **, int *, uint32_t,
@ -4381,16 +4383,14 @@ return n8; /* Number of 8-bit characters */
#ifdef SUPPORT_UNICODE
/*************************************************
* Add a list of characters to a class (internal) *
*************************************************/
/* This function is used for adding a list of case-equivalent characters to a
class, and also for adding a list of horizontal or vertical whitespace. If the
list is in order (which it should be), ranges of characters are detected and
handled appropriately. This function is called (sometimes recursively) only
from within the "add to class" set of functions. The external entry point is
add_list_to_class().
class when in UTF mode. This function is called only from within
add_to_class_internal(), with which it is mutually recursive.
Arguments:
classbits the bit map for characters < 256
@ -4423,6 +4423,7 @@ while (p[0] < NOTACHAR)
}
return n8;
}
#endif
@ -4459,8 +4460,11 @@ return add_to_class_internal(classbits, uchardptr, options, cb, start, end);
* External entry point for add list to class *
*************************************************/
/* This function sets the overall range so that the internal functions can try
to avoid duplication when handling case-independence.
/* This function is used for adding a list of horizontal or vertical whitespace
characters to a class. The list must be in order so that ranges of characters
can be detected and handled appropriately. This function sets the overall range
so that the internal functions can try to avoid duplication when handling
case-independence.
Arguments:
classbits the bit map for characters < 256

View File

@ -3216,7 +3216,7 @@ if (mcontext == NULL)
{
mb->callout = NULL;
mb->memctl = re->memctl;
mb->match_limit_recursion = PRIV(default_match_context).recursion_limit;
mb->match_limit_recursion = PRIV(default_match_context).recursion_limit;
}
else
{
@ -3229,7 +3229,7 @@ else
mb->callout = mcontext->callout;
mb->callout_data = mcontext->callout_data;
mb->memctl = mcontext->memctl;
mb->match_limit_recursion = mcontext->recursion_limit;
mb->match_limit_recursion = mcontext->recursion_limit;
}
if (mb->match_limit_recursion > re->limit_recursion)
mb->match_limit_recursion = re->limit_recursion;

View File

@ -173,9 +173,9 @@ static const unsigned char compile_error_texts[] =
"regular expression is too complicated\0"
"lookbehind assertion is too long\0"
"pattern string is longer than the limit set by the application\0"
"internal error: unknown code in parsed pattern\0"
"internal error: unknown code in parsed pattern\0"
/* 90 */
"internal error: bad code value in parsed_skip()\0"
"internal error: bad code value in parsed_skip()\0"
;
/* Match-time and UTF error texts are in the same format. */

View File

@ -648,7 +648,7 @@ typedef struct pcre2_real_match_data {
#ifndef PCRE2_PCRE2TEST
/* Structures for checking for mutual recursion when scanning compiled or
/* Structures for checking for mutual recursion when scanning compiled or
parsed code. */
typedef struct recurse_check {
@ -702,7 +702,7 @@ typedef struct compile_block {
PCRE2_UCHAR *name_table; /* The name/number table */
PCRE2_SIZE workspace_size; /* Size of workspace */
PCRE2_SIZE small_ref_offset[10]; /* Offsets for \1 to \9 */
PCRE2_SIZE erroroffset; /* Offset of error in pattern */
PCRE2_SIZE erroroffset; /* Offset of error in pattern */
uint16_t names_found; /* Number of entries so far */
uint16_t name_entry_size; /* Size of each entry */
open_capitem *open_caps; /* Chain of open capture items */
@ -711,16 +711,16 @@ typedef struct compile_block {
uint32_t external_options; /* External (initial) options */
uint32_t external_flags; /* External flag bits to be set */
uint32_t bracount; /* Count of capturing parentheses */
uint32_t lastcapture; /* Last capture encountered */
uint32_t *parsed_pattern; /* Parsed pattern buffer */
uint32_t *parsed_pattern_end; /* Parsed pattern should not get here */
uint32_t lastcapture; /* Last capture encountered */
uint32_t *parsed_pattern; /* Parsed pattern buffer */
uint32_t *parsed_pattern_end; /* Parsed pattern should not get here */
uint32_t *groupinfo; /* Group info vector */
uint32_t top_backref; /* Maximum back reference */
uint32_t backref_map; /* Bitmap of low back refs */
uint32_t nltype; /* Newline type */
uint32_t nllen; /* Newline string length */
uint32_t class_range_start; /* Overall class range start */
uint32_t class_range_end; /* Overall class range end */
uint32_t class_range_end; /* Overall class range end */
PCRE2_UCHAR nl[4]; /* Newline string when fixed length */
int max_lookbehind; /* Maximum lookbehind (characters) */
int parens_depth; /* Depth of nested parentheses */

View File

@ -1937,6 +1937,8 @@ static int
fill_buffer(void *handle, int frtype, char *buffer, int length,
BOOL input_line_buffered)
{
(void)frtype; /* Avoid warning when not used */
#ifdef SUPPORT_LIBZ
if (frtype == FR_LIBZ)
return gzread((gzFile)handle, buffer, length);
@ -3015,8 +3017,8 @@ sprintf((char *)buffer, "%s%.*s%s", prefix[popts], patlen, ps, suffix[popts]);
p->compiled = pcre2_compile(buffer, PCRE2_ZERO_TERMINATED, options, &errcode,
&erroffset, compile_context);
/* Handle successful compile. Try JIT-compiling if supported and enabled. We
ignore any JIT compiler errors, relying falling back to interpreting if
/* Handle successful compile. Try JIT-compiling if supported and enabled. We
ignore any JIT compiler errors, relying falling back to interpreting if
anything goes wrong with JIT. */
if (p->compiled != NULL)
@ -3577,10 +3579,10 @@ if (colour_option != NULL && strcmp(colour_option, "never") != 0)
if (cs == NULL) cs = getenv("PCREGREP_COLOR");
if (cs == NULL) cs = parse_grep_colors(getenv("GREP_COLORS"));
if (cs == NULL) cs = getenv("GREP_COLOR");
if (cs != NULL)
if (cs != NULL)
{
if (strspn(cs, ";0123456789") == strlen(cs)) colour_string = cs;
}
}
#ifdef WIN32
init_colour_output();
#endif