More documentation
This commit is contained in:
parent
56805a1246
commit
69530d5b36
|
@ -15,9 +15,11 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
After a successful call of \fBpcre2_match()\fP that was passed the match block
|
After a successful call of \fBpcre2_match()\fP that was passed the match block
|
||||||
that is this function's argument, this function returns the code unit offset of
|
that is this function's argument, this function returns the code unit offset of
|
||||||
the character at which the successful match started. This can be different to
|
the character at which the successful match started. For a non-partial match,
|
||||||
the value of \fIovector[0]\fP if the pattern contains the \eK escape sequence.
|
this can be different to the value of \fIovector[0]\fP if the pattern contains
|
||||||
Note, however, that \eK has no effect for a partial match.
|
the \eK escape sequence. After a partial match, however, this value is always
|
||||||
|
the same as \fIovector[0]\fP because \eK does not affect the result of a
|
||||||
|
partial match.
|
||||||
.P
|
.P
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "16 October 2014" "PCRE2 10.00"
|
.TH PCRE2API 3 "25 October 2014" "PCRE2 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -2069,10 +2069,11 @@ pointer to the zero-terminated name, which is within the compiled pattern.
|
||||||
Otherwise NULL is returned. A (*MARK) name may be available after a failed
|
Otherwise NULL is returned. A (*MARK) name may be available after a failed
|
||||||
match or a partial match, as well as after a successful one.
|
match or a partial match, as well as after a successful one.
|
||||||
.P
|
.P
|
||||||
The offset of the character at which the successful match started is
|
The code unit offset of the character at which a successful match started is
|
||||||
returned by \fBpcre2_get_startchar()\fP. This can be different to the value of
|
returned by \fBpcre2_get_startchar()\fP. For a non-partial match, this can be
|
||||||
\fIovector[0]\fP if the pattern contains the \eK escape sequence. Note,
|
different to the value of \fIovector[0]\fP if the pattern contains the \eK
|
||||||
however, that \eK has no effect for a partial match.
|
escape sequence. After a partial match, however, this value is always the same
|
||||||
|
as \fIovector[0]\fP because \eK does not affect the result of a partial match.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="errorlist"></a>
|
.\" HTML <a name="errorlist"></a>
|
||||||
|
@ -2629,6 +2630,6 @@ Cambridge CB2 3QH, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 16 October 2014
|
Last updated: 25 October 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
62
maint/README
62
maint/README
|
@ -112,10 +112,10 @@ distribution for a new release.
|
||||||
different configurations, and it also runs some of them with valgrind, all of
|
different configurations, and it also runs some of them with valgrind, all of
|
||||||
which can take quite some time.
|
which can take quite some time.
|
||||||
|
|
||||||
. Run perltest.pl on the test data for tests 1, 4, and 6. The output
|
. Run perltest.sh on the test data for tests 1 and 4. The output should match
|
||||||
should match the PCRE2 test output, apart from the version identification at
|
the PCRE2 test output, apart from the version identification at the start of
|
||||||
the start of each test. The other tests are not Perl-compatible (they use
|
each test. The other tests are not Perl-compatible (they use various
|
||||||
various PCRE2-specific features or options).
|
PCRE2-specific features or options).
|
||||||
|
|
||||||
. It is possible to test with the emulated memmove() function by undefining
|
. It is possible to test with the emulated memmove() function by undefining
|
||||||
HAVE_MEMMOVE and HAVE_BCOPY in config.h, though I do not do this often. You
|
HAVE_MEMMOVE and HAVE_BCOPY in config.h, though I do not do this often. You
|
||||||
|
@ -134,6 +134,9 @@ distribution for a new release.
|
||||||
longer do this, but instead I rely on putting out release candidates for
|
longer do this, but instead I rely on putting out release candidates for
|
||||||
folks on the pcre-dev list to test.
|
folks on the pcre-dev list to test.
|
||||||
|
|
||||||
|
. The buildbots at http://buildfarm.opencsw.org/ do some automated testing
|
||||||
|
of PCRE2 and should be checked before putting out a release.
|
||||||
|
|
||||||
|
|
||||||
Updating version info for libtool
|
Updating version info for libtool
|
||||||
=================================
|
=================================
|
||||||
|
@ -179,8 +182,8 @@ changes in a shared library:
|
||||||
new version. Increment current, set revision and age to 0.
|
new version. Increment current, set revision and age to 0.
|
||||||
|
|
||||||
|
|
||||||
Making a PCRE release
|
Making a PCRE2 release
|
||||||
=====================
|
======================
|
||||||
|
|
||||||
Run PrepareRelease and commit the files that it changes (by removing trailing
|
Run PrepareRelease and commit the files that it changes (by removing trailing
|
||||||
spaces). The first thing this script does is to run CheckMan on the man pages;
|
spaces). The first thing this script does is to run CheckMan on the man pages;
|
||||||
|
@ -193,9 +196,9 @@ copy:
|
||||||
svn copy svn://vcs.exim.org/pcre2/code/trunk \
|
svn copy svn://vcs.exim.org/pcre2/code/trunk \
|
||||||
svn://vcs.exim.org/pcre2/code/tags/pcre-8.xx
|
svn://vcs.exim.org/pcre2/code/tags/pcre-8.xx
|
||||||
|
|
||||||
Don't forget to update Freecode (fka Freshmeat) when the new release is out,
|
When the new release is out, don't forget to tell webmaster@pcre.org and the
|
||||||
and to tell webmaster@pcre.org and the mailing list. Also, update the list of
|
mailing list. Also, update the list of version numbers in Bugzilla (edit
|
||||||
version numbers in Bugzilla (edit products).
|
products).
|
||||||
|
|
||||||
|
|
||||||
Future ideas (wish list)
|
Future ideas (wish list)
|
||||||
|
@ -220,8 +223,8 @@ others are relatively new.
|
||||||
to have little effect, and maybe makes things worse.
|
to have little effect, and maybe makes things worse.
|
||||||
|
|
||||||
* "Ends with literal string" - note that a single character doesn't gain much
|
* "Ends with literal string" - note that a single character doesn't gain much
|
||||||
over the existing "required byte" (reqbyte) feature that just remembers one
|
over the existing "required code unit" feature that just remembers one code
|
||||||
data unit.
|
unit.
|
||||||
|
|
||||||
* Remember an initial string rather than just 1 code unit?
|
* Remember an initial string rather than just 1 code unit?
|
||||||
|
|
||||||
|
@ -245,13 +248,11 @@ others are relatively new.
|
||||||
|
|
||||||
. Perl 6 will be a revolution. Is it a revolution too far for PCRE?
|
. Perl 6 will be a revolution. Is it a revolution too far for PCRE?
|
||||||
|
|
||||||
. Allow errorptr and erroroffset to be NULL. I don't like this idea.
|
|
||||||
|
|
||||||
. Line endings:
|
. Line endings:
|
||||||
|
|
||||||
* Option to use NUL as a line terminator in subject strings. This could now
|
* Option to use NUL as a line terminator in subject strings. This could now
|
||||||
be done relatively easily since the extension to support LF, CR, and CRLF.
|
be done relatively easily since the extension to support LF, CR, and CRLF.
|
||||||
If it is done, a suitable option for pcregrep is also required.
|
If it is done, a suitable option for pcre2grep is also required.
|
||||||
|
|
||||||
. Catch SIGSEGV for stack overflows?
|
. Catch SIGSEGV for stack overflows?
|
||||||
|
|
||||||
|
@ -259,32 +260,26 @@ others are relatively new.
|
||||||
|
|
||||||
. Option to convert results into character offsets and character lengths.
|
. Option to convert results into character offsets and character lengths.
|
||||||
|
|
||||||
. Option for pcregrep to scan only the start of a file. I am not keen - this is
|
. Option for pcre2grep to scan only the start of a file. I am not keen - this
|
||||||
the job of "head".
|
is the job of "head".
|
||||||
|
|
||||||
. A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
|
. A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
|
||||||
preceded by a blank line, instead of adding it to every matched line, and (b)
|
preceded by a blank line, instead of adding it to every matched line, and (b)
|
||||||
support --outputfile=name.
|
support --outputfile=name.
|
||||||
|
|
||||||
. Consider making UTF and UCP the default for PCRE n.0 for some n > 8.
|
|
||||||
|
|
||||||
. Define a union for the results from pcre2_pattern_info().
|
. Define a union for the results from pcre2_pattern_info().
|
||||||
|
|
||||||
. Provide a "random access to the subject" facility so that the way in which it
|
. Provide a "random access to the subject" facility so that the way in which it
|
||||||
is stored is independent of PCRE. For efficiency, it probably isn't possible
|
is stored is independent of PCRE2. For efficiency, it probably isn't possible
|
||||||
to switch this dynamically. It would have to be specified when PCRE was
|
to switch this dynamically. It would have to be specified when PCRE2 was
|
||||||
compiled. PCRE would then call a function every time it wanted a character.
|
compiled. PCRE2 would then call a function every time it wanted a character.
|
||||||
|
|
||||||
. Wild thought: the ability to compile from PCRE's internal byte code to a real
|
. Wild thought: the ability to compile from PCRE2's internal code to a real
|
||||||
FSM and a very fast (third) matcher to process the result. There would be
|
FSM and a very fast (third) matcher to process the result. There would be
|
||||||
even more restrictions than for pcre_dfa_exec(), however. This is not easy.
|
even more restrictions than for pcre2_dfa_exec(), however. This is not easy.
|
||||||
This is probably obsolete now that we have the JIT support.
|
This is probably obsolete now that we have the JIT support.
|
||||||
|
|
||||||
. Should pcretest have some private locale data, to avoid relying on the
|
. pcre2grep: add -rs for a sorted recurse? Having to store file names and sort
|
||||||
available locales for the test data, since different OS have different ideas?
|
|
||||||
This won't be as thorough a test, but perhaps that doesn't really matter.
|
|
||||||
|
|
||||||
. pcregrep: add -rs for a sorted recurse? Having to store file names and sort
|
|
||||||
them will of course slow it down.
|
them will of course slow it down.
|
||||||
|
|
||||||
. Someone suggested --disable-callout to save code space when callouts are
|
. Someone suggested --disable-callout to save code space when callouts are
|
||||||
|
@ -293,13 +288,14 @@ others are relatively new.
|
||||||
. A user suggested a parameter to limit the length of string matched, for
|
. A user suggested a parameter to limit the length of string matched, for
|
||||||
example if the parameter is N, the current match should fail if the matched
|
example if the parameter is N, the current match should fail if the matched
|
||||||
substring exceeds N. This could apply to both match functions. The value
|
substring exceeds N. This could apply to both match functions. The value
|
||||||
could be a new field in the extra block.
|
could be a new field in the match context.
|
||||||
|
|
||||||
. Callouts with arguments: (?Cn:ARG) for instance.
|
. Callouts with arguments: (?Cn:ARG) for instance.
|
||||||
|
|
||||||
. Write a function that generates random matching strings for a compiled regex.
|
. Write a function that generates random matching strings for a compiled
|
||||||
|
pattern.
|
||||||
|
|
||||||
. Pcregrep: an option to specify the output line separator, either as a string
|
. Pcre2grep: an option to specify the output line separator, either as a string
|
||||||
or select from a fixed list. This is not dead easy, because at the moment it
|
or select from a fixed list. This is not dead easy, because at the moment it
|
||||||
outputs whatever is in the input file.
|
outputs whatever is in the input file.
|
||||||
|
|
||||||
|
@ -309,7 +305,7 @@ others are relatively new.
|
||||||
implementation that I tried made things worse in many simple cases, so this
|
implementation that I tried made things worse in many simple cases, so this
|
||||||
is not an obviously good thing.
|
is not an obviously good thing.
|
||||||
|
|
||||||
. PCRE cannot at present distinguish between subpatterns with different names,
|
. PCRE2 cannot at present distinguish between subpatterns with different names,
|
||||||
but the same number (created by the use of ?|). In order to do so, a way of
|
but the same number (created by the use of ?|). In order to do so, a way of
|
||||||
remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
|
remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
|
||||||
Now that (*MARK) has been implemented, it can perhaps be used as a way round
|
Now that (*MARK) has been implemented, it can perhaps be used as a way round
|
||||||
|
@ -321,4 +317,4 @@ others are relatively new.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 13 May 2014
|
Last updated: 25 October 2014
|
||||||
|
|
Loading…
Reference in New Issue