More documentation

This commit is contained in:
Philip.Hazel 2014-10-25 15:51:01 +00:00
parent 56805a1246
commit 69530d5b36
3 changed files with 41 additions and 42 deletions

View File

@ -15,9 +15,11 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
After a successful call of \fBpcre2_match()\fP that was passed the match block After a successful call of \fBpcre2_match()\fP that was passed the match block
that is this function's argument, this function returns the code unit offset of that is this function's argument, this function returns the code unit offset of
the character at which the successful match started. This can be different to the character at which the successful match started. For a non-partial match,
the value of \fIovector[0]\fP if the pattern contains the \eK escape sequence. this can be different to the value of \fIovector[0]\fP if the pattern contains
Note, however, that \eK has no effect for a partial match. the \eK escape sequence. After a partial match, however, this value is always
the same as \fIovector[0]\fP because \eK does not affect the result of a
partial match.
.P .P
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the
.\" HREF .\" HREF

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "16 October 2014" "PCRE2 10.00" .TH PCRE2API 3 "25 October 2014" "PCRE2 10.00"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -2069,10 +2069,11 @@ pointer to the zero-terminated name, which is within the compiled pattern.
Otherwise NULL is returned. A (*MARK) name may be available after a failed Otherwise NULL is returned. A (*MARK) name may be available after a failed
match or a partial match, as well as after a successful one. match or a partial match, as well as after a successful one.
.P .P
The offset of the character at which the successful match started is The code unit offset of the character at which a successful match started is
returned by \fBpcre2_get_startchar()\fP. This can be different to the value of returned by \fBpcre2_get_startchar()\fP. For a non-partial match, this can be
\fIovector[0]\fP if the pattern contains the \eK escape sequence. Note, different to the value of \fIovector[0]\fP if the pattern contains the \eK
however, that \eK has no effect for a partial match. escape sequence. After a partial match, however, this value is always the same
as \fIovector[0]\fP because \eK does not affect the result of a partial match.
. .
. .
.\" HTML <a name="errorlist"></a> .\" HTML <a name="errorlist"></a>
@ -2629,6 +2630,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 16 October 2014 Last updated: 25 October 2014
Copyright (c) 1997-2014 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi

View File

@ -112,10 +112,10 @@ distribution for a new release.
different configurations, and it also runs some of them with valgrind, all of different configurations, and it also runs some of them with valgrind, all of
which can take quite some time. which can take quite some time.
. Run perltest.pl on the test data for tests 1, 4, and 6. The output . Run perltest.sh on the test data for tests 1 and 4. The output should match
should match the PCRE2 test output, apart from the version identification at the PCRE2 test output, apart from the version identification at the start of
the start of each test. The other tests are not Perl-compatible (they use each test. The other tests are not Perl-compatible (they use various
various PCRE2-specific features or options). PCRE2-specific features or options).
. It is possible to test with the emulated memmove() function by undefining . It is possible to test with the emulated memmove() function by undefining
HAVE_MEMMOVE and HAVE_BCOPY in config.h, though I do not do this often. You HAVE_MEMMOVE and HAVE_BCOPY in config.h, though I do not do this often. You
@ -134,6 +134,9 @@ distribution for a new release.
longer do this, but instead I rely on putting out release candidates for longer do this, but instead I rely on putting out release candidates for
folks on the pcre-dev list to test. folks on the pcre-dev list to test.
. The buildbots at http://buildfarm.opencsw.org/ do some automated testing
of PCRE2 and should be checked before putting out a release.
Updating version info for libtool Updating version info for libtool
================================= =================================
@ -179,8 +182,8 @@ changes in a shared library:
new version. Increment current, set revision and age to 0. new version. Increment current, set revision and age to 0.
Making a PCRE release Making a PCRE2 release
===================== ======================
Run PrepareRelease and commit the files that it changes (by removing trailing Run PrepareRelease and commit the files that it changes (by removing trailing
spaces). The first thing this script does is to run CheckMan on the man pages; spaces). The first thing this script does is to run CheckMan on the man pages;
@ -193,9 +196,9 @@ copy:
svn copy svn://vcs.exim.org/pcre2/code/trunk \ svn copy svn://vcs.exim.org/pcre2/code/trunk \
svn://vcs.exim.org/pcre2/code/tags/pcre-8.xx svn://vcs.exim.org/pcre2/code/tags/pcre-8.xx
Don't forget to update Freecode (fka Freshmeat) when the new release is out, When the new release is out, don't forget to tell webmaster@pcre.org and the
and to tell webmaster@pcre.org and the mailing list. Also, update the list of mailing list. Also, update the list of version numbers in Bugzilla (edit
version numbers in Bugzilla (edit products). products).
Future ideas (wish list) Future ideas (wish list)
@ -220,8 +223,8 @@ others are relatively new.
to have little effect, and maybe makes things worse. to have little effect, and maybe makes things worse.
* "Ends with literal string" - note that a single character doesn't gain much * "Ends with literal string" - note that a single character doesn't gain much
over the existing "required byte" (reqbyte) feature that just remembers one over the existing "required code unit" feature that just remembers one code
data unit. unit.
* Remember an initial string rather than just 1 code unit? * Remember an initial string rather than just 1 code unit?
@ -245,13 +248,11 @@ others are relatively new.
. Perl 6 will be a revolution. Is it a revolution too far for PCRE? . Perl 6 will be a revolution. Is it a revolution too far for PCRE?
. Allow errorptr and erroroffset to be NULL. I don't like this idea.
. Line endings: . Line endings:
* Option to use NUL as a line terminator in subject strings. This could now * Option to use NUL as a line terminator in subject strings. This could now
be done relatively easily since the extension to support LF, CR, and CRLF. be done relatively easily since the extension to support LF, CR, and CRLF.
If it is done, a suitable option for pcregrep is also required. If it is done, a suitable option for pcre2grep is also required.
. Catch SIGSEGV for stack overflows? . Catch SIGSEGV for stack overflows?
@ -259,32 +260,26 @@ others are relatively new.
. Option to convert results into character offsets and character lengths. . Option to convert results into character offsets and character lengths.
. Option for pcregrep to scan only the start of a file. I am not keen - this is . Option for pcre2grep to scan only the start of a file. I am not keen - this
the job of "head". is the job of "head".
. A (non-Unix) user wanted pcregrep options to (a) list a file name just once, . A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
preceded by a blank line, instead of adding it to every matched line, and (b) preceded by a blank line, instead of adding it to every matched line, and (b)
support --outputfile=name. support --outputfile=name.
. Consider making UTF and UCP the default for PCRE n.0 for some n > 8.
. Define a union for the results from pcre2_pattern_info(). . Define a union for the results from pcre2_pattern_info().
. Provide a "random access to the subject" facility so that the way in which it . Provide a "random access to the subject" facility so that the way in which it
is stored is independent of PCRE. For efficiency, it probably isn't possible is stored is independent of PCRE2. For efficiency, it probably isn't possible
to switch this dynamically. It would have to be specified when PCRE was to switch this dynamically. It would have to be specified when PCRE2 was
compiled. PCRE would then call a function every time it wanted a character. compiled. PCRE2 would then call a function every time it wanted a character.
. Wild thought: the ability to compile from PCRE's internal byte code to a real . Wild thought: the ability to compile from PCRE2's internal code to a real
FSM and a very fast (third) matcher to process the result. There would be FSM and a very fast (third) matcher to process the result. There would be
even more restrictions than for pcre_dfa_exec(), however. This is not easy. even more restrictions than for pcre2_dfa_exec(), however. This is not easy.
This is probably obsolete now that we have the JIT support. This is probably obsolete now that we have the JIT support.
. Should pcretest have some private locale data, to avoid relying on the . pcre2grep: add -rs for a sorted recurse? Having to store file names and sort
available locales for the test data, since different OS have different ideas?
This won't be as thorough a test, but perhaps that doesn't really matter.
. pcregrep: add -rs for a sorted recurse? Having to store file names and sort
them will of course slow it down. them will of course slow it down.
. Someone suggested --disable-callout to save code space when callouts are . Someone suggested --disable-callout to save code space when callouts are
@ -293,13 +288,14 @@ others are relatively new.
. A user suggested a parameter to limit the length of string matched, for . A user suggested a parameter to limit the length of string matched, for
example if the parameter is N, the current match should fail if the matched example if the parameter is N, the current match should fail if the matched
substring exceeds N. This could apply to both match functions. The value substring exceeds N. This could apply to both match functions. The value
could be a new field in the extra block. could be a new field in the match context.
. Callouts with arguments: (?Cn:ARG) for instance. . Callouts with arguments: (?Cn:ARG) for instance.
. Write a function that generates random matching strings for a compiled regex. . Write a function that generates random matching strings for a compiled
pattern.
. Pcregrep: an option to specify the output line separator, either as a string . Pcre2grep: an option to specify the output line separator, either as a string
or select from a fixed list. This is not dead easy, because at the moment it or select from a fixed list. This is not dead easy, because at the moment it
outputs whatever is in the input file. outputs whatever is in the input file.
@ -309,7 +305,7 @@ others are relatively new.
implementation that I tried made things worse in many simple cases, so this implementation that I tried made things worse in many simple cases, so this
is not an obviously good thing. is not an obviously good thing.
. PCRE cannot at present distinguish between subpatterns with different names, . PCRE2 cannot at present distinguish between subpatterns with different names,
but the same number (created by the use of ?|). In order to do so, a way of but the same number (created by the use of ?|). In order to do so, a way of
remembering *which* subpattern numbered n matched is needed. Bugzilla #760. remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
Now that (*MARK) has been implemented, it can perhaps be used as a way round Now that (*MARK) has been implemented, it can perhaps be used as a way round
@ -321,4 +317,4 @@ others are relatively new.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 13 May 2014 Last updated: 25 October 2014