Fix typos and add clarification to documentation.
This commit is contained in:
parent
9bedb66492
commit
55f982ac0a
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2UNICODE 3 "16 October 2015" "PCRE2 10.21"
|
||||
.TH PCRE2UNICODE 3 "02 July 2016" "PCRE2 10.22"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions (revised API)
|
||||
.SH "UNICODE AND UTF SUPPORT"
|
||||
|
@ -57,18 +57,21 @@ individual code units.
|
|||
In UTF modes, the dot metacharacter matches one UTF character instead of a
|
||||
single code unit.
|
||||
.P
|
||||
The escape sequence \eC can be used to match a single code unit, in a UTF mode,
|
||||
The escape sequence \eC can be used to match a single code unit in a UTF mode,
|
||||
but its use can lead to some strange effects because it breaks up multi-unit
|
||||
characters (see the description of \eC in the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
documentation). The use of \eC is not supported by the alternative matching
|
||||
function \fBpcre2_dfa_match()\fP when in UTF mode. Its use provokes a
|
||||
match-time error. The JIT optimization also does not support \eC in UTF mode.
|
||||
If JIT optimization is requested for a UTF pattern that contains \eC, it will
|
||||
not succeed, and so the matching will be carried out by the normal interpretive
|
||||
function.
|
||||
documentation).
|
||||
.P
|
||||
The use of \eC is not supported by the alternative matching function
|
||||
\fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character
|
||||
may consist of more than one code unit. The use of \eC in these modes provokes
|
||||
a match-time error. Also, the JIT optimization does not support \eC in these
|
||||
modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that
|
||||
contains \eC, it will not succeed, and so when \fBpcre2_match()\fP is called,
|
||||
the matching will be carried out by the normal interpretive function.
|
||||
.P
|
||||
The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly test
|
||||
characters of any code value, but, by default, the characters that PCRE2
|
||||
|
@ -232,9 +235,9 @@ never occur in a valid UTF-8 string.
|
|||
.sp
|
||||
The following negative error codes are given for invalid UTF-16 strings:
|
||||
.sp
|
||||
PCRE_UTF16_ERR1 Missing low surrogate at end of string
|
||||
PCRE_UTF16_ERR2 Invalid low surrogate follows high surrogate
|
||||
PCRE_UTF16_ERR3 Isolated low surrogate
|
||||
PCRE2_UTF16_ERR1 Missing low surrogate at end of string
|
||||
PCRE2_UTF16_ERR2 Invalid low surrogate follows high surrogate
|
||||
PCRE2_UTF16_ERR3 Isolated low surrogate
|
||||
.sp
|
||||
.
|
||||
.
|
||||
|
@ -244,8 +247,8 @@ The following negative error codes are given for invalid UTF-16 strings:
|
|||
.sp
|
||||
The following negative error codes are given for invalid UTF-32 strings:
|
||||
.sp
|
||||
PCRE_UTF32_ERR1 Surrogate character (range from 0xd800 to 0xdfff)
|
||||
PCRE_UTF32_ERR2 Code point is greater than 0x10ffff
|
||||
PCRE2_UTF32_ERR1 Surrogate character (range from 0xd800 to 0xdfff)
|
||||
PCRE2_UTF32_ERR2 Code point is greater than 0x10ffff
|
||||
.sp
|
||||
.
|
||||
.
|
||||
|
@ -263,6 +266,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 16 October 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
Last updated: 02 July 2016
|
||||
Copyright (c) 1997-2016 University of Cambridge.
|
||||
.fi
|
||||
|
|
Loading…
Reference in New Issue