Documentation update.

This commit is contained in:
Philip.Hazel 2020-10-05 16:52:39 +00:00
parent 81da2b97e3
commit deffc391ce
2 changed files with 39 additions and 20 deletions

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "19 March 2020" "PCRE2 10.35" .TH PCRE2API 3 "05 October 2020" "PCRE2 10.36"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -1434,10 +1434,13 @@ letters in the subject. It is equivalent to Perl's /i option, and it can be
changed within a pattern by a (?i) option setting. If either PCRE2_UTF or changed within a pattern by a (?i) option setting. If either PCRE2_UTF or
PCRE2_UCP is set, Unicode properties are used for all characters with more than PCRE2_UCP is set, Unicode properties are used for all characters with more than
one other case, and for all characters whose code points are greater than one other case, and for all characters whose code points are greater than
U+007F. For lower valued characters with only one other case, a lookup table is U+007F. Note that there are two ASCII characters, K and S, that, in addition to
used for speed. When neither PCRE2_UTF nor PCRE2_UCP is set, a lookup table is their lower case ASCII equivalents, are case-equivalent with U+212A (Kelvin
used for all code points less than 256, and higher code points (available only sign) and U+017F (long S) respectively. For lower valued characters with only
in 16-bit or 32-bit mode) are treated as not having another case. one other case, a lookup table is used for speed. When neither PCRE2_UTF nor
PCRE2_UCP is set, a lookup table is used for all code points less than 256, and
higher code points (available only in 16-bit or 32-bit mode) are treated as not
having another case.
.sp .sp
PCRE2_DOLLAR_ENDONLY PCRE2_DOLLAR_ENDONLY
.sp .sp
@ -3968,6 +3971,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 19 March 2020 Last updated: 05 October 2020
Copyright (c) 1997-2020 University of Cambridge. Copyright (c) 1997-2020 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "24 February 2020" "PCRE2 10.35" .TH PCRE2PATTERN 3 "05 October 2020" "PCRE2 10.35"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -263,8 +263,11 @@ corresponding characters in the subject. As a trivial example, the pattern
The quick brown fox The quick brown fox
.sp .sp
matches a portion of a subject string that is identical to itself. When matches a portion of a subject string that is identical to itself. When
caseless matching is specified (the PCRE2_CASELESS option), letters are matched caseless matching is specified (the PCRE2_CASELESS option or (?i) within the
independently of case. pattern), letters are matched independently of case. Note that there are two
ASCII characters, K and S, that, in addition to their lower case ASCII
equivalents, are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F
(long S) respectively when either PCRE2_UTF or PCRE2_UCP is set.
.P .P
The power of regular expressions comes from the ability to include wild cards, The power of regular expressions comes from the ability to include wild cards,
character classes, alternatives, and repetitions in the pattern. These are character classes, alternatives, and repetitions in the pattern. These are
@ -298,6 +301,22 @@ a character class the only metacharacters are:
[ POSIX character class (if followed by POSIX syntax) [ POSIX character class (if followed by POSIX syntax)
] terminates the character class ] terminates the character class
.sp .sp
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
the pattern, other than in a character class, and characters between a #
outside a character class and the next newline, inclusive, are ignored. An
escaping backslash can be used to include a white space or a # character as
part of the pattern. If the PCRE2_EXTENDED_MORE option is set, the same
applies, but in addition unescaped space and horizontal tab characters are
ignored inside a character class. Note: only these two characters are ignored,
not the full set of pattern white space characters that are ignored outside a
character class. Option settings can be changed within a pattern; see the
section entitled
.\" HTML <a href="#internaloptions">
.\" </a>
"Internal Option Setting"
.\"
below.
.P
The following sections describe the use of each of the metacharacters. The following sections describe the use of each of the metacharacters.
. .
. .
@ -315,15 +334,9 @@ would otherwise be interpreted as a metacharacter, so it is always safe to
precede a non-alphanumeric with backslash to specify that it stands for itself. precede a non-alphanumeric with backslash to specify that it stands for itself.
In particular, if you want to match a backslash, you write \e\e. In particular, if you want to match a backslash, you write \e\e.
.P .P
In a UTF mode, only ASCII digits and letters have any special meaning after a Only ASCII digits and letters have any special meaning after a backslash. All
backslash. All other characters (in particular, those whose code points are other characters (in particular, those whose code points are greater than 127)
greater than 127) are treated as literals. are treated as literals.
.P
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
the pattern (other than in a character class), and characters between a #
outside a character class and the next newline, inclusive, are ignored. An
escaping backslash can be used to include a white space or # character as part
of the pattern.
.P .P
If you want to treat all characters in a sequence as literals, you can do so by If you want to treat all characters in a sequence as literals, you can do so by
putting them between \eQ and \eE. This is different from Perl in that $ and @ putting them between \eQ and \eE. This is different from Perl in that $ and @
@ -1436,7 +1449,10 @@ Characters in a class may be specified by their code points using \eo, \ex, or
\eN{U+hh..} in the usual way. When caseless matching is set, any letters in a \eN{U+hh..} in the usual way. When caseless matching is set, any letters in a
class represent both their upper case and lower case versions, so for example, class represent both their upper case and lower case versions, so for example,
a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
match "A", whereas a caseful version would. match "A", whereas a caseful version would. Note that there are two ASCII
characters, K and S, that, in addition to their lower case ASCII equivalents,
are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F (long S)
respectively when either PCRE2_UTF or PCRE2_UCP is set.
.P .P
Characters that might indicate line breaks are never treated in any special way Characters that might indicate line breaks are never treated in any special way
when matching character classes, whatever line-ending sequence is in use, and when matching character classes, whatever line-ending sequence is in use, and
@ -3881,6 +3897,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 24 February 2020 Last updated: 05 October 2020
Copyright (c) 1997-2020 University of Cambridge. Copyright (c) 1997-2020 University of Cambridge.
.fi .fi