Documentation update.
This commit is contained in:
parent
81da2b97e3
commit
deffc391ce
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "19 March 2020" "PCRE2 10.35"
|
.TH PCRE2API 3 "05 October 2020" "PCRE2 10.36"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1434,10 +1434,13 @@ letters in the subject. It is equivalent to Perl's /i option, and it can be
|
||||||
changed within a pattern by a (?i) option setting. If either PCRE2_UTF or
|
changed within a pattern by a (?i) option setting. If either PCRE2_UTF or
|
||||||
PCRE2_UCP is set, Unicode properties are used for all characters with more than
|
PCRE2_UCP is set, Unicode properties are used for all characters with more than
|
||||||
one other case, and for all characters whose code points are greater than
|
one other case, and for all characters whose code points are greater than
|
||||||
U+007F. For lower valued characters with only one other case, a lookup table is
|
U+007F. Note that there are two ASCII characters, K and S, that, in addition to
|
||||||
used for speed. When neither PCRE2_UTF nor PCRE2_UCP is set, a lookup table is
|
their lower case ASCII equivalents, are case-equivalent with U+212A (Kelvin
|
||||||
used for all code points less than 256, and higher code points (available only
|
sign) and U+017F (long S) respectively. For lower valued characters with only
|
||||||
in 16-bit or 32-bit mode) are treated as not having another case.
|
one other case, a lookup table is used for speed. When neither PCRE2_UTF nor
|
||||||
|
PCRE2_UCP is set, a lookup table is used for all code points less than 256, and
|
||||||
|
higher code points (available only in 16-bit or 32-bit mode) are treated as not
|
||||||
|
having another case.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_DOLLAR_ENDONLY
|
PCRE2_DOLLAR_ENDONLY
|
||||||
.sp
|
.sp
|
||||||
|
@ -3968,6 +3971,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 19 March 2020
|
Last updated: 05 October 2020
|
||||||
Copyright (c) 1997-2020 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "24 February 2020" "PCRE2 10.35"
|
.TH PCRE2PATTERN 3 "05 October 2020" "PCRE2 10.35"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -263,8 +263,11 @@ corresponding characters in the subject. As a trivial example, the pattern
|
||||||
The quick brown fox
|
The quick brown fox
|
||||||
.sp
|
.sp
|
||||||
matches a portion of a subject string that is identical to itself. When
|
matches a portion of a subject string that is identical to itself. When
|
||||||
caseless matching is specified (the PCRE2_CASELESS option), letters are matched
|
caseless matching is specified (the PCRE2_CASELESS option or (?i) within the
|
||||||
independently of case.
|
pattern), letters are matched independently of case. Note that there are two
|
||||||
|
ASCII characters, K and S, that, in addition to their lower case ASCII
|
||||||
|
equivalents, are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F
|
||||||
|
(long S) respectively when either PCRE2_UTF or PCRE2_UCP is set.
|
||||||
.P
|
.P
|
||||||
The power of regular expressions comes from the ability to include wild cards,
|
The power of regular expressions comes from the ability to include wild cards,
|
||||||
character classes, alternatives, and repetitions in the pattern. These are
|
character classes, alternatives, and repetitions in the pattern. These are
|
||||||
|
@ -298,6 +301,22 @@ a character class the only metacharacters are:
|
||||||
[ POSIX character class (if followed by POSIX syntax)
|
[ POSIX character class (if followed by POSIX syntax)
|
||||||
] terminates the character class
|
] terminates the character class
|
||||||
.sp
|
.sp
|
||||||
|
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
|
||||||
|
the pattern, other than in a character class, and characters between a #
|
||||||
|
outside a character class and the next newline, inclusive, are ignored. An
|
||||||
|
escaping backslash can be used to include a white space or a # character as
|
||||||
|
part of the pattern. If the PCRE2_EXTENDED_MORE option is set, the same
|
||||||
|
applies, but in addition unescaped space and horizontal tab characters are
|
||||||
|
ignored inside a character class. Note: only these two characters are ignored,
|
||||||
|
not the full set of pattern white space characters that are ignored outside a
|
||||||
|
character class. Option settings can be changed within a pattern; see the
|
||||||
|
section entitled
|
||||||
|
.\" HTML <a href="#internaloptions">
|
||||||
|
.\" </a>
|
||||||
|
"Internal Option Setting"
|
||||||
|
.\"
|
||||||
|
below.
|
||||||
|
.P
|
||||||
The following sections describe the use of each of the metacharacters.
|
The following sections describe the use of each of the metacharacters.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -315,15 +334,9 @@ would otherwise be interpreted as a metacharacter, so it is always safe to
|
||||||
precede a non-alphanumeric with backslash to specify that it stands for itself.
|
precede a non-alphanumeric with backslash to specify that it stands for itself.
|
||||||
In particular, if you want to match a backslash, you write \e\e.
|
In particular, if you want to match a backslash, you write \e\e.
|
||||||
.P
|
.P
|
||||||
In a UTF mode, only ASCII digits and letters have any special meaning after a
|
Only ASCII digits and letters have any special meaning after a backslash. All
|
||||||
backslash. All other characters (in particular, those whose code points are
|
other characters (in particular, those whose code points are greater than 127)
|
||||||
greater than 127) are treated as literals.
|
are treated as literals.
|
||||||
.P
|
|
||||||
If a pattern is compiled with the PCRE2_EXTENDED option, most white space in
|
|
||||||
the pattern (other than in a character class), and characters between a #
|
|
||||||
outside a character class and the next newline, inclusive, are ignored. An
|
|
||||||
escaping backslash can be used to include a white space or # character as part
|
|
||||||
of the pattern.
|
|
||||||
.P
|
.P
|
||||||
If you want to treat all characters in a sequence as literals, you can do so by
|
If you want to treat all characters in a sequence as literals, you can do so by
|
||||||
putting them between \eQ and \eE. This is different from Perl in that $ and @
|
putting them between \eQ and \eE. This is different from Perl in that $ and @
|
||||||
|
@ -1436,7 +1449,10 @@ Characters in a class may be specified by their code points using \eo, \ex, or
|
||||||
\eN{U+hh..} in the usual way. When caseless matching is set, any letters in a
|
\eN{U+hh..} in the usual way. When caseless matching is set, any letters in a
|
||||||
class represent both their upper case and lower case versions, so for example,
|
class represent both their upper case and lower case versions, so for example,
|
||||||
a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
|
a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
|
||||||
match "A", whereas a caseful version would.
|
match "A", whereas a caseful version would. Note that there are two ASCII
|
||||||
|
characters, K and S, that, in addition to their lower case ASCII equivalents,
|
||||||
|
are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F (long S)
|
||||||
|
respectively when either PCRE2_UTF or PCRE2_UCP is set.
|
||||||
.P
|
.P
|
||||||
Characters that might indicate line breaks are never treated in any special way
|
Characters that might indicate line breaks are never treated in any special way
|
||||||
when matching character classes, whatever line-ending sequence is in use, and
|
when matching character classes, whatever line-ending sequence is in use, and
|
||||||
|
@ -3881,6 +3897,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 24 February 2020
|
Last updated: 05 October 2020
|
||||||
Copyright (c) 1997-2020 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
Loading…
Reference in New Issue