B.3
Supported Unicode General Categories
The following table describes the supported Unicode general categories. These categories can be used with the \p and \P character classes. See the "B.2 Character Classes", for details on the character classes.
Table B.3 | List of Supported Unicode General Categories |
|
|
Lu
|
Letter, Uppercase
|
Ll
|
Letter, Lowercase
|
Lt
|
Letter, Titlecase
|
Lm
|
Letter, Modifier
|
Lo
|
Letter, Other
|
Mn
|
Mark, Nonspacing
|
Mc
|
Mark, Spacing Combining
|
Me
|
Mark, Enclosing
|
Nd
|
Number, Decimal Digit
|
Nl
|
Number, Letter
|
No
|
Number, Other
|
Pc
|
Punctuation, Connector
|
Pd
|
Punctuation, Dash
|
Ps
|
Punctuation, Open
|
Pe
|
Punctuation, Close
|
Pi
|
Punctuation, Initial quote
|
Pf
|
Punctuation, Final quote
|
Po
|
Punctuation, Other
|
Sm
|
Symbol, Math
|
Sc
|
Symbol, Currency
|
Sk
|
Symbol, Modifier
|
So
|
Symbol, Other
|
Zs
|
Separator, Space
|
Zl
|
Separator, Line
|
Zp
|
Separator, Paragraph
|
Cc
|
Other, Control
|
Cf
|
Other, Format
|
Cs
|
Other, Surrogate
|
Co
|
Other, Private Use
|
Cn
|
Other, Not Assigned
|
Additional special categories are supported that represent a set of Unicode character categories, as shown in the following table:
Table B.4 | List of Set of Unicode Character Categories |
|
|
C
|
(All control characters) Cc, Cf, Cs, Co, and Cn.
|
L
|
(All letters) Lu, Ll, Lt, Lm, and Lo.
|
M
|
(All diacritic marks) Mm, Mc, and Me.
|
N
|
(All numbers) Nd, Nl, and No.
|
P
|
(All punctuation) Pc, Pd, Ps, Pe, Pi, Pf and Po.
|
S
|
(All symbols) Sm, Sc, Sk, and So.
|
Z
|
(All separators) Zs, Zl, and Zp.
|