B.3 Supported Unicode General Categories

The following table describes the supported Unicode general categories. These categories can be used with the \p and \P character classes. See the "B.2 Character Classes", for details on the character classes.

Table B.3

List of Supported Unicode General Categories

Unicode General Categories

Description

Lu

Letter, Uppercase

Ll

Letter, Lowercase

Lt

Letter, Titlecase

Lm

Letter, Modifier

Lo

Letter, Other

Mn

Mark, Nonspacing

Mc

Mark, Spacing Combining

Me

Mark, Enclosing

Nd

Number, Decimal Digit

Nl

Number, Letter

No

Number, Other

Pc

Punctuation, Connector

Pd

Punctuation, Dash

Ps

Punctuation, Open

Pe

Punctuation, Close

Pi

Punctuation, Initial quote

Pf

Punctuation, Final quote

Po

Punctuation, Other

Sm

Symbol, Math

Sc

Symbol, Currency

Sk

Symbol, Modifier

So

Symbol, Other

Zs

Separator, Space

Zl

Separator, Line

Zp

Separator, Paragraph

Cc

Other, Control

Cf

Other, Format

Cs

Other, Surrogate

Co

Other, Private Use

Cn

Other, Not Assigned

 

Additional special categories are supported that represent a set of Unicode character categories, as shown in the following table:

Table B.4

List of Set of Unicode Character Categories

Category

Description

C

(All control characters) Cc, Cf, Cs, Co, and Cn.

L

(All letters) Lu, Ll, Lt, Lm, and Lo.

M

(All diacritic marks) Mm, Mc, and Me.

N

(All numbers) Nd, Nl, and No.

P

(All punctuation) Pc, Pd, Ps, Pe, Pi, Pf and Po.

S

(All symbols) Sm, Sc, Sk, and So.

Z

(All separators) Zs, Zl, and Zp.