disarm 0.10.0

Unicode canonicalization and TR39 confusable analysis: building blocks for text-security pipelines (homoglyph/bidi/zalgo handling) plus standards-based transliteration
Documentation
# Persian (Farsi) transliteration overrides
# Based on BGN/PCGN 1958 romanization system (ASCII output)
#
# Persian shares the Arabic script but has 4 extra letters (پ چ ژ گ),
# different vowel pronunciation, and distinct romanization conventions.
# The default table already provides base Arabic mappings; this table
# overrides characters where Persian pronunciation differs from Arabic.

# === 32 Persian consonants (BGN/PCGN) ===
0628	b
067E	p
062A	t
062B	s
062C	j
0686	ch
062D	h
062E	kh
062F	d
0630	z
0631	r
0632	z
0698	zh
0633	s
0634	sh
0635	s
0636	z
0637	t
0638	z
0639	'
063A	gh
0641	f
0642	q
06A9	k
06AF	g
0644	l
0645	m
0646	n
0648	v
0647	h
06CC	y
# Arabic kaf fallback (Persian normally uses U+06A9)
0643	k

# === Vowels & special characters ===
# Alef
0627	a
# Alef-madda — long "a", not doubled
0622	a
# Hamza
0621	'
# Alef with hamza above
0623	a
# Alef with hamza below
0625	e
# Waw with hamza — glottal stop, not consonantal waw
0624	'
# Yeh with hamza — glottal stop, not consonantal yeh
0626	'
# Taa marbuta
0629	e
# Alef maqsura
0649	a
# Arabic yaa
064A	y
# Heh with yeh above — izafe
06C0	-e
# Heh goal
06C1	h

# === Diacritics (harakat) ===
# Fathah
064E	a
# Kasra
0650	e
# Damma
064F	o
# Sukun — suppress vowel
0652
# Shadda — gemination
0651

# === Persian (Eastern Arabic-Indic) digits ===
06F0	0
06F1	1
06F2	2
06F3	3
06F4	4
06F5	5
06F6	6
06F7	7
06F8	8
06F9	9

# === Common Persian punctuation ===
060C	,
061B	;
061F	?
06D4	.