1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
//! Adobe-Arabic-1 / Adobe-Persian-1 CID-to-Unicode mapping.
//!
//! Identity mapping over the Unicode Arabic block (U+0600–U+06FF)
//! for the common case where Persian / Farsi / Pashto / Urdu fonts
//! (Nazanin, Yagut, Mitra, Lotus) declare
//! `CIDSystemInfo /Registry (Adobe) /Ordering (Persian|Arabic)` but
//! the font's actual CID-to-glyph mapping is sequential in the
//! Arabic Unicode range.
//!
//! Without this mapping, the engine falls back to Identity-H, which
//! emits CIDs as Latin-Extended-B codepoints (U+01xx–U+07xx
//! garbage). This mapping at least lands the characters in the
//! correct Arabic block (U+0600–U+06FF) where bidi-aware viewers
//! can shape them.
//!
//! ## PDF spec basis
//!
//! Per `docs/spec/pdf.md` §9.7 "Composite Fonts" + §9.7.5 "CMaps":
//! CID-keyed fonts use a CMap to map character codes to CIDs and a
//! registered character collection (`CIDSystemInfo` →
//! `Registry`/`Ordering`/`Supplement`) plus a UCS2-suffixed CMap
//! (e.g. `UniArabicBookman-UCS2`) to map CIDs to Unicode. The full
//! registered Adobe-Persian-1 / Adobe-Arabic-1 UCS2 tables are not
//! shipped: Adobe deprecated and no longer publishes these
//! collections (their adobe-type-tools repo ships CJK + Manga
//! only). The identity mapping here is the §9.10.3 "Mapping
//! Character Codes to Unicode Values" fallback step 3 — when the
//! CMap chain runs out, a conforming reader emits "the actual
//! character code as the Unicode value." For Persian fonts with
//! sequential Arabic-block CIDs this produces correct output; for
//! fonts with non-sequential CID encodings it produces best-effort
//! output in the correct Unicode block.
//!
//! **Limitations**: this is NOT the official Adobe-Arabic-1-UCS2
//! CMap. It is a heuristic identity mapping that works for the
//! common case where Persian fonts use sequential Arabic-block
//! CIDs. The full official CMap data is no longer publicly
//! distributed by Adobe.
/// Look up Unicode for an Adobe-Arabic-1 / Adobe-Persian-1 CID.
///
/// Stub mapping: returns the Arabic-block Unicode codepoint for CID
/// values in `[0x600..=0x6FF]`. Returns `None` otherwise (caller
/// falls back to the existing chain).
///
/// **Why identity mapping**: while the official Adobe-Arabic-1
/// CMap maps CIDs in a specific ordering (e.g. CID 1=isolated alef,
/// CID 2=isolated be, etc.), many Persian fonts ship with simpler
/// CIDs that already align with Unicode codepoints. The identity
/// mapping handles those; the official CMap support is follow-up
/// work.