1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
//! Layer 2 (part of [`crate::api`]) — the precompiled pipeline presets and the
//! named-policy-profile registry.
use crateError;
// ── Precompiled pipeline presets ──────────────────────────────────────────────
/// Security-focused text canonicalization (homoglyph / bidi / zero-width / control
/// neutralization with a path-safety guarantee).
///
/// Pipeline: NFKC → confusables → strip bidi/format → collapse whitespace →
/// path-separator neutralization. Fallible only through the confusables stage,
/// whose target script is fixed internally, so in practice this never errors;
/// the [`Result`] keeps the surface uniform with the other key/clean presets.
/// ML/NLP text normalization: NFKC → emoji→text → transliterate → strip accents →
/// case fold → collapse whitespace.
///
/// `lang` selects the transliteration table (`None` skips transliteration).
/// `emoji_style` is `"cldr"` (expand emoji to CLDR short names) or `"none"`
/// (leave emoji as-is). Fails ([`ErrorKind::InvalidArgument`](crate::ErrorKind))
/// on an unknown `lang` or an unsupported `emoji_style`.
/// Library catalog deduplication key: NFKC → strip bidi → transliterate →
/// confusables → strip accents → case fold → collapse whitespace.
///
/// `strict_iso9` selects the ISO 9:1995 Cyrillic scheme. Fails
/// ([`ErrorKind::InvalidArgument`](crate::ErrorKind)) on an unknown `lang`.
/// Case/accent/script-insensitive search lookup key (like [`catalog_key`] without
/// confusable folding). Fails ([`ErrorKind::InvalidArgument`](crate::ErrorKind))
/// on an unknown `lang`.
/// Collation sort key (like [`search_key`] but preserves base accented characters
/// for correct ordering). Fails ([`ErrorKind::InvalidArgument`](crate::ErrorKind))
/// on an unknown `lang`.
/// Display-safe cleanup for rendered user content: strip bidi/format → collapse
/// whitespace (also stripping control + zero-width). Infallible.
/// Strip bidirectional override and formatting characters (UAX #9 §3.3.2 plus the
/// soft hyphen and deprecated/interlinear format controls). A composable primitive
/// shared by the security/key presets. Infallible.
/// Normalize user-submitted input — Unicode hygiene that **preserves the original
/// script** (no transliteration): NFKC → strip bidi/zero-width/control →
/// strip zalgo → confusables → collapse whitespace → path-separator
/// neutralization.
///
/// Not an output sanitizer (no HTML/JS/SQL escaping). Fallible only through the
/// fixed-target confusables stage; the [`Result`] keeps the surface uniform.
/// Maximum-strength deobfuscation: NFKC → strip all combining marks → strip bidi →
/// strip zero-width → demojize → confusables → strip accents → collapse
/// whitespace. Preserves case; does not transliterate.
///
/// Fallible only through the fixed-target confusables stage; the [`Result`] keeps
/// the surface uniform.
// ── Named policy profiles ─────────────────────────────────────────────────────
/// Sorted names of the available named policy profiles (the registry that the
/// `get_pipeline` Python entrypoint builds from).
///
/// The stateful pipeline builder itself (`_TextPipeline`) stays binding-only for
/// now — exposing it as a pure crates.io type is deferred (see the module-level
/// `src/pipeline.rs` `Pipeline` core), so this read-only registry view is the
/// pipeline surface Layer 2 exposes. Infallible.