1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
//! Canonical id derivation and filename slugging.
//!
//! Canonicalisation is exact-match only: it normalises surface forms
//! (lowercase emails, strip a leading `@` on handles, strip a leading `#` on
//! topics/hashtags) and assigns a deterministic, stable canonical id of the
//! form `<kind>:<normalised-surface>`. The same surface form always maps to
//! the same id regardless of casing or decoration, so cross-source mentions
//! of the same thing collapse onto one registry file.
//!
//! Fuzzy matching (e.g. `alice-slack` ≡ `Alice-Discord` by soft match) is out
//! of scope here; the mechanical cases are handled cleanly without producing
//! false merges.
use EntityKind;
/// Canonical id form per kind. Deterministic so the same surface always maps
/// to the same id.
///
/// - Email: `email:<lowercased>`
/// - Handle: `handle:<lowercased>` with a leading `@` stripped
/// - Hashtag: `hashtag:<lowercased>` with a leading `#` stripped
/// - Topic: `topic:<lowercased>` with leading `@`/`#` stripped
/// - URL: `url:<trimmed>` with case preserved for path/query exact matching
/// - Other kinds: `<kind>:<lowercased-surface>`
///
/// URLs keep their original case because path and query components are
/// case-significant; every other kind is folded to lowercase so casing never
/// fragments an identity.
/// Map a canonical id to a filesystem-safe filename stem.
///
/// `:` is replaced (along with the other Windows-reserved characters and
/// control bytes) so the same on-disk layout works on every platform, even
/// though `:` is legal on Unix. The authoritative id always lives in the
/// file's YAML `id:` field, so the slug is only a content-addressed handle —
/// the parser never reconstructs the id from the filename.
pub