Per-URL-component normalization: decode only unreserved characters
(RFC 3986 §2.3 — A-Z, a-z, 0-9, -, ., _, ~). Reserved
characters stay percent-encoded so that downstream rules see the same
shape regardless of encoding variation.
Text-level confusable character lookup using embedded data from build.rs.
Separate from confusables.rs which is used for hostname skeleton matching.
This table covers Mathematical Alphanumeric Symbols (U+1D400–U+1D7FF)
used in steganographic text attacks.