Expand description
#349: Unicode normalization for symbol-name matching.
Hangul (and any combining-mark) identifiers written in NFD — typically pasted from macOS filenames, where APFS preserves decomposed jamo — silently miss NFC queries when names are compared byte-exact. The fix is one canonical form at both boundaries: symbol names are normalized to NFC once at extraction (so the index, the overview payloads, and the BM25F corpus all carry NFC), and query strings are normalized the same way before hitting the store.
Signatures and bodies stay byte-faithful to the source file — only
identifier-matching fields normalize. Pre-existing index rows keep
their on-disk form until the next refresh_symbol_index.
Functions§
- nfc_
identifier - NFC-normalize an identifier. ASCII (the overwhelming majority of symbol names) and already-NFC strings take the zero-alloc path.