Skip to main content

Module unicode

Module unicode 

Source
Expand description

#349: Unicode normalization for symbol-name matching.

Hangul (and any combining-mark) identifiers written in NFD — typically pasted from macOS filenames, where APFS preserves decomposed jamo — silently miss NFC queries when names are compared byte-exact. The fix is one canonical form at both boundaries: symbol names are normalized to NFC once at extraction (so the index, the overview payloads, and the BM25F corpus all carry NFC), and query strings are normalized the same way before hitting the store.

Signatures and bodies stay byte-faithful to the source file — only identifier-matching fields normalize. Pre-existing index rows keep their on-disk form until the next refresh_symbol_index.

Functions§

nfc_identifier
NFC-normalize an identifier. ASCII (the overwhelming majority of symbol names) and already-NFC strings take the zero-alloc path.