Expand description
BM25 scoring module for code-aware text ranking.
Implements Okapi BM25 with a code-aware tokenizer that handles camelCase, snake_case, and other programming conventions. Replaces the naive split+intersect token overlap in hybrid scoring.
Structs§
- Bm25
Index - BM25 index for scoring query-document relevance.
Functions§
- tokenize
- Tokenize text with code-awareness: splits camelCase, snake_case, punctuation boundaries, lowercases, and filters short tokens.