Skip to main content

Module bm25

Module bm25 

Source
Expand description

BM25 scoring module for code-aware text ranking.

Implements Okapi BM25 with a code-aware tokenizer that handles camelCase, snake_case, and other programming conventions. Replaces the naive split+intersect token overlap in hybrid scoring.

Structs§

Bm25Index
BM25 index for scoring query-document relevance.

Functions§

tokenize
Tokenize text with code-awareness: splits camelCase, snake_case, punctuation boundaries, lowercases, and filters short tokens.