Module bm25

Expand description

BM25 keyword search index for code chunks.

Provides camelCase/snake_case-aware tokenization via CodeSplitFilter and an in-RAM tantivy index (Bm25Index) that supports per-field boosted queries so identifier sub-tokens (e.g. json from parseJsonConfig) are matched correctly.

Structs§

BM25Fields: Handles to the tantivy schema fields used by Bm25Index.
Bm25Index: In-RAM BM25 index over a slice of CodeChunks.
CodeSplitFilter: Tantivy TokenFilter that emits sub-tokens for camelCase/snake_case identifiers in addition to the original token.
CodeSplitFilterWrapper: Wrapper tokenizer produced by CodeSplitFilter::transform.
CodeSplitTokenStream: Token stream produced by CodeSplitFilterWrapper.

Functions§

build_schema: Construct the tantivy Schema and return field handles.
code_analyzer: Build a tantivy TextAnalyzer that tokenizes, expands camelCase/snake_case identifiers into sub-tokens, then lowercases everything.
split_code_identifier: Split a code identifier into its constituent sub-words.

Module bm25

Module bm25 Copy item path

Structs§

Functions§

Module bm25