Expand description
BM25 keyword search index for code chunks.
Provides camelCase/snake_case-aware tokenization via CodeSplitFilter
and an in-RAM tantivy index (Bm25Index) that supports per-field
boosted queries so identifier sub-tokens (e.g. json from
parseJsonConfig) are matched correctly.
Structs§
- BM25
Fields - Handles to the tantivy schema fields used by
Bm25Index. - Bm25
Index - In-RAM BM25 index over a slice of
CodeChunks. - Code
Split Filter - Tantivy
TokenFilterthat emits sub-tokens for camelCase/snake_case identifiers in addition to the original token. - Code
Split Filter Wrapper - Wrapper tokenizer produced by
CodeSplitFilter::transform. - Code
Split Token Stream - Token stream produced by
CodeSplitFilterWrapper.
Functions§
- build_
schema - Construct the tantivy
Schemaand return field handles. - code_
analyzer - Build a tantivy
TextAnalyzerthat tokenizes, expands camelCase/snake_case identifiers into sub-tokens, then lowercases everything. - split_
code_ identifier - Split a code identifier into its constituent sub-words.