Skip to main content

Module sparse

Module sparse 

Source
Expand description

Sparse (learned) embedding primitives for SPLADE / BGE-M3-sparse integration .

§Why

Learned-sparse retrievers (SPLADE v3, opensearch-doc-v3-distill, BGE-M3-sparse, granite-embedding-30m-sparse) produce a sparse vector over a WordPiece vocabulary that can be scored via an inverted index with semantic term weights learned end-to-end. BEIR nDCG@10 on sparse neural retrievers lands around +3-5 points over classical lexical keyword scoring on zero-shot domains; this lane replaces that legacy lexical lane entirely .

§What this module provides

  • SparseEmbed - canonical sparse-vector shape (ascending indices + aligned values) with a vocab_id tag so two models with different vocabularies never get mixed in one posting list.
  • SparseEncoder trait - adapter-side hook for ONNX / candle backends to implement. Mirrors the crate::rerank::Reranker trait shape.
  • MockSparseEncoder - deterministic test-only encoder.

The actual inverted-index over SparseEmbed values lives in crate::index::sparse so the index stays next to its sibling (brute-force vector index).

Storage in crate::objects::Node: a future Node.sparse_embed: Option<SparseEmbed> field. Additive, so existing CIDs stay byte-identical because the serializer omits None via skip_serializing_if. CBOR canonicality is preserved because indices is sorted ascending at construction (checked by SparseEmbed::new).

Structs§

MockSparseEncoder
Deterministic test-only encoder. Produces a SparseEmbed by hashing each whitespace-separated token into the first 1024 vocabulary slots with a length-inverse weight (1.0 / (1.0 + token_len)).
SparseEmbed
A sparse embedding over a fixed vocabulary.

Enums§

SparseError
Error surface for sparse-encoder adapters. Same shape as crate::llm::LlmError and crate::rerank::RerankError.

Traits§

SparseEncoder
Learned-sparse encoder: given text, produce a SparseEmbed over a fixed vocabulary. Adapter crates implement this over SPLADE-ONNX, BGE-M3-sparse-ONNX, or a remote sidecar.