Struct SparseVectorQuery

Source

pub struct SparseVectorQuery {
    pub field: Field,
    pub vector: Vec<(u32, f32)>,
    pub combiner: MultiValueCombiner,
    pub heap_factor: f32,
    pub weight_threshold: f32,
    pub max_query_dims: Option<usize>,
    pub pruning: Option<f32>,
    pub over_fetch_factor: f32,
    /* private fields */
}

Expand description

Sparse vector query for similarity search

Fields§

§field: Field

Field containing the sparse vectors

§vector: Vec<(u32, f32)>

Query vector as (dimension_id, weight) pairs

§combiner: MultiValueCombiner

How to combine scores for multi-valued documents

§heap_factor: f32

Approximate search factor (1.0 = exact, lower values = faster but approximate) Controls MaxScore pruning aggressiveness in block-max scoring

§weight_threshold: f32

Minimum abs(weight) for query dimensions (0.0 = no filtering) Dimensions below this threshold are dropped before search.

§max_query_dims: Option<usize>

Maximum number of query dimensions to process (None = all) Keeps only the top-k dimensions by abs(weight).

§pruning: Option<f32>

Fraction of query dimensions to keep (0.0-1.0), same semantics as indexing-time pruning: sort by abs(weight) descending, keep top fraction. None or 1.0 = no pruning.

§over_fetch_factor: f32

Multiplier on executor limit for ordinal deduplication (1.0 = no over-fetch)

Implementations§

Source §

impl SparseVectorQuery

Source

pub fn new(field: Field, vector: Vec<(u32, f32)>) -> Self

Create a new sparse vector query

Default combiner is LogSumExp { temperature: 0.7 } which provides saturation for documents with many sparse vectors (e.g., 100+ ordinals). This prevents over-weighting from multiple matches while still allowing additional matches to contribute to the score.

Source

pub fn with_combiner(self, combiner: MultiValueCombiner) -> Self

Set the multi-value score combiner

Source

pub fn with_over_fetch_factor(self, factor: f32) -> Self

Set executor over-fetch factor for multi-valued fields. After MaxScore execution, ordinal combining may reduce result count; this multiplier compensates by fetching more from the executor. (1.0 = no over-fetch, 2.0 = fetch 2x then combine down)

Source

pub fn with_heap_factor(self, heap_factor: f32) -> Self

Set the heap factor for approximate search

Controls the trade-off between speed and recall:

1.0 = exact search (default)
0.8-0.9 = ~20-40% faster with minimal recall loss
Lower values = more aggressive pruning, faster but lower recall

Source

pub fn with_weight_threshold(self, threshold: f32) -> Self

Set minimum weight threshold for query dimensions Dimensions with abs(weight) below this are dropped before search.

Source

pub fn with_max_query_dims(self, max_dims: usize) -> Self

Set maximum number of query dimensions (top-k by weight)

Source

pub fn with_pruning(self, fraction: f32) -> Self

Set pruning fraction (0.0-1.0): keep top fraction of query dims by weight. Same semantics as indexing-time pruning.

Source

pub fn from_indices_weights( field: Field, indices: Vec<u32>, weights: Vec<f32>, ) -> Self

Create from separate indices and weights vectors

Source

pub fn from_text( field: Field, text: &str, tokenizer_name: &str, weighting: QueryWeighting, sparse_index: Option<&SparseIndex>, ) -> Result<Self>

Create from raw text using a HuggingFace tokenizer (single segment)

This method tokenizes the text and creates a sparse vector query. For multi-segment indexes, use from_text_with_stats instead.

§Arguments

field - The sparse vector field to search
text - Raw text to tokenize
tokenizer_name - HuggingFace tokenizer path (e.g., “bert-base-uncased”)
weighting - Weighting strategy for tokens
sparse_index - Optional sparse index for IDF lookup (required for IDF weighting)

Source

pub fn from_text_with_stats( field: Field, text: &str, tokenizer: &HfTokenizer, weighting: QueryWeighting, global_stats: Option<&GlobalStats>, ) -> Result<Self>

Create from raw text using global statistics (multi-segment)

This is the recommended method for multi-segment indexes as it uses aggregated IDF values across all segments for consistent ranking.

§Arguments

field - The sparse vector field to search
text - Raw text to tokenize
tokenizer - Pre-loaded HuggingFace tokenizer
weighting - Weighting strategy for tokens
global_stats - Global statistics for IDF computation

Source

pub fn from_text_with_tokenizer_bytes( field: Field, text: &str, tokenizer_bytes: &[u8], weighting: QueryWeighting, global_stats: Option<&GlobalStats>, ) -> Result<Self>

Create from raw text, loading tokenizer from index directory

This method supports the index:// prefix for tokenizer paths, loading tokenizer.json from the index directory.

§Arguments

field - The sparse vector field to search
text - Raw text to tokenize
tokenizer_bytes - Tokenizer JSON bytes (pre-loaded from directory)
weighting - Weighting strategy for tokens
global_stats - Global statistics for IDF computation

Trait Implementations§

Source §

impl Clone for SparseVectorQuery

Source §

fn clone(&self) -> SparseVectorQuery

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for SparseVectorQuery

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Display for SparseVectorQuery

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Query for SparseVectorQuery

Source §

fn scorer<'a>( &self, reader: &'a SegmentReader, limit: usize, ) -> ScorerFuture<'a>

Create a scorer for this query against a single segment (async) Read more

Source §

fn scorer_sync<'a>( &self, reader: &'a SegmentReader, limit: usize, ) -> Result<Box<dyn Scorer + 'a>>

Create a scorer synchronously (mmap/RAM only). Read more

Source §

fn count_estimate<'a>(&self, _reader: &'a SegmentReader) -> CountFuture<'a>

Estimated number of matching documents in a segment (async)

Source §

fn as_sparse_term_queries(&self) -> Option<Vec<SparseTermQueryInfo>>

Decompose into sparse term query infos for MaxScore optimization. Read more

Source §

fn as_term_query_info(&self) -> Option<TermQueryInfo>

Return term info if this is a simple term query eligible for MaxScore optimization Read more

Source §

fn as_sparse_term_query_info(&self) -> Option<SparseTermQueryInfo>

Return sparse term info if this is a single-dimension sparse query eligible for MaxScore optimization

Source §

fn is_filter(&self) -> bool

True if this query is a pure filter (always scores 1.0, no positions). Used by the planner to convert non-selective MUST filters into predicates.

Source §

fn as_doc_predicate<'a>( &self, _reader: &'a SegmentReader, ) -> Option<DocPredicate<'a>>

For filter queries: return a cheap per-doc predicate against a segment. The predicate does O(1) work per doc (e.g., fast-field lookup).

Auto Trait Implementations§

§

impl UnwindSafe for SparseVectorQuery

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §