pub struct SparseVectorQuery {
pub field: Field,
pub vector: Vec<(u32, f32)>,
pub combiner: MultiValueCombiner,
pub heap_factor: f32,
pub weight_threshold: f32,
pub max_query_dims: Option<usize>,
pub pruning: Option<f32>,
pub over_fetch_factor: f32,
/* private fields */
}Expand description
Sparse vector query for similarity search
Fields§
§field: FieldField containing the sparse vectors
vector: Vec<(u32, f32)>Query vector as (dimension_id, weight) pairs
combiner: MultiValueCombinerHow to combine scores for multi-valued documents
heap_factor: f32Approximate search factor (1.0 = exact, lower values = faster but approximate) Controls MaxScore pruning aggressiveness in block-max scoring
weight_threshold: f32Minimum abs(weight) for query dimensions (0.0 = no filtering) Dimensions below this threshold are dropped before search.
max_query_dims: Option<usize>Maximum number of query dimensions to process (None = all) Keeps only the top-k dimensions by abs(weight).
pruning: Option<f32>Fraction of query dimensions to keep (0.0-1.0), same semantics as
indexing-time pruning: sort by abs(weight) descending,
keep top fraction. None or 1.0 = no pruning.
over_fetch_factor: f32Multiplier on executor limit for ordinal deduplication (1.0 = no over-fetch)
Implementations§
Source§impl SparseVectorQuery
impl SparseVectorQuery
Sourcepub fn new(field: Field, vector: Vec<(u32, f32)>) -> Self
pub fn new(field: Field, vector: Vec<(u32, f32)>) -> Self
Create a new sparse vector query
Default combiner is LogSumExp { temperature: 0.7 } which provides
saturation for documents with many sparse vectors (e.g., 100+ ordinals).
This prevents over-weighting from multiple matches while still allowing
additional matches to contribute to the score.
Sourcepub fn with_combiner(self, combiner: MultiValueCombiner) -> Self
pub fn with_combiner(self, combiner: MultiValueCombiner) -> Self
Set the multi-value score combiner
Sourcepub fn with_over_fetch_factor(self, factor: f32) -> Self
pub fn with_over_fetch_factor(self, factor: f32) -> Self
Set executor over-fetch factor for multi-valued fields. After MaxScore execution, ordinal combining may reduce result count; this multiplier compensates by fetching more from the executor. (1.0 = no over-fetch, 2.0 = fetch 2x then combine down)
Sourcepub fn with_heap_factor(self, heap_factor: f32) -> Self
pub fn with_heap_factor(self, heap_factor: f32) -> Self
Set the heap factor for approximate search
Controls the trade-off between speed and recall:
- 1.0 = exact search (default)
- 0.8-0.9 = ~20-40% faster with minimal recall loss
- Lower values = more aggressive pruning, faster but lower recall
Sourcepub fn with_weight_threshold(self, threshold: f32) -> Self
pub fn with_weight_threshold(self, threshold: f32) -> Self
Set minimum weight threshold for query dimensions Dimensions with abs(weight) below this are dropped before search.
Sourcepub fn with_max_query_dims(self, max_dims: usize) -> Self
pub fn with_max_query_dims(self, max_dims: usize) -> Self
Set maximum number of query dimensions (top-k by weight)
Sourcepub fn with_pruning(self, fraction: f32) -> Self
pub fn with_pruning(self, fraction: f32) -> Self
Set pruning fraction (0.0-1.0): keep top fraction of query dims by weight.
Same semantics as indexing-time pruning.
Sourcepub fn from_indices_weights(
field: Field,
indices: Vec<u32>,
weights: Vec<f32>,
) -> Self
pub fn from_indices_weights( field: Field, indices: Vec<u32>, weights: Vec<f32>, ) -> Self
Create from separate indices and weights vectors
Sourcepub fn from_text(
field: Field,
text: &str,
tokenizer_name: &str,
weighting: QueryWeighting,
sparse_index: Option<&SparseIndex>,
) -> Result<Self>
pub fn from_text( field: Field, text: &str, tokenizer_name: &str, weighting: QueryWeighting, sparse_index: Option<&SparseIndex>, ) -> Result<Self>
Create from raw text using a HuggingFace tokenizer (single segment)
This method tokenizes the text and creates a sparse vector query.
For multi-segment indexes, use from_text_with_stats instead.
§Arguments
field- The sparse vector field to searchtext- Raw text to tokenizetokenizer_name- HuggingFace tokenizer path (e.g., “bert-base-uncased”)weighting- Weighting strategy for tokenssparse_index- Optional sparse index for IDF lookup (required for IDF weighting)
Sourcepub fn from_text_with_stats(
field: Field,
text: &str,
tokenizer: &HfTokenizer,
weighting: QueryWeighting,
global_stats: Option<&GlobalStats>,
) -> Result<Self>
pub fn from_text_with_stats( field: Field, text: &str, tokenizer: &HfTokenizer, weighting: QueryWeighting, global_stats: Option<&GlobalStats>, ) -> Result<Self>
Create from raw text using global statistics (multi-segment)
This is the recommended method for multi-segment indexes as it uses aggregated IDF values across all segments for consistent ranking.
§Arguments
field- The sparse vector field to searchtext- Raw text to tokenizetokenizer- Pre-loaded HuggingFace tokenizerweighting- Weighting strategy for tokensglobal_stats- Global statistics for IDF computation
Sourcepub fn from_text_with_tokenizer_bytes(
field: Field,
text: &str,
tokenizer_bytes: &[u8],
weighting: QueryWeighting,
global_stats: Option<&GlobalStats>,
) -> Result<Self>
pub fn from_text_with_tokenizer_bytes( field: Field, text: &str, tokenizer_bytes: &[u8], weighting: QueryWeighting, global_stats: Option<&GlobalStats>, ) -> Result<Self>
Create from raw text, loading tokenizer from index directory
This method supports the index:// prefix for tokenizer paths,
loading tokenizer.json from the index directory.
§Arguments
field- The sparse vector field to searchtext- Raw text to tokenizetokenizer_bytes- Tokenizer JSON bytes (pre-loaded from directory)weighting- Weighting strategy for tokensglobal_stats- Global statistics for IDF computation
Trait Implementations§
Source§impl Clone for SparseVectorQuery
impl Clone for SparseVectorQuery
Source§fn clone(&self) -> SparseVectorQuery
fn clone(&self) -> SparseVectorQuery
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for SparseVectorQuery
impl Debug for SparseVectorQuery
Source§impl Display for SparseVectorQuery
impl Display for SparseVectorQuery
Source§impl Query for SparseVectorQuery
impl Query for SparseVectorQuery
Source§fn scorer<'a>(
&self,
reader: &'a SegmentReader,
limit: usize,
) -> ScorerFuture<'a>
fn scorer<'a>( &self, reader: &'a SegmentReader, limit: usize, ) -> ScorerFuture<'a>
Source§fn scorer_sync<'a>(
&self,
reader: &'a SegmentReader,
limit: usize,
) -> Result<Box<dyn Scorer + 'a>>
fn scorer_sync<'a>( &self, reader: &'a SegmentReader, limit: usize, ) -> Result<Box<dyn Scorer + 'a>>
Source§fn count_estimate<'a>(&self, _reader: &'a SegmentReader) -> CountFuture<'a>
fn count_estimate<'a>(&self, _reader: &'a SegmentReader) -> CountFuture<'a>
Source§fn as_sparse_term_queries(&self) -> Option<Vec<SparseTermQueryInfo>>
fn as_sparse_term_queries(&self) -> Option<Vec<SparseTermQueryInfo>>
Source§fn as_term_query_info(&self) -> Option<TermQueryInfo>
fn as_term_query_info(&self) -> Option<TermQueryInfo>
Source§fn as_sparse_term_query_info(&self) -> Option<SparseTermQueryInfo>
fn as_sparse_term_query_info(&self) -> Option<SparseTermQueryInfo>
Source§fn is_filter(&self) -> bool
fn is_filter(&self) -> bool
Source§fn as_doc_predicate<'a>(
&self,
_reader: &'a SegmentReader,
) -> Option<DocPredicate<'a>>
fn as_doc_predicate<'a>( &self, _reader: &'a SegmentReader, ) -> Option<DocPredicate<'a>>
Auto Trait Implementations§
impl Freeze for SparseVectorQuery
impl RefUnwindSafe for SparseVectorQuery
impl Send for SparseVectorQuery
impl Sync for SparseVectorQuery
impl Unpin for SparseVectorQuery
impl UnsafeUnpin for SparseVectorQuery
impl UnwindSafe for SparseVectorQuery
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.Source§impl<T> ToCompactString for Twhere
T: Display,
impl<T> ToCompactString for Twhere
T: Display,
Source§fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>
fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>
ToCompactString::to_compact_string() Read moreSource§fn to_compact_string(&self) -> CompactString
fn to_compact_string(&self) -> CompactString
CompactString. Read more