Skip to main content

AsyncSegmentReader

Struct AsyncSegmentReader 

Source
pub struct AsyncSegmentReader { /* private fields */ }
Expand description

Async segment reader with lazy loading

  • Term dictionary: only index loaded, blocks loaded on-demand
  • Postings: loaded on-demand per term via HTTP range requests
  • Document store: only index loaded, blocks loaded on-demand via HTTP range requests

Implementations§

Source§

impl AsyncSegmentReader

Source

pub async fn open<D: Directory>( dir: &D, segment_id: SegmentId, schema: Arc<Schema>, doc_id_offset: DocId, cache_blocks: usize, ) -> Result<Self>

Open a segment with lazy loading

Source

pub fn meta(&self) -> &SegmentMeta

Source

pub fn num_docs(&self) -> u32

Source

pub fn avg_field_len(&self, field: Field) -> f32

Get average field length for BM25F scoring

Source

pub fn doc_id_offset(&self) -> DocId

Source

pub fn schema(&self) -> &Schema

Source

pub fn sparse_indexes(&self) -> &FxHashMap<u32, SparseIndex>

Get sparse indexes for all fields

Source

pub fn vector_indexes(&self) -> &FxHashMap<u32, VectorIndex>

Get vector indexes for all fields

Source

pub fn term_dict_stats(&self) -> SSTableStats

Get term dictionary stats for debugging

Source

pub async fn get_postings( &self, field: Field, term: &[u8], ) -> Result<Option<BlockPostingList>>

Get posting list for a term (async - loads on demand)

For small posting lists (1-3 docs), the data is inlined in the term dictionary and no additional I/O is needed. For larger lists, reads from .post file.

Source

pub async fn doc(&self, local_doc_id: DocId) -> Result<Option<Document>>

Get document by local doc_id (async - loads on demand)

Source

pub async fn prefetch_terms( &self, field: Field, start_term: &[u8], end_term: &[u8], ) -> Result<()>

Prefetch term dictionary blocks for a key range

Source

pub fn store_has_dict(&self) -> bool

Check if store uses dictionary compression (incompatible with raw merging)

Source

pub fn store_raw_blocks(&self) -> Vec<RawStoreBlock>

Get raw store blocks for optimized merging

Source

pub fn store_data_slice(&self) -> &LazyFileSlice

Get store data slice for raw block access

Source

pub async fn all_terms(&self) -> Result<Vec<(Vec<u8>, TermInfo)>>

Get all terms from this segment (for merge)

Source

pub async fn all_terms_with_stats(&self) -> Result<Vec<(Field, String, u32)>>

Get all terms with parsed field and term string (for statistics aggregation)

Returns (field, term_string, doc_freq) for each term in the dictionary. Skips terms that aren’t valid UTF-8.

Source

pub fn term_dict_iter(&self) -> AsyncSSTableIterator<'_, TermInfo>

Get streaming iterator over term dictionary (for memory-efficient merge)

Source

pub async fn read_postings(&self, offset: u64, len: u32) -> Result<Vec<u8>>

Read raw posting bytes at offset

Source

pub fn search_dense_vector( &self, field: Field, query: &[f32], k: usize, rerank_factor: usize, combiner: MultiValueCombiner, ) -> Result<Vec<(DocId, f32)>>

Search dense vectors using RaBitQ

Returns (doc_id, score) pairs sorted by score (descending). The doc_ids are adjusted by doc_id_offset for this segment. If mrl_dim is configured, the query vector is automatically trimmed. For multi-valued documents, scores are combined using the specified combiner.

Source

pub fn has_dense_vector_index(&self, field: Field) -> bool

Check if this segment has a dense vector index for the given field

Source

pub fn get_dense_vector_index(&self, field: Field) -> Option<Arc<RaBitQIndex>>

Get the dense vector index for a field (if available)

Source

pub fn get_ivf_vector_index(&self, field: Field) -> Option<Arc<IVFRaBitQIndex>>

Get the IVF vector index for a field (if available)

Source

pub fn get_scann_vector_index( &self, field: Field, ) -> Option<(Arc<IVFPQIndex>, Arc<PQCodebook>)>

Get the ScaNN vector index for a field (if available)

Source

pub fn get_vector_index(&self, field: Field) -> Option<&VectorIndex>

Get the vector index type for a field

Source

pub async fn search_sparse_vector( &self, field: Field, vector: &[(u32, f32)], limit: usize, combiner: MultiValueCombiner, ) -> Result<Vec<(u32, f32)>>

Search for similar sparse vectors using dedicated sparse posting lists

Uses shared WandExecutor with SparseTermScorer for efficient top-k retrieval. Optimizations (via WandExecutor):

  1. MaxScore pruning: Dimensions sorted by max contribution
  2. Block-Max WAND: Skips blocks where max contribution < threshold
  3. Top-K heap: Efficient score collection

Returns (doc_id, score) pairs sorted by score descending.

Source

pub async fn get_positions( &self, field: Field, term: &[u8], ) -> Result<Option<PositionPostingList>>

Get positions for a term (for phrase queries)

Position offsets are now embedded in TermInfo, so we first look up the term to get its TermInfo, then use position_info() to get the offset.

Source

pub fn has_positions(&self, field: Field) -> bool

Check if positions are available for a field

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V