Skip to main content

VectorIndex

Trait VectorIndex 

Source
pub trait VectorIndex:
    Send
    + Sync
    + Debug
    + Index {
Show 18 methods // Required methods fn search<'life0, 'life1, 'life2, 'async_trait>( &'life0 self, query: &'life1 Query, pre_filter: Arc<dyn PreFilter>, metrics: &'life2 dyn MetricsCollector, ) -> Pin<Box<dyn Future<Output = Result<RecordBatch>> + Send + 'async_trait>> where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait; fn find_partitions( &self, query: &Query, ) -> Result<(UInt32Array, Float32Array)>; fn total_partitions(&self) -> usize; fn search_in_partition<'life0, 'life1, 'life2, 'async_trait>( &'life0 self, partition_id: usize, query: &'life1 Query, pre_filter: Arc<dyn PreFilter>, metrics: &'life2 dyn MetricsCollector, ) -> Pin<Box<dyn Future<Output = Result<RecordBatch>> + Send + 'async_trait>> where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait; fn is_loadable(&self) -> bool; fn use_residual(&self) -> bool; fn load<'life0, 'async_trait>( &'life0 self, reader: Arc<dyn Reader>, offset: usize, length: usize, ) -> Pin<Box<dyn Future<Output = Result<Box<dyn VectorIndex>>> + Send + 'async_trait>> where Self: 'async_trait, 'life0: 'async_trait; fn to_batch_stream<'life0, 'async_trait>( &'life0 self, with_vector: bool, ) -> Pin<Box<dyn Future<Output = Result<SendableRecordBatchStream>> + Send + 'async_trait>> where Self: 'async_trait, 'life0: 'async_trait; fn num_rows(&self) -> u64; fn row_ids(&self) -> Box<dyn Iterator<Item = &u64> + '_>; fn remap<'life0, 'life1, 'async_trait>( &'life0 mut self, mapping: &'life1 HashMap<u64, Option<u64>>, ) -> Pin<Box<dyn Future<Output = Result<()>> + Send + 'async_trait>> where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait; fn metric_type(&self) -> DistanceType; fn ivf_model(&self) -> &IvfModel; fn quantizer(&self) -> Quantizer; fn partition_size(&self, part_id: usize) -> usize; fn sub_index_type(&self) -> (SubIndexType, QuantizationType); // Provided methods fn load_partition<'life0, 'async_trait>( &'life0 self, reader: Arc<dyn Reader>, offset: usize, length: usize, _partition_id: usize, ) -> Pin<Box<dyn Future<Output = Result<Box<dyn VectorIndex>>> + Send + 'async_trait>> where Self: 'async_trait, 'life0: 'async_trait { ... } fn partition_reader<'life0, 'life1, 'async_trait>( &'life0 self, _partition_id: usize, _with_vector: bool, _metrics: &'life1 dyn MetricsCollector, ) -> Pin<Box<dyn Future<Output = Result<SendableRecordBatchStream>> + Send + 'async_trait>> where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait { ... }
}
Expand description

Vector Index for (Approximate) Nearest Neighbor (ANN) Search.

Vector indices are often built as a chain of indices. For example, IVF -> PQ or IVF -> HNSW -> SQ.

We use one trait for both the top-level and the sub-indices. Typically the top-level search is a partition-aware search and all sub-indices are whole-index searches.

Required Methods§

Source

fn search<'life0, 'life1, 'life2, 'async_trait>( &'life0 self, query: &'life1 Query, pre_filter: Arc<dyn PreFilter>, metrics: &'life2 dyn MetricsCollector, ) -> Pin<Box<dyn Future<Output = Result<RecordBatch>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait,

Search entire index for k nearest neighbors.

It returns a RecordBatch with Schema of:

use arrow_schema::{Schema, Field, DataType};

Schema::new(vec![
  Field::new("_rowid", DataType::UInt64, true),
  Field::new("_distance", DataType::Float32, false),
]);

The pre_filter argument is used to filter out row ids that we know are not relevant to the query. For example, it removes deleted rows or rows that do not match a user-provided filter.

Source

fn find_partitions(&self, query: &Query) -> Result<(UInt32Array, Float32Array)>

Find partitions that may contain nearest neighbors.

If maximum_nprobes is set then this method will return the partitions that are most likely to contain the nearest neighbors (e.g. the closest partitions to the query vector).

Return the partition ids and the distances between the query and the centroids, the results should be in sorted order from closest to farthest.

Source

fn total_partitions(&self) -> usize

Get the total number of partitions in the index.

Source

fn search_in_partition<'life0, 'life1, 'life2, 'async_trait>( &'life0 self, partition_id: usize, query: &'life1 Query, pre_filter: Arc<dyn PreFilter>, metrics: &'life2 dyn MetricsCollector, ) -> Pin<Box<dyn Future<Output = Result<RecordBatch>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait,

Search a single partition for nearest neighbors.

This method should return the same results as VectorIndex::search method except that it will only search a single partition.

Source

fn is_loadable(&self) -> bool

If the index is loadable by IVF, so it can be a sub-index that is loaded on demand by IVF.

Source

fn use_residual(&self) -> bool

Use residual vector to search.

Source

fn load<'life0, 'async_trait>( &'life0 self, reader: Arc<dyn Reader>, offset: usize, length: usize, ) -> Pin<Box<dyn Future<Output = Result<Box<dyn VectorIndex>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Load the index from the reader on-demand.

Source

fn to_batch_stream<'life0, 'async_trait>( &'life0 self, with_vector: bool, ) -> Pin<Box<dyn Future<Output = Result<SendableRecordBatchStream>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Source

fn num_rows(&self) -> u64

Source

fn row_ids(&self) -> Box<dyn Iterator<Item = &u64> + '_>

Return the IDs of rows in the index.

Source

fn remap<'life0, 'life1, 'async_trait>( &'life0 mut self, mapping: &'life1 HashMap<u64, Option<u64>>, ) -> Pin<Box<dyn Future<Output = Result<()>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Remap the index according to mapping

Each item in mapping describes an old row id -> new row id pair. If old row id -> None then that row id has been deleted and can be removed from the index.

If an old row id is not in the mapping then it should be left alone.

Source

fn metric_type(&self) -> DistanceType

The metric type of this vector index.

Source

fn ivf_model(&self) -> &IvfModel

Source

fn quantizer(&self) -> Quantizer

Source

fn partition_size(&self, part_id: usize) -> usize

Source

fn sub_index_type(&self) -> (SubIndexType, QuantizationType)

the index type of this vector index.

Provided Methods§

Source

fn load_partition<'life0, 'async_trait>( &'life0 self, reader: Arc<dyn Reader>, offset: usize, length: usize, _partition_id: usize, ) -> Pin<Box<dyn Future<Output = Result<Box<dyn VectorIndex>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Load the partition from the reader on-demand.

Source

fn partition_reader<'life0, 'life1, 'async_trait>( &'life0 self, _partition_id: usize, _with_vector: bool, _metrics: &'life1 dyn MetricsCollector, ) -> Pin<Box<dyn Future<Output = Result<SendableRecordBatchStream>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Implementors§

Source§

impl<Q: Quantization + Send + Sync + 'static> VectorIndex for HNSWIndex<Q>