Skip to main content

Module vector_search

Module vector_search 

Source
Expand description

Reusable helpers for building brute-force vector similarity search expressions over Vector extension arrays.

This module exposes three small building blocks that together make it straightforward to stand up a cosine-similarity-plus-threshold scan on top of a prepared data array:

  • compress_turboquant applies the canonical TurboQuant encoding pipeline (L2Denorm(SorfTransform(FSL(Dict(codes, centroids))), norms)) to a raw Vector<dim, f32> array without requiring the caller to plumb the unstable_encodings feature flag on the vortex facade.
  • build_constant_query_vector wraps a single query vector into a Vector extension array whose storage is a ConstantArray broadcast across num_rows rows. This is the shape expected by CosineSimilarity::try_new_array for the RHS of a database-vs-query scan.
  • build_similarity_search_tree wires everything together into a lazy Binary(Gt, [CosineSimilarity(data, query), threshold]) expression.

Executing the tree from build_similarity_search_tree into a BoolArray yields one boolean per row indicating whether that row’s cosine similarity to the query exceeds threshold.

§Example

use vortex_array::{ArrayRef, VortexSessionExecute};
use vortex_array::arrays::BoolArray;
use vortex_session::VortexSession;
use vortex_tensor::vector_search::{build_similarity_search_tree, compress_turboquant};

fn run(session: &VortexSession, data: ArrayRef, query: &[f32]) -> anyhow::Result<()> {
    let mut ctx = session.create_execution_ctx();
    let data = compress_turboquant(data, &mut ctx)?;
    let tree = build_similarity_search_tree(data, query, 0.8)?;
    let _matches: BoolArray = tree.execute(&mut ctx)?;
    Ok(())
}

Functions§

build_constant_query_vector
Build a Vector extension array whose storage is a ConstantArray broadcasting a single query vector across num_rows rows.
build_similarity_search_tree
Build the lazy similarity-search expression tree for a prepared database array and a single query vector.
compress_turboquant
Apply the canonical TurboQuant encoding pipeline to a Vector<dim, f32> array.