Skip to main content

Crate iqdb_flat

Crate iqdb_flat 

Source
Expand description

§iqdb-flat

Brute-force exact nearest-neighbour search for the iQDB vector database. FlatIndex scans every stored vector on every query — that is the point. It is the ground-truth implementation that every approximate index (HNSW, IVF) is measured against via recall@k, and it is fast enough to be the right choice in its own right for small corpora where graph or partition overhead is not justified.

It is also the first and simplest consumer of the iqdb_index::Index trait, so it doubles as that trait’s design validation.

§Design

  • Storage is four parallel Vecs of length len() — the row payloads Vec<Arc<[f32]>>, the ids Vec<VectorId>, the optional per-row metadata Vec<Option<Metadata>>, and a monotonic insertion-sequence number Vec<u64> — plus a HashMap<VectorId, usize> mapping each live id to its current position. The map keeps the duplicate check on insert and the lookup on delete O(1) regardless of corpus size.
  • Zero-copy insert. The row payload is an Arc<[f32]> taken by value: FlatIndex stores the caller’s Arc verbatim and never allocates a fresh [f32] buffer. A consumer that also keeps the vector in its own record store shares one underlying allocation with the index instead of paying for a copy.
  • One ordering invariant. All distance math is delegated to iqdb_distance::compute_batch — flat never reimplements a metric. For DistanceMetric::DotProduct the raw inner product (larger is more similar) is negated at the boundary so Hit::distance is always smaller-is-nearer, the same contract across all five metrics.
  • Bounded top-k. Selection uses a max-heap of size k keyed by (distance, seq) via f32::total_cmp, so it is O(n log k), NaN-safe, and deterministic: ties break on insertion order (lower sequence number wins).
  • Filter-first. A metadata filter is evaluated before distance computation, so a selective filter skips distance work in proportion to how much it rejects.

§Optional parallel feature

The parallel feature adds a rayon-backed chunked distance scan for large corpora. The sequential path is the correctness baseline; the parallel path produces byte-identical results (enforced by tests/parallel_equivalence.rs).

§Example

use std::sync::Arc;

use iqdb_flat::{FlatConfig, FlatIndex};
use iqdb_index::{Index, IndexCore};
use iqdb_types::{DistanceMetric, SearchParams, VectorId};

let mut idx = FlatIndex::new(2, DistanceMetric::Euclidean, FlatConfig)?;
idx.insert(VectorId::from(1u64), Arc::<[f32]>::from(&[0.0, 0.0][..]), None)?;
idx.insert(VectorId::from(2u64), Arc::<[f32]>::from(&[3.0, 4.0][..]), None)?;
idx.insert(VectorId::from(3u64), Arc::<[f32]>::from(&[1.0, 0.0][..]), None)?;

let hits = idx.search(&[0.0, 0.0], &SearchParams::new(2, DistanceMetric::Euclidean))?;
assert_eq!(hits.len(), 2);
assert_eq!(hits[0].id, VectorId::U64(1));
assert_eq!(hits[1].id, VectorId::U64(3));

Structs§

FlatConfig
Configuration for FlatIndex::new.
FlatIndex
Brute-force exact nearest-neighbour index.
Hit
One result of a similarity search: a matched record and its distance.

Constants§

VERSION
The version of this crate, taken from Cargo.toml at compile time.