Skip to main content

iqdb_flat/
lib.rs

1//! # iqdb-flat
2//!
3//! Brute-force **exact** nearest-neighbour search for the iQDB vector
4//! database. [`FlatIndex`] scans every stored vector on every query — that
5//! is the point. It is the ground-truth implementation that every
6//! approximate index (HNSW, IVF) is measured against via recall@k, and it
7//! is fast enough to be the right choice in its own right for small corpora
8//! where graph or partition overhead is not justified.
9//!
10//! It is also the first and simplest consumer of the [`iqdb_index::Index`]
11//! trait, so it doubles as that trait's design validation.
12//!
13//! ## Design
14//!
15//! - **Storage** is four parallel `Vec`s of length `len()` — the row
16//!   payloads `Vec<Arc<[f32]>>`, the ids `Vec<VectorId>`, the optional
17//!   per-row metadata `Vec<Option<Metadata>>`, and a monotonic
18//!   insertion-sequence number `Vec<u64>` — plus a `HashMap<VectorId,
19//!   usize>` mapping each live id to its current position. The map keeps
20//!   the duplicate check on insert and the lookup on delete `O(1)`
21//!   regardless of corpus size.
22//! - **Zero-copy insert.** The row payload is an [`Arc<[f32]>`](std::sync::Arc)
23//!   taken by value: [`FlatIndex`] stores the caller's `Arc` verbatim and
24//!   never allocates a fresh `[f32]` buffer. A consumer that also keeps the
25//!   vector in its own record store shares one underlying allocation with
26//!   the index instead of paying for a copy.
27//! - **One ordering invariant.** All distance math is delegated to
28//!   [`iqdb_distance::compute_batch`] — flat never reimplements a metric.
29//!   For [`DistanceMetric::DotProduct`](iqdb_types::DistanceMetric::DotProduct) the raw inner product (larger is
30//!   more similar) is negated at the boundary so [`Hit::distance`] is always
31//!   *smaller-is-nearer*, the same contract across all five metrics.
32//! - **Bounded top-`k`.** Selection uses a max-heap of size `k` keyed by
33//!   `(distance, seq)` via [`f32::total_cmp`], so it is `O(n log k)`,
34//!   NaN-safe, and deterministic: ties break on insertion order (lower
35//!   sequence number wins).
36//! - **Filter-first.** A metadata filter is evaluated *before* distance
37//!   computation, so a selective filter skips distance work in proportion
38//!   to how much it rejects.
39//!
40//! ## Optional `parallel` feature
41//!
42//! The `parallel` feature adds a rayon-backed chunked distance scan for
43//! large corpora. The sequential path is the correctness baseline; the
44//! parallel path produces byte-identical results (enforced by
45//! `tests/parallel_equivalence.rs`).
46//!
47//! ## Example
48//!
49//! ```
50//! use std::sync::Arc;
51//!
52//! use iqdb_flat::{FlatConfig, FlatIndex};
53//! use iqdb_index::{Index, IndexCore};
54//! use iqdb_types::{DistanceMetric, SearchParams, VectorId};
55//!
56//! # fn main() -> iqdb_types::Result<()> {
57//! let mut idx = FlatIndex::new(2, DistanceMetric::Euclidean, FlatConfig)?;
58//! idx.insert(VectorId::from(1u64), Arc::<[f32]>::from(&[0.0, 0.0][..]), None)?;
59//! idx.insert(VectorId::from(2u64), Arc::<[f32]>::from(&[3.0, 4.0][..]), None)?;
60//! idx.insert(VectorId::from(3u64), Arc::<[f32]>::from(&[1.0, 0.0][..]), None)?;
61//!
62//! let hits = idx.search(&[0.0, 0.0], &SearchParams::new(2, DistanceMetric::Euclidean))?;
63//! assert_eq!(hits.len(), 2);
64//! assert_eq!(hits[0].id, VectorId::U64(1));
65//! assert_eq!(hits[1].id, VectorId::U64(3));
66//! # Ok(())
67//! # }
68//! ```
69
70#![cfg_attr(docsrs, feature(doc_cfg))]
71#![deny(warnings)]
72#![deny(missing_docs)]
73#![deny(unsafe_op_in_unsafe_fn)]
74#![deny(unused_must_use)]
75#![deny(unused_results)]
76#![deny(clippy::unwrap_used)]
77#![deny(clippy::expect_used)]
78#![deny(clippy::todo)]
79#![deny(clippy::unimplemented)]
80#![deny(clippy::print_stdout)]
81#![deny(clippy::print_stderr)]
82#![deny(clippy::dbg_macro)]
83#![deny(clippy::unreachable)]
84#![deny(clippy::undocumented_unsafe_blocks)]
85#![forbid(unsafe_code)]
86
87mod flat;
88mod topk;
89
90#[cfg(feature = "parallel")]
91mod parallel;
92
93pub use crate::flat::{FlatConfig, FlatIndex};
94
95// Re-export the `Hit` type that searches return so callers can drive
96// `FlatIndex` without a second `use` line for the result type.
97pub use iqdb_types::Hit;
98
99/// The version of this crate, taken from `Cargo.toml` at compile time.
100///
101/// # Examples
102///
103/// ```
104/// let version = iqdb_flat::VERSION;
105/// assert_eq!(version.split('.').count(), 3);
106/// ```
107pub const VERSION: &str = env!("CARGO_PKG_VERSION");