iqdb-build 1.0.0

Parallel index construction, incremental updates, and merging - part of the iQDB family.
Documentation
  • One-call build — turn a batch of vectors into a finished index with build
  • Parallel — split the input into shards and build them concurrently across CPU cores with build_parallel (rayon)
  • Merge — fold sharded builds back into a single index with merge; build_merged is the whole split → build → merge pipeline in one call
  • Incremental — append more vectors to an index you already hold with build_into, including a &mut dyn IndexCore trait object
  • Progress — an optional on_progress callback reports shard completion for long-running builds
  • Generic over the backend — the same IndexBuilder<I> constructs flat, HNSW, IVF, or your own index — it never names a concrete type
  • Zero ceremony — no error type and no data type of its own; build items are the tuple the index already consumes, and errors propagate unchanged

The feature set and the public API are frozen as of 0.5; what remains before 1.0 is integration against real consumers and stabilization. See the ROADMAP.

Installation

[dependencies]
iqdb-build = "1.0"

Quick start

Shape your vectors into (id, Arc<[f32]>, metadata) tuples and build in one call. The same call constructs any backend that implements iqdb_index::Index:

use std::sync::Arc;
use iqdb_build::{build, build_into};
use iqdb_types::{DistanceMetric, VectorId};

// `MyIndex: iqdb_index::Index` — a flat, HNSW, or IVF index, or your own.
let items = vec![
    (VectorId::from(1u64), Arc::from([0.0_f32, 0.0, 0.0].as_slice()), None),
    (VectorId::from(2u64), Arc::from([1.0_f32, 0.0, 0.0].as_slice()), None),
    (VectorId::from(3u64), Arc::from([0.0_f32, 1.0, 0.0].as_slice()), None),
];
let mut index: MyIndex = build(3, DistanceMetric::Euclidean, items)?;

// Later, append more vectors without rebuilding.
let added = build_into(&mut index, more_items)?;

For tuning the backend, use the builder directly:

use iqdb_build::IndexBuilder;
use iqdb_types::DistanceMetric;

let builder = IndexBuilder::<MyIndex>::with_config(768, DistanceMetric::Cosine, my_config);
let index = builder.build(items)?;   // reuse `builder` for as many builds as you like

For large inputs, build across CPU cores and merge into one index — the whole pipeline in one call (the backend must implement Mergeable):

use iqdb_build::IndexBuilder;
use iqdb_types::DistanceMetric;

let index: MyIndex = IndexBuilder::new(768, DistanceMetric::Cosine)
    .with_shards(8)                       // or leave it to auto (one per CPU)
    .on_progress(|p| eprintln!("{}/{}", p.shards_completed, p.shards_total))
    .build_merged(items)?;

To keep the shards separate (the engine's storage is itself sharded), use build_parallel, which returns Vec<MyIndex>.

Runnable end-to-end examples (with a toy backend) live in examples/: quickstart, incremental, configured, parallel, and merge.

Status

v1.0.0stable. The generic IndexBuilder<I>; the Tier-1 build / build_into free functions; build_parallel for rayon-backed sharded construction; the Mergeable trait with merge and the one-call build_merged pipeline; and on_progress / BuildProgress reporting. Property-tested invariants (completeness, equivalence, additivity, duplicate rejection, parallel completeness, and merge equivalence), a loom model check over the concurrent progress path, an end-to-end consumer-simulation soak test, five runnable examples, and a criterion harness comparing sequential and parallel build. Zero unsafe; every public item is documented with a runnable example. The public API is committed under SemVer for the 1.x series (no breaking changes until 2.0; the frozen surface is recorded in the ROADMAP), verified on Windows and Linux across stable and the 1.87 MSRV. Cross-crate validation against the real iqdb-flat / iqdb-hnsw backends runs in iqdb-eval / the engine workspace. Full reference in docs/API.md.

Where It Fits

iqdb-build is a Phase-4 consumer of the index layer. It builds on:

  • iqdb-index — the Index / IndexCore traits it is generic over
  • iqdb-types — the VectorId, Metadata, DistanceMetric, and Result vocabulary

and is consumed by:

  • iqdb — for bulk ingestion

Standards

Built to the iQDB Rust standard. See REPS.md (Rust Efficiency & Performance Standards) and dev/DIRECTIVES.md for the engineering law and the definition of done. Before a PR: cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean.