pub struct IvfConfig {
pub n_clusters: usize,
pub n_probes: usize,
pub training_sample_size: usize,
pub use_pq: bool,
pub pq_subvectors: Option<usize>,
pub pq_refine_factor: u32,
pub seed: u64,
}Expand description
Configuration for crate::IvfIndex construction (see
iqdb_index::Index::new).
All fields have documented defaults; see the field-level docs and
the crate README.md for the tradeoffs each one controls.
§Examples
use iqdb_ivf::IvfConfig;
let cfg = IvfConfig::default();
assert_eq!(cfg.n_clusters, 256);
assert_eq!(cfg.n_probes, 8);
let tuned = IvfConfig::default()
.with_n_clusters(64)
.with_n_probes(4)
.with_seed(42);
assert_eq!(tuned.n_clusters, 64);
assert_eq!(tuned.n_probes, 4);
assert_eq!(tuned.seed, 42);Fields§
§n_clusters: usizeNumber of k-means partitions (inverted lists) the trainer produces.
Spec heuristic: sqrt(N) for moderate corpora, 4 * sqrt(N)
for very large ones. Must be at least 1. Default 256.
n_probes: usizeNumber of clusters searched at query time.
Larger values raise recall at higher per-query cost. Must be
at least 1 and no greater than n_clusters.
Default 8.
training_sample_size: usizeCap on the training sample passed to k-means.
When the caller supplies more vectors than this, the trainer
subsamples down to this many via the seeded PRNG. Must be at
least 1. Default 65_536.
use_pq: boolEnable Product Quantization within each inverted list.
When true, Self::pq_subvectors must be Some(m) with
m >= 1 and m | dim at index-construction time. The IVF-PQ
branch trains a iqdb_quantize::ProductQuantizer over the
same working set used for the coarse k-means (plain-PQ), stores
a per-entry iqdb_quantize::PqCode alongside the retained
Arc<[f32]> vector, and scores intra-cluster candidates via
ADC. Supported metrics: Euclidean, DotProduct, Manhattan
— Cosine and Hamming are rejected at construction with
IqdbError::InvalidMetric. Defaults to false (IVF-Flat).
pq_subvectors: Option<usize>Subvector count M for IVF-PQ.
Required to be Some(m) with m >= 1 and m | dim whenever
use_pq is true. Ignored when use_pq is
false. Each subvector compresses to one byte (K = 256),
so smaller m compresses harder at the cost of more
reconstruction error per code.
pq_refine_factor: u32IVF-PQ refine factor.
0 disables refine: the search returns the pure ADC top-k.
N >= 1 enables refine: the search shortlists N × k
candidates by ADC, then exact-reranks the shortlist using the
retained Arc<[f32]> vectors (same distance path as IVF-Flat,
same DotProduct sign convention) before returning top-k.
Default 4. Ignored when use_pq is false.
Tunable at runtime via crate::IvfIndex::set_pq_refine_factor.
seed: u64Seed for the internal SplitMix64 PRNG used by k-means++ initialization and by deterministic subsampling of the training set.
Identical seed + identical training sample → byte-identical
centroids on every platform. When use_pq is
true, the same seed flows into the PQ codebook trainer so
the per-subvector codebooks are also reproducible.
Implementations§
Source§impl IvfConfig
impl IvfConfig
Sourcepub fn with_n_clusters(self, n_clusters: usize) -> Self
pub fn with_n_clusters(self, n_clusters: usize) -> Self
Override n_clusters.
Sourcepub fn with_n_probes(self, n_probes: usize) -> Self
pub fn with_n_probes(self, n_probes: usize) -> Self
Override n_probes.
Sourcepub fn with_training_sample_size(self, training_sample_size: usize) -> Self
pub fn with_training_sample_size(self, training_sample_size: usize) -> Self
Override training_sample_size.
Sourcepub fn with_use_pq(self, use_pq: bool) -> Self
pub fn with_use_pq(self, use_pq: bool) -> Self
Override use_pq.
When true, Self::pq_subvectors must also be set; the
metric/dim divisibility checks happen at
IvfIndex::new_unconfigured time
when both dim and metric are known.
Sourcepub fn with_pq_subvectors(self, pq_subvectors: Option<usize>) -> Self
pub fn with_pq_subvectors(self, pq_subvectors: Option<usize>) -> Self
Override pq_subvectors.
Required to be Some(m) with m >= 1 and m | dim whenever
use_pq is true; otherwise ignored.
Sourcepub fn with_pq_refine_factor(self, pq_refine_factor: u32) -> Self
pub fn with_pq_refine_factor(self, pq_refine_factor: u32) -> Self
Override pq_refine_factor.
0 disables refine; N >= 1 shortlists N × k candidates by
ADC and exact-reranks. Ignored when use_pq is
false.
Sourcepub fn validate(&self) -> Result<()>
pub fn validate(&self) -> Result<()>
Validate the configuration.
Called by IvfIndex::new before the
index is built.
The error variant is always IqdbError::InvalidConfig with a
short &'static str reason naming exactly which check failed,
so a caller can branch on the message or thread it into a log.