Skip to main content

IvfConfig

Struct IvfConfig 

Source
pub struct IvfConfig {
    pub n_clusters: usize,
    pub n_probes: usize,
    pub training_sample_size: usize,
    pub use_pq: bool,
    pub pq_subvectors: Option<usize>,
    pub pq_refine_factor: u32,
    pub seed: u64,
}
Expand description

Configuration for crate::IvfIndex construction (see iqdb_index::Index::new).

All fields have documented defaults; see the field-level docs and the crate README.md for the tradeoffs each one controls.

§Examples

use iqdb_ivf::IvfConfig;

let cfg = IvfConfig::default();
assert_eq!(cfg.n_clusters, 256);
assert_eq!(cfg.n_probes, 8);

let tuned = IvfConfig::default()
    .with_n_clusters(64)
    .with_n_probes(4)
    .with_seed(42);
assert_eq!(tuned.n_clusters, 64);
assert_eq!(tuned.n_probes, 4);
assert_eq!(tuned.seed, 42);

Fields§

§n_clusters: usize

Number of k-means partitions (inverted lists) the trainer produces.

Spec heuristic: sqrt(N) for moderate corpora, 4 * sqrt(N) for very large ones. Must be at least 1. Default 256.

§n_probes: usize

Number of clusters searched at query time.

Larger values raise recall at higher per-query cost. Must be at least 1 and no greater than n_clusters. Default 8.

§training_sample_size: usize

Cap on the training sample passed to k-means.

When the caller supplies more vectors than this, the trainer subsamples down to this many via the seeded PRNG. Must be at least 1. Default 65_536.

§use_pq: bool

Enable Product Quantization within each inverted list.

When true, Self::pq_subvectors must be Some(m) with m >= 1 and m | dim at index-construction time. The IVF-PQ branch trains a iqdb_quantize::ProductQuantizer over the same working set used for the coarse k-means (plain-PQ), stores a per-entry iqdb_quantize::PqCode alongside the retained Arc<[f32]> vector, and scores intra-cluster candidates via ADC. Supported metrics: Euclidean, DotProduct, ManhattanCosine and Hamming are rejected at construction with IqdbError::InvalidMetric. Defaults to false (IVF-Flat).

§pq_subvectors: Option<usize>

Subvector count M for IVF-PQ.

Required to be Some(m) with m >= 1 and m | dim whenever use_pq is true. Ignored when use_pq is false. Each subvector compresses to one byte (K = 256), so smaller m compresses harder at the cost of more reconstruction error per code.

§pq_refine_factor: u32

IVF-PQ refine factor.

0 disables refine: the search returns the pure ADC top-k. N >= 1 enables refine: the search shortlists N × k candidates by ADC, then exact-reranks the shortlist using the retained Arc<[f32]> vectors (same distance path as IVF-Flat, same DotProduct sign convention) before returning top-k. Default 4. Ignored when use_pq is false. Tunable at runtime via crate::IvfIndex::set_pq_refine_factor.

§seed: u64

Seed for the internal SplitMix64 PRNG used by k-means++ initialization and by deterministic subsampling of the training set.

Identical seed + identical training sample → byte-identical centroids on every platform. When use_pq is true, the same seed flows into the PQ codebook trainer so the per-subvector codebooks are also reproducible.

Implementations§

Source§

impl IvfConfig

Source

pub fn with_n_clusters(self, n_clusters: usize) -> Self

Override n_clusters.

Source

pub fn with_n_probes(self, n_probes: usize) -> Self

Override n_probes.

Source

pub fn with_training_sample_size(self, training_sample_size: usize) -> Self

Override training_sample_size.

Source

pub fn with_use_pq(self, use_pq: bool) -> Self

Override use_pq.

When true, Self::pq_subvectors must also be set; the metric/dim divisibility checks happen at IvfIndex::new_unconfigured time when both dim and metric are known.

Source

pub fn with_pq_subvectors(self, pq_subvectors: Option<usize>) -> Self

Override pq_subvectors.

Required to be Some(m) with m >= 1 and m | dim whenever use_pq is true; otherwise ignored.

Source

pub fn with_pq_refine_factor(self, pq_refine_factor: u32) -> Self

Override pq_refine_factor.

0 disables refine; N >= 1 shortlists N × k candidates by ADC and exact-reranks. Ignored when use_pq is false.

Source

pub fn with_seed(self, seed: u64) -> Self

Override the PRNG seed.

Source

pub fn validate(&self) -> Result<()>

Validate the configuration.

Called by IvfIndex::new before the index is built. The error variant is always IqdbError::InvalidConfig with a short &'static str reason naming exactly which check failed, so a caller can branch on the message or thread it into a log.

Trait Implementations§

Source§

impl Clone for IvfConfig

Source§

fn clone(&self) -> IvfConfig

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Copy for IvfConfig

Source§

impl Debug for IvfConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for IvfConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl Eq for IvfConfig

Source§

impl PartialEq for IvfConfig

Source§

fn eq(&self, other: &IvfConfig) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for IvfConfig

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<E> WithErrorCode<E> for E

Source§

fn with_code(self, code: impl Into<String>) -> CodedError<E>

Attach an error code to an error
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more