Skip to main content

VectorCollection

Struct VectorCollection 

Source
pub struct VectorCollection {
    pub surrogate_map: HashMap<u32, Surrogate>,
    pub surrogate_to_local: HashMap<Surrogate, u32>,
    pub multi_doc_map: HashMap<Surrogate, Vec<u32>>,
    pub codec_dispatch: Option<CollectionCodec>,
    pub payload: PayloadIndexSet,
    pub arena_index: Option<u32>,
    /* private fields */
}
Expand description

Manages all vector segments for a single collection (one index key).

This type is !Send — owned by a single Data Plane core.

Fields§

§surrogate_map: HashMap<u32, Surrogate>

Mapping from internal global vector ID → surrogate.

§surrogate_to_local: HashMap<Surrogate, u32>

Reverse map: surrogate → global vector ID. Used by point delete.

§multi_doc_map: HashMap<Surrogate, Vec<u32>>

Reverse mapping for multi-vector documents: document_surrogate → list of global vector IDs.

§codec_dispatch: Option<CollectionCodec>

Optional collection-level codec-dispatch index (RaBitQ or BBQ). Present only when the collection was built with a non-Sq8 quantization. Coexists with sealed segments — for codec-dispatched collections the per-segment Sq8 builder is skipped and this index is used instead.

§payload: PayloadIndexSet

In-memory payload bitmap indexes for vector-primary collections.

Empty (no indexes) by default; populated at construction time from VectorPrimaryConfig::payload_indexes.

§arena_index: Option<u32>

Optional dedicated memory arena index for this collection.

Set by the Data Plane after requesting a per-collection arena from nodedb_mem::CollectionArenaRegistry. Used only for stats reporting; the actual arena pinning is handled externally.

Implementations§

Source§

impl VectorCollection

Source

pub fn set_data_dir(&mut self, dir: PathBuf)

Set the data directory for mmap segment files.

Source

pub fn set_ram_budget(&mut self, bytes: usize)

Set the RAM budget for vector data (FP32 in sealed segments).

Source

pub fn ram_usage_bytes(&self) -> usize

Estimate current RAM usage for vector data.

Source

pub fn is_budget_exceeded(&self) -> bool

Whether the RAM budget is exceeded.

Source

pub fn mmap_fallback_count(&self) -> u32

Number of segments that fell back to mmap.

Source

pub fn mmap_segment_count(&self) -> u32

Number of currently active mmap segments.

Source§

impl VectorCollection

Source

pub fn checkpoint_to_bytes(&self, kek: Option<&WalEncryptionKey>) -> Vec<u8>

Serialize all segments for checkpointing.

When kek is Some, the MessagePack payload is wrapped in an AES-256-GCM encrypted envelope with a SEGV preamble. When None, raw MessagePack bytes are returned (existing plaintext format).

Returns an empty Vec on serialization failure (callers treat this as a skip signal, consistent with the pre-existing error handling).

Source

pub fn from_checkpoint( bytes: &[u8], kek: Option<&WalEncryptionKey>, ) -> Result<Option<Self>, VectorError>

Restore a collection from checkpoint bytes.

kek controls the expected framing:

  • None → the file must be plaintext MessagePack (starting with bytes that are NOT SEGV). If the file starts with SEGV and no key is provided, returns Err(CheckpointEncryptedNoKey).
  • Some(key) → encryption is required. If the file starts with SEGV, it is decrypted with key. If the file is plaintext, returns Err(CheckpointPlaintextKeyRequired) — refuse to silently load unencrypted data when the operator has enabled at-rest encryption.
Source§

impl VectorCollection

Source

pub fn build_codec_dispatch( &mut self, quantization: &str, ) -> Option<&CollectionCodec>

Build a codec-dispatched index over all current vectors using the requested quantization. Replaces any existing dispatch index for this collection. Idempotent.

Returns a reference to the new index, or None if the quantization tag is not supported (falls back to per-segment Sq8/PQ paths) or there are no vectors to train on.

Source§

impl VectorCollection

Source

pub fn new(dim: usize, params: HnswParams) -> Self

Create an empty collection with the default seal threshold.

Source

pub fn with_seal_threshold( dim: usize, params: HnswParams, seal_threshold: usize, ) -> Self

Create an empty collection with an explicit seal threshold.

Source

pub fn with_index_config(dim: usize, config: IndexConfig) -> Self

Create an empty collection with a full index configuration.

Source

pub fn with_seal_threshold_and_config( dim: usize, config: IndexConfig, seal_threshold: usize, ) -> Self

Create an empty collection with a full index config and custom seal threshold.

Source

pub fn with_seed(dim: usize, params: HnswParams, _seed: u64) -> Self

Create with a specific seed (for deterministic testing).

Source

pub fn needs_seal(&self) -> bool

Check if the growing segment should be sealed.

Source

pub fn seal(&mut self, key: &str) -> Option<BuildRequest>

Seal the growing segment and return a build request.

Source

pub fn complete_build(&mut self, segment_id: u32, index: HnswIndex)

Accept a completed HNSW build from the background thread.

After promoting the segment to sealed, rebuilds the collection-level codec-dispatch index when self.quantization is RaBitQ or Bbq. The rebuild trains over all vectors so the codec index always covers every sealed segment.

Source

pub fn sealed_segments(&self) -> &[SealedSegment]

Access sealed segments (read-only).

Source

pub fn sealed_segments_mut(&mut self) -> &mut Vec<SealedSegment>

Access sealed segments mutably.

Source

pub fn growing_is_empty(&self) -> bool

Whether the growing segment has no vectors.

Source

pub fn len(&self) -> usize

Source

pub fn live_count(&self) -> usize

Source

pub fn is_empty(&self) -> bool

Source

pub fn dim(&self) -> usize

Source

pub fn params(&self) -> &HnswParams

Source

pub fn set_params(&mut self, params: HnswParams)

Update HNSW parameters for future builds.

Source

pub fn set_quantization(&mut self, q: VectorQuantization)

Set the collection-level quantization.

Source

pub fn quantization(&self) -> VectorQuantization

Return the configured quantization mode.

Source

pub fn configure_payload_indexes(&mut self, fields: &[String])

Configure payload bitmap indexes from a list of field names.

Source§

impl VectorCollection

Source

pub fn compact(&mut self) -> usize

Compact sealed segments by removing tombstoned nodes.

Rewrites surrogate_map and multi_doc_map for every sealed segment so that global ids continue to resolve to the correct surrogate after local-id renumbering.

Source

pub fn export_snapshot(&self) -> Vec<(u32, Vec<f32>, Option<Surrogate>)>

Export all live vectors for snapshot.

Source§

impl VectorCollection

Source

pub fn insert(&mut self, vector: Vec<f32>) -> u32

Insert a vector. Returns the global vector ID.

Source

pub fn insert_with_surrogate( &mut self, vector: Vec<f32>, surrogate: Surrogate, ) -> u32

Insert a vector with an associated surrogate. The surrogate is allocated by the Control Plane before the call; the engine only stores the binding.

Source

pub fn insert_multi_vector( &mut self, vectors: &[&[f32]], document_surrogate: Surrogate, ) -> Vec<u32>

Insert multiple vectors for a single document (ColBERT-style). All N vectors are bound to the same document_surrogate.

Source

pub fn delete_multi_vector(&mut self, document_surrogate: Surrogate) -> usize

Delete all vectors belonging to a multi-vector document.

Source

pub fn get_surrogate(&self, vector_id: u32) -> Option<Surrogate>

Look up the surrogate for a global vector ID.

Source

pub fn local_for_surrogate(&self, surrogate: Surrogate) -> Option<u32>

Resolve a surrogate back to its global vector ID, if bound.

Source

pub fn delete(&mut self, id: u32) -> bool

Soft-delete a vector by global ID.

Source

pub fn delete_by_surrogate(&mut self, surrogate: Surrogate) -> bool

Soft-delete a vector by surrogate.

Source

pub fn undelete(&mut self, id: u32) -> bool

Un-delete a previously soft-deleted vector (for transaction rollback).

Source§

impl VectorCollection

Source

pub fn hnsw_params(&self) -> HnswParams

Return the HNSW construction parameters for this collection.

Source

pub fn growing_flat(&self) -> &FlatIndex

Immutable access to the growing flat index.

The growing index holds vectors that have not yet been sealed into an HNSW segment. Its contents should be included in any full-collection rebuild that wants to produce a complete result.

Source

pub fn replace_sealed(&mut self, new_segments: Vec<SealedSegment>)

Replace all sealed segments with new_segments.

Used by the concurrent REINDEX cutover: after the background thread finishes rebuilding the HNSW graph the Data Plane swaps in the single rebuilt segment. The growing segment is preserved unchanged.

Caller is responsible for ensuring that new_segments covers all vectors that were in the old sealed set; any vector not present in the new segments will return no result on subsequent searches until the growing segment is sealed.

Source

pub fn compact_tombstones(&mut self) -> usize

Compact tombstoned nodes from all sealed segments.

This is the same operation as compact() — the alias exists so call sites that want to express “remove tombstones” read clearly.

Source§

impl VectorCollection

Source

pub fn with_pq_config(dim: usize, hnsw: HnswParams, pq_m: usize) -> Self

Convenience constructor for PQ-configured collections.

Equivalent to building a full IndexConfig with index_type = HnswPq and the given pq_m.

Source

pub fn with_seal_threshold_and_pq_config( dim: usize, hnsw: HnswParams, pq_m: usize, seal_threshold: usize, ) -> Self

Convenience constructor for PQ-configured collections with a custom seal threshold.

Source

pub fn build_sq8_for_index(index: &HnswIndex) -> Option<(Sq8Codec, Vec<u8>)>

Build SQ8 quantized data for an HNSW index.

Returns None when there are too few live vectors for stable min/max calibration.

Source

pub fn build_pq_for_index( index: &HnswIndex, pq_m: usize, ) -> Option<(PqCodec, Vec<u8>)>

Train a PQ codec from a built HNSW index’s live vectors.

Source§

impl VectorCollection

Source

pub fn search( &self, query: &[f32], top_k: usize, ef: usize, ) -> Vec<SearchResult>

Search across all segments, merging results by distance.

Source

pub fn search_with_metric( &self, query: &[f32], top_k: usize, ef: usize, metric: DistanceMetric, ) -> Vec<SearchResult>

Search across all segments using an explicit metric override.

For sealed segments with quantized codecs, the metric override is applied during candidate reranking. Growing and building segments apply it exactly via brute-force. The HNSW graph structure was built with the collection metric; using a different metric affects the scoring but not graph traversal.

Source

pub fn search_with_bitmap_bytes_and_metric( &self, query: &[f32], top_k: usize, ef: usize, bitmap: &[u8], metric: DistanceMetric, ) -> Vec<SearchResult>

Search with a pre-filter bitmap (byte-array format) and explicit metric override.

Source

pub fn search_with_bitmap_bytes( &self, query: &[f32], top_k: usize, ef: usize, bitmap: &[u8], ) -> Vec<SearchResult>

Search with a pre-filter bitmap (byte-array format).

Source

pub fn search_with_payload_filter( &self, query: &[f32], top_k: usize, ef: usize, predicate: &FilterPredicate, ) -> (Vec<SearchResult>, bool)

Search with a structured payload predicate.

If predicate is fully covered by indexed fields (all leaf fields have a bitmap index), the bitmap is built and HNSW traversal uses it as a pre-filter.

If any field in predicate is un-indexed, the method returns (results, false) where false signals that the predicate was NOT applied and the caller must apply it as a post-filter. This guarantees the un-indexed predicate is never silently dropped.

Returns (results, filter_was_applied).

Source§

impl VectorCollection

Source

pub fn stats(&self) -> VectorIndexStats

Collect live statistics from all segments.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> ArchivePointee for T

Source§

type ArchivedMetadata = ()

The archived version of the pointer metadata for this type.
Source§

fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata

Converts some archived metadata to the pointer metadata for itself.
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> LayoutRaw for T

Source§

fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>

Returns the layout of the type.
Source§

impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
where T: SharedNiching<N1, N2>, N1: Niching<T>, N2: Niching<T>,

Source§

unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool

Returns whether the given value has been niched. Read more
Source§

fn resolve_niched(out: Place<NichedOption<T, N1>>)

Writes data to out indicating that a T is niched.
Source§

impl<T> Pointee for T

Source§

type Metadata = ()

The metadata type for pointers and references to this type.
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more