Skip to main content

Module id_dictionary

Module id_dictionary 

Source
Expand description

IdDictionary value encoding/decoding.

Maps user-provided external IDs to system-assigned internal vector IDs.

§Why Two ID Spaces?

Users provide external IDs—arbitrary strings up to 64 bytes—to identify their vectors. The system maps these to internal IDs (u64) because:

  1. Efficient bitmaps: Fixed-width u64 keys enable RoaringTreemap operations
  2. Better compression: Monotonically increasing IDs cluster well in bitmaps
  3. Lifecycle management: System controls ID allocation and reuse

§Upsert Behavior

When inserting a vector with an existing external ID:

  1. Look up existing internal ID from IdDictionary
  2. Delete old vector: add to deleted bitmap, tombstone data/metadata
  3. Allocate new internal ID (from SeqBlock)
  4. Write new vector with new internal ID
  5. Update IdDictionary to point to new internal ID

This “delete old + insert new” approach avoids expensive read-modify-write cycles to update every posting list and metadata index entry.

§Delete Operation

Deleting a vector requires atomic operations via WriteBatch:

  1. Add vector ID to deleted bitmap (centroid_id = 0 posting list)
  2. Tombstone VectorData record
  3. Tombstone VectorMeta record
  4. Tombstone IdDictionary entry

Metadata index cleanup happens during LIRE maintenance.

Structs§

IdDictionaryValue
IdDictionary value storing the internal vector ID for an external ID.