Expand description
IdDictionary value encoding/decoding.
Maps user-provided external IDs to system-assigned internal vector IDs.
§Why Two ID Spaces?
Users provide external IDs—arbitrary strings up to 64 bytes—to identify their vectors. The system maps these to internal IDs (u64) because:
- Efficient bitmaps: Fixed-width u64 keys enable RoaringTreemap operations
- Better compression: Monotonically increasing IDs cluster well in bitmaps
- Lifecycle management: System controls ID allocation and reuse
§Upsert Behavior
When inserting a vector with an existing external ID:
- Look up existing internal ID from
IdDictionary - Delete old vector: add to deleted bitmap, tombstone data/metadata
- Allocate new internal ID (from
SeqBlock) - Write new vector with new internal ID
- Update
IdDictionaryto point to new internal ID
This “delete old + insert new” approach avoids expensive read-modify-write cycles to update every posting list and metadata index entry.
§Delete Operation
Deleting a vector requires atomic operations via WriteBatch:
- Add vector ID to deleted bitmap (centroid_id = 0 posting list)
- Tombstone
VectorDatarecord - Tombstone
VectorMetarecord - Tombstone
IdDictionaryentry
Metadata index cleanup happens during LIRE maintenance.
Structs§
- IdDictionary
Value - IdDictionary value storing the internal vector ID for an external ID.