Skip to main content

KeyIndexer

Struct KeyIndexer 

Source
pub struct KeyIndexer { /* private fields */ }
Expand description

KeyIndexer maps 64-bit key hashes to 48-bit file offsets, augmented with a 16-bit tag for lightweight collision detection.

§Purpose

While XXH3 is a high-quality 64-bit hash, collisions are still possible in large-scale stores. This index:

  • Detects such collisions via a 16-bit fingerprint (tag)
  • Avoids storing full keys in memory
  • Maintains constant memory footprint (u64 per key)

§Packed Value Format

Stored in a single u64:

[63         48][47                      0]
[  tag (16)  ][     file offset (48)     ]

§Collision Handling

During lookups, the stored tag is compared to a rederived one from the key or key hash. If the tags do not match, the entry is rejected as a potential collision.

§Limitations

  • Max file size: 2^48 bytes = 256 TiB
    • Any file larger than this will overflow the offset field and corrupt the tag.
  • Tag uniqueness: 2^16 = 65,536 distinct tags
    • Probability of tag collision is 1 in 65,536 per conflicting key hash
    • This is sufficient to distinguish over 4 billion keys with ~50% collision probability (birthday bound)

§Tradeoffs

This scheme offers a middle ground:

  • Significantly improves safety over pure hash indexing
  • No dynamic memory cost
  • Small and predictable performance cost (~2 bit ops + 1 compare)

Implementations§

Source§

impl KeyIndexer

Source

pub fn tag_from_hash(key_hash: u64) -> u16

Returns a 16-bit tag from the upper bits of a key hash.

Source

pub fn tag_from_key(key: &[u8]) -> u16

Computes a tag directly from the raw key.

Source

pub fn pack(tag: u16, offset: u64) -> u64

Combines a tag and offset into a packed 64-bit value.

§Panics

If offset exceeds 48 bits (i.e. > 256 TiB), this will panic.

Source

pub fn unpack(packed: u64) -> (u16, u64)

Extracts (tag, offset) from a packed value.

Source

pub fn build(mmap: &Mmap, tail_offset: u64) -> Self

Scans the file backwards and builds a tag-aware hash index.

The most recent version of each key hash is kept.

Source

pub fn insert( &mut self, key_hash: u64, new_offset: u64, ) -> Result<Option<u64>, &'static str>

Inserts a new key hash and offset into the index, with collision detection.

This method checks the tag of an existing entry before overwriting. If the tags do not match, it returns an Err, indicating a hash collision.

§Returns
  • Ok(Some(u64)): On a successful update, returning the previous offset.
  • Ok(None): On a successful insert of a new key.
  • Err(&'static str): On a tag mismatch, indicating a hash collision.
Source

pub fn get_packed(&self, key_hash: &u64) -> Option<&u64>

Gets the raw packed value.

Source

pub fn get_offset(&self, key_hash: &u64) -> Option<u64>

Returns only the unpacked offset (ignores tag).

Source

pub fn remove(&mut self, key_hash: &u64) -> Option<u64>

Removes a key and returns its offset.

Source

pub fn len(&self) -> usize

Source

pub fn is_empty(&self) -> bool

Source

pub fn values(&self) -> Values<'_, u64, u64>

Returns a memory-efficient iterator over the packed (tag|offset) values.

This method is preferable to collecting all values into a Vec when the index is large, as it avoids a large upfront memory allocation. The iterator borrows the underlying index, so it must be used within the lifetime of the KeyIndexer’s read lock.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more