Skip to main content

PrimaryKeyIndex

Struct PrimaryKeyIndex 

Source
pub struct PrimaryKeyIndex { /* private fields */ }
Expand description

Thread-safe primary key deduplication index.

Sync dedup in the hot path: BloomFilter::may_contain(), FxHashSet::contains(), and TextDictReader::ordinal() are all sync.

Interior mutability for the mutable state (bloom + uncommitted set) is behind parking_lot::Mutex. The committed data is only mutated via &mut self methods (commit/abort path), so no lock is needed for it.

Implementations§

Source§

impl PrimaryKeyIndex

Source

pub fn new( field: Field, pk_data: Vec<PkSegmentData>, snapshot: SegmentSnapshot, ) -> Self

Create a new PrimaryKeyIndex by scanning committed segments.

Iterates each segment’s fast-field text dictionary to populate the bloom filter with all existing primary key values. The snapshot keeps ref counts alive so segments aren’t deleted while we hold data.

CPU-intensive — call from spawn_blocking, not the async runtime.

Source

pub fn from_persisted( field: Field, bloom: BloomFilter, pk_data: Vec<PkSegmentData>, new_data: &[PkSegmentData], snapshot: SegmentSnapshot, ) -> Self

Create from a pre-loaded bloom filter (loaded from pk_bloom.bin).

Skips dictionary iteration entirely when the persisted bloom covers all current segments. pk_data contains data for ALL current segments. If new_data is non-empty, their keys are inserted into the bloom before returning (incremental update). new_data is a borrowed slice pointing to the subset of segments not covered by the persisted bloom.

Source

pub fn bloom_to_bytes(&self) -> Vec<u8>

Serialize the bloom filter for persistence to pk_bloom.bin.

Source

pub fn memory_bytes(&self) -> usize

Memory used by the bloom filter and uncommitted set.

Source

pub fn check_and_insert(&self, doc: &Document) -> Result<()>

Check whether a document’s primary key is unique, and if so, register it.

Returns Ok(()) if the key is new (inserted into bloom + uncommitted set). Returns Err(DuplicatePrimaryKey) if the key already exists. Returns Err(Document) if the primary key field is missing or empty.

Source

pub fn refresh_incremental( &mut self, new_data: Vec<PkSegmentData>, snapshot: SegmentSnapshot, )

Refresh after commit: merge new segment data, prune removed segments, insert new keys into bloom, and clear uncommitted set.

Only new_data (segments not already held) need to be loaded by the caller. Existing data for segments still in snapshot is retained. The snapshot keeps ref counts alive so segments aren’t deleted.

Source

pub fn committed_segment_ids(&self) -> impl Iterator<Item = &str>

Iterator over segment IDs already held in this PK index.

Source

pub fn rollback_uncommitted_key(&self, doc: &Document)

Roll back an uncommitted key registration (e.g. when channel send fails after check_and_insert succeeded). Bloom may retain the key but that only causes harmless false positives, never missed duplicates.

Source

pub fn clear_uncommitted(&mut self)

Clear uncommitted keys (e.g. on abort). Bloom may retain stale entries but that only causes harmless false positives (extra committed-segment lookups), never missed duplicates.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V