pub struct PositionalIndexer { /* private fields */ }Expand description
Positional indexer for cache-aware routing.
Uses a single DashMap<(usize, ContentHash), SeqEntry> — keyed by
(position, content_hash). Grows unboundedly (no capacity limit).
Jump search gives amortized O(D/J + W) matching complexity.
Write-path methods take a caller-owned &mut WorkerBlockMap (one per worker).
This gives direct HashMap access (~5ns) instead of DashMap hash+shard locking
(~25ns), achieving zero-contention for single-writer-per-worker.
Implementations§
Source§impl PositionalIndexer
impl PositionalIndexer
Sourcepub fn new(jump_size: usize) -> Self
pub fn new(jump_size: usize) -> Self
Create a new PositionalIndexer with the given jump size.
jump_size controls how many positions the search algorithm skips at a time.
Larger values reduce lookups on long matching prefixes but increase scan range
when workers drain. Default: 64.
Sourcepub fn worker_id(&self, worker: &str) -> Option<u32>
pub fn worker_id(&self, worker: &str) -> Option<u32>
Get the internal u32 ID for a worker URL, if it has been interned.
Used by consumers to look up scores in OverlapScores by worker URL.
Returns None if the worker has never been seen by this indexer.
Sourcepub fn apply_stored(
&self,
worker_id: u32,
blocks: &[StoredBlock],
parent_seq_hash: Option<SequenceHash>,
worker_blocks: &mut WorkerBlockMap,
) -> Result<(), ApplyError>
pub fn apply_stored( &self, worker_id: u32, blocks: &[StoredBlock], parent_seq_hash: Option<SequenceHash>, worker_blocks: &mut WorkerBlockMap, ) -> Result<(), ApplyError>
Apply a “blocks stored” event for a worker.
worker_id: internal ID from [intern_worker].
blocks: ordered sequence of stored blocks (each with seq_hash + content_hash).
parent_seq_hash: if Some, the sequence extends from the parent’s position + 1.
If None, the sequence starts from position 0.
worker_blocks: caller-owned reverse lookup for this worker.
Sourcepub fn apply_removed(
&self,
worker_id: u32,
seq_hashes: &[SequenceHash],
worker_blocks: &mut WorkerBlockMap,
)
pub fn apply_removed( &self, worker_id: u32, seq_hashes: &[SequenceHash], worker_blocks: &mut WorkerBlockMap, )
Apply a “blocks removed” event for a worker.
worker_id: internal ID from [intern_worker].
seq_hashes: position-aware block hashes to remove (from proto).
worker_blocks: caller-owned reverse lookup for this worker.
Note on orphaned entries: Removing a block at position N does not cascade to blocks at positions > N. Those entries become orphaned — they remain in the index but won’t match queries because the rolling prefix hash chain is broken at the gap. This is harmless: orphaned entries waste a small amount of memory and are cleaned up when the worker is cleared or removed. Backends typically evict from the tail (LRU), so mid-sequence gaps are rare in practice.
Sourcepub fn apply_cleared(&self, worker_id: u32, worker_blocks: &mut WorkerBlockMap)
pub fn apply_cleared(&self, worker_id: u32, worker_blocks: &mut WorkerBlockMap)
Apply a “cache cleared” event — drain blocks, clean index, caller keeps the empty map.
worker_id: internal ID from [intern_worker].
worker_blocks: caller-owned reverse lookup — drained and left empty.
Sourcepub fn remove_worker(&self, worker_id: u32, worker_blocks: WorkerBlockMap)
pub fn remove_worker(&self, worker_id: u32, worker_blocks: WorkerBlockMap)
Remove a worker entirely — takes ownership of blocks, cleans index, worker is gone.
worker_id: internal ID from [intern_worker].
worker_blocks: caller-owned reverse lookup — consumed.
Sourcepub fn current_size(&self) -> usize
pub fn current_size(&self) -> usize
Get total number of blocks across all workers.
Sourcepub fn find_matches(
&self,
content_hashes: &[ContentHash],
early_exit: bool,
) -> OverlapScores
pub fn find_matches( &self, content_hashes: &[ContentHash], early_exit: bool, ) -> OverlapScores
Find overlap scores for a request’s content hash sequence.
Uses jump search: strides by jump_size positions, only scanning
intermediate positions when workers drain (stop matching).
Complexity: amortized O(D/J + W) where D=depth, J=jump_size, W=workers.
When early_exit is true, returns immediately after finding any match
at position 0 (score = 1 for all matching workers). Useful when the caller
only needs to know whether any worker has cached data for this sequence.
Assumption: Block sequences are prefix-closed — if a worker has a block at
position N, it has blocks at all positions 0..N. This holds when backends evict
from the tail (LRU). If apply_removed creates a mid-sequence gap, the rolling
prefix hash detects it (the chain breaks at the gap), but the jump heuristic may
over-count if it lands past the gap. In practice, backends only evict tail blocks.
Sourcepub fn intern_worker(&self, worker: &str) -> u32
pub fn intern_worker(&self, worker: &str) -> u32
Intern a worker URL to an internal u32 ID. Fast path: DashMap shard read (no lock). Slow path: DashMap entry API (once per worker).