pub struct IndexWriter<D: DirectoryWriter + 'static> { /* private fields */ }Expand description
Async IndexWriter for adding documents and committing segments.
Backpressure: add_document() is sync, O(1). Returns Error::QueueFull
when the shared queue is at capacity — caller must back off.
Two-phase commit:
prepare_commit()→PreparedCommit::commit()orPreparedCommit::abort()commit()is a convenience that does both phases.- Between prepare and commit, the caller can do external work (WAL, sync, etc.) knowing that abort is possible if something fails.
- Dropping
PreparedCommitwithout calling commit/abort auto-aborts.
Implementations§
Source§impl<D: DirectoryWriter + 'static> IndexWriter<D>
impl<D: DirectoryWriter + 'static> IndexWriter<D>
Sourcepub async fn build_vector_index(&self) -> Result<()>
pub async fn build_vector_index(&self) -> Result<()>
Train vector index from accumulated Flat vectors (manual, not auto-triggered).
- Acquires a snapshot (segments safe to read)
- Collects vectors for training
- Trains centroids/codebooks
- Updates metadata (marks fields as Built)
- Publishes to ArcSwap — merges will use these automatically
Existing flat segments get ANN during normal merges. No rebuild needed.
Sourcepub async fn rebuild_vector_index(&self) -> Result<()>
pub async fn rebuild_vector_index(&self) -> Result<()>
Rebuild vector index by retraining centroids/codebooks.
Resets Built state to Flat, clears trained structures, then trains fresh.
Source§impl<D: DirectoryWriter + 'static> IndexWriter<D>
impl<D: DirectoryWriter + 'static> IndexWriter<D>
Sourcepub async fn create(
directory: D,
schema: Schema,
config: IndexConfig,
) -> Result<Self>
pub async fn create( directory: D, schema: Schema, config: IndexConfig, ) -> Result<Self>
Create a new index in the directory
Sourcepub async fn create_with_config(
directory: D,
schema: Schema,
config: IndexConfig,
builder_config: SegmentBuilderConfig,
) -> Result<Self>
pub async fn create_with_config( directory: D, schema: Schema, config: IndexConfig, builder_config: SegmentBuilderConfig, ) -> Result<Self>
Create a new index with custom builder config
Sourcepub async fn open(directory: D, config: IndexConfig) -> Result<Self>
pub async fn open(directory: D, config: IndexConfig) -> Result<Self>
Open an existing index for writing
Sourcepub async fn open_with_config(
directory: D,
config: IndexConfig,
builder_config: SegmentBuilderConfig,
) -> Result<Self>
pub async fn open_with_config( directory: D, config: IndexConfig, builder_config: SegmentBuilderConfig, ) -> Result<Self>
Open an existing index with custom builder config
Sourcepub fn from_index(index: &Index<D>) -> Self
pub fn from_index(index: &Index<D>) -> Self
Create an IndexWriter from an existing Index. Shares the SegmentManager for consistent segment lifecycle management.
Sourcepub fn set_tokenizer<T: Tokenizer>(&mut self, field: Field, tokenizer: T)
pub fn set_tokenizer<T: Tokenizer>(&mut self, field: Field, tokenizer: T)
Set tokenizer for a field. Propagated to worker threads — takes effect for the next SegmentBuilder they create.
Sourcepub async fn init_primary_key_dedup(&mut self) -> Result<()>
pub async fn init_primary_key_dedup(&mut self) -> Result<()>
Initialize primary key deduplication from committed segments.
Tries to load a cached bloom filter from pk_bloom.bin first. If the
cache covers all current segments, the bloom is reused directly (fast
path). If new segments appeared since the cache was written, only their
keys are iterated (incremental). Falls back to a full rebuild when no
cache exists.
The CPU-intensive bloom build is offloaded via spawn_blocking so it
does not block the tokio runtime.
No-op if schema has no primary field.
Sourcepub fn add_document(&self, doc: Document) -> Result<()>
pub fn add_document(&self, doc: Document) -> Result<()>
Add a document to the indexing queue (sync, O(1), lock-free).
Document is moved into the channel (zero-copy). Workers compete to pull it.
Returns Error::QueueFull when the queue is at capacity — caller must back off.
Sourcepub fn add_documents(&self, documents: Vec<Document>) -> Result<usize>
pub fn add_documents(&self, documents: Vec<Document>) -> Result<usize>
Add multiple documents to the indexing queue.
Returns the number of documents successfully queued. Stops at the first
QueueFull and returns the count queued so far.
Sourcepub async fn maybe_merge(&self)
pub async fn maybe_merge(&self)
Check merge policy and spawn a background merge if needed.
Sourcepub async fn abort_merges(&self)
pub async fn abort_merges(&self)
Abort all in-flight merge tasks without waiting for completion.
Sourcepub async fn wait_for_merging_thread(&self)
pub async fn wait_for_merging_thread(&self)
Wait for the in-flight background merge to complete (if any).
Sourcepub async fn wait_for_all_merges(&self)
pub async fn wait_for_all_merges(&self)
Wait for all eligible merges to complete, including cascading merges.
Sourcepub fn tracker(&self) -> Arc<SegmentTracker>
pub fn tracker(&self) -> Arc<SegmentTracker>
Get the segment tracker for sharing with readers.
Sourcepub async fn acquire_snapshot(&self) -> SegmentSnapshot
pub async fn acquire_snapshot(&self) -> SegmentSnapshot
Acquire a snapshot of current segments for reading.
Sourcepub async fn cleanup_orphan_segments(&self) -> Result<usize>
pub async fn cleanup_orphan_segments(&self) -> Result<usize>
Clean up orphan segment files not registered in metadata.
Sourcepub async fn prepare_commit(&mut self) -> Result<PreparedCommit<'_, D>>
pub async fn prepare_commit(&mut self) -> Result<PreparedCommit<'_, D>>
Prepare commit — signal workers to flush, wait for completion, collect segments.
All documents sent via add_document before this call are guaranteed
to be written to segment files on disk. Segments are NOT yet registered
in metadata — call PreparedCommit::commit() for that.
Workers are NOT destroyed — they flush their builders and wait for
resume_workers() to give them a new channel.
add_document will return Closed error until commit/abort resumes workers.
Sourcepub async fn commit(&mut self) -> Result<bool>
pub async fn commit(&mut self) -> Result<bool>
Commit (convenience): prepare_commit + commit in one call.
Guarantees all prior add_document calls are committed.
Vector training is decoupled — call build_vector_index() manually.
Sourcepub async fn force_merge(&mut self) -> Result<()>
pub async fn force_merge(&mut self) -> Result<()>
Force merge all segments into one.
Sourcepub async fn reorder(&mut self) -> Result<()>
pub async fn reorder(&mut self) -> Result<()>
Reorder all segments via Recursive Graph Bisection (BP) for better BMP pruning.
Each segment is individually rebuilt with record-level BP reordering: ordinals are shuffled across blocks so that similar content clusters tightly.
Sourcepub fn segment_manager(&self) -> &Arc<SegmentManager<D>>
pub fn segment_manager(&self) -> &Arc<SegmentManager<D>>
Get the segment manager (for background optimizer access).
Trait Implementations§
Source§impl<D: DirectoryWriter + 'static> Drop for IndexWriter<D>
impl<D: DirectoryWriter + 'static> Drop for IndexWriter<D>
Auto Trait Implementations§
impl<D> !Freeze for IndexWriter<D>
impl<D> !RefUnwindSafe for IndexWriter<D>
impl<D> Send for IndexWriter<D>
impl<D> Sync for IndexWriter<D>
impl<D> Unpin for IndexWriter<D>
impl<D> UnsafeUnpin for IndexWriter<D>
impl<D> !UnwindSafe for IndexWriter<D>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.