pub struct IndexWriter<D: DirectoryWriter + 'static> { /* private fields */ }Expand description
Async IndexWriter for adding documents and committing segments
Features:
- Queue-based parallel indexing with worker tasks
- Streams documents to disk immediately (no in-memory document storage)
- Uses string interning for terms (reduced allocations)
- Uses hashbrown HashMap (faster than BTreeMap)
Architecture:
add_document()sends to per-worker unbounded channels (non-blocking)- Round-robin distribution across workers - no mutex contention
- Each worker owns a SegmentBuilder and flushes when memory threshold is reached
State management:
- Building segments: Managed here (pending_builds)
- Committed segments + metadata: Managed by SegmentManager (sole owner of metadata.json)
Implementations§
Source§impl<D: DirectoryWriter + 'static> IndexWriter<D>
impl<D: DirectoryWriter + 'static> IndexWriter<D>
Sourcepub async fn build_vector_index(&self) -> Result<()>
pub async fn build_vector_index(&self) -> Result<()>
Train vector index from accumulated Flat vectors (manual, not auto-triggered).
- Acquires a snapshot (segments safe to read)
- Collects vectors for training
- Trains centroids/codebooks
- Updates metadata (marks fields as Built)
- Publishes to ArcSwap — merges will use these automatically
Existing flat segments get ANN during normal merges. No rebuild needed.
Sourcepub async fn rebuild_vector_index(&self) -> Result<()>
pub async fn rebuild_vector_index(&self) -> Result<()>
Rebuild vector index by retraining centroids/codebooks.
Resets Built state to Flat, clears trained structures, then trains fresh.
Source§impl<D: DirectoryWriter + 'static> IndexWriter<D>
impl<D: DirectoryWriter + 'static> IndexWriter<D>
Sourcepub async fn create(
directory: D,
schema: Schema,
config: IndexConfig,
) -> Result<Self>
pub async fn create( directory: D, schema: Schema, config: IndexConfig, ) -> Result<Self>
Create a new index in the directory
Sourcepub async fn create_with_config(
directory: D,
schema: Schema,
config: IndexConfig,
builder_config: SegmentBuilderConfig,
) -> Result<Self>
pub async fn create_with_config( directory: D, schema: Schema, config: IndexConfig, builder_config: SegmentBuilderConfig, ) -> Result<Self>
Create a new index with custom builder config
Sourcepub async fn open(directory: D, config: IndexConfig) -> Result<Self>
pub async fn open(directory: D, config: IndexConfig) -> Result<Self>
Open an existing index for writing
Sourcepub async fn open_with_config(
directory: D,
config: IndexConfig,
builder_config: SegmentBuilderConfig,
) -> Result<Self>
pub async fn open_with_config( directory: D, config: IndexConfig, builder_config: SegmentBuilderConfig, ) -> Result<Self>
Open an existing index with custom builder config
Sourcepub fn from_index(index: &Index<D>) -> Self
pub fn from_index(index: &Index<D>) -> Self
Create an IndexWriter from an existing Index
This shares the SegmentManager with the Index, ensuring consistent segment lifecycle management.
Sourcepub fn set_tokenizer<T: Tokenizer>(&mut self, field: Field, tokenizer: T)
pub fn set_tokenizer<T: Tokenizer>(&mut self, field: Field, tokenizer: T)
Set tokenizer for a field
Sourcepub fn add_document(&self, doc: Document) -> Result<DocId>
pub fn add_document(&self, doc: Document) -> Result<DocId>
Add a document to the indexing queue
Documents are sent to per-worker unbounded channels. This is O(1) and never blocks - returns immediately. Workers handle the actual indexing in parallel.
Sourcepub fn add_documents(&self, documents: Vec<Document>) -> Result<usize>
pub fn add_documents(&self, documents: Vec<Document>) -> Result<usize>
Add multiple documents to the indexing queue
Documents are distributed round-robin to workers. Returns immediately - never blocks.
Sourcepub fn pending_build_count(&self) -> usize
pub fn pending_build_count(&self) -> usize
Get the number of pending background builds
Sourcepub async fn maybe_merge(&self)
pub async fn maybe_merge(&self)
Check merge policy and spawn background merges if needed
This is called automatically after commit via SegmentManager. Can also be called manually to trigger merge checking.
Sourcepub async fn wait_for_merges(&self)
pub async fn wait_for_merges(&self)
Wait for all pending merges to complete
Sourcepub fn tracker(&self) -> Arc<SegmentTracker>
pub fn tracker(&self) -> Arc<SegmentTracker>
Get the segment tracker for sharing with readers This allows readers to acquire snapshots that prevent segment deletion
Sourcepub async fn acquire_snapshot(&self) -> SegmentSnapshot
pub async fn acquire_snapshot(&self) -> SegmentSnapshot
Acquire a snapshot of current segments for reading The snapshot holds references - segments won’t be deleted while snapshot exists
Sourcepub async fn cleanup_orphan_segments(&self) -> Result<usize>
pub async fn cleanup_orphan_segments(&self) -> Result<usize>
Clean up orphan segment files that are not registered
This can happen if the process halts after segment files are written but before they are registered in segments.json. Call this after opening an index to reclaim disk space from incomplete operations.
Returns the number of orphan segments deleted.
Sourcepub async fn flush(&self) -> Result<()>
pub async fn flush(&self) -> Result<()>
Flush all workers - serializes in-memory data to segment files on disk
Sends flush signals to all workers, waits for them to acknowledge,
then awaits ALL in-flight build JoinHandles.
Completed segments are accumulated in flushed_segments but NOT
registered in metadata - only commit() does that.
Workers continue running and can accept new documents after flush.
Sourcepub async fn commit(&self) -> Result<()>
pub async fn commit(&self) -> Result<()>
Commit all pending segments to metadata.
Tantivy-style: flush → atomic commit → merge evaluation. Nothing else.
Vector training is decoupled — call build_vector_index() manually.
Sourcepub async fn force_merge(&self) -> Result<()>
pub async fn force_merge(&self) -> Result<()>
Force merge all segments into one. Flushes + commits pending docs, then merges. Does NOT trigger background merges — force_merge handles everything itself.
Auto Trait Implementations§
impl<D> !Freeze for IndexWriter<D>
impl<D> !RefUnwindSafe for IndexWriter<D>
impl<D> Send for IndexWriter<D>
impl<D> Sync for IndexWriter<D>
impl<D> Unpin for IndexWriter<D>
impl<D> !UnwindSafe for IndexWriter<D>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.