pub struct SegmentBuilder { /* private fields */ }Expand description
Segment builder with optimized memory usage
Features:
- Streams documents to disk immediately (no in-memory document storage)
- Uses string interning for terms (reduced allocations)
- Uses hashbrown HashMap (faster than BTreeMap)
Implementations§
Source§impl SegmentBuilder
impl SegmentBuilder
Sourcepub fn new(schema: Schema, config: SegmentBuilderConfig) -> Result<Self>
pub fn new(schema: Schema, config: SegmentBuilderConfig) -> Result<Self>
Create a new segment builder
pub fn set_tokenizer(&mut self, field: Field, tokenizer: BoxedTokenizer)
pub fn num_docs(&self) -> u32
Sourcepub fn estimated_memory_bytes(&self) -> usize
pub fn estimated_memory_bytes(&self) -> usize
Fast O(1) memory estimate - updated incrementally during indexing
Sourcepub fn recalibrate_memory(&mut self)
pub fn recalibrate_memory(&mut self)
Recalibrate incremental memory estimate using capacity-based calculation. More expensive than estimated_memory_bytes() — O(terms + dims) vs O(1) — but accounts for Vec capacity growth (doubling) and HashMap table overhead. Call periodically (e.g. every 1000 docs) to prevent drift.
Sourcepub fn sparse_dim_count(&self) -> usize
pub fn sparse_dim_count(&self) -> usize
Count total unique sparse dimensions across all fields
Sourcepub fn stats(&self) -> SegmentBuilderStats
pub fn stats(&self) -> SegmentBuilderStats
Get current statistics for debugging performance (expensive - iterates all data)
Sourcepub fn add_document(&mut self, doc: Document) -> Result<DocId>
pub fn add_document(&mut self, doc: Document) -> Result<DocId>
Add a document - streams to disk immediately
Sourcepub async fn build<D: Directory + DirectoryWriter>(
self,
dir: &D,
segment_id: SegmentId,
) -> Result<SegmentMeta>
pub async fn build<D: Directory + DirectoryWriter>( self, dir: &D, segment_id: SegmentId, ) -> Result<SegmentMeta>
Build the final segment
Memory optimization: each phase consumes and drops its source data before the next phase begins, preventing accumulation of multiple large buffers.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for SegmentBuilder
impl !RefUnwindSafe for SegmentBuilder
impl Send for SegmentBuilder
impl Sync for SegmentBuilder
impl Unpin for SegmentBuilder
impl !UnwindSafe for SegmentBuilder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.