Skip to main content

SegmentBuilder

Struct SegmentBuilder 

Source
pub struct SegmentBuilder { /* private fields */ }
Expand description

Segment builder with optimized memory usage

Features:

  • Streams documents to disk immediately (no in-memory document storage)
  • Uses string interning for terms (reduced allocations)
  • Uses hashbrown HashMap (faster than BTreeMap)

Implementations§

Source§

impl SegmentBuilder

Source

pub fn new(schema: Schema, config: SegmentBuilderConfig) -> Result<Self>

Create a new segment builder

Source

pub fn set_tokenizer(&mut self, field: Field, tokenizer: BoxedTokenizer)

Source

pub fn num_docs(&self) -> u32

Source

pub fn estimated_memory_bytes(&self) -> usize

Fast O(1) memory estimate - updated incrementally during indexing

Source

pub fn recalibrate_memory(&mut self)

Recalibrate incremental memory estimate using capacity-based calculation. More expensive than estimated_memory_bytes() — O(terms + dims) vs O(1) — but accounts for Vec capacity growth (doubling) and HashMap table overhead. Call periodically (e.g. every 1000 docs) to prevent drift.

Source

pub fn sparse_dim_count(&self) -> usize

Count total unique sparse dimensions across all fields

Source

pub fn stats(&self) -> SegmentBuilderStats

Get current statistics for debugging performance (expensive - iterates all data)

Source

pub fn add_document(&mut self, doc: Document) -> Result<DocId>

Add a document - streams to disk immediately

Source

pub async fn build<D: Directory + DirectoryWriter>( self, dir: &D, segment_id: SegmentId, ) -> Result<SegmentMeta>

Build the final segment

Streams all data directly to disk via StreamingWriter to avoid buffering entire serialized outputs in memory. Each phase consumes and drops its source data before the next phase begins.

Trait Implementations§

Source§

impl Drop for SegmentBuilder

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V