pub struct MemTableWriter { /* private fields */ }Expand description
In-memory write buffer that batches small inserts before persisting.
Problem: streaming pipelines (Flink, Spark Streaming) emit small
RecordBatches every few seconds. Calling write_batch_deferred on each
micro-batch creates many tiny Parquet files and triggers repeated HNSW
builds — both are expensive.
Solution: buffer rows in RAM, flush to a single Parquet shard only when the buffer reaches the configured size/row/time threshold. The deferred HNSW build runs once per flush, not once per micro-batch.
§Usage
let mut mt = MemTableWriter::new(catalog, store, policy, table, MemTableConfig::default());
loop {
mt.insert(&batch, &embeddings).await.unwrap();
mt.flush_if_due().await.unwrap();
}
mt.flush().await.unwrap();Implementations§
Source§impl MemTableWriter
impl MemTableWriter
pub fn new( catalog: Arc<dyn CatalogProvider>, store: Arc<dyn Store>, policy: VectorStoragePolicy, table: TableIdent, config: MemTableConfig, ) -> Self
Sourcepub async fn insert(
&mut self,
batch: &RecordBatch,
embeddings: &[Vec<f32>],
) -> AilakeResult<Option<SnapshotId>>
pub async fn insert( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], ) -> AilakeResult<Option<SnapshotId>>
Buffer a micro-batch. Flushes automatically if size or row threshold exceeded.
Returns Some(snapshot_id) when an automatic flush occurred, None otherwise.
Sourcepub async fn flush_if_due(&mut self) -> AilakeResult<Option<SnapshotId>>
pub async fn flush_if_due(&mut self) -> AilakeResult<Option<SnapshotId>>
Flush if flush_interval has elapsed since the last flush.
Returns Some(snapshot_id) when a flush occurred, None otherwise.
Sourcepub async fn flush(&mut self) -> AilakeResult<SnapshotId>
pub async fn flush(&mut self) -> AilakeResult<SnapshotId>
Flush all buffered data immediately, even if thresholds are not met.
Returns the committed SnapshotId. Calling on an empty buffer is a no-op
and returns SnapshotId::default().
Sourcepub fn buffered_rows(&self) -> usize
pub fn buffered_rows(&self) -> usize
Number of rows currently in the buffer.
Sourcepub fn buffered_bytes(&self) -> usize
pub fn buffered_bytes(&self) -> usize
Estimated byte size of the embedding data currently in the buffer.
Auto Trait Implementations§
impl !RefUnwindSafe for MemTableWriter
impl !UnwindSafe for MemTableWriter
impl Freeze for MemTableWriter
impl Send for MemTableWriter
impl Sync for MemTableWriter
impl Unpin for MemTableWriter
impl UnsafeUnpin for MemTableWriter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more