pub struct TableWriter { /* private fields */ }Implementations§
Source§impl TableWriter
impl TableWriter
pub fn new( catalog: Arc<dyn CatalogProvider>, store: Arc<dyn Store>, policy: VectorStoragePolicy, table: TableIdent, ) -> Self
Sourcepub fn with_bm25(self, text_column: impl Into<String>) -> Self
pub fn with_bm25(self, text_column: impl Into<String>) -> Self
Enable BM25 hybrid search by accumulating IDF stats from column on each write.
After calling this, every write_batch* call will tokenize the specified column,
update the corpus IDF stats, and persist them to metadata/ailake_bm25_stats.bin.
This file is then loaded automatically by SearchConfig::hybrid at query time.
Typical usage: TableWriter::new(...).with_bm25("chunk_text").
pub fn with_parent_snapshot(self, id: SnapshotId) -> Self
Sourcepub async fn write_batch_deferred(
&mut self,
batch: &RecordBatch,
embeddings: &[Vec<f32>],
) -> AilakeResult<()>
pub async fn write_batch_deferred( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], ) -> AilakeResult<()>
Write batch as Parquet-only immediately, build HNSW in background.
Returns after the Parquet file is persisted (~LanceDB write speed). A tokio task runs concurrently to build the HNSW index, rewrite the file with the AILK section, and update the catalog entry.
During the build window, SearchSession serves this shard via flat scan
(brute-force, exact) instead of HNSW. The transition is automatic once
the background task commits the updated manifest entry.
Sourcepub async fn write_batch_ivf_pq_deferred(
&mut self,
batch: &RecordBatch,
embeddings: &[Vec<f32>],
ivf_config: IvfPqConfig,
) -> AilakeResult<()>
pub async fn write_batch_ivf_pq_deferred( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], ivf_config: IvfPqConfig, ) -> AilakeResult<()>
Write batch as Parquet-only immediately; train IVF-PQ index in background.
The first shard trains the shared codebook (k-means). All subsequent shards
reuse it via OnceCell — build is O(n) assign+encode, not O(n×k) k-means.
Returns after Parquet is persisted. Index transitions Indexing → Ready async.
Sourcepub async fn write_batch_idempotent(
&mut self,
batch: &RecordBatch,
embeddings: &[Vec<f32>],
batch_id: &str,
) -> AilakeResult<()>
pub async fn write_batch_idempotent( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], batch_id: &str, ) -> AilakeResult<()>
Idempotent variant of write_batch.
Before any I/O, checks if batch_id already appears in the current
snapshot. If it does, this is a no-op — safe for Airflow/Kestra retries.
If not found, writes the batch and tags the DataFileEntry with batch_id
so future retries can detect it.
commit() is likewise a no-op when pending_files is empty.
pub async fn write_batch( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], ) -> AilakeResult<()>
Sourcepub async fn write_batch_auto(
&mut self,
batch: &RecordBatch,
embeddings: &[Vec<f32>],
) -> AilakeResult<()>
pub async fn write_batch_auto( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], ) -> AilakeResult<()>
Write batch, auto-selecting the index based on detected hardware.
Picks IVF-PQ when a CUDA GPU or ≥8 CPU cores are present AND the batch
has ≥5 000 vectors. Falls back to HNSW for weaker / local hardware.
Uses IvfPqConfig::for_dataset to scale nlist with dataset size.
Sourcepub async fn write_batch_auto_deferred(
&mut self,
batch: &RecordBatch,
embeddings: &[Vec<f32>],
) -> AilakeResult<()>
pub async fn write_batch_auto_deferred( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], ) -> AilakeResult<()>
Write batch, auto-selecting the index based on detected hardware — deferred variant.
Same hardware detection as write_batch_auto: picks IVF-PQ when a CUDA GPU or
≥8 CPU cores are present AND the batch has ≥5 000 vectors; falls back to HNSW.
Unlike write_batch_auto, the index is built in a background tokio task:
- Parquet is persisted immediately (~200k vec/s, same as write_parquet_only).
- HNSW or IVF-PQ index built asynchronously; shard served via flat scan meanwhile.
Use this when ingest throughput matters more than immediate searchability.
Sourcepub async fn write_batch_ivf_pq(
&mut self,
batch: &RecordBatch,
embeddings: &[Vec<f32>],
ivf_config: IvfPqConfig,
) -> AilakeResult<()>
pub async fn write_batch_ivf_pq( &mut self, batch: &RecordBatch, embeddings: &[Vec<f32>], ivf_config: IvfPqConfig, ) -> AilakeResult<()>
Write batch with IVF-PQ index built synchronously (no background task).
Smaller index than HNSW; better for S3 sequential-scan workloads.
Sourcepub async fn write_batch_multi(
&mut self,
batch: &RecordBatch,
columns: &[MultiVectorBatch<'_>],
) -> AilakeResult<()>
pub async fn write_batch_multi( &mut self, batch: &RecordBatch, columns: &[MultiVectorBatch<'_>], ) -> AilakeResult<()>
Write a batch with multiple vector columns into a single AI-Lake file.
The first entry in columns is treated as the primary column (used for
geometric pruning). Additional columns each get their own HNSW section.
Sourcepub async fn write_batch_multi_deferred(
&mut self,
batch: &RecordBatch,
columns: &[MultiVectorBatch<'_>],
) -> AilakeResult<()>
pub async fn write_batch_multi_deferred( &mut self, batch: &RecordBatch, columns: &[MultiVectorBatch<'_>], ) -> AilakeResult<()>
Write a multi-column batch as Parquet-only immediately; build all N column HNSW indexes in a single background task.
Same semantics as write_batch_deferred but for N vector columns:
- Parquet (primary column bytes) is persisted immediately (~200k vec/s).
- A background tokio task rebuilds the full AILK file via
write_multiand patches the catalog entry with primary + extra column offsets once ready. - During the build window,
SearchSessionserves this shard via GPU/CPU flat scan. Transition to HNSW-indexed search is automatic onIndexStatus::Ready.
All N column embeddings are cloned into the background task; choose batch size so that N×rows×dim×4 bytes fits comfortably in RAM while the task runs.
pub async fn commit(self) -> AilakeResult<SnapshotId>
Sourcepub async fn create_or_open(
catalog: Arc<dyn CatalogProvider>,
store: Arc<dyn Store>,
policy: VectorStoragePolicy,
table: TableIdent,
format_version: u8,
) -> AilakeResult<Self>
pub async fn create_or_open( catalog: Arc<dyn CatalogProvider>, store: Arc<dyn Store>, policy: VectorStoragePolicy, table: TableIdent, format_version: u8, ) -> AilakeResult<Self>
Create a table if it doesn’t exist, then return a writer for it.
Auto Trait Implementations§
impl !RefUnwindSafe for TableWriter
impl !UnwindSafe for TableWriter
impl Freeze for TableWriter
impl Send for TableWriter
impl Sync for TableWriter
impl Unpin for TableWriter
impl UnsafeUnpin for TableWriter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more