Expand description
Lance Columnar Data Format
Lance columnar data format is an alternative to Parquet. It provides 100x faster for random access, automatic versioning, optimized for computer vision, bioinformatics, spatial and ML data. Apache Arrow and DuckDB compatible.
Create a Dataset
use lance::{dataset::WriteParams, Dataset};
let schema = Arc::new(Schema::new(vec![Field::new("test", DataType::Int64, false)]));
let batches = vec![RecordBatch::new_empty(schema.clone())];
let reader = RecordBatchIterator::new(
batches.into_iter().map(Ok), schema
);
let write_params = WriteParams::default();
Dataset::write(reader, &uri, Some(write_params)).await.unwrap();
Scan a Dataset
use futures::StreamExt;
use lance::Dataset;
let dataset = Dataset::open(&path).await.unwrap();
let mut scanner = dataset.scan();
let batches: Vec<RecordBatch> = scanner
.try_into_stream()
.await
.unwrap()
.map(|b| b.unwrap())
.collect::<Vec<RecordBatch>>()
.await;
Re-exports
Modules
- Extend Arrow Functionality
- Extends DataFusion
- Lance Dataset
- Data encodings
- On-disk format
- Secondary Index
- I/O utilities.
- Various utilities