Lance Columnar Data Format

Lance columnar data format is an alternative to Parquet. It provides 100x faster for random access, automatic versioning, optimized for computer vision, bioinformatics, spatial and ML data. Apache Arrow and DuckDB compatible.

Create a Dataset

use lance::{dataset::WriteParams, Dataset};

let schema = Arc::new(Schema::new(vec![Field::new("test", DataType::Int64, false)]));
let batches = vec![RecordBatch::new_empty(schema.clone())];
let reader = RecordBatchIterator::new(
    batches.into_iter().map(Ok), schema

let write_params = WriteParams::default();
Dataset::write(reader, &uri, Some(write_params)).await.unwrap();

Scan a Dataset

use futures::StreamExt;
use lance::Dataset;

let dataset = Dataset::open(&path).await.unwrap();
let mut scanner = dataset.scan();
let batches: Vec<RecordBatch> = scanner
    .map(|b| b.unwrap())