Skip to main content

Crate parquet_lite

Crate parquet_lite 

Source
Expand description

§parquet-lite

A lightweight, pure-Rust alternative to the official Apache Parquet crate.

Designed for projects where the full parquet crate is overkill. parquet-lite provides read-path essentials in a fraction of the size, with zero unsafe code and full WASM compatibility.

§Key Differences from Official Crate

Featureparquet (official)parquet-lite
Binary sizeLargeSmall
Dependencies~80~15
Thrift dependencyYesNo (hand-rolled)
Read supportFullFlat schemas
Write supportYesNot yet
Arrow integrationYesYes
WASM compatiblePartialFull

§Quick Start

use parquet_lite::*;
use std::fs;

let data = fs::read("data.parquet").unwrap();

// Read metadata
let metadata = read_metadata(&data).unwrap();
println!("Rows: {}, Columns: {}", metadata.num_rows, metadata.num_columns);

// Read as Arrow batches
let batches = read_to_arrow_batches(&data, 1024).unwrap();
for batch in batches {
    let batch = batch.unwrap();
    println!("Batch: {} rows", batch.num_rows());
}

§Feature Flags

FeatureDefaultDescription
snappySnappy compression/decompression
serdeSerde serialization for metadata
wasmWASM bindings via wasm-bindgen
fullAll features enabled

Re-exports§

pub use types::Compression;
pub use types::Encoding;
pub use types::ParquetError;
pub use types::ParquetMetadata;
pub use types::ParquetType;
pub use types::Result;
pub use types::ColumnMetadata;
pub use types::RowGroupMetadata;
pub use schema::ColumnSchema;
pub use schema::LogicalType;
pub use schema::SchemaBuilder;
pub use schema::TimestampUnit;
pub use reader::ColumnData;
pub use reader::ParquetReader;
pub use arrow_convert::ArrowConverter;
pub use batch_iter_advanced::SelectiveBatchIterator;
pub use streaming_reader::StreamingParquetReader;
pub use statistics::ColumnStatistics;
pub use statistics::StatisticsCollector;

Modules§

arrow_convert
batch_iter_advanced
codecs
metadata
reader
schema
statistics
streaming_reader
types

Functions§

print_stats
Print a formatted statistics summary to stdout.
read_columns_to_arrow_batches
Create a batch iterator reading only the specified columns.
read_metadata
Parse metadata from raw Parquet file bytes.
read_to_arrow_batches
Create a batch iterator over Arrow RecordBatches.