parquet-lite 0.2.0

A lightweight, pure-Rust alternative to the official Apache Parquet crate — minimal dependencies, small binary size, WASM-compatible
Documentation

parquet-lite

Crates.io Documentation License: Apache-2.0

A lightweight, pure-Rust alternative to the official Apache Parquet crate.

Designed for projects where the full parquet crate is overkill. parquet-lite provides read-path essentials in a fraction of the size, with zero unsafe code and full WASM compatibility out of the box.

Why use parquet-lite?

If you've ever tried to compile the official parquet crate for WebAssembly or embedded environments, you likely noticed it pulls in dozens of dependencies, takes a while to compile, and adds significant bloat to your binary size.

parquet-lite takes a different approach:

  • No Thrift dependency: It uses a hand-rolled, zero-allocation decoder for just the parts of the Thrift Compact Protocol needed to parse Parquet footers.
  • Granular Arrow Integration: Uses lightweight arrow-array and arrow-schema sub-crates rather than the monolithic arrow crate to avoid pulling in CSV/JSON/IPC modules you don't need.
  • Minimal Footprint: By omitting write support and complex nested schemas, it strips away thousands of lines of code.

When to use parquet-lite

  • WebAssembly (WASM): You are building a frontend application that needs to read Parquet files in the browser and want to keep your .wasm payload small.
  • Edge / Serverless: You have strict deployment size constraints (e.g., Cloudflare Workers, AWS Lambda).
  • Embedded Systems: Memory and storage are tightly constrained.
  • Simple Workloads: You just need to read flat tabular data into Arrow arrays and don't need the kitchen sink.

What is NOT supported

To keep the library small and focused, the following features from the official crate are deliberately omitted:

  • Writing Parquet files: This is a read-only library.
  • Nested Types: Structs, Lists, and Maps are not supported. Only flat tabular schemas are supported.
  • Complex Encodings: Only Plain encoding is currently supported (Dictionary, RLE, and Delta encodings are planned but not yet implemented).
  • All Compressions: Snappy is supported (and default). GZIP, Brotli, ZSTD, etc. are currently unsupported.
  • Full Thrift Parsing: Only the subset of Parquet Thrift metadata required for reading data is parsed.

Quick Start

Add parquet-lite to your Cargo.toml:

[dependencies]
parquet-lite = "0.2"

Read metadata and data:

use parquet_lite::*;
use std::fs;

// 1. Read raw bytes
let data = fs::read("data.parquet").unwrap();

// 2. Read metadata quickly
let metadata = read_metadata(&data).unwrap();
println!("Rows: {}, Columns: {}", metadata.num_rows, metadata.num_columns);

// 3. Read into Arrow RecordBatches
let batches = read_to_arrow_batches(&data, 1024).unwrap();
for batch in batches {
    let batch = batch.unwrap();
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Benchmarks run against a Snappy-compressed Parquet file with 1,000,000 rows (4 columns: int32, float64, boolean, string) on an M-series Mac.

  • Read Metadata (Footer parsing): ~1.1 µs
  • Full File Read (to Arrow RecordBatches, 4096 rows/batch): ~300 µs
  • Full File Read (to Arrow RecordBatches, 1024 rows/batch): ~427 µs

Note: The library relies on arrow sub-crates and zero-copy slicing to achieve extremely fast columnar reads.

Feature Flags

Feature Default Description
snappy Snappy compression/decompression via snap
serde Serde serialization for metadata types
wasm First-class WASM bindings via wasm-bindgen
full All optional features enabled

WASM Example

Enable the wasm feature:

parquet-lite = { version = "0.2", features = ["wasm"] }

In your JavaScript/TypeScript:

import init, { WasmParquetReader } from './parquet_lite.js';

async function readParquet() {
    await init();
    
    const response = await fetch('data.parquet');
    const data = new Uint8Array(await response.arrayBuffer());
    
    const reader = new WasmParquetReader(data);
    
    console.log(`Rows: ${reader.num_rows()}`);
    
    // Read column data as JSON
    const colData = JSON.parse(reader.read_column_json(0));
    console.log('Values:', colData);
}

License

Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).