hdf5-reader 0.1.1

Pure-Rust, read-only HDF5 file decoder with no C dependencies
Documentation

netcdf-rust

Pure-Rust, read-only decoders for HDF5 and NetCDF. No C libraries, no build scripts, and no unsafe beyond memmap2.

Crates

Crate Description
hdf5-reader Low-level HDF5 decoder (superblock, object headers, B-trees, chunked I/O, filters)
netcdf-reader NetCDF reader supporting CDF-1/2/5 classic and NetCDF-4 (HDF5-backed) formats

Usage

use netcdf_reader::{NcFile, NcSliceInfo, NcSliceInfoElem};

let file = NcFile::open("era5.nc")?;
println!("format: {:?}", file.format());

for var in file.variables() {
    println!("  var: {} {:?}", var.name(), var.shape());
}

// Read typed data (works for both classic and NetCDF-4)
let temp: ndarray::ArrayD<f32> = file.read_variable("temperature")?;

// Type-promoting read (any numeric type → f64)
let data = file.read_variable_as_f64("temperature")?;

// CF conventions: unpack packed integer data (scale_factor + add_offset)
let unpacked = file.read_variable_unpacked("temperature")?;

// CF conventions: mask fill values + unpack in one call
let clean = file.read_variable_unpacked_masked("temperature")?;

// Hyperslab: read a single time step from a 4D variable
let sel = NcSliceInfo {
    selections: vec![
        NcSliceInfoElem::Index(0),                                        // time=0
        NcSliceInfoElem::Slice { start: 0, end: u64::MAX, step: 1 },     // all levels
        NcSliceInfoElem::Slice { start: 0, end: u64::MAX, step: 1 },     // all lat
        NcSliceInfoElem::Slice { start: 0, end: u64::MAX, step: 1 },     // all lon
    ],
};
let step: ndarray::ArrayD<f32> = file.read_variable_slice("temperature", &sel)?;

// Lazy iteration over time steps
for slice in file.iter_slices::<f32>("temperature", 0)? {
    let data = slice?;
    println!("  step shape: {:?}", data.shape());
}

Using hdf5-reader directly:

use hdf5_reader::Hdf5File;

let file = Hdf5File::open("data.h5")?;
let ds = file.dataset("/group1/temperature")?;
let data: ndarray::ArrayD<f64> = ds.read_array()?;

// Hyperslab selection
use hdf5_reader::{SliceInfo, SliceInfoElem};
let sel = SliceInfo {
    selections: vec![
        SliceInfoElem::Slice { start: 0, end: 10, step: 1 },
        SliceInfoElem::Index(5),
    ],
};
let slice: ndarray::ArrayD<f64> = ds.read_slice(&sel)?;

Features

HDF5

  • Superblock v0-v3 and object header v1/v2 with checksum verification
  • Compact, contiguous, and chunked layouts
  • All chunk index types: v1/v2 B-tree, single-chunk, implicit, Fixed Array, Extensible Array
  • Deflate, shuffle, Fletcher-32, and optional LZ4 filters
  • Custom filters via FilterRegistry
  • Dense-link resolution, soft-link resolution, committed datatypes, global heap strings, and object references
  • Parallel chunk decoding, chunk caching, and object-header caching

NetCDF

  • CDF-1, CDF-2, CDF-5, and NetCDF-4
  • Automatic format detection
  • Unified typed reads across formats
  • Type promotion to f64, unpacking, masking, and combined CF helpers
  • Slice reads, lazy slice iteration, and parallel NC4 slice reads
  • Cache and filter configuration through NcOpenOptions

Feature flags

[dependencies]
netcdf-reader = "0.1"           # CDF-1/2/5 + NetCDF-4 (default)
netcdf-reader = { version = "0.1", default-features = false }  # CDF-1/2/5 only
Flag Default Description
netcdf4 yes NetCDF-4 support via hdf5-reader
rayon yes Parallel chunk reading
lz4 yes LZ4 filter support (hdf5-reader)
cf no CF Conventions helpers (axis identification, time decoding, CRS extraction, bounds)

Custom filters

Register filters before opening files:

use hdf5_reader::{Hdf5File, OpenOptions};
use hdf5_reader::filters::FilterRegistry;

let mut registry = FilterRegistry::new();
registry.register(32001, Box::new(|data, _elem_size| {
    // Custom decompression logic
    Ok(data.to_vec())
}));

let file = Hdf5File::open_with_options("data.h5", OpenOptions {
    filter_registry: Some(registry),
    ..Default::default()
})?;

Testing

# Unit tests (no external dependencies)
cargo test

# Integration tests with generated fixtures
pip install h5py netCDF4 numpy
python testdata/generate_fixtures.py
cargo test

Releasing

See RELEASING.md for the release checklist and the required publish order for hdf5-reader and netcdf-reader.

Benchmarks

netcdf-reader includes a Criterion benchmark that compares this implementation against the C-backed netcdf crate on the checked-in fixtures plus larger generated benchmark fixtures.

# Full benchmark matrix
cargo bench -p netcdf-reader --bench compare_georust

# Summarize the latest Criterion results
python3 scripts/criterion_summary.py

# Include x1-relative speedup for threaded workloads
python3 scripts/criterion_summary.py --speedup \
  --group parallel_metadata_batch \
  --group parallel_slice_batch \
  --group read_full_internal_parallel \
  --group parallel_open_and_read

Notes:

  • The benchmark uses netcdf with its static feature, so it builds a bundled netcdf-c stack.
  • The suite separates open cost, metadata walks, warm full reads, end-to-end reads, slices, and threaded workloads.
  • Larger benchmark fixtures are generated at runtime.
  • In the netcdf 0.12.0 baseline used for these runs, libnetcdf entry points go through a shared process-global lock. The contention benchmarks are intended to make that visible.

Known limitations

  • External HDF5 links are skipped (soft links are resolved)
  • SZIP, N-Bit, and ScaleOffset filters are not built in (register via FilterRegistry)
  • SOHM (shared object header message table) resolution returns a descriptive error
  • Fractal heap huge/tiny objects are not yet supported (managed objects work)
  • Classic format slice reads fall back to full-read-then-slice when the selection has a stride >1 or restricts inner dimensions
  • CF time decoding uses Gregorian approximation for non-standard calendars (noleap, 360_day)

License

MIT OR Apache-2.0