# netcdf-rust
Pure-Rust, read-only decoders for HDF5 and NetCDF. No C libraries, no build scripts, and no unsafe beyond `memmap2`.
## Crates
| `hdf5-reader` | Low-level HDF5 decoder (superblock, object headers, B-trees, chunked I/O, filters) |
| `netcdf-reader` | NetCDF reader supporting CDF-1/2/5 classic and NetCDF-4 (HDF5-backed) formats |
## Usage
```rust
use netcdf_reader::{NcFile, NcSliceInfo, NcSliceInfoElem};
let file = NcFile::open("era5.nc")?;
println!("format: {:?}", file.format());
for var in file.variables() {
println!(" var: {} {:?}", var.name(), var.shape());
}
// Read typed data (works for both classic and NetCDF-4)
let temp: ndarray::ArrayD<f32> = file.read_variable("temperature")?;
// Type-promoting read (any numeric type → f64)
let data = file.read_variable_as_f64("temperature")?;
// CF conventions: unpack packed integer data (scale_factor + add_offset)
let unpacked = file.read_variable_unpacked("temperature")?;
// CF conventions: mask fill values + unpack in one call
let clean = file.read_variable_unpacked_masked("temperature")?;
// Hyperslab: read a single time step from a 4D variable
let sel = NcSliceInfo {
selections: vec![
NcSliceInfoElem::Index(0), // time=0
NcSliceInfoElem::Slice { start: 0, end: u64::MAX, step: 1 }, // all levels
NcSliceInfoElem::Slice { start: 0, end: u64::MAX, step: 1 }, // all lat
NcSliceInfoElem::Slice { start: 0, end: u64::MAX, step: 1 }, // all lon
],
};
let step: ndarray::ArrayD<f32> = file.read_variable_slice("temperature", &sel)?;
// Lazy iteration over time steps
for slice in file.iter_slices::<f32>("temperature", 0)? {
let data = slice?;
println!(" step shape: {:?}", data.shape());
}
```
Using `hdf5-reader` directly:
```rust
use hdf5_reader::Hdf5File;
let file = Hdf5File::open("data.h5")?;
let ds = file.dataset("/group1/temperature")?;
let data: ndarray::ArrayD<f64> = ds.read_array()?;
// Hyperslab selection
use hdf5_reader::{SliceInfo, SliceInfoElem};
let sel = SliceInfo {
selections: vec![
SliceInfoElem::Slice { start: 0, end: 10, step: 1 },
SliceInfoElem::Index(5),
],
};
let slice: ndarray::ArrayD<f64> = ds.read_slice(&sel)?;
```
## Features
**HDF5**
- Superblock v0-v3 and object header v1/v2 with checksum verification
- Compact, contiguous, and chunked layouts
- All chunk index types: v1/v2 B-tree, single-chunk, implicit, Fixed Array, Extensible Array
- Deflate, shuffle, Fletcher-32, and optional LZ4 filters
- Custom filters via `FilterRegistry`
- Dense-link resolution, soft-link resolution, committed datatypes, global heap strings, and object references
- Parallel chunk decoding, chunk caching, and object-header caching
**NetCDF**
- CDF-1, CDF-2, CDF-5, and NetCDF-4
- Automatic format detection
- Unified typed reads across formats
- Type promotion to `f64`, unpacking, masking, and combined CF helpers
- Slice reads, lazy slice iteration, and parallel NC4 slice reads
- Cache and filter configuration through `NcOpenOptions`
## Feature flags
```toml
[dependencies]
netcdf-reader = "0.1" # CDF-1/2/5 + NetCDF-4 (default)
netcdf-reader = { version = "0.1", default-features = false } # CDF-1/2/5 only
```
| `netcdf4` | yes | NetCDF-4 support via `hdf5-reader` |
| `rayon` | yes | Parallel chunk reading |
| `lz4` | yes | LZ4 filter support (hdf5-reader) |
| `cf` | no | CF Conventions helpers (axis identification, time decoding, CRS extraction, bounds) |
## Custom filters
Register filters before opening files:
```rust
use hdf5_reader::{Hdf5File, OpenOptions};
use hdf5_reader::filters::FilterRegistry;
let mut registry = FilterRegistry::new();
filter_registry: Some(registry),
..Default::default()
})?;
```
## Testing
```sh
# Unit tests (no external dependencies)
cargo test
# Integration tests with generated fixtures
pip install h5py netCDF4 numpy
python testdata/generate_fixtures.py
cargo test
```
## Releasing
See [RELEASING.md](RELEASING.md) for the release checklist and the required
publish order for `hdf5-reader` and `netcdf-reader`.
## Benchmarks
`netcdf-reader` includes a Criterion benchmark that compares this implementation
against the C-backed [`netcdf`](https://github.com/georust/netcdf) crate on the
checked-in fixtures plus larger generated benchmark fixtures.
```sh
# Full benchmark matrix
cargo bench -p netcdf-reader --bench compare_georust
# Summarize the latest Criterion results
python3 scripts/criterion_summary.py
# Include x1-relative speedup for threaded workloads
python3 scripts/criterion_summary.py --speedup \
--group parallel_metadata_batch \
--group parallel_slice_batch \
--group read_full_internal_parallel \
--group parallel_open_and_read
```
Notes:
- The benchmark uses `netcdf` with its `static` feature, so it builds a bundled `netcdf-c` stack.
- The suite separates open cost, metadata walks, warm full reads, end-to-end reads, slices, and threaded workloads.
- Larger benchmark fixtures are generated at runtime.
- In the `netcdf 0.12.0` baseline used for these runs, libnetcdf entry points go through a shared process-global lock. The contention benchmarks are intended to make that visible.
## Known limitations
- External HDF5 links are skipped (soft links are resolved)
- SZIP, N-Bit, and ScaleOffset filters are not built in (register via `FilterRegistry`)
- SOHM (shared object header message table) resolution returns a descriptive error
- Fractal heap huge/tiny objects are not yet supported (managed objects work)
- Classic format slice reads fall back to full-read-then-slice when the selection has a stride >1 or restricts inner dimensions
- CF time decoding uses Gregorian approximation for non-standard calendars (noleap, 360_day)
## License
MIT OR Apache-2.0