pod5lib
Pure Rust library for reading and writing POD5 nanopore sequencing files — Oxford Nanopore Technologies' successor to FAST5.
Features
- Read POD5 files: iterate reads, random-access by UUID, decompress VBZ signal
- Write POD5 files compatible with the Python
pod5library - Streaming writer (
StreamingWriter) for memory-efficient writing of large files - No unsafe code; pure Rust VBZ decompression (delta + zigzag + SVB16 + zstd)
Quick start
Reading
use Reader;
let reader = open?;
println!;
for read in reader.reads_iter
// Random-access by UUID string — O(1)
if let Some = reader.get_read_by_id
Writing (buffered)
use ;
let src = open?;
let mut writer = new;
writer.file_identifier = src.file_identifier.to_string;
for read in src.reads_iter.take
writer.write_file?;
Writing (streaming — lower memory use)
use ;
let src = open?;
let mut writer = create?;
writer.file_identifier = src.file_identifier.to_string;
for read in src.reads_iter
writer.finish?;
StreamingWriter VBZ-compresses each read's signal on write_read and releases it immediately, rather than accumulating raw samples for the whole file. Prefer it for large files.
Performance
Benchmarked on a 500,000-read / 24.7 GB POD5 file, 64-core server, compared against the Python pod5 library.
| Operation | Rust | Python | Ratio |
|---|---|---|---|
| Read — sequential | 54.6 s, 300 MB RSS | 58.0 s, 23,769 MB RSS | speed parity, 79× less RAM |
| Read — parallel (16 threads) | 31.1 s, 359 MB RSS | 58.0 s, 23,769 MB RSS | 1.9× faster, 66× less RAM |
| Write (50k reads) | 12.7 s, 525 MB RSS | 21.7 s, 2,531 MB RSS | 1.7× faster, 5× less RAM |
Python's RSS for the sequential read case (~24 GB) matches the uncompressed file size — the library accumulates decompressed signal arrays in memory. Rust decompresses lazily and discards each read's signal immediately, keeping RSS constant at ~300 MB regardless of file size.
Reader::par_reads_iter(threads) decompresses signal in parallel: blobs are gathered under a single I/O lock acquisition per batch, then decompressed concurrently with a rayon thread pool of the specified size. StreamingWriter similarly compresses signal in parallel rayon batches before streaming to a background writer thread.
POD5 format
POD5 files contain three Arrow IPC tables embedded in a binary container with a FlatBuffers footer:
| Table | Key columns |
|---|---|
| Reads | read_id, signal (row indices into Signal table), calibration, pore, end reason, run_info |
| Signal | read_id, signal (VBZ-compressed blob), samples |
| RunInfo | acquisition_id, sample_rate, timestamps, context tags, tracking ID |
Signal is compressed with VBZ: delta-encode → zigzag-encode → SVB16 → zstd.
Dependencies
| Crate | Purpose |
|---|---|
arrow (ipc feature) |
Arrow IPC reading and writing |
zstd |
zstd compression/decompression for VBZ signal |
thiserror |
Error derive |
Acknowledgements
The POD5 format is designed and developed by Oxford Nanopore Technologies. The reference implementation and format specification are available at github.com/nanoporetech/pod5-file-format under the Mozilla Public License v2.0.
pod5lib-rs is an independent clean-room Rust implementation of the format and is not
affiliated with or endorsed by Oxford Nanopore Technologies.
License
This project is licensed under the Mozilla Public License v2.0, consistent with the ONT reference implementation.
Development
Integration tests against a real POD5 file:
Tests that require tests/data/sample.pod5 are silently skipped when the file is absent.