pod5lib 0.1.0

Pure Rust library for reading and writing POD5 nanopore sequencing files
Documentation

pod5lib

Crates.io docs.rs CI License: MPL-2.0 Rust 1.87+

Pure Rust library for reading and writing POD5 nanopore sequencing files — Oxford Nanopore Technologies' successor to FAST5.

Features

  • Read POD5 files: iterate reads, random-access by UUID, decompress VBZ signal
  • Write POD5 files compatible with the Python pod5 library
  • Streaming writer (StreamingWriter) for memory-efficient writing of large files
  • No unsafe code; pure Rust VBZ decompression (delta + zigzag + SVB16 + zstd)

Quick start

Reading

use pod5lib::Reader;

let reader = Reader::open("sample.pod5")?;
println!("{} reads", reader.len());

for read in reader.reads_iter() {
    let adc = read.signal()?;          // raw i16 ADC samples (VBZ decompressed)
    let pa  = read.signal_pa()?;       // calibrated picoamps (f32)
    println!("{}{} samples", read.read_id_str(), adc.len());
}

// Random-access by UUID string — O(1)
if let Some(read) = reader.get_read_by_id("a1b2c3d4-0506-0708-090a-0b0c0d0e0f10") {
    println!("found: {} samples", read.num_samples);
}

Writing (buffered)

use pod5lib::{Reader, Writer};

let src = Reader::open("input.pod5")?;
let mut writer = Writer::new();
writer.file_identifier = src.file_identifier().to_string();

for read in src.reads_iter().take(10) {
    let signal = read.signal()?;
    writer.add_read(read.clone(), signal);
}
writer.write_file("output.pod5")?;

Writing (streaming — lower memory use)

use pod5lib::{Reader, StreamingWriter};

let src = Reader::open("input.pod5")?;
let mut writer = StreamingWriter::create("output.pod5")?;
writer.file_identifier = src.file_identifier().to_string();

for read in src.reads_iter() {
    let signal = read.signal()?;
    writer.write_read(read, &signal)?;   // signal compressed immediately, not held
}
writer.finish()?;

StreamingWriter VBZ-compresses each read's signal on write_read and releases it immediately, rather than accumulating raw samples for the whole file. Prefer it for large files.

Performance

Benchmarked on a 500,000-read / 24.7 GB POD5 file, 64-core server, compared against the Python pod5 library.

Operation Rust Python Ratio
Read — sequential 54.6 s, 300 MB RSS 58.0 s, 23,769 MB RSS speed parity, 79× less RAM
Read — parallel (16 threads) 31.1 s, 359 MB RSS 58.0 s, 23,769 MB RSS 1.9× faster, 66× less RAM
Write (50k reads) 12.7 s, 525 MB RSS 21.7 s, 2,531 MB RSS 1.7× faster, 5× less RAM

Python's RSS for the sequential read case (~24 GB) matches the uncompressed file size — the library accumulates decompressed signal arrays in memory. Rust decompresses lazily and discards each read's signal immediately, keeping RSS constant at ~300 MB regardless of file size.

Reader::par_reads_iter(threads) decompresses signal in parallel: blobs are gathered under a single I/O lock acquisition per batch, then decompressed concurrently with a rayon thread pool of the specified size. StreamingWriter similarly compresses signal in parallel rayon batches before streaming to a background writer thread.

POD5 format

POD5 files contain three Arrow IPC tables embedded in a binary container with a FlatBuffers footer:

Table Key columns
Reads read_id, signal (row indices into Signal table), calibration, pore, end reason, run_info
Signal read_id, signal (VBZ-compressed blob), samples
RunInfo acquisition_id, sample_rate, timestamps, context tags, tracking ID

Signal is compressed with VBZ: delta-encode → zigzag-encode → SVB16 → zstd.

Dependencies

Crate Purpose
arrow (ipc feature) Arrow IPC reading and writing
zstd zstd compression/decompression for VBZ signal
thiserror Error derive

Acknowledgements

The POD5 format is designed and developed by Oxford Nanopore Technologies. The reference implementation and format specification are available at github.com/nanoporetech/pod5-file-format under the Mozilla Public License v2.0.

pod5lib-rs is an independent clean-room Rust implementation of the format and is not affiliated with or endorsed by Oxford Nanopore Technologies.

License

This project is licensed under the Mozilla Public License v2.0, consistent with the ONT reference implementation.

Development

cargo fmt                  # format
cargo ci-fmt               # format check (CI)
cargo ci-clippy            # clippy -D warnings (CI)
cargo ci-test              # all tests including doc-tests (CI)

Integration tests against a real POD5 file:

mkdir -p tests/data
cp your_file.pod5 tests/data/sample.pod5
cargo ci-test

Tests that require tests/data/sample.pod5 are silently skipped when the file is absent.