biolic
A modular bioinformatics toolkit in Rust for processing long-read sequencing data.
What is biolic?
biolic unifies the most common operations on sequencing data — statistics, filtering, conversion, sampling, search, and quality control — into a single fast, memory-efficient binary with first-class support for PacBio HiFi and Oxford Nanopore long reads.
Today, a bioinformatician working with long-read data must combine samtools, nanoq,
chopper, seqtk, rasusa, and seqkit — each with different CLIs, install methods,
and quirks. biolic replaces all of them with one binary, one consistent CLI, and
streaming-first performance.
Status
Phase 1 in development. Currently implemented:
biolic stats— full statistics with N50, N90, quality distributionbiolic count— fast read and base counting
Coming next (see biolic_plan.md):
biolic filter,biolic convert,biolic sample,biolic grep,biolic head,biolic tailbiolic qc— adaptive QC with mixture models and anomaly detection (novel contribution)biolic logs— execution history with SQLite backend (novel contribution)
Installation
From source
From crates.io (once published)
From Bioconda (once published)
Usage
# Compute statistics
# JSON output for pipelines
# Fast counting
# Pipe through tools
|
Example output:
File: reads.fastq.gz
Reads: 1,234,567
Total bases: 9,876,543,210
Min length: 52
Max length: 87,432
Mean length: 8,001.2
Median length: 6,543.0
N50: 12,345
N90: 2,109
Mean quality: 28.54
Bases above Q10: 99.82%
Bases above Q20: 94.31%
Bases above Q30: 71.15%
GC content: 42.18%
Design principles
- Streaming first: constant memory regardless of file size.
- Single binary: zero runtime dependencies, no Python, no Docker.
- Long-read native: N50, native BAM input, per-position analysis.
- Modern UX: JSON output, automatic format detection, predictable CLI.
- Modular: each operation is an independent module, easy to add new ones.
Supported formats
| Format | Read | Write |
|---|---|---|
| FASTQ | ✓ | ✓ |
| FASTQ.gz (gzip) | ✓ | ✓ |
| FASTQ.bz2 (bzip2) | ✓ | — |
| FASTQ.xz / .zst | ✓ | — |
| FASTA | planned | planned |
| BAM (unaligned) | planned | planned |
Performance
biolic is designed to be:
- Comparable in speed to nanoq (current Rust reference for stats/filter)
- Faster than seqkit (avoids Go GC overhead)
- Dramatically faster than Python tools (NanoFilt, NanoStat)
- Lower memory than all of the above through streaming
Benchmarks against nanoq, chopper, seqkit, and seqtk will be published with v0.1
release. See docs/benchmarks.md.
Project plan
The complete project plan is in biolic_plan.md. It contains:
- Vision, design philosophy, architecture
- Detailed specifications for every module
- All reference tools studied, with GitHub links
- Distribution strategy, testing strategy, performance targets
- Roadmap beyond Phase 1
Contributing
Contributions are welcome. Please open an issue before submitting major changes.
The project uses standard Rust tooling:
cargo fmtbefore commitscargo clippymust passcargo testmust pass
License
Dual-licensed under MIT or Apache-2.0, at your option. See LICENSE-MIT and LICENSE-APACHE.
Citation
If you use biolic in research, please cite (forthcoming):
jpvich (2026). biolic: A modular bioinformatics toolkit in Rust.
[Software paper venue forthcoming]