biolic

A modular bioinformatics toolkit in Rust for processing long-read sequencing data.

What is biolic?

biolic unifies the most common operations on sequencing data — statistics, filtering, conversion, sampling, search, and quality control — into a single fast, memory-efficient binary with first-class support for PacBio HiFi and Oxford Nanopore long reads.

Today, a bioinformatician working with long-read data must combine samtools, nanoq, chopper, seqtk, rasusa, and seqkit — each with different CLIs, install methods, and quirks. biolic replaces all of them with one binary, one consistent CLI, and streaming-first performance.

Status

Phase 1 in development. Currently implemented:

biolic stats — full statistics with N50, N90, quality distribution
biolic count — fast read and base counting

Coming next (see biolic_plan.md):

biolic filter, biolic convert, biolic sample, biolic grep, biolic head, biolic tail
biolic qc — adaptive QC with mixture models and anomaly detection (novel contribution)
biolic logs — execution history with SQLite backend (novel contribution)

Installation

From source

git clone https://github.com/jpvich/biolic
cd biolic
cargo build --release
./target/release/biolic --help

From crates.io (once published)

cargo install biolic

From Bioconda (once published)

conda install -c bioconda biolic

Usage

# Compute statistics
biolic stats reads.fastq.gz

# JSON output for pipelines
biolic stats reads.fastq.gz --json

# Fast counting
biolic count reads.fastq.gz

# Pipe through tools
cat reads.fastq | biolic stats

Example output:

File:              reads.fastq.gz
Reads:             1,234,567
Total bases:       9,876,543,210
Min length:        52
Max length:        87,432
Mean length:       8,001.2
Median length:     6,543.0
N50:               12,345
N90:               2,109
Mean quality:      28.54
Bases above Q10:   99.82%
Bases above Q20:   94.31%
Bases above Q30:   71.15%
GC content:        42.18%

Design principles

Streaming first: constant memory regardless of file size.
Single binary: zero runtime dependencies, no Python, no Docker.
Long-read native: N50, native BAM input, per-position analysis.
Modern UX: JSON output, automatic format detection, predictable CLI.
Modular: each operation is an independent module, easy to add new ones.

Supported formats

Format	Read	Write
FASTQ	✓	✓
FASTQ.gz (gzip)	✓	✓
FASTQ.bz2 (bzip2)	✓	—
FASTQ.xz / .zst	✓	—
FASTA	planned	planned
BAM (unaligned)	planned	planned

Performance

biolic is designed to be:

Comparable in speed to nanoq (current Rust reference for stats/filter)
Faster than seqkit (avoids Go GC overhead)
Dramatically faster than Python tools (NanoFilt, NanoStat)
Lower memory than all of the above through streaming

Benchmarks against nanoq, chopper, seqkit, and seqtk will be published with v0.1 release. See docs/benchmarks.md.

Project plan

The complete project plan is in biolic_plan.md. It contains:

Vision, design philosophy, architecture
Detailed specifications for every module
All reference tools studied, with GitHub links
Distribution strategy, testing strategy, performance targets
Roadmap beyond Phase 1

Contributing

Contributions are welcome. Please open an issue before submitting major changes.

The project uses standard Rust tooling:

cargo fmt before commits
cargo clippy must pass
cargo test must pass

License

Dual-licensed under MIT or Apache-2.0, at your option. See LICENSE-MIT and LICENSE-APACHE.

Citation

If you use biolic in research, please cite (forthcoming):

jpvich (2026). biolic: A modular bioinformatics toolkit in Rust.
[Software paper venue forthcoming]

biolic 0.1.0