biolic 0.1.0

A modular bioinformatics toolkit in Rust for long-read sequence processing
Documentation

biolic

A modular bioinformatics toolkit in Rust for processing long-read sequencing data.

CI License: MIT/Apache-2.0

What is biolic?

biolic unifies the most common operations on sequencing data — statistics, filtering, conversion, sampling, search, and quality control — into a single fast, memory-efficient binary with first-class support for PacBio HiFi and Oxford Nanopore long reads.

Today, a bioinformatician working with long-read data must combine samtools, nanoq, chopper, seqtk, rasusa, and seqkit — each with different CLIs, install methods, and quirks. biolic replaces all of them with one binary, one consistent CLI, and streaming-first performance.

Status

Phase 1 in development. Currently implemented:

  • biolic stats — full statistics with N50, N90, quality distribution
  • biolic count — fast read and base counting

Coming next (see biolic_plan.md):

  • biolic filter, biolic convert, biolic sample, biolic grep, biolic head, biolic tail
  • biolic qc — adaptive QC with mixture models and anomaly detection (novel contribution)
  • biolic logs — execution history with SQLite backend (novel contribution)

Installation

From source

git clone https://github.com/jpvich/biolic
cd biolic
cargo build --release
./target/release/biolic --help

From crates.io (once published)

cargo install biolic

From Bioconda (once published)

conda install -c bioconda biolic

Usage

# Compute statistics
biolic stats reads.fastq.gz

# JSON output for pipelines
biolic stats reads.fastq.gz --json

# Fast counting
biolic count reads.fastq.gz

# Pipe through tools
cat reads.fastq | biolic stats

Example output:

File:              reads.fastq.gz
Reads:             1,234,567
Total bases:       9,876,543,210
Min length:        52
Max length:        87,432
Mean length:       8,001.2
Median length:     6,543.0
N50:               12,345
N90:               2,109
Mean quality:      28.54
Bases above Q10:   99.82%
Bases above Q20:   94.31%
Bases above Q30:   71.15%
GC content:        42.18%

Design principles

  1. Streaming first: constant memory regardless of file size.
  2. Single binary: zero runtime dependencies, no Python, no Docker.
  3. Long-read native: N50, native BAM input, per-position analysis.
  4. Modern UX: JSON output, automatic format detection, predictable CLI.
  5. Modular: each operation is an independent module, easy to add new ones.

Supported formats

Format Read Write
FASTQ
FASTQ.gz (gzip)
FASTQ.bz2 (bzip2)
FASTQ.xz / .zst
FASTA planned planned
BAM (unaligned) planned planned

Performance

biolic is designed to be:

  • Comparable in speed to nanoq (current Rust reference for stats/filter)
  • Faster than seqkit (avoids Go GC overhead)
  • Dramatically faster than Python tools (NanoFilt, NanoStat)
  • Lower memory than all of the above through streaming

Benchmarks against nanoq, chopper, seqkit, and seqtk will be published with v0.1 release. See docs/benchmarks.md.

Project plan

The complete project plan is in biolic_plan.md. It contains:

  • Vision, design philosophy, architecture
  • Detailed specifications for every module
  • All reference tools studied, with GitHub links
  • Distribution strategy, testing strategy, performance targets
  • Roadmap beyond Phase 1

Contributing

Contributions are welcome. Please open an issue before submitting major changes.

The project uses standard Rust tooling:

  • cargo fmt before commits
  • cargo clippy must pass
  • cargo test must pass

License

Dual-licensed under MIT or Apache-2.0, at your option. See LICENSE-MIT and LICENSE-APACHE.

Citation

If you use biolic in research, please cite (forthcoming):

jpvich (2026). biolic: A modular bioinformatics toolkit in Rust.
[Software paper venue forthcoming]