biolic 0.1.0 - Docs.rs

# biolic

A modular bioinformatics toolkit in Rust for processing long-read sequencing data.

[![CI](https://github.com/jpvich/biolic/actions/workflows/ci.yml/badge.svg)](https://github.com/jpvich/biolic/actions/workflows/ci.yml)
[![License: MIT/Apache-2.0](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](LICENSE-MIT)

## What is biolic?

biolic unifies the most common operations on sequencing data — statistics, filtering,
conversion, sampling, search, and quality control — into a single fast, memory-efficient
binary with first-class support for PacBio HiFi and Oxford Nanopore long reads.

Today, a bioinformatician working with long-read data must combine `samtools`, `nanoq`,
`chopper`, `seqtk`, `rasusa`, and `seqkit` — each with different CLIs, install methods,
and quirks. biolic replaces all of them with one binary, one consistent CLI, and
streaming-first performance.

## Status

**Phase 1 in development.** Currently implemented:
- `biolic stats` — full statistics with N50, N90, quality distribution
- `biolic count` — fast read and base counting

Coming next (see `biolic_plan.md`):
- `biolic filter`, `biolic convert`, `biolic sample`, `biolic grep`, `biolic head`, `biolic tail`
- `biolic qc` — adaptive QC with mixture models and anomaly detection (novel contribution)
- `biolic logs` — execution history with SQLite backend (novel contribution)

## Installation

### From source

```bash
git clone https://github.com/jpvich/biolic
cd biolic
cargo build --release
./target/release/biolic --help
```

### From crates.io (once published)

```bash
cargo install biolic
```

### From Bioconda (once published)

```bash
conda install -c bioconda biolic
```

## Usage

```bash
# Compute statistics
biolic stats reads.fastq.gz

# JSON output for pipelines
biolic stats reads.fastq.gz --json

# Fast counting
biolic count reads.fastq.gz

# Pipe through tools
cat reads.fastq | biolic stats
```

Example output:

```
File:              reads.fastq.gz
Reads:             1,234,567
Total bases:       9,876,543,210
Min length:        52
Max length:        87,432
Mean length:       8,001.2
Median length:     6,543.0
N50:               12,345
N90:               2,109
Mean quality:      28.54
Bases above Q10:   99.82%
Bases above Q20:   94.31%
Bases above Q30:   71.15%
GC content:        42.18%
```

## Design principles

1. **Streaming first**: constant memory regardless of file size.
2. **Single binary**: zero runtime dependencies, no Python, no Docker.
3. **Long-read native**: N50, native BAM input, per-position analysis.
4. **Modern UX**: JSON output, automatic format detection, predictable CLI.
5. **Modular**: each operation is an independent module, easy to add new ones.

## Supported formats

| Format | Read | Write |
|---|---|---|
| FASTQ | ✓ | ✓ |
| FASTQ.gz (gzip) | ✓ | ✓ |
| FASTQ.bz2 (bzip2) | ✓ | — |
| FASTQ.xz / .zst | ✓ | — |
| FASTA | planned | planned |
| BAM (unaligned) | planned | planned |

## Performance

biolic is designed to be:
- Comparable in speed to nanoq (current Rust reference for stats/filter)
- Faster than seqkit (avoids Go GC overhead)
- Dramatically faster than Python tools (NanoFilt, NanoStat)
- Lower memory than all of the above through streaming

Benchmarks against `nanoq`, `chopper`, `seqkit`, and `seqtk` will be published with v0.1
release. See `docs/benchmarks.md`.

## Project plan

The complete project plan is in [`biolic_plan.md`](./biolic_plan.md). It contains:
- Vision, design philosophy, architecture
- Detailed specifications for every module
- All reference tools studied, with GitHub links
- Distribution strategy, testing strategy, performance targets
- Roadmap beyond Phase 1

## Contributing

Contributions are welcome. Please open an issue before submitting major changes.

The project uses standard Rust tooling:
- `cargo fmt` before commits
- `cargo clippy` must pass
- `cargo test` must pass

## License

Dual-licensed under MIT or Apache-2.0, at your option. See [LICENSE-MIT](LICENSE-MIT)
and [LICENSE-APACHE](LICENSE-APACHE).

## Citation

If you use biolic in research, please cite (forthcoming):

```
jpvich (2026). biolic: A modular bioinformatics toolkit in Rust.
[Software paper venue forthcoming]
```