bio-rs
Rust workspace for practical biological AI input tooling.
Status: v0.8.1 (workspace/package version in
Cargo.toml)
This repository focuses on functionality that is already implemented and testable today:
- FASTA parsing (
parse_fasta_records) - protein-20 tokenization (
tokenize_fasta_records) - package manifest inspect/validate/bridge planning
- fixture verification (
package verify)
What exists in v0.8.1
Core (biors-core)
biors-core is the engine crate. It contains data contracts and pure Rust logic:
- FASTA record parsing and normalization
- protein-20 tokenization and residue issue reporting
- package manifest structs + validation/inspection
- runtime bridge planning report generation
- fixture verification report generation
Use this crate when embedding bio-rs in Rust services, libraries, or tooling.
CLI (biors)
biors is the command-line surface built on top of biors-core.
- Reads FASTA/JSON files (or stdin for FASTA)
- Executes core workflows
- Emits machine-readable JSON output
- Uses non-zero exit codes on invalid operations
Use this crate when you need shell-first workflows, scripting, or CI checks.
Release history and roadmap
Delivered
0.6.0: package manifest inspect/validate0.7.0: runtime bridge planning (package bridge)0.8.0: fixture verification (package verify)0.8.1: documentation, contribution guide, and benchmark baseline hardening
Next (post-0.8)
0.9.xtarget: expand fixtures and verification ergonomics (larger fixture sets, clearer failure diagnostics)1.0.0target: stable contracts and runtime-facing APIs after enough real-world package validation
0.7.0 capability notes are kept only as release history above; all "current" descriptions in this README are aligned to 0.8.1.
Quickstart
Inspect FASTA records:
Tokenize FASTA records:
Tokenize FASTA records from stdin:
|
Tokenize a multi-record FASTA file:
Inspect a portable model package manifest:
Validate a portable model package manifest:
Plan the portable runtime bridge for a package:
Verify package fixture observations:
Evidence and benchmarks
Performance claims should be backed by reproducible data in-repo.
- Benchmark guide and latest recorded result:
benchmarks/fasta_vs_biopython.md - Reproducible benchmark harness:
scripts/benchmark_fasta_vs_biopython.py
The benchmark compares FASTA parse+tokenization throughput against a Biopython baseline using the UniProt human reference proteome (UP000005640 / taxonomy 9606).
On the latest recorded run, biors tokenize completed the FASTA parse +
protein-20 tokenization + full JSON output path in 0.291s, while a Biopython
parse + protein-20 token/count baseline took 0.494s.
This is a workload-specific baseline, not a broad claim that bio-rs is faster than Biopython across all FASTA parsing workloads.
JSON contracts
tokenize always emits an array of records:
inspect always emits a summary object:
package validate always emits a validation report:
package bridge always emits a runtime bridge report:
package verify always emits a fixture verification report:
Development checks
The check suite runs cargo fmt, native Rust checks, a biors-core
wasm32-unknown-unknown build check, tests, and cargo clippy with warnings
denied.
Run the Rust library example:
Workspace
packages/
rust/
biors/ CLI
biors-core/ Core engine + contracts
examples/
multi.fasta
protein-package/
fixtures/
observations.json
protein.fasta
Protein-20 alphabet
A C D E F G H I K L M N P Q R S T V W Y
Token ids follow that order, starting at 0.
Contributing
See CONTRIBUTING.md for local setup, checks, and PR expectations.
License
Dual licensed under MIT OR Apache-2.0.