# bio-rs
[](https://github.com/bio-rs/bio-rs/actions)
[](https://github.com/bio-rs/bio-rs/actions/workflows/release.yml)
[](LICENSE-MIT)
Rust workspace for practical biological AI input tooling.
> Status: **v0.9.0** (workspace/package version in `Cargo.toml`)
This repository focuses on functionality that is already implemented and testable today:
- FASTA parsing (`parse_fasta_records`)
- protein-20 tokenization (`tokenize_fasta_records`)
- FASTA validation (`biors fasta validate`)
- model-ready input shaping (`ModelInput`)
- package manifest inspect/validate/bridge planning
- fixture verification (`package verify`)
- frozen CLI JSON success/error envelope candidates
## What exists in v0.9.0
### Core (`biors-core`)
`biors-core` is the engine crate. It contains data contracts and pure Rust logic:
- FASTA record parsing and normalization
- FASTA validation with line and record-index diagnostics
- protein-20 tokenization and residue issue reporting
- model-ready input records with attention masks and padding/truncation policy
- package manifest structs + validation/inspection
- runtime bridge planning report generation
- fixture verification report generation
Use this crate when embedding bio-rs in Rust services, libraries, or tooling.
### CLI (`biors`)
`biors` is the command-line surface built on top of `biors-core`.
- Reads FASTA/JSON files (or stdin for FASTA)
- Executes core workflows
- Emits machine-readable JSON success envelopes
- Supports JSON error mode with structured error codes
- Uses non-zero exit codes on invalid operations
Use this crate when you need shell-first workflows, scripting, or CI checks.
## Release history and roadmap
### Delivered
- `0.6.0`: package manifest inspect/validate
- `0.7.0`: runtime bridge planning (`package bridge`)
- `0.8.0`: fixture verification (`package verify`)
- `0.8.1`: documentation, contribution guide, and benchmark baseline hardening
- `0.9.0`: CLI and JSON contract freeze candidates
### Next (post-0.9)
- `1.0.0` target: stable contracts and runtime-facing APIs after enough real-world package validation
`0.7.0` capability notes are kept only as release history above; all "current" descriptions in this README are aligned to **0.9.0**.
### Not yet
These are roadmap directions, not current capabilities:
- hosted web workflows
- Python bindings
- model inference backends
- package registry or plugin ecosystem
- general-purpose chemistry or structure tooling
## Quickstart
Inspect FASTA records:
```bash
cargo run -p biors -- inspect examples/protein.fasta
```
Tokenize FASTA records:
```bash
cargo run -p biors -- tokenize examples/protein.fasta
```
Tokenize FASTA records from stdin:
```bash
Tokenize a multi-record FASTA file:
```bash
cargo run -p biors -- tokenize examples/multi.fasta
```
Validate FASTA records:
```bash
cargo run -p biors -- fasta validate examples/protein.fasta
```
Emit structured JSON errors:
```bash
printf 'ACDE\n' | cargo run -p biors -- --json tokenize -
```
Inspect a portable model package manifest:
```bash
cargo run -p biors -- package inspect examples/protein-package/manifest.json
```
Validate a portable model package manifest:
```bash
cargo run -p biors -- package validate examples/protein-package/manifest.json
```
Plan the portable runtime bridge for a package:
```bash
cargo run -p biors -- package bridge examples/protein-package/manifest.json
```
Verify package fixture observations:
```bash
cargo run -p biors -- package verify \
examples/protein-package/manifest.json \
examples/protein-package/observations.json
```
## Proof asset
This is the smallest reproducible package verification example in the repository.
Command:
```bash
cargo run -p biors -- package verify \
examples/protein-package/manifest.json \
examples/protein-package/observations.json
```
Input:
- package manifest: `examples/protein-package/manifest.json`
- observed fixture map: `examples/protein-package/observations.json`
- expected output fixture: `examples/protein-package/fixtures/tiny.output.json`
Output shape:
```json
{
"ok": true,
"biors_version": "0.9.0",
"data": {
"package": "protein-seed",
"fixtures": 1,
"passed": 1,
"failed": 0,
"results": [
{
"name": "tiny-protein",
"input": "fixtures/tiny.fasta",
"expected_output": "fixtures/tiny.output.json",
"observed_output": "fixtures/tiny.output.json",
"status": "passed",
"issue": null
}
]
}
}
```
This proves that a portable package manifest can point to fixture inputs and
expected JSON outputs, and that `biors` can check observed outputs against that
contract. It is a small contract test, not a model inference benchmark.
## Evidence and benchmarks
Performance claims should be backed by reproducible data in-repo.
- Benchmark guide and latest recorded result: `benchmarks/fasta_vs_biopython.md`
- Reproducible benchmark harness: `scripts/benchmark_fasta_vs_biopython.py`
The benchmark compares FASTA parse+tokenization throughput against a
Biopython baseline using the UniProt human reference proteome
(UP000005640 / taxonomy 9606).
On the latest recorded run, `biors tokenize` completed the FASTA parse +
protein-20 tokenization + full JSON output path in 0.291s, while a Biopython
parse + protein-20 token/count baseline took 0.494s.
This is a workload-specific baseline, not a broad claim that bio-rs is faster
than Biopython across all FASTA parsing workloads.
## JSON contracts
CLI success output always uses the success envelope:
```json
{
"ok": true,
"biors_version": "0.9.0",
"input_hash": "fnv1a64:846a502e5067bc21",
"data": {}
}
```
`--json` error mode always emits:
```json
{
"ok": false,
"error": {
"code": "fasta.missing_header",
"message": "FASTA input must start with a header line beginning with '>' at line 1",
"location": {
"line": 1,
"record_index": null
}
}
}
```
`tokenize` data is an array of records:
```json
[
{
"id": "seq1",
"length": 4,
"alphabet": "protein-20",
"valid": true,
"tokens": [0, 1, 2, 3],
"warnings": [],
"errors": []
}
]
```
`inspect` data is a summary object:
```json
{
"records": 1,
"total_length": 4,
"valid_records": 1,
"warning_count": 0,
"error_count": 0
}
```
`package validate` data is a validation report:
```json
{
"valid": true,
"issues": []
}
```
`package bridge` data is a runtime bridge report:
```json
{
"ready": true,
"backend": "onnx-webgpu",
"target": "browser-wasm-webgpu",
"execution_provider": "webgpu",
"blocking_issues": []
}
```
`package verify` data is a fixture verification report:
```json
{
"package": "protein-seed",
"fixtures": 1,
"passed": 1,
"failed": 0,
"results": [
{
"name": "tiny-protein",
"input": "fixtures/tiny.fasta",
"expected_output": "fixtures/tiny.output.json",
"observed_output": "fixtures/tiny.output.json",
"status": "passed",
"issue": null
}
]
}
```
## Development checks
```bash
scripts/check.sh
```
The check suite runs `cargo fmt`, native Rust checks, a `biors-core`
`wasm32-unknown-unknown` build check, tests, and `cargo clippy` with warnings
denied.
Run the Rust library example:
```bash
cargo run -p biors-core --example tokenize
```
## Workspace
```txt
packages/
rust/
biors/ CLI
biors-core/ Core engine + contracts
schemas/
cli-error.v0.json
cli-success.v0.json
inspect-output.v0.json
package-manifest.v0.json
package-validation-report.v0.json
tokenize-output.v0.json
examples/
multi.fasta
protein-package/
fixtures/
observations.json
protein.fasta
```
## Protein-20 alphabet
```txt
A C D E F G H I K L M N P Q R S T V W Y
```
Token ids follow that order, starting at `0`.
## Contributing
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for local setup, checks, and PR expectations.
## Public contracts
- CLI contract: [`docs/cli-contract.md`](docs/cli-contract.md)
- Error code registry: [`docs/error-codes.md`](docs/error-codes.md)
- 1.0 candidates: [`docs/public-contract-1.0-candidates.md`](docs/public-contract-1.0-candidates.md)
- Security policy: [`SECURITY.md`](SECURITY.md)
## License
Dual licensed under MIT OR Apache-2.0.