biors 0.1.0

Command-line tools for bio-rs biological AI model input workflows.
biors-0.1.0 is not a library.

bio-rs

CI Crates.io Core Crates.io CLI npm PyPI License: MIT/Apache-2.0

Rust/WASM tools for biological AI models.

bio-rs turns Python-born bio-AI models into portable, inspectable tools for CLIs, browsers, servers, and agents.

Python is where many biological AI models are born. bio-rs is where the model-facing tools around them become reproducible, agent-callable, and easier to ship outside a research notebook.

bio-rs is open source under dual MIT OR Apache-2.0 licensing.

Why this exists

Bio-AI does not need Rust to replace Python research workflows. It needs a reliable tooling layer around model inputs, tokenizers, runners, browser demos, and agent interfaces.

Rust is useful here because it is good at:

  • predictable CLI and server tools
  • portable WASM/browser execution
  • safe input contracts for biological data
  • reproducible single-binary distribution
  • long-running services and agent-callable tools

Current proof

The first target is intentionally small:

FASTA -> validated protein sequence -> token ids -> model-ready input

Currently implemented:

  • FASTA parsing for one protein sequence
  • protein-20 residue validation
  • lowercase sequence normalization
  • ambiguous residue reporting for X, B, Z, J, U, and O
  • invalid residue reporting
  • token ids using a stable protein-20 order
  • JSON output for CLI/tool use
  • biors inspect and biors tokenize

Not implemented yet:

  • WASM bindings
  • MCP/agent tools
  • model inference runners
  • external model tokenizer parity
  • multi-FASTA batch processing

Quickstart

CLI (Rust)

Install the CLI:

cargo install biors

Inspect a protein sequence:

biors inspect examples/protein.fasta

Tokenize for AI model input:

biors tokenize examples/protein.fasta --format json

Library (Rust)

Add to your project:

[dependencies]
biors-core = "0.1"

Distribution

The project is distributed across multiple ecosystems:

  • crates.io: biors (CLI), biors-core (Library)
  • npm: biors (WASM bindings - coming soon)
  • PyPI: biors (Python bindings - coming soon)

Checks

This repo keeps the local pre-commit path and CI strict. Before committing, run:

scripts/check.sh

The check suite runs:

  • cargo fmt --check
  • cargo check --workspace --all-targets --all-features
  • cargo test --workspace --all-targets --all-features
  • cargo clippy --workspace --all-targets --all-features -- -D warnings

Local git hooks are stored in .githooks/. Enable them with:

git config core.hooksPath .githooks

Workspace Structure

The project is a monorepo managed under the packages/ directory:

packages/
  rust/
    biors/       Main CLI tool and unified entrypoint
    biors-core/  Core protein parsing and tokenization logic
  npm/           WebAssembly bindings for JavaScript/TypeScript
  python/        High-performance Python bindings via PyO3
examples/
  protein.fasta

Protein-20

The first alphabet is protein-20:

A C D E F G H I K L M N P Q R S T V W Y

Token ids follow that order, starting at 0.

Final goal

The long-term goal is to make useful biological AI models easier to package as portable tools:

  • CLI tools for local workflows
  • WASM tools for browsers and demos
  • server components for production systems
  • agent-callable interfaces for automated research workflows

The first milestone is not folding or training. It is the stable input layer that everything after it needs.