biors 0.0.1

Command-line tools for bio-rs biological AI model input workflows.
biors-0.0.1 is not a library.

bio-rs

CI License: MIT/Apache-2.0

Rust tools for validating protein FASTA input and tokenizing it into stable protein-20 token ids.

Features

  • FASTA parsing for one protein sequence
  • protein-20 residue validation
  • lowercase sequence normalization
  • ambiguous residue reporting for X, B, Z, J, U, and O
  • invalid residue reporting
  • JSON output from the CLI

Quickstart

Inspect a protein sequence:

cargo run -p biors -- inspect examples/protein.fasta

Tokenize a protein sequence:

cargo run -p biors -- tokenize examples/protein.fasta

Use the Rust library:

[dependencies]
biors-core = "0.0.1"

Checks

scripts/check.sh

The check suite runs cargo fmt, cargo check, cargo test, and cargo clippy with warnings denied.

Workspace

packages/
  rust/
    biors/       CLI
    biors-core/  FASTA parsing and tokenization library
examples/
  protein.fasta

Protein-20

A C D E F G H I K L M N P Q R S T V W Y

Token ids follow that order, starting at 0.

License

Dual licensed under MIT OR Apache-2.0.