biors 0.2.0

Command-line tools for bio-rs biological AI model input workflows.
biors-0.2.0 is not a library.

bio-rs

CI License: MIT/Apache-2.0

Rust tools for validating protein FASTA input and tokenizing single-record and multi-record FASTA files into stable protein-20 token ids.

Features

  • FASTA parsing for one or more protein sequences
  • protein-20 residue validation
  • lowercase sequence normalization
  • ambiguous residue reporting for X, B, Z, J, U, and O
  • invalid residue reporting
  • JSON output from the CLI, including array output for multi-FASTA tokenization

Quickstart

Inspect a protein sequence:

cargo run -p biors -- inspect examples/protein.fasta

Tokenize a protein sequence:

cargo run -p biors -- tokenize examples/protein.fasta

Tokenize a multi-FASTA file:

cargo run -p biors -- tokenize examples/multi.fasta

Use the Rust library:

[dependencies]
biors-core = "0.0.1"

Checks

scripts/check.sh

The check suite runs cargo fmt, cargo check, cargo test, and cargo clippy with warnings denied.

Workspace

packages/
  rust/
    biors/       CLI
    biors-core/  FASTA parsing and tokenization library
examples/
  multi.fasta
  protein.fasta

Protein-20

A C D E F G H I K L M N P Q R S T V W Y

Token ids follow that order, starting at 0.

License

Dual licensed under MIT OR Apache-2.0.