immunum
High-performance antibody and TCR sequence numbering in Rust, Python, and WebAssembly.
Overview
immunum is a library for numbering antibody and T-cell receptor (TCR) variable domain sequences. It uses Needleman-Wunsch semi-global alignment against position-specific scoring matrices (PSSM) built from consensus sequences, with BLOSUM62-based substitution scores.
>99% position accuracy across 6,000+ validation sequences. Processes a full dataset in ~0.6s.
Available as:
- Rust crate — core library and CLI
- Python package — via PyPI (
pip install immunum), with a Polars plugin for vectorized batch processing - npm package — for Node.js and browsers
Supported chains
| Antibody | TCR |
|---|---|
| IGH (heavy) | TRA (alpha) |
| IGK (kappa) | TRB (beta) |
| IGL (lambda) | TRD (delta) |
| TRG (gamma) |
Numbering schemes
- IMGT — all 7 chain types
- Kabat — antibody chains (IGH, IGK, IGL)
Chain type is automatically detected by aligning against all loaded chains and selecting the best match.
Table of Contents
Python
Installation
Numbering
=
=
=
# H
# 0.97
# {"1": "E", "2": "V", "3": "Q", ...}
Segmentation
segment splits the sequence into FR/CDR regions:
=
# EVQLVESGGGLVKPGGSLKLSCAAS
# GFTFSSYAMS
# WVRQAPGKGLEWVS
# AISGSGGS
# TYYADSVKGRFTISRDNAKN
# ...
# ...
Chains: "H" (heavy), "K" (kappa), "L" (lambda), "A" (TRA), "B" (TRB), "G" (TRG), "D" (TRD).
Polars plugin
For batch processing, immunum.polars registers elementwise Polars expressions:
=
# Add a struct column with chain, scheme, confidence, numbering
=
# Add a struct column with FR/CDR segments
=
The number expression returns a struct with fields chain, scheme, confidence, and numbering (a struct of position→residue). The segment expression returns a struct with fields fr1, cdr1, fr2, cdr2, fr3, cdr3, fr4, prefix, postfix.
JavaScript / npm
Installation
Usage
import init from "immunum";
await ; // load the wasm module
const annotator = ;
const sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";
const result = annotator.;
console.log; // "H"
console.log; // 0.97
console.log; // { "1": "E", "2": "V", ... }
const segments = annotator.;
console.log;
annotator.; // or use `using annotator = new Annotator(...)` with explicit resource management
Rust
Usage
use ;
let annotator = new.unwrap;
let sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";
let result = annotator.number.unwrap;
println!; // IGH
println!;
for in sequence.chars.zip
Add to Cargo.toml:
[]
= "0.9"
CLI
Options
| Flag | Description | Default |
|---|---|---|
-s, --scheme |
Numbering scheme: imgt (i), kabat (k) |
imgt |
-c, --chain |
Chain filter: h,k,l,a,b,g,d or groups: ig, tcr, all. Accepts any form (h, heavy, igh), case-insensitive. |
ig |
-f, --format |
Output format: tsv, json, jsonl |
tsv |
Input
Accepts a raw sequence, a FASTA file, or stdin (auto-detected):
|
Output
Writes to stdout by default, or to a file if a second positional argument is given:
Examples
# Kabat scheme, JSON output
# All chains (antibody + TCR), JSONL output
# TCR sequences only, save to file
# Extract sequences from a TSV column and pipe in (see fixtures/ig.tsv)
| |
|
# Filter TSV output to CDR3 positions (111-128 in IMGT)
|
# Filter to heavy chain results only
|
# Extract CDR3 sequences with jq
|
Development
To orchestrate a project between cargo and python, we use task.
You can install it with:
And then run task or task --list-all to get the full list of available tasks.
By default, dev profile will be used in all but benchmark-* tasks, but you can change it
via providing PROFILE=release to your task.
Also, by default, task caches results, but you can ignore it by running task my-task -f.
Building local environment
# build a dev environment
# build a dev environment with --release flag
Testing
Linting
Benchmarking
There are multiple benchmarks in the repository. For full list, see task | grep benchmark:
|
)
)
)
)
Project structure
src/
├── main.rs # CLI binary (immunum number ...)
├── lib.rs # Public API
├── annotator.rs # Sequence annotation and chain detection
├── alignment.rs # Needleman-Wunsch semi-global alignment
├── io.rs # Input parsing (FASTA, raw) and output formatting (TSV, JSON, JSONL)
├── numbering.rs # Numbering module entry point
├── numbering/
│ ├── imgt.rs # IMGT numbering rules
│ └── kabat.rs # Kabat numbering rules
├── scoring.rs # PSSM and scoring matrices
├── types.rs # Core domain types (Chain, Scheme, Position)
├── validation.rs # Validation utilities
├── error.rs # Error types
└── bin/
├── benchmark.rs # Validation metrics report
├── debug_validation.rs # Alignment mismatch visualization
└── speed_benchmark.rs # Performance benchmarks
resources/
└── consensus/ # Consensus sequence CSVs (compiled into scoring matrices)
fixtures/
├── validation/ # ANARCI-numbered reference datasets
├── ig.fasta # Example antibody sequences
└── ig.tsv # Example TSV input
scripts/ # Python tooling for generating consensus data
immunum/
├── _internal.pyi # python stub file for pyo3
├── polars.py # polars extension module
└── python.py # python module
Design decisions
- Semi-global alignment forces full query consumption, preventing long CDR3 regions from being treated as trailing gaps.
- Anchor positions at highly conserved FR residues receive 3× gap penalties to stabilize alignment.
- FR regions use alignment-based numbering; CDR regions use scheme-specific insertion rules.
- Scoring matrices are generated at compile time from consensus data via
build.rs.