righor-0.2.5 has been yanked.
RIGHOR
This package, based on IGoR, is meant to learn models of V(D)J recombination.
It can:
- generate sequences
- evaluate sequences (infer the most likely recombination scenarios)
- compute "pgen"
It's probably easier to use the companion python package (pip install righor), but working in Rust directly should also be viable.
How to use the python package:
Load a model:
=
# alternatively, you can load a model from igor files
# igor_model = righor.load_model_from_files(params.txt, marginals.txt, anchor_v.csv, anchor_j.csv)
Generate sequences fast:
# Create a generator object
= # or igor_model.generator() to run it without a seed
# Generate 10'000 functional sequences (not out-of-frame, no stop codons, right boundaries)
# generate_without_errors ignore Igor error model, use "generate" if this is needed
=
# Generate one sequence with a particular V/J genes family
= # return all the V genes that match TRBV5
= # all the J genes
=
=
Evaluate a given sequence:
## Evaluate a given sequence
=
# evaluate the sequence
=
# Most likely scenario
=
Infer a model:
# Inference of a model
# use a very small number of sequences to keep short (takes ~30s)
# here we just generate the sequences needed
=
=
=
# define parameters for the alignment and the inference (also possible for the evaluation)
=
= 70
=
# generate an uniform model as a starting point
# (it's generally *much* faster to start from an already inferred model)
=
=
= 0
# align multiple sequences at once
=
# multiple round of expectation-maximization to infer the model
=
=
= 0
=
=
Visualize and save the model
# visualisation of the results
=
# save the model in the Igor format
# will return an error if the directory already exists
# load the model
=
# save the model in json format (one file)
# load the model in json
=
Extra stuff:
Main differences with IGoR:
- "dynamic programming" method, instead of summing over all events we first pre-compute over sum of events. This means that we can run it with undefined nucleotides like N (at least in theory, I need to add full support for these).
- The D gene alignment is less constrained
- can measure pgen for amino-acid sequences (like olga)
- more error models (and more flexible, better for IGH)
Limitations:
- Need to get rid of any primers/ends on the V gene side before running it
- The reads need to be long enough to fully cover the CDR3 (even when it's particularly long)
Programming stuff:
- There's a wasm version for web use.
- python version is in a different crate now.
- to add a model permanently, add it to "models.json". First model in a category is the default model. Each field is one independant model. The elements in chain and species should always be lower-case.
- ambiguous nucleotide with "errors", the pgen won't work very well probably.