Expand description
Only available with feature align
.
Code to make alignments of two peptides based on mass mistakes, and genetic information.
A mass based alignment handles the case in which multiple amino acids are wrong, but the total mass
of this set of amino acids is equal to the mass of a set of different amino acids on the other peptide.
This is quite common in mass spectrometry where mistakes based on mass coincidences are very common.
For example N
has the same mass as GG
, so if we want to make a mass spectrometry faithful alignment
of ANA
with AGGA
the result should reflect this fact:
Identity: 0.500 (2/4), Similarity: 0.750 (3/4), Gaps: 0.000 (0/4), Score: 0.706 (12/17),
Equal mass, Tolerance: 10 ppm, Alignment: global
Start: A 0 B 0, Path: 1=1:2i1=
AN·A A
AGGA B
╶╴
Generated using this algorithm bound to a cli tool: https://github.com/snijderlab/align-cli
use rustyms::{*, align::*};
let a = LinearPeptide::pro_forma("ANA").unwrap();
let b = LinearPeptide::pro_forma("AGGA").unwrap();
let alignment = align::<4>(&a, &b, &matrix::BLOSUM62,
Tolerance::new_ppm(10.0), AlignType::GLOBAL);
assert_eq!(alignment.short(), "1=1:2i1=");
assert_eq!(alignment.ppm(), 0.0);
Modules§
- Different scoring matrices that can be used. Matrices from: https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/util/tables/ and https://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/data/
Structs§
- The type of alignment to perform
- An alignment of two reads. Which owns the sequences.
- A piece in an alignment, determining what step was taken in the alignment and how this impacted the score
- An alignment of two reads. Which has a reference to the sequences.
- The score of an alignment
- Statistics for an alignment with some helper functions to easily retrieve the number of interest.
Enums§
- The type of a single match step
- The alignment specification for a single side
Traits§
- A generalised alignment with all behaviour.
Functions§
- Create an alignment of two peptides based on mass and homology. The substitution matrix is in the exact same order as the definition of
AminoAcid
. TheTolerance
sets the tolerance for two sets of amino acids to be regarded as the same mass. TheAlignType
controls the alignment behaviour, global/local or anything in between. - Only available with if features
align
andimgt
are turned on. Align one sequence to multiple consecutive genes. Each gene can be controlled to be global to the left or free to allow unmatched residues between it and the previous gene. If the sequence is too short to cover all genes only the genes that could be matched are returned. - Only available with if features
align
,rayon
, andimgt
are turned on. Align one sequence to multiple consecutive genes. Each gene can be controlled to be global to the left or free to allow unmatched residues between it and the previous gene. If the sequence is too short to cover all genes only the genes that could be matched are returned.