Expand description
§Match those fragments!
Handle mass spectrometry data in Rust. This crate is set up to handle very complex peptides with
loads of ambiguity and complexity. It pivots around the ComplexPeptide
and LinearPeptide
which encode the ProForma specification. Additionally
this crate enables the reading of mgf, doing spectrum annotation
(BU/MD/TD), finding isobaric sequences, doing alignments of peptides
, accessing the IMGT germline database, and reading identified peptide files.
§Library features
- Read pro forma sequences (‘level 2-ProForma + mass spectrum compliant + glycans compliant’, with the intention to fully support the whole spec)
- Generate theoretical fragments with control over the fragmentation model from any supported pro forma peptide
- Generate fragments from satellite ions (w, d, and v)
- Generate glycan fragments
- Generate theoretical fragments for modifications of unknown position
- Generate theoretical fragments for chimeric spectra
- Read mgf files
- Match spectra to the generated fragments
- Extensive use of
uom
for compile time unit checking - Align peptides based on mass (algorithm will be tweaked extensively over time) (see
Stitch
for more information, but the algorithm has been improved)
§Example usage
// Open some data and see if the given peptide is a valid match
use rustyms::{*, system::{Charge, e}};
let peptide = ComplexPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVSERTHGGNFD")?;
let spectrum = rawfile::mgf::open(raw_file_path)?;
let model = Model::ethcd();
let fragments = peptide.generate_theoretical_fragments(Charge::new::<e>(2.0), &model);
let annotated = spectrum[0].annotate(peptide, &fragments, &model, MassMode::Monoisotopic);
let fdr = annotated.fdr(&fragments, &model);
// This is the incorrect sequence for this spectrum so the FDR will indicate this
assert!(fdr.sigma() < 2.0);
// Check how this peptide compares to a similar peptide (using `align`)
// (same sequence, repeated for easy reference)
use rustyms::{*, align::*};
let first_peptide = LinearPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVS")?;
let second_peptide = LinearPeptide::pro_forma("E[Glu->pyro-Glu]VQVES")?;
let alignment = align::<4>(&first_peptide, &second_peptide,
matrix::BLOSUM62, Tolerance::new_ppm(10.0), AlignType::GLOBAL);
let stats = alignment.stats();
assert_eq!(stats.mass_similar, 6); // All positions are mass similar
§Compilation features
Rustyms ties together multiple smaller modules into one cohesive structure. It has multiple features which allow you to slim it down if needed (all are enabled by default).
identification
- gives access to methods reading many different identified peptide formats.align
- gives access to mass based alignment of peptides.imgt
- enables access to the IMGT database of antibodies germline sequences, with annotations.rayon
- enables parallel iterators using rayon, mostly forimgt
but also in consecutive align.
Re-exports§
pub use crate::model::Model;
pub use crate::modification::Modification;
pub use crate::spectrum::AnnotatedSpectrum;
pub use crate::spectrum::MassMode;
pub use crate::spectrum::RawSpectrum;
pub use fragment::Fragment;
Modules§
- Only available with feature
align
. Code to make alignments of two peptides based on mass mistakes, and genetic information. - All amino acid property classes according to IMGT.
- Contain the definition for errors with all additional data that is needed to generate nice error messages
- Handle fragment related issues, access provided if you want to dive deeply into fragments in your own code.
- Handle glycan related issues, access provided if you want to work with glycans on your own.
- Only available with feature
identification
. Read in the annotations from peptide identification sources - Only available with feature
imgt
. This crate handles parsing the IMGT LIGM-DB database into structures compatible with rustyms. It additionally stores all regions and annotations. There are two main ways of selecting germline(s), specified by nameget_germline
or by building a query over the dataSelection
. - Handle model instantiation.
- Handle modification related issues, access provided if you want to dive deeply into modifications in your own code.
- The available ontologies
- Rules regarding the placement of modifications
- Handling raw files
- Spectrum related code
- The measurement system used in this crate. A redefinition of the important SI units for them to be stored in a more sensible base unit for MS purposes.
Macros§
- Macro to implement
quantity
type aliases for a specific system of units and value storage type. - Easily define molecular formulas using the following syntax:
<element> <num>
or(<isotope>)<element> <num>
Structs§
- A peptide with all data as provided by pro forma. Preferably generated by using the
crate::ComplexPeptide::pro_forma
function. - A selection of ions that together define the charge of a peptide
- A molecular formula, a selection of elements of specified isotopes together forming a structure
- A collection of potentially multiple of the generic type, it is used be able to easily combine multiple of this multi struct into all possible combinations.
- A protease defined by it ability to cut at any site identified by the right amino acids at the n and c terminal. Each position is identified by an option, a none means that there is no specificity at this position. If there is a specificity at a certain position any amino acid that is contained in the set is allowed (see
AminoAcid::canonical_identical
). - One block in a sequence meaning an aminoacid and its accompanying modifications
Enums§
- An amino acid, alongside the standard ones some ambiguous (B/J/Z/X) and non-standard (U/O) are included.
- A single pro forma entry, can contain multiple peptides, more options will be added in the future to support the full Pro Forma spec
- The elements (and electrons)
- All possible neutral losses
- A tolerance around a given mass for searching purposes
Constants§
- All elements sorted so that single characters come after two character element symbols (needed for greedy parsing)
Traits§
- Any item that has a clearly defined single molecular formula
- Check if two values are within the specified tolerance from each other.
- Any item that has a number of potential chemical formulas
Functions§
- Get the possible building blocks for sequences based on the given modifications. Useful for any automated sequence generation, like isobaric set generation or de novo sequencing. The result is for each location (N term, center, C term) the list of all possible building blocks with its mass, sorted on mass.
- Get the elemental data
- Find the isobaric sets for the given mass with the given modifications and ppm error. The modifications are placed on any location they are allowed based on the given placement rules, so using any modifications which provide those is advised. If the provided
LinearPeptide
has multiple formulas, it uses the formula with the lowest monoisotopic mass.