Crate rustyms

Crate rustyms 

Source
Expand description

§Match those fragments!

Handle mass spectrometry data in Rust. This crate is set up to handle very complex peptides with loads of ambiguity and complexity. It pivots around the CompoundPeptidoformIon, PeptidoformIon and Peptidoform which encode the ProForma specification. Additionally, this crate enables the reading of mgf, doing spectrum annotation (BU/MD/TD), finding isobaric sequences, doing alignments of peptides , accessing the IMGT germline database, and reading identified peptide files.

§Library features

  • Read ProForma sequences (complete specification supported: ‘level 2-ProForma + top-down compliant + cross-linking compliant + glycans compliant + mass spectrum compliant’)
  • Generate theoretical fragments with control over the fragmentation model from any ProForma peptidoform/proteoform
    • Generate theoretical fragments for chimeric spectra
    • Generate theoretical fragments for cross-links (also disulfides)
    • Generate theoretical fragments for modifications of unknown position
    • Generate peptide backbone (a, b, c, x, y, and z) and satellite ion fragments (w, d, and v)
    • Generate glycan fragments (B, Y, and internal fragments)
  • Integrated with mzdata for reading raw data files
  • Match spectra to the generated fragments
  • Align peptides based on mass
  • Fast access to the IMGT database of antibody germlines
  • Reading of multiple identified peptide file formats (Fasta, MaxQuant, MSFragger, Novor, OPair, Peaks, Sage, and many more)
  • Exhaustively fuzz tested for reliability (using cargo-afl)
  • Extensive use of uom for compile time unit checking

§Example usage

use rustyms::{prelude::*, system::{isize::Charge, e}};
// Open example raw data (this is the built in mgf reader, look into mzdata for more advanced raw file readers)
let spectrum = rustyms::spectrum::mgf::open(raw_file_path)?;
// Parse the given ProForma definition
let peptide = CompoundPeptidoformIon::pro_forma("[Gln->pyro-Glu]-QVQEVSERTHGGNFD", None)?;
// Generate theoretical fragments for this peptide given EThcD fragmentation
let model = FragmentationModel::ethcd();
let fragments = peptide.generate_theoretical_fragments(Charge::new::<e>(2), model);
let parameters = MatchingParameters::default();
// Annotate the raw data with the theoretical fragments
let annotated = spectrum[0].annotate(peptide, &fragments, &parameters, MassMode::Monoisotopic);
// Calculate a peak false discovery rate for this annotation
let (fdr, _) = annotated.fdr(&fragments, &parameters, MassMode::Monoisotopic);
// This is the incorrect sequence for this spectrum so the peak FDR will indicate this
assert!(fdr.peaks_sigma() > 2.0);
use rustyms::{prelude::*, sequence::SimpleLinear, align::*};
// Check how this peptide compares to a similar peptide (using the feature `align`)
let first_peptide = Peptidoform::pro_forma("IVQEVT", None)?.into_simple_linear().unwrap();
let second_peptide = Peptidoform::pro_forma("LVQVET", None)?.into_simple_linear().unwrap();
// Align the two peptides using mass based alignment
// IVQEVT A
// LVQVET B
// ─  ╶╴
let alignment = align::<4, &Peptidoform<SimpleLinear>, &Peptidoform<SimpleLinear>>(
                  &first_peptide,
                  &second_peptide,
                  AlignScoring::default(),
                  AlignType::GLOBAL);
// Calculate some more statistics on this alignment
let stats = alignment.stats();
assert_eq!(stats.mass_similar, 6); // 6 out of the 6 positions are mass similar

§Compilation features

Rustyms ties together multiple smaller modules into one cohesive structure. It has multiple features which allow you to slim it down if needed (all are enabled by default).

  • align - gives access to mass based alignment of peptides.
  • identification - gives access to methods reading many different identified peptide formats.
  • imgt - enables access to the IMGT database of antibodies germline sequences, with annotations.
  • isotopes - gives access to generation of an averagine model for isotopes, also enables two additional dependencies.
  • rand - allows the generation of random peptides.
  • rayon - enables parallel iterators using rayon, mostly for imgt but also in consecutive align.
  • mzdata - enables integration with mzdata which has more advanced raw file support.
  • glycan-render - enables the rendering to SVGs for glycans and glycan fragments
  • glycan-render-bitmap - enables the rendering to bitmaps for glycans, by enabling the optional dependencies zeno and swash

Modules§

align
Only available with feature align. Code to make alignments of two peptides based on mass mistakes, and genetic information.
annotation
Contains all things related to annotations (MS2 spectrum annotations that is).
chemistry
Contains all things related to the underlying chemistry.
fragment
Contains all things related to fragments and fragmentation.
glycan
Handle glycan related issues, access provided if you want to work with glycans on your own.
identification
Only available with feature identification. Read in the annotations from peptide identification sources
imgt
Only available with feature imgt. This crate handles parsing the IMGT LIGM-DB database into structures compatible with rustyms. It additionally stores all regions and annotations. There are two main ways of selecting germline(s), specified by name get_germline or by building a query over the data Selection.
ontology
The available ontologies
prelude
A subset of the types and traits that are envisioned to be used the most, importing this is a good starting point for working with the crate
quantities
Contains all things related to tolerances and structures to handle multiple mass/formula options.
sequence
Contains all things related to sequences, amongst others amino acids and peptidoforms.
spectrum
Spectrum related code
system
The measurement system used in this crate. A redefinition of the important SI units for them to be stored in a more sensible base unit for MS purposes.

Macros§

Q
Macro to implement Quantity type aliases for a specific system of units and value storage type.
molecular_formula
Easily define molecular formulas using the following syntax: <element> <num> or [<isotope> <element> <num>]. The spaces are required by the Rust compiler.