rustyms 0.9.0-alpha.1

A library to handle proteomic mass spectrometry data and match peptides to spectra.
Documentation

Match those fragments!

Handle mass spectrometry data in Rust. This crate is set up to handle very complex peptides with loads of ambiguity and complexity. It pivots around the [CompoundPeptidoform], [Peptidoform] and [LinearPeptide] which encode the ProForma specification. Additionally this crate enables the reading of mgf, doing spectrum annotation (BU/MD/TD), finding isobaric sequences, doing alignments of peptides , accessing the IMGT germline database, and reading identified peptide files.

Library features

  • Read ProForma sequences (complete specification supported: 'level 2-ProForma + top-down compliant + cross-linking compliant + glycans compliant + mass spectrum compliant')
  • Generate theoretical fragments with control over the fragmentation model from any ProForma peptidoform/proteoform
    • Generate fragments from satellite ions (w, d, and v)
    • Generate glycan fragments
    • Generate theoretical fragments for modifications of unknown position
    • Generate theoretical fragments for chimeric spectra
    • Generate theoretical fragments for cross-links (also disulfides)
  • Integrated with mzdata for reading raw data file
  • Match spectra to the generated fragments
  • Align peptides based on mass
  • Fast access to the IMGT database of antibody germlines
  • Reading of multiple identified peptide file formats (Fasta, MaxQuant, MSFragger, Novor, OPair, Peaks, and Sage)
  • Exhaustively fuzz tested for reliability (using cargo-afl)
  • Extensive use of uom for compile time unit checking

Example usage

# fn main() -> Result<(), rustyms::error::CustomError> {
# let raw_file_path = "data/annotated_example.mgf";
// Open some data and see if the given peptide is a valid match
use rustyms::{*, system::{usize::Charge, e}};
let peptide = CompoundPeptidoform::pro_forma("[Gln->pyro-Glu]-QVQEVSERTHGGNFD", None)?;
let spectrum = rawfile::mgf::open(raw_file_path)?;
let model = Model::ethcd();
let fragments = peptide.generate_theoretical_fragments(Charge::new::<e>(2), &model);
let annotated = spectrum[0].annotate(peptide, &fragments, &model, MassMode::Monoisotopic);
let (fdr, _) = annotated.fdr(&fragments, &model, MassMode::Monoisotopic);
// This is the incorrect sequence for this spectrum so the FDR will indicate this
# dbg!(&fdr, fdr.peaks_sigma(), fdr.peaks_fdr(), fdr.peaks_score());
assert!(fdr.peaks_sigma() > 2.0);
# Ok(()) }
# fn main() -> Result<(), rustyms::error::CustomError> {
// Check how this peptide compares to a similar peptide (using `align`)
// (same sequence, repeated for easy reference)
use rustyms::{*, align::*};
let first_peptide = LinearPeptide::pro_forma("IVQEVS", None)?.simple().unwrap();
let second_peptide = LinearPeptide::pro_forma("LEVQVES", None)?.simple().unwrap();
let alignment = align::<4, Simple, Simple>(&first_peptide, &second_peptide,
                 matrix::BLOSUM62, Tolerance::new_ppm(10.0), AlignType::GLOBAL);
# dbg!(&alignment);
let stats = alignment.stats();
# //assert_eq!(stats.identical, 3); // Only three positions are identical
assert_eq!(stats.mass_similar, 6); // All positions are mass similar
# Ok(()) }

Compilation features

Rustyms ties together multiple smaller modules into one cohesive structure. It has multiple features which allow you to slim it down if needed (all are enabled by default).

  • align - gives access to mass based alignment of peptides.
  • identification - gives access to methods reading many different identified peptide formats.
  • imgt - enables access to the IMGT database of antibodies germline sequences, with annotations.
  • isotopes - gives access to generation of an averagine model for isotopes, also enables two additional dependencies
  • rand - allows the generation of random peptides align.
  • rayon - enables parallel iterators using rayon, mostly for imgt but also in consecutive