Crate rustyms

source ·
Expand description

§Match those fragments!

Handle mass spectrometry data in Rust. This crate is set up to handle very complex peptides with loads of ambiguity and complexity. It pivots around the ComplexPeptide and LinearPeptide which encode the ProForma specification. Additionally this crate enables the reading of mgf, doing spectrum annotation (BU/MD/TD), finding isobaric sequences, doing alignments of peptides , accessing the IMGT germline database, and reading identified peptide files.

§Library features

  • Read pro forma sequences (‘level 2-ProForma + mass spectrum compliant + glycans compliant’, with the intention to fully support the whole spec)
  • Generate theoretical fragments with control over the fragmentation model from any supported pro forma peptide
    • Generate fragments from satellite ions (w, d, and v)
    • Generate glycan fragments
    • Generate theoretical fragments for modifications of unknown position
    • Generate theoretical fragments for chimeric spectra
  • Read mgf files
  • Match spectra to the generated fragments
  • Extensive use of uom for compile time unit checking
  • Align peptides based on mass (algorithm will be tweaked extensively over time) (see Stitch for more information, but the algorithm has been improved)

§Example usage

// Open some data and see if the given peptide is a valid match
use rustyms::{*, system::{Charge, e}};
let peptide = ComplexPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVSERTHGGNFD")?;
let spectrum = rawfile::mgf::open(raw_file_path)?;
let model = Model::ethcd();
let fragments = peptide.generate_theoretical_fragments(Charge::new::<e>(2.0), &model);
let annotated = spectrum[0].annotate(peptide, &fragments, &model, MassMode::Monoisotopic);
let fdr = annotated.fdr(&fragments, &model);
// This is the incorrect sequence for this spectrum so the FDR will indicate this
assert!(fdr.sigma() < 2.0);
// Check how this peptide compares to a similar peptide (using `align`)
// (same sequence, repeated for easy reference)
use rustyms::{*, align::*};
let first_peptide = LinearPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVS")?;
let second_peptide = LinearPeptide::pro_forma("E[Glu->pyro-Glu]VQVES")?;
let alignment = align::<4>(&first_peptide, &second_peptide,
                 matrix::BLOSUM62, Tolerance::new_ppm(10.0), AlignType::GLOBAL);
let stats = alignment.stats();
assert_eq!(stats.mass_similar, 6); // All positions are mass similar

§Compilation features

Rustyms ties together multiple smaller modules into one cohesive structure. It has multiple features which allow you to slim it down if needed (all are enabled by default).

  • identification - gives access to methods reading many different identified peptide formats.
  • align - gives access to mass based alignment of peptides.
  • imgt - enables access to the IMGT database of antibodies germline sequences, with annotations.
  • rayon - enables parallel iterators using rayon, mostly for imgt but also in consecutive align.

Re-exports§

Modules§

  • Only available with feature align. Code to make alignments of two peptides based on mass mistakes, and genetic information.
  • All amino acid property classes according to IMGT.
  • Contain the definition for errors with all additional data that is needed to generate nice error messages
  • Handle fragment related issues, access provided if you want to dive deeply into fragments in your own code.
  • Handle glycan related issues, access provided if you want to work with glycans on your own.
  • Only available with feature identification. Read in the annotations from peptide identification sources
  • Only available with feature imgt. This crate handles parsing the IMGT LIGM-DB database into structures compatible with rustyms. It additionally stores all regions and annotations. There are two main ways of selecting germline(s), specified by name get_germline or by building a query over the data Selection.
  • Handle model instantiation.
  • Handle modification related issues, access provided if you want to dive deeply into modifications in your own code.
  • The available ontologies
  • Rules regarding the placement of modifications
  • Handling raw files
  • Spectrum related code
  • The measurement system used in this crate. A redefinition of the important SI units for them to be stored in a more sensible base unit for MS purposes.

Macros§

  • Macro to implement quantity type aliases for a specific system of units and value storage type.
  • Easily define molecular formulas using the following syntax: <element> <num> or (<isotope>)<element> <num>

Structs§

  • A peptide with all data as provided by pro forma. Preferably generated by using the crate::ComplexPeptide::pro_forma function.
  • A selection of ions that together define the charge of a peptide
  • A molecular formula, a selection of elements of specified isotopes together forming a structure
  • A collection of potentially multiple of the generic type, it is used be able to easily combine multiple of this multi struct into all possible combinations.
  • A protease defined by it ability to cut at any site identified by the right amino acids at the n and c terminal. Each position is identified by an option, a none means that there is no specificity at this position. If there is a specificity at a certain position any amino acid that is contained in the set is allowed (see AminoAcid::canonical_identical).
  • One block in a sequence meaning an aminoacid and its accompanying modifications

Enums§

Constants§

  • All elements sorted so that single characters come after two character element symbols (needed for greedy parsing)

Traits§

  • Any item that has a clearly defined single molecular formula
  • Check if two values are within the specified tolerance from each other.
  • Any item that has a number of potential chemical formulas

Functions§

  • Get the possible building blocks for sequences based on the given modifications. Useful for any automated sequence generation, like isobaric set generation or de novo sequencing. The result is for each location (N term, center, C term) the list of all possible building blocks with its mass, sorted on mass.
  • Get the elemental data
  • Find the isobaric sets for the given mass with the given modifications and ppm error. The modifications are placed on any location they are allowed based on the given placement rules, so using any modifications which provide those is advised. If the provided LinearPeptide has multiple formulas, it uses the formula with the lowest monoisotopic mass.