Chemical structure generation for protein sequences as SMILES string.
AminoAcid enum to encode the sequence residues, and build a SMILES
proteinogenic::smiles. For example with divergicin 750:
extern crate proteinogenic; let residues = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA" .chars() .map(proteinogenic::AminoAcid::from_char) .map(Result::unwrap); let s = proteinogenic::smiles(residues) .expect("failed to generate SMILES string");
Additional modifications can be carried out by using a
Peptide struct to
configure the rendering of the peptide. So far, disulfide bonds as well as
lanthionine bridges are supported, as well as head-to-tail cyclization.
For instance. we can generate the SMILES string of a
cyclotide such as
extern crate proteinogenic; let residues = "GLPVCGETCVGGTCNTPGCTCSWPVCTRN" .chars() .map(proteinogenic::AminoAcid::from_char) .map(Result::unwrap); let mut p = proteinogenic::Protein::new(residues); p.cyclization(proteinogenic::Cyclization::HeadToTail); p.cross_link(proteinogenic::CrossLink::Cystine(5, 19)).unwrap(); p.cross_link(proteinogenic::CrossLink::Cystine(9, 21)).unwrap(); p.cross_link(proteinogenic::CrossLink::Cystine(14, 26)).unwrap(); let s = p.smiles() .expect("failed to generate SMILES string");
This SMILES string can be used in conjunction with other cheminformatics toolkits, for instance OpenBabel which can generate a PNG figure:
proteinogenic is not limited to building a SMILES string; it can
actually use any
implementor to generate an in-memory representation of a protein formula. If
your code is already compatible with
purr, then you’ll be able to use
protein sequences quite easily.
extern crate proteinogenic; extern crate purr; let sequence = "KGILGKLGVVQAGVDFVSGVWAGIKQSAKDHPNA"; let residues = sequence.chars() .map(proteinogenic::AminoAcid::from_char) .map(Result::unwrap); let mut builder = purr::graph::Builder::new(); proteinogenic::visit(residues, &mut builder); builder.build() .expect("failed to create a graph representation");
The API is not yet stable, and may change to follow changes introduced by
purr or to improve the interface ergonomics.
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
If you’re a bioinformatician and a Rustacean, you may be interested in these other libraries:
uniprot.rs: Rust data structures for the UniProtKB databases.
obofoundry.rs: Rust data structures for the OBO Foundry.
fastobo: Rust parser and abstract syntax tree for Open Biomedical Ontologies.
pubchem.rs: Rust data structures and API client for the PubChem API.
This library is provided under the open-source MIT license.
A protein abstracted as a modified peptide.
An error marker for sequences containing invalid amino acids.
A single L-α amino-acid.
A covalent bond between several amino-acid residues.
A peptide cyclization mechanism.
A generic error type for this crate.