Expand description
§pdbtbx (PDB Toolbox)
A library to work with crystallographic Protein DataBank files. It can parse the main part of the PDB and mmCIF format (it is actively in development so more will follow). The resulting structure can be used to edit and interrogate the 3D structure of the protein. The changed structures can be saved in a PDB or mmCIF file for use in other software.
§Goals
This library is designed to be a dependable, safe, stable and fast way of handling PDB files in idiomatic Rust. It is the goal to be very community driven, to make it into a project that is as useful to everyone while keeping true to its core principles.
§Why
As Rust is a recent language so there is not a lot of support for scientific work in Rust
compared to languages that are used much longer (like the ubiquitous Python). I think
that using Rust would have huge benefits over other languages in bigger scientific projects.
It is not just me, more scientists are turning to Rust [Perkel, J. M.
]. I want to make it
easier for scientists to start using Rust by writing this library.
§How to use it
The following example opens a pdb file (1ubq.pdb
). Removes all H
atoms. Calculates the
average B factor (or temperature factor) and prints that. It also saves the resulting PDB
to a file.
use pdbtbx::*;
let (mut pdb, _errors) = pdbtbx::open("example-pdbs/1ubq.pdb").unwrap();
pdb.remove_atoms_by(|atom| atom.element() == Some(&Element::H)); // Remove all H atoms
let mut avg_b_factor = 0.0;
for atom in pdb.atoms() { // Iterate over all atoms in the structure
avg_b_factor += atom.b_factor();
}
avg_b_factor /= pdb.atom_count() as f64;
println!("The average B factor of the protein is: {}", avg_b_factor);
pdbtbx::save(&pdb, "dump/1ubq_no_hydrogens.pdb", pdbtbx::StrictnessLevel::Loose);
§High level documentation
§Parallelization
Rayon is used to create parallel iterators for all logical candidates. Use
the parallel version of an iterator by prefixing the name with par_
. Among other the looping iterators,
like atoms()
, residues()
and atoms_with_hierarchy()
are implemented as parallel iterators. The Rayon
implementations are gated behind the rayon
feature
which is enabled by default.
§Serialization
Enable the serde
feature for Serde support.
§Spatial lookup of atoms
Enable the rstar
feature for rstar support. This enables you to generate
R*trees making it possible to do very fast lookup for atoms with spatial queries. So for example finding close
atoms is very fast. See the documentation of this crate for more information on how to make use of all of its
features.
use pdbtbx::*;
let (mut pdb, _errors) = pdbtbx::open("example-pdbs/1ubq.pdb").unwrap();
// You can loop over all atoms within 3.5 Aͦ of a specific atom
// Note: The `locate_within_distance` method takes a squared distance
let tree = pdb.create_atom_rtree();
for atom in tree.locate_within_distance(pdb.atom(42).unwrap().pos(), 3.5 * 3.5) {
println!("{}", atom);
}
// You can even get information about the hierarchy of these atoms
// (the chain, residue and conformer that contain this atom)
let tree = pdb.create_hierarchy_rtree();
let mut total = 0;
for hierarchy in tree.locate_within_distance(pdb.atom(42).unwrap().pos(), 3.5 * 3.5) {
if hierarchy.is_backbone() {
total += 1;
}
}
println!("There are {} backbone atoms within 3.5Aͦ of the atom at index 42", total);
§References
- [
Perkel, J. M.
] Perkel, J. M. (2020). Why scientists are turning to Rust. Nature, 588(7836), 185–186. https://doi.org/10.1038/d41586-020-03382-2
Modules§
- Here you can find high level documentation for this crate.
Structs§
- A struct to represent a single Atom in a protein.
- A struct to hold references to an Atom and its containing Conformer.
- A struct to hold mutable references to an Atom and its containing Conformer.
- A struct to hold references to an Atom and its containing Conformer and Residue.
- A struct to hold references to an Atom and its containing Conformer, Residue, and Chain.
- A struct to hold references to an Atom and its containing Conformer, Residue, Chain, and Model.
- A struct to hold mutable references to an Atom and its containing Conformer, Residue, Chain, and Model.
- A struct to hold mutable references to an Atom and its containing Conformer, Residue, and Chain.
- A struct to hold mutable references to an Atom and its containing Conformer and Residue.
- Hold all atomic radii for a single element. So that in the code it is obvious which radius you use. All values are in Å (10e-10 m or 0.1 nm).
- A Chain containing multiple Residues
- A Conformer containing multiple atoms, analogous to
atom_group
in cctbx - The information about the database see DBREF documentation wwPDB v3.30 https://www.wwpdb.org/documentation/file-format-content/format33/sect3.html#DBREF
- A DatabaseReference containing the cross-reference to a corresponding database sequence for a Chain.
- A Model containing multiple Chains.
- A transformation expressing non-crystallographic symmetry, used when transformations are required to generate the whole asymmetric subunit
- A PDB struct is generated by opening a PDB or mmCIF file. It contains all information present in this file, like its atoms, bonds, hierarchy , and metadata. The struct can be used to access, interact with, and edit this data.
- An error surfacing while handling a PDB
- A position in a file for use in parsing/lexing
- Options and flags which can be used to configure how a structure file is opened.
- A Residue containing multiple Conformers
- A difference between the sequence of the database and the pdb file
- The position of the sequence for a cross-reference of sequences.
- A Space group of a crystal
- A 3D affine transformation matrix
- A unit cell of a crystal, containing its dimensions and angles
Enums§
- Bond types between two atoms
- A struct to define the context of an error message
- All elements from the periodic system.
- This indicates the level of the error, to handle it differently based on the level of the raised error.
- Used to set which format to read the file in.
- All operators that can be used in a search
- A collection of multiple search Terms in the search for (an) atom(s) in a PDB. You can use bitwise and (
&
), or (|
), and xor (^
) to chain a search. In the same way you can use not!
to negate a search term. - The strictness to operate in, this defines at which
ErrorLevel
the program should stop execution upon finding an error. - Any parameter to use in a Search for atom(s) in a PDB. For position related searches look into the rstar crate which can be combined with this crate using the
rstar
feature, seePDB::create_atom_rtree
andPDB::create_hierarchy_rtree
. The rstar crate makes spatial lookup and queries way faster and feasible to use in high performance environments.
Traits§
- A trait which defines all functions on a hierarchy which contains Atoms and Conformers.
- A trait which defines all functions on a mutable hierarchy which contains Atoms and Conformers.
- A trait which defines all functions on a hierarchy which contains Atoms, Conformers, and Residues.
- A trait which defines all functions on a hierarchy which contains Atoms, Conformers, Residues, and Chains.
- A trait which defines all functions on a hierarchy which contains Atoms, Conformers, Residues, Chains, and Models.
- A trait which defines all functions on a mutable hierarchy which contains Atoms, Conformers, Residues, Chains, and Models.
- A trait which defines all functions on a mutable hierarchy which contains Atoms, Conformers, Residues, and Chains.
- A trait which defines all functions on a mutable hierarchy which contains Atoms, Conformers, and Residues.
Functions§
- Checks if a char is allowed in a PDB file. The char has to be ASCII graphic or a space. Returns
true
if the char is valid. - Converts a number into a base26 with only the alphabet as possible chars
- Open an atomic data file, either PDB or mmCIF/PDBx.
- open_gz
Deprecated Open a compressed atomic data file, either PDB or mmCIF/PDBx. The correct type will be determined based on the file extension (.pdb.gz or .cif.gz). - open_
mmcif Deprecated Parse the given mmCIF file into a PDB struct. Returns a PDBError if a BreakingError is found. Otherwise it returns the PDB with all errors/warnings found while parsing it. - open_
mmcif_ bufread Deprecated Open’s mmCIF file from a BufRead. This allows opening mmCIF files directly from memory. - open_
mmcif_ raw Deprecated Parse the given mmCIF&str
into a PDB struct. This allows opening mmCIF files directly from memory. Returns a PDBError if a BreakingError is found. Otherwise it returns the PDB with all errors/warnings found while parsing it. - open_
pdb Deprecated Parse the given file into a PDB struct. Returns a PDBError if a BreakingError is found. Otherwise it returns the PDB with all errors/warnings found while parsing it. - open_
pdb_ raw Deprecated Parse the input stream into a PDB struct. To allow for direct streaming from sources, like from RCSB.org. Returns a PDBError if a BreakingError is found. Otherwise it returns the PDB with all errors/warnings found while parsing it. - Creates a valid identifier from the given string slice. Does not change the case.
- Creates a valid identifier from the given string slice. Also turns the identifier to uppercase.
- Save the given PDB struct to the given file, validating it beforehand. If validation gives rise to problems, use the
save_raw
function. The correct file type (pdb or mmCIF/PDBx) will be determined based on the given file extension. - Save the given PDB struct to the given file and compressing to gz, validating it beforehand. If validation gives rise to problems, use the
save_raw
function. The correct file type (pdb or mmCIF/PDBx) will be determined based on the given file extension. - Save the given PDB struct to the given file as mmCIF or PDBx.
- Save the given PDB struct to the given file as mmCIF or PDBx and compresses to .gz
- Save the given PDB struct to the given BufWriter. It does not validate or renumber the PDB, so if that is needed that needs to be done in preparation. It does change the output format based on the StrictnessLevel given.
- Save the given PDB struct to the given file, validating it beforehand.
- Save the given PDB struct to the given file, validating it beforehand, and use gzip compression.
- Save the given PDB struct to the given BufWriter. It does not validate or renumber the PDB, so if that is needed, that needs to be done in preparation. It does change the output format based on the StrictnessLevel given.
- Checks a string using
check_char
. Returnstrue
if the text is valid. - Checks a string using
check_char
. Returnstrue
if the text is valid. - Validate a given PDB file in terms of invariants that should be held up. It returns
PDBError
s with the warning messages. - Validates this models specifically for the PDB format. It returns
PDBError
s with the warning messages. It extends the validation specified in thevalidate
function with PDB specific validations.