Crate pdbtbx[][src]

Expand description

pdbtbx (PDB Toolbox)

A library to work with crystallographic Protein DataBank files. It can parse the main part of the PDB and mmCIF format (it is actively in development so more will follow). The resulting structure can be used to edit and interrogate the 3D structure of the protein. The changed structures can be saved in a PDB or mmCIF file for use in other software.

Goals

This library is designed to be a dependable, safe, stable and fast way of handling PDB files in idiomatic Rust. It is the goal to be very community driven, to make it into a project that is as useful to everyone while keeping true to its core principles.

Why

As Rust is a recent language so there is not a lot of support for scientific work in Rust compared to languages that are used much longer (like the ubiquitous Python). I think that using Rust would have huge benefits over other languages in bigger scientific projects. It is not just me, more scientists are turning to Rust [Perkel, J. M.]. I want to make it easier for scientists to start using Rust by writing this library.

How to use it

The following example opens a pdb file (1ubq.pdb). Removes all H atoms. Calculates the average B factor (or temperature factor) and prints that. It also saves the resulting PDB to a file.

use pdbtbx;
let (mut pdb, _errors) = pdbtbx::open(
        "example-pdbs/1ubq.pdb",
        pdbtbx::StrictnessLevel::Medium
    ).unwrap();

pdb.remove_atoms_by(|atom| atom.element() == "H"); // Remove all H atoms

let mut avg_b_factor = 0.0;
for atom in pdb.atoms() { // Iterate over all atoms in the structure
    avg_b_factor += atom.b_factor();
}
avg_b_factor /= pdb.atom_count() as f64;

println!("The average B factor of the protein is: {}", avg_b_factor);
pdbtbx::save(pdb, "dump/1ubq_no_hydrogens.pdb", pdbtbx::StrictnessLevel::Loose);

PDB Hierarchy

As explained in depth in the documentation of CCTBX it can be quite hard to properly define a hierarchy for PDB files which works for all files. This library follows the hierarchy presented by CCTBX [Grosse-Kunstleve, R. W. et al], but renames the residue_group and atom_group constructs. This gives the following hierarchy, with the main identifying characteristics annotated per level.

  • PDB
    • Model
      Serial number
      • Chain
        Id
        • Residue (analogous to residue_group in CCTBX)
          Serial number
          Insertion code
          • Conformer (analogous to atom_group in CCTBX)
            Name
            Alternative location
            • Atom
              Serial number
              Name

Iterating over the PDB Hierarchy

use pdbtbx::*;
let (mut pdb, _errors) = pdbtbx::open(
    "example-pdbs/1ubq.pdb",
    pdbtbx::StrictnessLevel::Medium
).unwrap();

// Iterating over all levels
for model in pdb.models() {
    for chain in model.chains() {
        for residue in chain.residues() {
            for conformer in residue.conformers() {
                for atom in conformer.atoms() {
                    // Do the calculations
                }
            }
        }
    }
}
// Or only over a couple of levels (just like in the example above)
for residue in pdb.residues() {
    for atom in residue.atoms() {
        // Do the calculations
    }
}
// Or with access to the information with a single line
for hierarchy in pdb.atoms_with_hierarchy() {
    println!("Atom {} in Conformer {} in Residue {} in Chain {} in Model {}",
        hierarchy.atom().serial_number(),
        hierarchy.conformer().name(),
        hierarchy.residue().serial_number(),
        hierarchy.chain().id(),
        hierarchy.model().serial_number()
    );
}
// Or with mutable access to the members of the hierarchy
for mut hierarchy in pdb.atoms_with_hierarchy_mut() {
    let new_x = hierarchy.atom().x() * 1.5;
    hierarchy.atom_mut().set_x(new_x);
}

Parallelization

Rayon is used to create parallel iterators for all logical candidates. Use the parallel version of an iterator by prefixing the name with par_. Among other the looping iterators, like atoms(), residues() and atoms_with_hierarchy() are implemented as parallel iterators. The Rayon implementations are gated behind the rayon feature which is enabled by default.

Serialization

Enable the serde feature for Serde support.

Spatial lookup of atoms

Enable the rstar feature for rstar support. This enables you to generate R*trees making it possible to do very fast lookup for atoms with spatial queries. So for example finding close atoms is very fast. See the documentation of this crate for more information on how to make use of all of its features.

use pdbtbx::*;
let (mut pdb, _errors) = pdbtbx::open("example-pdbs/1ubq.pdb", pdbtbx::StrictnessLevel::Medium).unwrap();
// You can loop over all atoms within 3.5 Aͦ of a specific atom
// Note: The `locate_within_distance` method takes a squared distance
let tree = pdb.create_atom_rtree();
for atom in tree.locate_within_distance(pdb.atom(42).unwrap().pos(), 3.5 * 3.5) {
    println!("{}", atom);
}

// You can even get information about the hierarchy of these atoms 
// (the chain, residue and conformer that contain this atom)
let tree = pdb.create_hierarchy_rtree();
let mut total = 0;
for hierarchy in tree.locate_within_distance(pdb.atom(42).unwrap().pos(), 3.5 * 3.5) {
    if hierarchy.is_backbone() {
        total += 1;
    }
}
println!("There are {} backbone atoms within 3.5Aͦ of the atom at index 42", total);

References

  1. [Grosse-Kunstleve, R. W. et al] Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W., & Adams, P. D. (2002). TheComputational Crystallography Toolbox: crystallographic algorithms in a reusable software framework. Journal of Applied Crystallography, 35(1), 126–136. https://doi.org/10.1107/s0021889801017824
  2. [Perkel, J. M.] Perkel, J. M. (2020). Why scientists are turning to Rust. Nature, 588(7836), 185–186. https://doi.org/10.1038/d41586-020-03382-2

Structs

A struct to represent a single Atom in a protein

A struct to hold references to an Atom and its containing Conformer

A struct to hold mutable references to an Atom and its containing Conformer

A struct to hold references to an Atom and its containing Conformer and Residue

A struct to hold references to an Atom and its containing Conformer, Residue, and Chain

A struct to hold references to an Atom and its containing Conformer, Residue, Chain, and Model

A struct to hold mutable references to an Atom and its containing Conformer, Residue, Chain, and Model

A struct to hold mutable references to an Atom and its containing Conformer, Residue, and Chain

A struct to hold mutable references to an Atom and its containing Conformer and Residue

A Chain containing multiple Residues

A Conformer of a Conformer containing multiple atoms, analogous to ‘atom_group’ in cctbx

A DatabaseReference containing the cross-reference to a corresponding database sequence for a Chain.

A Model containing multiple Chains

A transformation expressing non-crystallographic symmetry, used when transformations are required to generate the whole asymmetric subunit

A PDB struct is generated by opening a PDB or mmCIF file. It contains all information present in this file, like its atoms, bonds, hierarchy , and metadata. The struct can be used to access, interact with, and edit this data.

An error surfacing while handling a PDB

A position in a file for use in parsing/lexing

A Residue containing multiple Residues

A difference between the sequence of the database and the pdb file

The position of the sequence for a cross-reference of sequences.

A Space group of a crystal

A 3D affine transformation matrix

A unit cell of a crystal, containing its dimensions and angles

Enums

Bond types between two atoms

A struct to define the context of an error message

This indicates the level of the error, to handle it differently based on the level of the raised error.

The strictness to operate in, this defines at which ErrorLevel the program should stop execution upon finding an error.

Traits

A trait which defines all functions on a hierarchy which contains Atoms and Conformers

A trait which defines all functions on a mutable hierarchy which contains Atoms and Conformers

A trait which defines all functions on a hierarchy which contains Atoms, Conformers, and Residues

A trait which defines all functions on a hierarchy which contains Atoms, Conformers, Residues, and Chains

A trait which defines all functions on a hierarchy which contains Atoms, Conformers, Residues, Chains, and Models

A trait which defines all functions on a mutable hierarchy which contains Atoms, Conformers, Residues, Chains, and Models

A trait which defines all functions on a mutable hierarchy which contains Atoms, Conformers, Residues, and Chains

A trait which defines all functions on a mutable hierarchy which contains Atoms, Conformers, and Residues

Functions

Open an atomic data file, either PDB or mmCIF/PDBx. The correct type will be determined based on the extension of the file. Returns an PDBError when it found a BreakingError. Otherwise it returns the PDB with all errors/warnings found while parsing it.

Parse the given mmCIF file into a PDB struct. Returns an PDBError when it found a BreakingError. Otherwise it returns the PDB with all errors/warnings found while parsing it.

Parse the given file into a PDB struct. Returns an PDBError when it found a BreakingError. Otherwise it returns the PDB with all errors/warnings found while parsing it.

Parse the input stream into a PDB struct. To allow for direct streaming from sources, like from RCSB.org. Returns an PDBError when it found a BreakingError. Otherwise it returns the PDB with all errors/warnings found while parsing it.

Save the given PDB struct to the given file. It validates the PDB. It fails if the validation fails with the given level. If validation gives rise to problems use the save_raw function. The correct file type (pdb or mmCIF/PDBx) will be determined based on the extension of the file.

Save the given PDB struct to the given file as mmCIF or PDBx. It validates the PDB. It fails if the validation fails with the given level, or if the file could not be opened. If validation gives rise to problems use the save_raw function.

Save the given PDB struct to the given BufWriter. It does not validate or renumber the PDB, so if that is needed that needs to be done in preparation. It does change the output format based on the StrictnessLevel given.

Save the given PDB struct to the given file. It validates the PDB. It fails if the validation fails with the given level. If validation gives rise to problems use the save_raw function.

Save the given PDB struct to the given BufWriter. It does not validate or renumber the PDB, so if that is needed that needs to be done in preparation. It does change the output format based on the StrictnessLevel given.

Validate a given PDB file in terms of invariants that should be held up. It returns PDBErrors with the warning messages.

Validates this models specifically for the PDB format