Skip to main content

Crate pdbrust

Crate pdbrust 

Source
Expand description

§PDBRust

A high-performance Rust library for parsing and analyzing Protein Data Bank (PDB) and mmCIF structure files.

PDBRust provides comprehensive tools for working with molecular structure data, from simple file parsing to advanced structural analysis. It’s designed for bioinformatics pipelines, structural biology research, and machine learning applications that work with protein structures.

For detailed documentation, examples, and best practices, see the guide module.

§Quick Start

use pdbrust::{parse_pdb_file, PdbStructure};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse a PDB file
    let structure = parse_pdb_file("protein.pdb")?;

    println!("Atoms: {}", structure.atoms.len());
    println!("Chains: {:?}", structure.get_chain_ids());

    Ok(())
}

§Feature Flags

PDBRust uses feature flags to keep the core library lightweight while offering extensive optional functionality:

FeatureDescriptionDefault
filterStructure filtering, extraction, and cleaningNo
descriptorsStructural descriptors (Rg, composition, B-factor, pLDDT)No
qualityQuality assessment and reportsNo
summaryUnified summaries (requires descriptors + quality)No
rcsbRCSB PDB search and downloadNo
rcsb-asyncAsync/concurrent bulk downloads with rate limitingNo
parallelParallel processing with RayonNo
geometryRMSD calculation and structure alignment (Kabsch)No
dsspDSSP-like secondary structure assignmentNo
dockqDockQ v2 interface quality for protein complexesNo
gzipParse gzip-compressed files (.ent.gz, .pdb.gz)No
analysisAll analysis features combinedNo
fullEverythingNo

Enable features in your Cargo.toml:

[dependencies]
pdbrust = { version = "0.7", features = ["filter", "descriptors"] }

§Core Features

§Parsing

  • Parse both PDB and mmCIF files with comprehensive error handling
  • Automatic format detection based on file content
  • Support for multiple models (NMR ensembles)
  • Handle alternate conformations (altlocs)

§Structure Data

  • ATOM/HETATM records with full coordinate data
  • SEQRES sequence information
  • CONECT connectivity records
  • SSBOND disulfide bond definitions
  • Header, title, and remark metadata

§Optional Features

§Filtering (filter feature)

// Remove ligands and keep only chain A
let cleaned = structure
    .remove_ligands()
    .keep_only_chain("A")
    .keep_only_backbone();

// Extract CA coordinates
let ca_coords = structure.get_ca_coords(None);

§Structural Descriptors (descriptors feature)

// Compute structural properties
let rg = structure.radius_of_gyration();
let composition = structure.aa_composition();
let hydrophobic = structure.hydrophobic_ratio();

§Quality Assessment (quality feature)

// Get comprehensive quality report
let report = structure.quality_report();

if report.is_analysis_ready() {
    println!("Structure is ready for analysis");
}

§RCSB Integration (rcsb feature)

use pdbrust::rcsb::{download_structure, rcsb_search, SearchQuery, FileFormat};

// Download from RCSB PDB
let structure = download_structure("1UBQ", FileFormat::Pdb)?;

// Search RCSB
let query = SearchQuery::new()
    .with_text("kinase")
    .with_resolution_max(2.0);
let results = rcsb_search(&query, 10)?;

§Async RCSB Downloads (rcsb-async feature)

use pdbrust::rcsb::{download_multiple_async, AsyncDownloadOptions, FileFormat};

// Download multiple structures concurrently
let pdb_ids = vec!["1UBQ", "8HM2", "4INS"];
let results = download_multiple_async(&pdb_ids, FileFormat::Pdb, None).await;

// With rate limiting options
let options = AsyncDownloadOptions::default()
    .with_max_concurrent(10)
    .with_rate_limit_ms(50);
let results = download_multiple_async(&pdb_ids, FileFormat::Cif, Some(options)).await;

§Geometry (geometry feature)

use pdbrust::geometry::AtomSelection;

// Calculate RMSD between structures
let rmsd = structure1.rmsd_to(&structure2)?;  // CA atoms by default

// Align structures using Kabsch algorithm
let (aligned, result) = mobile.align_to(&target)?;
println!("RMSD: {:.4} Å ({} atoms)", result.rmsd, result.num_atoms);

// Per-residue RMSD for flexibility analysis
let per_res = mobile.per_residue_rmsd_to(&target)?;

§Secondary Structure (dssp feature)

// DSSP-like secondary structure assignment
let ss = structure.assign_secondary_structure();
println!("Helix: {:.1}%", ss.helix_fraction * 100.0);
println!("Sheet: {:.1}%", ss.sheet_fraction * 100.0);

// Compact string representation (e.g., "HHHHEEEECCCC")
let ss_string = structure.secondary_structure_string();

§Gzip Support (gzip feature)

use pdbrust::{parse_gzip_structure_file, parse_gzip_pdb_file};

// Parse gzip-compressed files directly
let structure = parse_gzip_structure_file("pdb1ubq.ent.gz")?;
let structure = parse_gzip_pdb_file("protein.pdb.gz")?;

§Format Support

§PDB Format

Traditional fixed-width text format with support for:

  • ATOM/HETATM records
  • HEADER, TITLE, REMARK records
  • SEQRES, CONECT, SSBOND records
  • MODEL/ENDMDL for multi-model structures

§mmCIF Format

Modern dictionary-based format with support for:

  • _atom_site category (converted to ATOM records)
  • _entity_poly_seq category (converted to SEQRES records)
  • _struct_disulfid category (converted to SSBOND records)
  • Header and metadata information

§Examples

§Auto-detect Format

use pdbrust::parse_structure_file;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Works with both .pdb and .cif files
    let structure = parse_structure_file("example.cif")?;

    // Get all chain IDs
    let chains = structure.get_chain_ids();

    // Get sequence for a specific chain
    if let Some(chain_id) = chains.first() {
        let sequence = structure.get_sequence(chain_id);
        println!("Sequence for chain {}: {:?}", chain_id, sequence);
    }

    Ok(())
}

§Format-Specific Parsing

use pdbrust::{parse_pdb_file, parse_mmcif_file};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse PDB format explicitly
    let pdb_structure = parse_pdb_file("structure.pdb")?;

    // Parse mmCIF format explicitly
    let mmcif_structure = parse_mmcif_file("structure.cif")?;

    Ok(())
}

§Performance

PDBRust is designed for high performance. Benchmarks against the Python libraryPDB library show 40-260x speedups for common operations:

OperationPythonRustSpeedup
Parse PDB file15ms0.36ms42x
Remove ligands8ms0.03ms267x
Radius of gyration2ms0.05ms40x

§Error Handling

All parsing functions return Result<T, PdbError> with detailed error context:

use pdbrust::{parse_pdb_file, PdbError};

match parse_pdb_file("structure.pdb") {
    Ok(structure) => println!("Loaded {} atoms", structure.atoms.len()),
    Err(PdbError::IoError(e)) => eprintln!("File error: {}", e),
    Err(e) => eprintln!("Parse error: {}", e),
}

Re-exports§

pub use filter::selection::SelectionError;
pub use descriptors::ResidueBFactor;
pub use descriptors::StructureDescriptors;
pub use descriptors::ConfidenceCategory;
pub use descriptors::ResiduePlddt;
pub use descriptors::BindingSite;
pub use descriptors::ContactResidue;
pub use descriptors::HydrophobicContact;
pub use descriptors::LigandInteractionProfile;
pub use descriptors::ProteinLigandHBond;
pub use descriptors::SaltBridge;
pub use descriptors::RamachandranRegion;
pub use descriptors::RamachandranStats;
pub use descriptors::ResidueDihedrals;
pub use descriptors::ResidueRef;
pub use descriptors::HBondStats;
pub use descriptors::HBondType;
pub use descriptors::MainchainHBond;
pub use descriptors::ResidueHBonds;
pub use ligand_quality::AtomClash;
pub use ligand_quality::LigandPoseReport;
pub use core::PdbStructure;
pub use error::PdbError;
pub use parser::parse_mmcif_file;
pub use parser::parse_mmcif_string;
pub use parser::parse_pdb_file;
pub use parser::parse_pdb_reader;
pub use parser::parse_pdb_string;
pub use parser::parse_structure_file;
pub use records::Atom;
pub use records::Conect;
pub use records::Model;
pub use records::Remark;
pub use records::Residue;
pub use records::SSBond;
pub use records::SeqRes;
pub use writer::write_mmcif;
pub use writer::write_mmcif_file;
pub use writer::write_mmcif_string;
pub use writer::write_pdb;
pub use writer::write_pdb_file;
pub use parser::parse_gzip_mmcif_file;
pub use parser::parse_gzip_mmcif_reader;
pub use parser::parse_gzip_pdb_file;
pub use parser::parse_gzip_pdb_reader;
pub use parser::parse_gzip_structure_file;
pub use writer::write_gzip_mmcif_file;

Modules§

core
Core module for handling molecular structure file parsing and processing
descriptors
Structural descriptors and analysis functions for PDB structures.
dockq
DockQ v2 interface quality assessment for protein-protein complexes.
dssp
DSSP 4-like secondary structure assignment.
error
Error types for PDBRust library
filter
Filtering and cleaning operations for PDB structures.
geometry
Geometric analysis and structure superposition.
guide
PDBRust User Guide
ligand_quality
Ligand pose quality assessment module (PoseBusters-style geometry checks).
parser
Parser module for different molecular structure file formats
quality
Structure quality assessment functions.
rcsb
RCSB PDB Search and Download functionality.
records
Data structures for different PDB record types
summary
Unified structure summary functionality.
writer