Crate pdbrust

Crate pdbrust 

Source
Expand description

§PDBRust

A high-performance Rust library for parsing and analyzing Protein Data Bank (PDB) and mmCIF structure files.

PDBRust provides comprehensive tools for working with molecular structure data, from simple file parsing to advanced structural analysis. It’s designed for bioinformatics pipelines, structural biology research, and machine learning applications that work with protein structures.

For detailed documentation, examples, and best practices, see the guide module.

§Quick Start

use pdbrust::{parse_pdb_file, PdbStructure};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse a PDB file
    let structure = parse_pdb_file("protein.pdb")?;

    println!("Atoms: {}", structure.atoms.len());
    println!("Chains: {:?}", structure.get_chain_ids());

    Ok(())
}

§Feature Flags

PDBRust uses feature flags to keep the core library lightweight while offering extensive optional functionality:

FeatureDescriptionDefault
filterStructure filtering, extraction, and cleaningNo
descriptorsStructural descriptors (Rg, composition, geometry)No
qualityQuality assessment and reportsNo
summaryUnified summaries (requires descriptors + quality)No
rcsbRCSB PDB search and downloadNo
parallelParallel processing with RayonNo
geometryGeometric analysis with nalgebraNo
analysisAll analysis features combinedNo
fullEverythingNo

Enable features in your Cargo.toml:

[dependencies]
pdbrust = { version = "0.3", features = ["filter", "descriptors"] }

§Core Features

§Parsing

  • Parse both PDB and mmCIF files with comprehensive error handling
  • Automatic format detection based on file content
  • Support for multiple models (NMR ensembles)
  • Handle alternate conformations (altlocs)

§Structure Data

  • ATOM/HETATM records with full coordinate data
  • SEQRES sequence information
  • CONECT connectivity records
  • SSBOND disulfide bond definitions
  • Header, title, and remark metadata

§Optional Features

§Filtering (filter feature)

// Remove ligands and keep only chain A
let cleaned = structure
    .remove_ligands()
    .keep_only_chain("A")
    .keep_only_backbone();

// Extract CA coordinates
let ca_coords = structure.get_ca_coords(None);

§Structural Descriptors (descriptors feature)

// Compute structural properties
let rg = structure.radius_of_gyration();
let composition = structure.aa_composition();
let hydrophobic = structure.hydrophobic_ratio();

§Quality Assessment (quality feature)

// Get comprehensive quality report
let report = structure.quality_report();

if report.is_analysis_ready() {
    println!("Structure is ready for analysis");
}

§RCSB Integration (rcsb feature)

use pdbrust::rcsb::{download_structure, rcsb_search, SearchQuery, FileFormat};

// Download from RCSB PDB
let structure = download_structure("1UBQ", FileFormat::Pdb)?;

// Search RCSB
let query = SearchQuery::new()
    .with_text("kinase")
    .with_resolution_max(2.0);
let results = rcsb_search(&query, 10)?;

§Format Support

§PDB Format

Traditional fixed-width text format with support for:

  • ATOM/HETATM records
  • HEADER, TITLE, REMARK records
  • SEQRES, CONECT, SSBOND records
  • MODEL/ENDMDL for multi-model structures

§mmCIF Format

Modern dictionary-based format with support for:

  • _atom_site category (converted to ATOM records)
  • _entity_poly_seq category (converted to SEQRES records)
  • _struct_disulfid category (converted to SSBOND records)
  • Header and metadata information

§Examples

§Auto-detect Format

use pdbrust::parse_structure_file;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Works with both .pdb and .cif files
    let structure = parse_structure_file("example.cif")?;

    // Get all chain IDs
    let chains = structure.get_chain_ids();

    // Get sequence for a specific chain
    if let Some(chain_id) = chains.first() {
        let sequence = structure.get_sequence(chain_id);
        println!("Sequence for chain {}: {:?}", chain_id, sequence);
    }

    Ok(())
}

§Format-Specific Parsing

use pdbrust::{parse_pdb_file, parse_mmcif_file};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse PDB format explicitly
    let pdb_structure = parse_pdb_file("structure.pdb")?;

    // Parse mmCIF format explicitly
    let mmcif_structure = parse_mmcif_file("structure.cif")?;

    Ok(())
}

§Performance

PDBRust is designed for high performance. Benchmarks against the Python libraryPDB library show 40-260x speedups for common operations:

OperationPythonRustSpeedup
Parse PDB file15ms0.36ms42x
Remove ligands8ms0.03ms267x
Radius of gyration2ms0.05ms40x

§Error Handling

All parsing functions return Result<T, PdbError> with detailed error context:

use pdbrust::{parse_pdb_file, PdbError};

match parse_pdb_file("structure.pdb") {
    Ok(structure) => println!("Loaded {} atoms", structure.atoms.len()),
    Err(PdbError::IoError(e)) => eprintln!("File error: {}", e),
    Err(e) => eprintln!("Parse error: {}", e),
}

Re-exports§

pub use core::PdbStructure;
pub use error::PdbError;
pub use parser::parse_mmcif_file;
pub use parser::parse_mmcif_string;
pub use parser::parse_pdb_file;
pub use parser::parse_pdb_reader;
pub use parser::parse_pdb_string;
pub use parser::parse_structure_file;
pub use records::Atom;
pub use records::Conect;
pub use records::Model;
pub use records::Remark;
pub use records::Residue;
pub use records::SSBond;
pub use records::SeqRes;
pub use writer::write_pdb_file;
pub use parser::parse_gzip_mmcif_file;
pub use parser::parse_gzip_mmcif_reader;
pub use parser::parse_gzip_pdb_file;
pub use parser::parse_gzip_pdb_reader;
pub use parser::parse_gzip_structure_file;

Modules§

core
Core module for handling molecular structure file parsing and processing
descriptors
Structural descriptors and analysis functions for PDB structures.
error
Error types for PDBRust library
filter
Filtering and cleaning operations for PDB structures.
guide
PDBRust User Guide
parser
Parser module for different molecular structure file formats
quality
Structure quality assessment functions.
rcsb
RCSB PDB Search and Download functionality.
records
Data structures for different PDB record types
summary
Unified structure summary functionality.
writer