PDBRust
A fast Rust library for parsing and analyzing PDB and mmCIF protein structure files. Also available as a Python package with 40-260x speedups over pure Python implementations.
Installation
Python
Rust
[]
= "0.7"
With optional features:
[]
= { = "0.7", = ["filter", "descriptors", "rcsb", "gzip"] }
Quick Start
Python
# Parse a PDB file
=
# Filter and analyze
=
=
# Get coordinates as numpy arrays (fast!)
= # Shape: (N, 3)
= # Shape: (CA, 3)
# Download from RCSB PDB
=
Rust
use ;
Features
| Feature | Description |
|---|---|
filter |
Filter atoms, extract chains, remove ligands, clean structures |
descriptors |
Radius of gyration, amino acid composition, geometric metrics |
quality |
Structure quality assessment (altlocs, missing residues, etc.) |
summary |
Combined quality + descriptors in one call |
geometry |
RMSD, LDDT (superposition-free), structure alignment (Kabsch) |
dssp |
DSSP 4-like secondary structure assignment (H, G, I, P, E, B, T, S, C) |
dockq |
DockQ v2 interface quality assessment for protein-protein complexes |
rcsb |
Search and download structures from RCSB PDB |
rcsb-async |
Async/concurrent bulk downloads with rate limiting |
gzip |
Parse gzip-compressed files (.ent.gz, .pdb.gz, .cif.gz) |
parallel |
Parallel processing with Rayon |
Examples
Filter and Clean Structures
use parse_pdb_file;
let structure = parse_pdb_file?;
// Extract CA coordinates
let ca_coords = structure.get_ca_coords;
// Chain operations with fluent API
let chain_a = structure
.remove_ligands
.keep_only_chain
.keep_only_ca;
Compute Structural Descriptors
let structure = parse_pdb_file?;
let rg = structure.radius_of_gyration;
let max_dist = structure.max_ca_distance;
let composition = structure.aa_composition;
// Or get everything at once
let descriptors = structure.structure_descriptors;
Parse Gzip-Compressed Files
use parse_gzip_pdb_file;
// Parse gzip-compressed PDB files from the PDB archive
let structure = parse_gzip_pdb_file?;
println!;
Geometry: RMSD, LDDT, and Alignment
use ;
let model = parse_pdb_file?;
let reference = parse_pdb_file?;
// Calculate RMSD (without alignment)
let rmsd = model.rmsd_to?;
println!;
// Calculate LDDT (superposition-free, used in AlphaFold/CASP)
let lddt = model.lddt_to?;
println!; // 0.0 (poor) to 1.0 (perfect)
// Align structures using Kabsch algorithm
let = model.align_to?;
println!;
// Per-residue LDDT for quality analysis
let per_res = model.per_residue_lddt_to?;
for r in per_res.iter.filter
// Different atom selections
let rmsd_bb = model.rmsd_to_with_selection?;
let lddt_bb = model.lddt_to_with_options?;
Download from RCSB PDB
use ;
// Download a structure
let structure = download_structure?;
// Search RCSB
let query = new
.with_text
.with_organism
.with_resolution_max;
let results = rcsb_search?;
Bulk Downloads with Async
use ;
async
Python:
# Download multiple structures concurrently
=
# With custom options
=
=
Secondary Structure Assignment (DSSP)
use parse_pdb_file;
let structure = parse_pdb_file?;
// Compute DSSP-like secondary structure
let ss = structure.assign_secondary_structure;
println!;
println!;
println!;
// Get as compact string (e.g., "HHHHEEEECCCC")
let ss_string = structure.secondary_structure_string;
// Get composition tuple
let = structure.secondary_structure_composition;
Python:
=
# Get secondary structure assignment
=
# Compact string representation
=
# e.g., "CCCCHHHHHHHCCEEEEEECCC"
# Iterate over residue assignments
B-factor Analysis
use parse_pdb_file;
let structure = parse_pdb_file?;
// B-factor statistics
let mean_b = structure.b_factor_mean;
let mean_b_ca = structure.b_factor_mean_ca;
let std_b = structure.b_factor_std;
println!;
println!;
// Per-residue B-factor profile
let profile = structure.b_factor_profile;
for res in &profile
// Identify flexible/rigid regions
let flexible = structure.flexible_residues; // B > 50 Ų
let rigid = structure.rigid_residues; // B < 15 Ų
// Normalize for cross-structure comparison
let normalized = structure.normalize_b_factors;
Python:
=
# B-factor statistics
# Per-residue profile
=
# Flexible regions
=
Selection Language (PyMOL/VMD-style)
use parse_pdb_file;
let structure = parse_pdb_file?;
// Basic selections
let chain_a = structure.select?;
let ca_atoms = structure.select?;
let backbone = structure.select?;
// Residue selections
let res_range = structure.select?;
let alanines = structure.select?;
// Boolean operators
let chain_a_ca = structure.select?;
let heavy_atoms = structure.select?;
let complex = structure.select?;
// Numeric comparisons
let low_bfactor = structure.select?;
let high_occ = structure.select?;
// Validate without executing
validate_selection?;
Python:
=
# Select atoms using familiar syntax
=
=
=
# Complex selections
=
=
Common Workflows
See the examples/ directory for complete working code:
| Workflow | Example | Features Used |
|---|---|---|
| Load, clean, analyze, export | analysis_workflow.rs | filter, descriptors, quality, summary |
| Filter and clean structures | filtering_demo.rs | filter |
| Selection language | selection_demo.rs | filter |
| B-factor analysis | b_factor_demo.rs | descriptors |
| Secondary structure (DSSP) | secondary_structure_demo.rs | dssp |
| RMSD and structure alignment | geometry_demo.rs | geometry |
| LDDT (superposition-free) | lddt_demo.rs | geometry |
| DockQ interface quality | dockq_demo.rs | dockq |
| Search and download from RCSB | rcsb_workflow.rs | rcsb, descriptors |
| Async bulk downloads | async_download_demo.rs | rcsb-async, descriptors |
| Process multiple files | batch_processing.rs | descriptors, summary |
Python examples are available in pdbrust-python/examples/:
basic_usage.py- Parsing and structure accesswriting_files.py- Write PDB/mmCIF filesgeometry_rmsd.py- RMSD and alignmentlddt_demo.py- LDDT calculation (superposition-free)numpy_integration.py- Numpy arrays, distance matrices, contact mapsrcsb_search.py- RCSB search and downloadselection_language.py- PyMOL/VMD-style selection languagesecondary_structure.py- DSSP secondary structure assignmentb_factor_analysis.py- B-factor statistics and flexibility analysisalphafold_analysis.py- AlphaFold pLDDT confidence scores and disordered regionsramachandran_analysis.py- Phi/Psi dihedrals and Ramachandran validationligand_interactions.py- Protein-ligand binding sites and interactionsquality_and_summary.py- Quality reports and structure summariesbatch_processing.py- Process multiple files with CSV exportadvanced_filtering.py- Filtering, normalization, and manipulationdockq_demo.py- DockQ v2 interface quality assessment
Run Rust examples with:
For a complete getting started guide, see docs/GETTING_STARTED.md.
Performance
Benchmarks against equivalent Python code show 40-260x speedups for in-memory operations:
| Operation | Speedup |
|---|---|
| Parsing | 2-3x |
| get_ca_coords | 240x |
| max_ca_distance | 260x |
| radius_of_gyration | 100x |
Full PDB Archive Validation
PDBRust has been validated against the entire Protein Data Bank:
| Metric | Value |
|---|---|
| Total Structures Tested | 230,655 |
| Success Rate | 100% |
| Failed Parses | 0 |
| Total Atoms Parsed | 2,057,302,767 |
| Processing Rate | ~92 files/sec |
| Largest Structure | 2ku2 (1,290,100 atoms) |
Run the full benchmark yourself:
Python Package
Pre-built wheels available for Linux, macOS, and Windows (Python 3.9-3.13):
Platform Notes
The Python package includes full functionality on macOS and Windows. On Linux, the RCSB download/search features are not available in the pre-built wheels due to cross-compilation constraints. All other features (parsing, filtering, analysis, geometry, numpy arrays, etc.) work on all platforms.
| Platform | Parsing | Filtering | Descriptors | Geometry | DSSP | RCSB |
|---|---|---|---|---|---|---|
| macOS | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Windows | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Linux | ✓ | ✓ | ✓ | ✓ | ✓ | - |
See pdbrust-python/README.md for full Python API documentation.
Documentation
Citation
If you use PDBRust in your research, please cite it using the metadata in our CITATION.cff file:
Or in text format:
Fooladi, H. (2025). PDBRust: A High-Performance Rust Library for PDB/mmCIF Parsing and Analysis. Zenodo. https://doi.org/10.5281/zenodo.18232203
License
MIT