PDBRust
A fast Rust library for parsing and analyzing PDB and mmCIF protein structure files.
Installation
[]
= "0.3"
With optional features:
[]
= { = "0.3", = ["filter", "descriptors", "rcsb", "gzip"] }
Quick Start
use ;
Features
| Feature | Description |
|---|---|
filter |
Filter atoms, extract chains, remove ligands, clean structures |
descriptors |
Radius of gyration, amino acid composition, geometric metrics |
quality |
Structure quality assessment (altlocs, missing residues, etc.) |
summary |
Combined quality + descriptors in one call |
rcsb |
Search and download structures from RCSB PDB |
gzip |
Parse gzip-compressed files (.ent.gz, .pdb.gz, .cif.gz) |
parallel |
Parallel processing with Rayon |
Examples
Filter and Clean Structures
use parse_pdb_file;
let structure = parse_pdb_file?;
// Extract CA coordinates
let ca_coords = structure.get_ca_coords;
// Chain operations with fluent API
let chain_a = structure
.remove_ligands
.keep_only_chain
.keep_only_ca;
Compute Structural Descriptors
let structure = parse_pdb_file?;
let rg = structure.radius_of_gyration;
let max_dist = structure.max_ca_distance;
let composition = structure.aa_composition;
// Or get everything at once
let descriptors = structure.structure_descriptors;
Parse Gzip-Compressed Files
use parse_gzip_pdb_file;
// Parse gzip-compressed PDB files from the PDB archive
let structure = parse_gzip_pdb_file?;
println!;
Download from RCSB PDB
use ;
// Download a structure
let structure = download_structure?;
// Search RCSB
let query = new
.with_text
.with_organism
.with_resolution_max;
let results = rcsb_search?;
Common Workflows
See the examples/ directory for complete working code:
| Workflow | Example | Features Used |
|---|---|---|
| Load, clean, analyze, export | analysis_workflow.rs | filter, descriptors, quality, summary |
| Filter and clean structures | filtering_demo.rs | filter |
| Search and download from RCSB | rcsb_workflow.rs | rcsb, descriptors |
| Process multiple files | batch_processing.rs | descriptors, summary |
Run examples with:
For a complete getting started guide, see docs/GETTING_STARTED.md.
Performance
Benchmarks against equivalent Python code show 40-260x speedups for in-memory operations:
| Operation | Speedup |
|---|---|
| Parsing | 2-3x |
| get_ca_coords | 240x |
| max_ca_distance | 260x |
| radius_of_gyration | 100x |
Full PDB Archive Validation
PDBRust has been validated against the entire Protein Data Bank:
| Metric | Value |
|---|---|
| Total Structures Tested | 230,655 |
| Success Rate | 100% |
| Failed Parses | 0 |
| Total Atoms Parsed | 2,057,302,767 |
| Processing Rate | ~92 files/sec |
| Largest Structure | 2ku2 (1,290,100 atoms) |
Run the full benchmark yourself:
Documentation
Citation
If you use PDBRust in your research, please cite:
Or in text format:
Fooladi, H. (2025). PDBRust: A High-Performance Rust Library for PDB/mmCIF Parsing and Analysis. https://github.com/HFooladi/pdbrust
License
MIT