pdbrust 0.7.0

A comprehensive Rust library for parsing and analyzing Protein Data Bank (PDB) files
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
# Getting Started with PDBRust

This guide will help you get up and running with PDBRust quickly.

## Installation

Add PDBRust to your `Cargo.toml`:

```toml
[dependencies]
pdbrust = "0.7"
```

### Choosing Features

PDBRust uses feature flags to keep the core library lightweight. Enable only what you need:

```toml
# Minimal: just parsing
pdbrust = "0.7"

# Common setup: parsing + filtering + analysis
pdbrust = { version = "0.7", features = ["filter", "descriptors", "quality"] }

# Full analysis suite
pdbrust = { version = "0.7", features = ["analysis"] }

# Everything including RCSB download
pdbrust = { version = "0.7", features = ["full"] }
```

### Which Features Do I Need?

| If you want to... | Enable these features |
|-------------------|----------------------|
| Just parse PDB/mmCIF files | (none - included by default) |
| Filter atoms, extract chains, clean structures | `filter` |
| Use selection language (chain A and name CA) | `filter` |
| Compute Rg, composition, B-factor analysis | `descriptors` |
| Validate ligand pose geometry (clashes, overlap) | `ligand-quality` |
| Assess protein-protein docking quality (DockQ) | `dockq` |
| Assess structure quality | `quality` |
| Get all metrics in one call | `summary` |
| Calculate RMSD and align structures | `geometry` |
| Compute secondary structure (DSSP) | `dssp` |
| Download from RCSB PDB | `rcsb` |
| Async bulk downloads with rate limiting | `rcsb-async` |
| Process files in parallel | `parallel` |
| All analysis features | `analysis` |
| Everything | `full` |

## Quick Start (5 minutes)

### 1. Parse a Structure

```rust
use pdbrust::{parse_pdb_file, parse_structure_file};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse PDB format explicitly
    let structure = parse_pdb_file("protein.pdb")?;

    // Or auto-detect format (works with .pdb, .cif, .ent)
    let structure = parse_structure_file("protein.cif")?;

    // Basic info
    println!("Atoms: {}", structure.atoms.len());
    println!("Chains: {:?}", structure.get_chain_ids());
    println!("Residues: {}", structure.get_residues().len());

    Ok(())
}
```

### 2. Filter and Clean (requires `filter` feature)

```rust
use pdbrust::parse_pdb_file;

let structure = parse_pdb_file("protein.pdb")?;

// Method chaining for clean code
let cleaned = structure
    .remove_ligands()      // Remove waters, ions, ligands
    .remove_hydrogens()    // Remove H atoms
    .keep_only_chain("A"); // Keep only chain A

// Extract CA coordinates for distance calculations
let ca_coords = structure.get_ca_coords(None);

// Get CA atoms as full Atom structs
let ca_atoms = structure.get_ca_atoms(None);
```

### 2b. Selection Language (requires `filter` feature)

```rust
let structure = parse_pdb_file("protein.pdb")?;

// PyMOL/VMD-style selections
let chain_a = structure.select("chain A")?;
let ca_atoms = structure.select("name CA")?;
let backbone = structure.select("backbone")?;

// Combine with boolean operators
let chain_a_ca = structure.select("chain A and name CA")?;
let heavy_atoms = structure.select("protein and not hydrogen")?;
let complex = structure.select("(chain A or chain B) and bfactor < 30.0")?;

// Residue ranges and numeric comparisons
let active_site = structure.select("resid 50:60")?;
let flexible = structure.select("bfactor > 40.0")?;
```

### 3. Compute Descriptors (requires `descriptors` feature)

```rust
let structure = parse_pdb_file("protein.pdb")?;

// Individual metrics
let rg = structure.radius_of_gyration();
let max_dist = structure.max_ca_distance();
let composition = structure.aa_composition();

// Or get everything at once
let descriptors = structure.structure_descriptors();
println!("Rg: {:.2} A", descriptors.radius_of_gyration);
println!("Hydrophobic: {:.1}%", descriptors.hydrophobic_ratio * 100.0);
```

### 3b. B-factor Analysis (requires `descriptors` feature)

```rust
let structure = parse_pdb_file("protein.pdb")?;

// B-factor statistics
let mean_b = structure.b_factor_mean();
let mean_ca = structure.b_factor_mean_ca();
let std_b = structure.b_factor_std();
println!("Mean B-factor: {:.2} Ų", mean_b);

// Per-residue B-factor profile
let profile = structure.b_factor_profile();
for res in &profile {
    println!("{}{}: mean={:.2}", res.chain_id, res.residue_seq, res.mean);
}

// Identify flexible/rigid regions
let flexible = structure.flexible_residues(50.0);  // B > 50 Ų
let rigid = structure.rigid_residues(15.0);        // B < 15 Ų

// Normalize for cross-structure comparison
let normalized = structure.normalize_b_factors();
```

### 4. Quality Assessment (requires `quality` feature)

```rust
let structure = parse_pdb_file("protein.pdb")?;

let report = structure.quality_report();
println!("Models: {}", report.num_models);
println!("Has altlocs: {}", report.has_altlocs);
println!("Has HETATM: {}", report.has_hetatm);

// Check if suitable for typical analysis
if report.is_analysis_ready() {
    println!("Ready for analysis!");
}
```

### 5. Download from RCSB (requires `rcsb` feature)

```rust
use pdbrust::rcsb::{download_structure, rcsb_search, SearchQuery, FileFormat};

// Download by PDB ID
let structure = download_structure("1UBQ", FileFormat::Pdb)?;

// Search RCSB
let query = SearchQuery::new()
    .with_text("kinase")
    .with_organism("Homo sapiens")
    .with_resolution_max(2.0);

let results = rcsb_search(&query, 10)?;
println!("Found {} structures", results.total_count);
```

### 5b. Async Bulk Downloads (requires `rcsb-async` feature)

```rust
use pdbrust::rcsb::{download_multiple_async, AsyncDownloadOptions, FileFormat};

#[tokio::main]
async fn main() {
    let pdb_ids = vec!["1UBQ", "8HM2", "4INS", "1HHB"];

    // Download with default options (5 concurrent, 100ms rate limit)
    let results = download_multiple_async(&pdb_ids, FileFormat::Pdb, None).await;

    // Or with custom options
    let options = AsyncDownloadOptions::default()
        .with_max_concurrent(10)
        .with_rate_limit_ms(50);
    let results = download_multiple_async(&pdb_ids, FileFormat::Cif, Some(options)).await;

    for (pdb_id, result) in results {
        match result {
            Ok(structure) => println!("{}: {} atoms", pdb_id, structure.atoms.len()),
            Err(e) => eprintln!("{}: {}", pdb_id, e),
        }
    }
}
```

### 6. Geometry: RMSD and Alignment (requires `geometry` feature)

```rust
use pdbrust::{parse_pdb_file, geometry::AtomSelection};

let structure1 = parse_pdb_file("model1.pdb")?;
let structure2 = parse_pdb_file("model2.pdb")?;

// Calculate RMSD (without alignment)
let rmsd = structure1.rmsd_to(&structure2)?;
println!("RMSD: {:.3} Å", rmsd);

// Align structures using Kabsch algorithm
let (aligned, result) = structure1.align_to(&structure2)?;
println!("Alignment RMSD: {:.3} Å ({} atoms)", result.rmsd, result.num_atoms);

// Per-residue RMSD for flexibility analysis
let per_res = structure1.per_residue_rmsd_to(&structure2)?;
for r in per_res.iter().filter(|r| r.rmsd > 2.0) {
    println!("Flexible: {} {:.2} Å", r.residue_name, r.rmsd);
}

// Different atom selections
let rmsd_bb = structure1.rmsd_to_with_selection(&structure2, AtomSelection::Backbone)?;
```

### 7. Secondary Structure (requires `dssp` feature)

```rust
let structure = parse_pdb_file("protein.pdb")?;

// Compute DSSP-like secondary structure
let ss = structure.assign_secondary_structure();
println!("Helix: {:.1}%", ss.helix_fraction * 100.0);
println!("Sheet: {:.1}%", ss.sheet_fraction * 100.0);
println!("Coil:  {:.1}%", ss.coil_fraction * 100.0);

// Get compact string (e.g., "HHHHEEEECCCC")
let ss_string = structure.secondary_structure_string();

// Get composition tuple
let (helix, sheet, coil) = structure.secondary_structure_composition();

// Iterate over per-residue assignments
for res in &ss.residue_assignments {
    println!("{}{}: {} ({})",
        res.chain_id, res.residue_seq, res.residue_name, res.ss.code());
}
```

### 8. Ligand Pose Quality (requires `ligand-quality` feature)

```rust
let structure = parse_pdb_file("protein_ligand.pdb")?;

// List all ligands in the structure
let ligands = structure.get_ligand_names();
println!("Found ligands: {:?}", ligands);

// Validate a specific ligand
if let Some(report) = structure.ligand_pose_quality("LIG") {
    println!("Ligand: {} ({}{})", report.ligand_name,
             report.ligand_chain_id, report.ligand_residue_seq);
    println!("Atoms: {}", report.ligand_atom_count);
    println!("Min distance to protein: {:.2} Å", report.min_protein_ligand_distance);
    println!("Protein clashes: {}", report.num_clashes);
    println!("Volume overlap: {:.1}%", report.protein_volume_overlap_pct);

    if report.is_geometry_valid {
        println!("✓ Pose passes geometry checks");
    } else {
        println!("✗ Pose fails geometry checks");
        // Show clash details
        for clash in report.clashes.iter().take(3) {
            println!("  Clash: {} {} - {} {}: {:.2}Å (expected >{:.2}Å)",
                clash.protein_residue_name, clash.protein_atom_name,
                clash.ligand_atom_name, clash.ligand_element,
                clash.distance, clash.expected_min_distance);
        }
    }
}

// Validate all ligands at once
let reports = structure.all_ligand_pose_quality();
for report in &reports {
    let status = if report.is_geometry_valid { "PASS" } else { "FAIL" };
    println!("{}: {}", report.ligand_name, status);
}
```

## Common Workflows

### Workflow 1: Clean Structure for MD

```rust
let structure = parse_pdb_file("raw_structure.pdb")?;

let cleaned = structure
    .remove_ligands()      // Remove non-protein
    .remove_hydrogens()    // Will be added by MD software
    .keep_only_chain("A"); // Single chain

// Center at origin
let mut centered = cleaned.clone();
centered.center_structure();
centered.renumber_atoms();

write_pdb_file(&centered, "cleaned_for_md.pdb")?;
```

### Workflow 2: Dataset Characterization

```rust
use pdbrust::summary::batch_summarize;

let structures = vec![
    parse_pdb_file("protein1.pdb")?,
    parse_pdb_file("protein2.pdb")?,
    parse_pdb_file("protein3.pdb")?,
];

let summaries = batch_summarize(&structures);

for (i, summary) in summaries.iter().enumerate() {
    println!("Structure {}: {} residues, Rg={:.1}A",
        i+1, summary.num_residues, summary.radius_of_gyration);
}
```

### Workflow 3: RCSB Search Pipeline

```rust
use pdbrust::rcsb::{rcsb_search, download_structure, SearchQuery, FileFormat};

// Find human kinases with high resolution
let query = SearchQuery::new()
    .with_text("kinase")
    .with_organism("Homo sapiens")
    .with_resolution_max(2.0);

let results = rcsb_search(&query, 5)?;

for pdb_id in &results.pdb_ids {
    let structure = download_structure(pdb_id, FileFormat::Pdb)?;
    let rg = structure.radius_of_gyration();
    println!("{}: Rg = {:.1} A", pdb_id, rg);
}
```

## Running Examples

The repository includes several example files:

```bash
# Complete analysis workflow
cargo run --example analysis_workflow --features "filter,descriptors,quality,summary"

# Filtering operations
cargo run --example filtering_demo --features "filter"

# Selection language
cargo run --example selection_demo --features "filter"

# B-factor analysis
cargo run --example b_factor_demo --features "descriptors"

# Secondary structure (DSSP)
cargo run --example secondary_structure_demo --features "dssp"

# Ligand pose quality (PoseBusters-style checks)
cargo run --example ligand_quality_demo --features "ligand-quality"

# RMSD and structure alignment
cargo run --example geometry_demo --features "geometry"

# RCSB search and download
cargo run --example rcsb_workflow --features "rcsb,descriptors"

# Async bulk downloads
cargo run --example async_download_demo --features "rcsb-async,descriptors"

# Batch processing
cargo run --example batch_processing --features "descriptors,summary"

# Basic file reading
cargo run --example read_pdb -- examples/pdb_files/1UBQ.pdb
```

## Error Handling

All parsing functions return `Result` with detailed error types:

```rust
use pdbrust::{parse_pdb_file, PdbError};

match parse_pdb_file("structure.pdb") {
    Ok(structure) => {
        println!("Loaded {} atoms", structure.atoms.len());
    }
    Err(PdbError::Io(e)) => {
        eprintln!("File error: {}", e);
    }
    Err(PdbError::InvalidRecord(msg)) => {
        eprintln!("Parse error: {}", msg);
    }
    Err(e) => {
        eprintln!("Other error: {}", e);
    }
}
```

## Performance Tips

1. **Parse once, reuse**: Parse the structure once and perform multiple analyses
2. **Filter early**: Apply filters before expensive computations
3. **Use iterators**: Prefer `.iter()` over collecting to vectors when possible
4. **Batch operations**: Use `batch_summarize` for multiple structures

```rust
// Good: filter first, then compute
let chain_a = structure.keep_only_chain("A");
let rg = chain_a.radius_of_gyration();

// Good: use iterators
let ca_count = structure.atoms.iter()
    .filter(|a| a.name.trim() == "CA")
    .count();
```

## Next Steps

- See the [API Documentation]https://docs.rs/pdbrust for full details
- Check out the [examples/]../examples/ directory for more code
- Read the [guide module]https://docs.rs/pdbrust/latest/pdbrust/guide/ for comprehensive documentation

## Troubleshooting

### "feature X not found"

Make sure you've enabled the required feature in `Cargo.toml`:

```toml
pdbrust = { version = "0.7", features = ["filter", "descriptors"] }
```

### Network errors with RCSB

The `rcsb` feature requires internet access. Check your connection and firewall settings.

### Parse errors

- Ensure the file is a valid PDB or mmCIF format
- Check for file encoding issues (should be UTF-8 or ASCII)
- Try using `parse_structure_file()` for auto-detection