Skip to main content

dreid_pack/
lib.rs

1//! # DREID-Pack
2//!
3//! **Full-atom protein side-chain packing powered by the DREIDING force field.**
4//!
5//! # Overview
6//!
7//! DREID-Pack determines the Global Minimum Energy Conformation (GMEC) of protein side chains.
8//! The full pipeline:
9//!
10//! 1. **Structure Preparation** — Parse PDB/mmCIF, rebuild missing heavy atoms (Kabsch SVD
11//!    alignment), detect disulfide bonds, assign protonation states by pH and pKa, resolve
12//!    histidine tautomers via H-bond network analysis, add hydrogens, build covalent topology.
13//! 2. **Force-Field Parameterization** — Perceive molecular graph (rings, aromaticity,
14//!    hybridization), assign DREIDING atom types via rule engine, compute partial charges
15//!    (AMBER/CHARMM lookup for biopolymers, QEq with exact STO integrals for ligands),
16//!    generate force-field parameters.
17//! 3. **Rotamer Sampling** — Dunbrack 2010 backbone-dependent library with cis-Pro detection
18//!    and optional polar-hydrogen expansion; NERF coordinate generation.
19//! 4. **Self-Energy Pruning** — Sidechain vs fixed-scaffold energy + rotamer preference bias;
20//!    dead conformers discarded by threshold.
21//! 5. **Pair-Energy Computation** — Sidechain vs sidechain for every edge in the spatial
22//!    contact graph.
23//! 6. **Dead-End Elimination** — Goldstein + Split DEE, iterated to convergence with
24//!    single-survivor absorption.
25//! 7. **Tree-Decomposition DP** — MCS/min-fill elimination ordering; exact GMEC for treewidth
26//!    ≤ 5, rank-1 edge decomposition fallback.
27//!
28//! The force field is DREIDING (VdW + hydrogen bond), with optional distance-dependent Coulomb
29//! electrostatics. VdW supports both Buckingham (exp-6) and Lennard-Jones (12-6) forms.
30//!
31//! ## Why DREID-Pack
32//!
33//! **Physics.** DREIDING is a transferable, all-atom force field — atom types are automatically
34//! assigned from chemical-environment graph topology, not a hand-coded residue-specific lookup
35//! table. Buckingham exp-6 (default) gives a softer, more physical repulsive wall than LJ 12-6;
36//! both forms are available. Explicit D–H···A hydrogen bonding evaluates all-atom
37//! polar-hydrogen geometry. Optional distance-dependent Coulomb electrostatics add charge-based
38//! discrimination.
39//!
40//! **Chemistry.** Missing heavy atoms are rebuilt via SVD-based template alignment before any
41//! packing begins. 29 residue types with full protonation state coverage
42//! (Hid/Hie/Hip, Ash/Glh, Cys/Cym/Cyx, Lyn/Arn). All titratable residues
43//! (Asp, Glu, Lys, Arg, Cys, Tyr) are assigned states by pH and pKa; histidine tautomers are
44//! resolved via hydrogen-bond network scoring with salt-bridge priority override
45//! (Nδ/Nε near COO⁻ → Hip). Disulfide bonds are detected by Sγ–Sγ distance and relabeled to
46//! CYX. Polar-hydrogen torsions (Ser, Thr, Cys, Tyr, Ash, Glh, Lys, Lyn) are explicitly
47//! sampled as discrete candidates. cis-Proline is detected by ω angle and dispatches to the
48//! dedicated Dunbrack cis-Pro library.
49//!
50//! **Generality.** Any molecule — ligands, cofactors, nucleic acids, solvent, ions — is
51//! parameterized automatically and participates as a fixed-scaffold atom. Biopolymer charges
52//! come from AMBER/CHARMM lookup tables (29 protein residues × 5 terminal positions × 3
53//! schemes, plus nucleic acids and water models); ligand charges are computed dynamically via
54//! charge equilibration (QEq) with exact Slater-type orbital integrals, optionally embedded in
55//! the electrostatic field of the surrounding protein. Four packing scopes: full protein, ligand
56//! pocket, protein–protein interface, or explicit residue list.
57//!
58//! **Algorithm.** Self-energy threshold pruning eliminates dead conformers *before* the O(n²)
59//! pair-energy phase. Spatial grid acceleration replaces brute-force all-pairs distance
60//! computation. If the pruned interaction graph has treewidth > 5, rank-1 edge decomposition
61//! progressively factors weak pair couplings into self-energy until the graph becomes tractable
62//! — no bag-size explosion. Connected components are solved in parallel.
63//!
64//! **Engineering.** Design philosophy: maximize compile-time computation, then setup-time
65//! precomputation, then minimize runtime cost.
66//!
67//! - **Rotamer library** (`dunbrack`): the build script precomputes sin/cos of all 740K χ mean
68//!   angles across the full φ/ψ grid and bakes the entire Dunbrack 2010 database (~28 MB) into
69//!   `.rodata`. At query time, bilinear interpolation uses circular weighted means on the
70//!   precomputed sin/cos pairs — the only runtime trig is a single branchless `atan2f` per χ
71//!   angle.
72//! - **Coordinate builder** (`rotamer`): the build script code-generates a straight-line NERF
73//!   `build()` function per residue type, with all fixed torsion and bond-angle sin/cos baked as
74//!   `f32` immediates. Only the runtime-variable χ and polar-H angles call `sincosf` — for Arg
75//!   (18 atoms, 4χ), 4 of 36 trig evaluations remain at runtime; for simpler residues, zero.
76//! - **Energy kernels** (`dreid-kernel`): stateless, `#[inline(always)]` potential functions.
77//!   `precompute()` converts physical constants (D₀, R₀, V, φ₀) into optimized parameter
78//!   tuples at system-setup time, avoiding repeated sqrt/trig/exp in the energy hot loop.
79//! - **QEq integrals** (`cheq`/`sto-ns`): the QEq J-matrix uses exact two-center Coulomb
80//!   integrals over Slater-type ns orbitals via ellipsoidal coordinate expansion
81//!   (Rappé & Goddard, 1991) — no Gaussian approximation.
82//! - **Dispatch**: Buckingham vs LJ and Coulomb on/off are resolved at compile time via `const`
83//!   generics — zero runtime branching in the inner loop.
84//! - **Parallelism**: every compute-intensive phase — sampling, self-energy, pair-energy, DEE
85//!   convergence, subgraph DP — runs on `rayon`'s work-stealing thread pool.
86//! - **I/O**: PDB and mmCIF, both read and write.
87//!
88//! # Installation
89//!
90//! ```toml
91//! [dependencies]
92//! dreid-pack = "0.2.0"
93//! ```
94//!
95//! Set `default-features = false` to exclude the CLI dependencies (`clap`, `anyhow`,
96//! `indicatif`, `console`).
97//!
98//! # Usage
99//!
100//! ## End-to-End Example
101//!
102//! ```no_run
103//! use dreid_pack::io::{self, Format, ReadConfig};
104//! use dreid_pack::{PackConfig, pack};
105//! use std::io::{BufReader, BufWriter};
106//!
107//! // Read and parameterize
108//! let reader = BufReader::new(std::fs::File::open("input.pdb")?);
109//! let mut session = io::read(reader, Format::Pdb, &ReadConfig::default())?;
110//!
111//! // Pack with default settings
112//! pack::<()>(&mut session.system, &PackConfig::default());
113//!
114//! // Write result
115//! let writer = BufWriter::new(std::fs::File::create("output.pdb")?);
116//! io::write(writer, &session, Format::Pdb)?;
117//! # Ok::<(), Box<dyn std::error::Error>>(())
118//! ```
119//!
120//! [`pack::<()>`][pack()] uses zero-cost no-op progress tracking. Implement the [`Progress`]
121//! trait for custom phase-level callbacks.
122//!
123//! ## Key Items
124//!
125//! | Item | Role |
126//! |:-----|:-----|
127//! | [`pack()`] | Pack all mobile side chains to GMEC |
128//! | [`PackConfig`] | Packing algorithm settings (electrostatics, thresholds, polar-H) |
129//! | [`Progress`] | Phase-level progress callback trait |
130//! | [`System`] | Molecular system: mobile residues + fixed scaffold + FF params |
131//! | [`Residue`] | One packable residue slot with backbone geometry and sidechain atoms |
132//!
133//! For io items, see the [`io`] module documentation.
134
135mod model;
136mod pack;
137
138pub mod io;
139
140pub use model::residue::ResidueType;
141pub use model::system::{
142    BuckMatrix, BuckPair, FixedAtomPool, ForceFieldParams, HBondParams, LjMatrix, LjPair, Residue,
143    SidechainAtoms, System, VdwMatrix,
144};
145pub use model::types::{TypeIdx, Vec3};
146
147pub use pack::PackConfig;
148pub use pack::Progress;
149pub use pack::pack;