dreid-pack 0.2.0

A high-performance, pure Rust library and CLI for full-atom protein side-chain packing using the DREIDING force field, Goldstein+Split DEE, and tree-decomposition DP—with native protein-ligand and protein-nucleic acid interface support.
Documentation

DREID-Pack

Full-atom protein side-chain packing powered by the DREIDING force field.

OverviewInstallationCLILibraryBenchmarkPipeline & AlgorithmLicense


Overview

DREID-Pack determines the Global Minimum Energy Conformation (GMEC) of protein side chains. The full pipeline:

  1. Structure Preparation — Parse PDB/mmCIF, rebuild missing heavy atoms (Kabsch SVD alignment), detect disulfide bonds, assign protonation states by pH and pKa, resolve histidine tautomers via H-bond network analysis, add hydrogens, build covalent topology.
  2. Force-Field Parameterization — Perceive molecular graph (rings, aromaticity, hybridization), assign DREIDING atom types via rule engine, compute partial charges (AMBER/CHARMM lookup for biopolymers, QEq with exact STO integrals for ligands), generate force-field parameters.
  3. Rotamer Sampling — Dunbrack 2010 backbone-dependent library with cis-Pro detection and optional polar-hydrogen expansion; NERF coordinate generation.
  4. Self-Energy Pruning — Sidechain vs fixed-scaffold energy + rotamer preference bias; dead conformers discarded by threshold.
  5. Pair-Energy Computation — Sidechain vs sidechain for every edge in the spatial contact graph.
  6. Dead-End Elimination — Goldstein + Split DEE, iterated to convergence with single-survivor absorption.
  7. Tree-Decomposition DP — MCS/min-fill elimination ordering; exact GMEC for treewidth ≤ 5, rank-1 edge decomposition fallback.

The force field is DREIDING (VdW + hydrogen bond), with optional distance-dependent Coulomb electrostatics. VdW supports both Buckingham (exp-6) and Lennard-Jones (12-6) forms.

Why DREID-Pack

Physics. DREIDING is a transferable, all-atom force field — atom types are automatically assigned from chemical-environment graph topology, not a hand-coded residue-specific lookup table. Buckingham exp-6 (default) gives a softer, more physical repulsive wall than LJ 12-6; both forms are available. Explicit D–H···A hydrogen bonding evaluates all-atom polar-hydrogen geometry. Optional distance-dependent Coulomb electrostatics add charge-based discrimination.

Chemistry. Missing heavy atoms are rebuilt via SVD-based template alignment before any packing begins. 29 residue types with full protonation state coverage (Hid/Hie/Hip, Ash/Glh, Cys/Cym/Cyx, Lyn/Arn). All titratable residues (Asp, Glu, Lys, Arg, Cys, Tyr) are assigned states by pH and pKa; histidine tautomers are resolved via hydrogen-bond network scoring with salt-bridge priority override (Nδ/Nε near COO⁻ → Hip). Disulfide bonds are detected by Sγ–Sγ distance and relabeled to CYX. Polar-hydrogen torsions (Ser, Thr, Cys, Tyr, Ash, Glh, Lys, Lyn) are explicitly sampled as discrete candidates. cis-Proline is detected by ω angle and dispatches to the dedicated Dunbrack cis-Pro library.

Generality. Any molecule — ligands, cofactors, nucleic acids, solvent, ions — is parameterized automatically and participates as a fixed-scaffold atom. Biopolymer charges come from AMBER/CHARMM lookup tables (29 protein residues × 5 terminal positions × 3 schemes, plus nucleic acids and water models); ligand charges are computed dynamically via charge equilibration (QEq) with exact Slater-type orbital integrals, optionally embedded in the electrostatic field of the surrounding protein. Four packing scopes: full protein, ligand pocket, protein–protein interface, or explicit residue list.

Algorithm. Self-energy threshold pruning eliminates dead conformers before the O(n²) pair-energy phase. Spatial grid acceleration replaces brute-force all-pairs distance computation. If the pruned interaction graph has treewidth > 5, rank-1 edge decomposition progressively factors weak pair couplings into self-energy until the graph becomes tractable — no bag-size explosion. Connected components are solved in parallel.

Engineering. Design philosophy: maximize compile-time computation, then setup-time precomputation, then minimize runtime cost.

  • Rotamer library (dunbrack): the build script precomputes sin/cos of all 740K χ mean angles across the full φ/ψ grid and bakes the entire Dunbrack 2010 database (~28 MB) into .rodata. At query time, bilinear interpolation uses circular weighted means on the precomputed sin/cos pairs — the only runtime trig is a single branchless atan2f per χ angle.
  • Coordinate builder (rotamer): the build script code-generates a straight-line NERF build() function per residue type, with all fixed torsion and bond-angle sin/cos baked as f32 immediates. Only the runtime-variable χ and polar-H angles call sincosf — for Arg (18 atoms, 4χ), 4 of 36 trig evaluations remain at runtime; for simpler residues, zero.
  • Energy kernels (dreid-kernel): stateless, #[inline(always)] potential functions. precompute() converts physical constants (D₀, R₀, V, φ₀) into optimized parameter tuples at system-setup time, avoiding repeated sqrt/trig/exp in the energy hot loop.
  • QEq integrals (cheq/sto-ns): the QEq J-matrix uses exact two-center Coulomb integrals over Slater-type ns orbitals via ellipsoidal coordinate expansion (Rappé & Goddard, 1991) — no Gaussian approximation.
  • Dispatch: Buckingham vs LJ and Coulomb on/off are resolved at compile time via const generics — zero runtime branching in the inner loop.
  • Parallelism: every compute-intensive phase — sampling, self-energy, pair-energy, DEE convergence, subgraph DP — runs on rayon's work-stealing thread pool.
  • I/O: PDB and mmCIF, both read and write.

Installation

CLI

cargo install dreid-pack

Library

[dependencies]
dreid-pack = "0.2.0"

Set default-features = false to exclude the CLI dependencies (clap, anyhow, indicatif, console).

CLI

The binary is called dpack. Four subcommands control the packing scope:

# Pack all residues
dpack full input.pdb output.pdb

# Pack near a ligand pocket (8 Å default)
dpack pocket input.cif -L A:401 -r 10.0

# Pack a protein–protein interface
dpack interface input.pdb -A A,B -B C -c 6.0

# Pack an explicit residue list
dpack list input.pdb --residues A:42,A:55,B:108

Output defaults to <stem>-packed.<ext> when omitted.

Common Parameters

Structure

Flag Description Default
--ph <PH> Target pH for protonation state assignment input as-is
--his <STRATEGY> His tautomer: network, hid, hie, random network
--no-water Remove crystallographic water keep
--no-ions Remove monoatomic ions keep
--no-hetero Remove non-ion HETATM residues keep
--vdw <FORM> VdW potential: exp (Buckingham) or lj exp
--ff-rules <TOML> Custom DREIDING typing rules built-in
--ff-params <TOML> Custom force-field parameters built-in

Charges

Flag Description Default
--protein-charges <SCHEME> amber-ff14sb, amber-ff03, charmm amber-ff14sb
--nucleic-charges <SCHEME> amber, charmm amber
--water-charges <MODEL> tip3p, tip3p-fb, spc, spc-e, opc3 tip3p
--hetero-qeq <METHOD> Ligand charge method: embedded, vacuum embedded
--qeq-shell <Å> Embedded QEq environment shell radius 10.0
--qeq-charge <e> QEq target net charge (elementary charge units) 0.0
--qeq-basis <TYPE> QEq basis: sto (exact) or gto (approx) sto
--qeq-lambda <λ> QEq orbital screening scale factor λ 0.5
--qeq-tol <TOL> SCF convergence tolerance (RMS Δq) 1e-6
--qeq-iter <N> Maximum SCF iterations 2000
--qeq-damp <STRATEGY> Charge update damping: none, fixed:D, auto:D auto:0.4
--no-h-scf Disable hydrogen nonlinear SCF off

Packing

Flag Description Default
-e, --electrostatics <D> Enable Coulomb with ε(r) = D·r off
--no-polar-h Skip polar-hydrogen torsion sampling sample
--include-input Add input conformation as candidate off
-T, --template <MOL2> Hetero residue template (repeatable)
-E, --self-energy <E> Self-energy pruning window (kcal/mol) 30.0
-p, --prob-cutoff <P> Min Dunbrack rotamer probability 0.0
-q, --quiet Suppress progress output off

Run dpack <subcommand> --help for the full parameter set.

Library

The public API exposes three layers: I/O, system model, and packing.

End-to-End Example

use dreid_pack::io::{self, Format, ReadConfig};
use dreid_pack::{PackConfig, pack};
use std::io::{BufReader, BufWriter};

// Read and parameterize
let reader = BufReader::new(std::fs::File::open("input.pdb")?);
let mut session = io::read(reader, Format::Pdb, &ReadConfig::default())?;

// Pack with default settings
pack::<()>(&mut session.system, &PackConfig::default());

// Write result
let writer = BufWriter::new(std::fs::File::create("output.pdb")?);
io::write(writer, &session, Format::Pdb)?;

pack::<()> uses zero-cost no-op progress tracking. Implement the Progress trait for custom phase-level callbacks.

Please see the API documentation for details.

Benchmark

Evaluated on DB379 — 379 non-redundant X-ray crystal structures. B-factor filter: residues above the per-structure 75th-percentile heavy-atom B-factor are excluded. 68,550 residues across 18 residue types.

Metric 20° 40°
χ1 88.8% 91.1%
χ1+2 76.1% 84.1%
χ1–4 71.5% 79.1%

SC-RMSD: 0.728 Å

AA N χ1–4 20° χ1–4 40° RMSD/Å
ARG 3,332 34.4% 41.4% 2.005
ASN 3,235 65.3% 84.8% 0.749
ASP 4,178 61.8% 81.2% 0.742
CYS 792 88.6% 89.8% 0.372
GLN 2,490 40.0% 60.7% 1.084
GLU 3,893 29.2% 50.8% 1.131
HIS 1,734 62.2% 84.2% 0.950
ILE 5,616 84.3% 85.9% 0.393
LEU 8,937 85.8% 88.6% 0.493
LYS 3,330 37.1% 42.6% 1.101
MET 1,672 53.1% 60.8% 0.902
PHE 3,839 82.1% 94.1% 1.102
PRO 3,927 78.7% 80.7% 0.241
SER 4,941 69.7% 70.8% 0.581
THR 5,040 93.5% 94.2% 0.276
TRP 1,373 75.4% 85.8% 1.103
TYR 3,161 79.5% 91.5% 1.309
VAL 7,060 94.7% 95.0% 0.250

DB379 structures ship with the repository. Run:

cargo run -p dreid-pack-bench --release -- crates/dreid-pack-bench/data/db379

Pipeline & Algorithm

flowchart TD
    IN[/"PDB · mmCIF"/]

    subgraph prep ["Structure Preparation — bio-forge"]
        IN --> parse["Parse Structure"]
        parse --> clean["Clean: Strip Water / Ions / Hetero"]
        clean --> repair["Repair: Rebuild Missing Heavy Atoms"]
        repair --> ss["Disulfide Detect"]
        ss --> titrate["pH Titration\n(Asp/Glu/Lys/Arg/Cys/Tyr pKa)"]
        titrate --> salt{"Salt Bridge\nNδ/Nε near COO⁻?"}
        salt -->|yes| hip["→ Hip"]
        salt -->|no| hisn{"His Strategy"}
        hisn -->|HbNetwork| hnet["Score H-bond Network\n(Hid / Hie)"]
        hisn -->|Direct| hdir["Force Hid / Hie"]
        hisn -->|Random| hrand["Random Hid / Hie"]
        hip --> addh["Add Hydrogens"]
        hnet --> addh
        hdir --> addh
        hrand --> addh
        addh --> topo["Topology:\nIntra-residue Bonds\n+ Peptide/Nucleic Backbone\n+ Hetero Templates\n+ Disulfide Bonds"]
    end

    subgraph forge ["Force Field Parameterization — dreid-forge"]
        topo --> perc["Perception:\nRings\n→ Kekulé\n→ Aromaticity\n→ Resonance\n→ Hybridization"]
        perc --> atyp["DREIDING Typing"]
        atyp --> cdisp{"Charge Method"}
        cdisp -->|Protein / Nucleic| ffc["ffcharge Lookup\n(AMBER / CHARMM)"]
        cdisp -->|Ligand / Hetero| qeq["QEq: STO Integrals\n→ J Matrix → Linear Solve"]
        ffc --> pgen["Parameter Generation & Precomputation"]
        qeq --> pgen
    end

    subgraph build ["System Build — dreid-pack io"]
        pgen --> scope["Scope Filter\n(Full / Pocket / Interface / List)"]
        scope --> part["Partition Mobile / Fixed"]
        part --> dihed["Compute Backbone φ, ψ, ω"]
    end

    dihed --> dunq & reach & fatoms

    subgraph par ["Parallel Initialization"]
        direction LR

        subgraph s1 ["Rotamer Sampling"]
            direction TB
            dunq["Dunbrack 2010 Query\n(bilinear φ,ψ interpolation)"]
            dunq --> cis{"cis-Pro?\n(|ω| < π/2)"}
            cis -->|yes| cislib["cis-Pro Library"]
            cis -->|no| translib["trans Library"]
            cislib --> probf["Probability Filter\n(keep best + p ≥ cutoff)"]
            translib --> probf
            probf --> ph["Polar-H Expand\n(Ser/Thr/Cys/Tyr/Ash/Glh/Lys/Lyn)"]
            ph --> nerf["NERF Coordinate Build\n(code-generated per residue type)"]
            nerf --> bias["Rotamer Bias\nw · min(−ln p/p_max, 8)"]
        end

        subgraph s2 ["Contact Graph"]
            direction TB
            reach["Cα + Sidechain Reach"]
            reach --> grid1["Spatial Grid"]
            grid1 --> edet["Edge Detect\n(Cα dist ≤ reach_i + reach_j + cutoff)"]
            edet --> csr["CSR Adjacency"]
        end

        subgraph s3 ["Fixed Scaffold"]
            direction TB
            fatoms["Fixed Atom Pool"]
            fatoms --> grid2["Spatial Grid\n(cell size = interaction cutoff)"]
        end
    end

    bias --> selfe
    grid2 --> selfe

    selfe["Self-Energy: SC vs Scaffold"]
    selfe --> vdw1["VdW + H-bond + Coulomb\n(per-candidate, parallel)"]
    vdw1 --> addb["+ Weighted Rotamer Bias"]
    addb --> thresh["Threshold Prune\n(e_min + 30 kcal/mol)"]
    thresh --> compact["Compact Conformations"]

    compact --> paire
    csr --> paire

    paire["Pair Energy: SC vs SC"]
    paire --> vdw2["VdW + H-bond + Coulomb\n(per-edge × per-pair, parallel)"]

    vdw2 --> gold["Goldstein DEE\n(parallel per-slot, converge)"]
    gold --> abs1["Absorb Singletons\n(fold pair-E into self-E)"]
    abs1 --> spl["Split DEE\n(per-split-neighbor, converge)"]
    spl --> abs2["Absorb Singletons"]
    abs2 -.->|×2| gold

    abs2 --> filt["Filter Weak Edges\n(|pair_e| ≤ 2.0 kcal/mol)"]
    filt --> cc["Connected Components"]
    cc -->|parallel| mcs["MCS Elimination Ordering"]
    mcs --> chk{"Chordal\n(PEO)?"}
    chk -->|yes| peo["PEO Bags"]
    chk -->|no| minf["Min-fill Heuristic Bags"]
    peo --> tw{"tw ≤ 5?"}
    minf --> tw
    tw -->|yes| msg["Bottom-up Message Passing\n→ Traceback"]
    tw -->|no| edec["Rank-1 Edge Decomposition\n(fold weak pairs into self-E)"]
    edec -.->|retry| mcs
    msg --> gmec(["GMEC"])

    gmec --> wb["Write-back Coordinates"]
    wb --> recon["Reconstruct Structure"]
    recon --> OUT[/"PDB · mmCIF"/]

Dependencies

Direct

Crate Role
dreid-forge Orchestrates structure preparation, DREIDING atom typing, and charge assignment
dreid-kernel Stateless no_std energy kernels with precompute() optimization
dunbrack Compile-time-embedded Dunbrack 2010 rotamer library (740K entries)
rotamer Code-generated straight-line NERF coordinate builder
rayon Work-stealing thread pool for data parallelism

Transitive (via dreid-forge)

Crate Role
bio-forge PDB/mmCIF parsing, cleaning, heavy-atom repair (Kabsch SVD), pH-aware protonation, His tautomer network analysis, topology building
dreid-typer Molecular graph perception (SSSR rings, Kekulé, aromaticity, resonance, hybridization) and DREIDING atom typing rule engine
cheq Charge Equilibration (QEq) solver with exact STO integrals, embedded-field support for ligands in protein environments
ffcharge Compile-time no_std AMBER/CHARMM partial charge lookup (protein, nucleic acid, water, ion)
sto-ns Exact two-center Coulomb integrals over ns Slater-Type Orbitals via ellipsoidal expansion

All non-rayon crates are authored and maintained within this project ecosystem.

License

MIT — see LICENSE.


Caltech Materials and Process Simulation Center (MSC)

Made by Tony Kan