dreid-pack 0.2.0

A high-performance, pure Rust library and CLI for full-atom protein side-chain packing using the DREIDING force field, Goldstein+Split DEE, and tree-decomposition DP—with native protein-ligand and protein-nucleic acid interface support.
Documentation
# DREID-Pack

**Full-atom protein side-chain packing powered by the DREIDING force field.**

[Overview](#overview) • [Installation](#installation) • [CLI](#cli) • [Library](#library) • [Benchmark](#benchmark) • [Pipeline & Algorithm](#pipeline--algorithm) • [License](#license)

---

## Overview

DREID-Pack determines the Global Minimum Energy Conformation (GMEC) of protein side chains. The full pipeline:

1. **Structure Preparation** — Parse PDB/mmCIF, rebuild missing heavy atoms (Kabsch SVD alignment), detect disulfide bonds, assign protonation states by pH and pKa, resolve histidine tautomers via H-bond network analysis, add hydrogens, build covalent topology.
2. **Force-Field Parameterization** — Perceive molecular graph (rings, aromaticity, hybridization), assign DREIDING atom types via rule engine, compute partial charges (AMBER/CHARMM lookup for biopolymers, QEq with exact STO integrals for ligands), generate force-field parameters.
3. **Rotamer Sampling** — Dunbrack 2010 backbone-dependent library with cis-Pro detection and optional polar-hydrogen expansion; NERF coordinate generation.
4. **Self-Energy Pruning** — Sidechain vs fixed-scaffold energy + rotamer preference bias; dead conformers discarded by threshold.
5. **Pair-Energy Computation** — Sidechain vs sidechain for every edge in the spatial contact graph.
6. **Dead-End Elimination** — Goldstein + Split DEE, iterated to convergence with single-survivor absorption.
7. **Tree-Decomposition DP** — MCS/min-fill elimination ordering; exact GMEC for treewidth ≤ 5, rank-1 edge decomposition fallback.

The force field is DREIDING (VdW + hydrogen bond), with optional distance-dependent Coulomb electrostatics. VdW supports both Buckingham (exp-6) and Lennard-Jones (12-6) forms.

### Why DREID-Pack

**Physics.** DREIDING is a transferable, all-atom force field — atom types are automatically assigned from chemical-environment graph topology, not a hand-coded residue-specific lookup table. Buckingham exp-6 (default) gives a softer, more physical repulsive wall than LJ 12-6; both forms are available. Explicit D–H···A hydrogen bonding evaluates all-atom polar-hydrogen geometry. Optional distance-dependent Coulomb electrostatics add charge-based discrimination.

**Chemistry.** Missing heavy atoms are rebuilt via SVD-based template alignment before any packing begins. 29 residue types with full protonation state coverage (Hid/Hie/Hip, Ash/Glh, Cys/Cym/Cyx, Lyn/Arn). All titratable residues (Asp, Glu, Lys, Arg, Cys, Tyr) are assigned states by pH and pKa; histidine tautomers are resolved via hydrogen-bond network scoring with salt-bridge priority override (Nδ/Nε near COO⁻ → Hip). Disulfide bonds are detected by Sγ–Sγ distance and relabeled to CYX. Polar-hydrogen torsions (Ser, Thr, Cys, Tyr, Ash, Glh, Lys, Lyn) are explicitly sampled as discrete candidates. cis-Proline is detected by ω angle and dispatches to the dedicated Dunbrack cis-Pro library.

**Generality.** Any molecule — ligands, cofactors, nucleic acids, solvent, ions — is parameterized automatically and participates as a fixed-scaffold atom. Biopolymer charges come from AMBER/CHARMM lookup tables (29 protein residues × 5 terminal positions × 3 schemes, plus nucleic acids and water models); ligand charges are computed dynamically via charge equilibration (QEq) with exact Slater-type orbital integrals, optionally embedded in the electrostatic field of the surrounding protein. Four packing scopes: full protein, ligand pocket, protein–protein interface, or explicit residue list.

**Algorithm.** Self-energy threshold pruning eliminates dead conformers _before_ the O(n²) pair-energy phase. Spatial grid acceleration replaces brute-force all-pairs distance computation. If the pruned interaction graph has treewidth > 5, rank-1 edge decomposition progressively factors weak pair couplings into self-energy until the graph becomes tractable — no bag-size explosion. Connected components are solved in parallel.

**Engineering.** Design philosophy: maximize compile-time computation, then setup-time precomputation, then minimize runtime cost.

- **Rotamer library** (`dunbrack`): the build script precomputes sin/cos of all 740K χ mean angles across the full φ/ψ grid and bakes the entire Dunbrack 2010 database (~28 MB) into `.rodata`. At query time, bilinear interpolation uses circular weighted means on the precomputed sin/cos pairs — the only runtime trig is a single branchless `atan2f` per χ angle.
- **Coordinate builder** (`rotamer`): the build script code-generates a straight-line NERF `build()` function per residue type, with all fixed torsion and bond-angle sin/cos baked as `f32` immediates. Only the runtime-variable χ and polar-H angles call `sincosf` — for Arg (18 atoms, 4χ), 4 of 36 trig evaluations remain at runtime; for simpler residues, zero.
- **Energy kernels** (`dreid-kernel`): stateless, `#[inline(always)]` potential functions. `precompute()` converts physical constants (D₀, R₀, V, φ₀) into optimized parameter tuples at system-setup time, avoiding repeated sqrt/trig/exp in the energy hot loop.
- **QEq integrals** (`cheq`/`sto-ns`): the QEq J-matrix uses exact two-center Coulomb integrals over Slater-type ns orbitals via ellipsoidal coordinate expansion (Rappé & Goddard, 1991) — no Gaussian approximation.
- **Dispatch**: Buckingham vs LJ and Coulomb on/off are resolved at compile time via `const` generics — zero runtime branching in the inner loop.
- **Parallelism**: every compute-intensive phase — sampling, self-energy, pair-energy, DEE convergence, subgraph DP — runs on `rayon`'s work-stealing thread pool.
- **I/O**: PDB and mmCIF, both read and write.

## Installation

**CLI**

```bash
cargo install dreid-pack
```

**Library**

```toml
[dependencies]
dreid-pack = "0.2.0"
```

Set `default-features = false` to exclude the CLI dependencies (`clap`, `anyhow`, `indicatif`, `console`).

## CLI

The binary is called `dpack`. Four subcommands control the packing scope:

```bash
# Pack all residues
dpack full input.pdb output.pdb

# Pack near a ligand pocket (8 Å default)
dpack pocket input.cif -L A:401 -r 10.0

# Pack a protein–protein interface
dpack interface input.pdb -A A,B -B C -c 6.0

# Pack an explicit residue list
dpack list input.pdb --residues A:42,A:55,B:108
```

Output defaults to `<stem>-packed.<ext>` when omitted.

### Common Parameters

**Structure**

| Flag                 | Description                                     | Default     |
| :------------------- | :---------------------------------------------- | :---------- |
| `--ph <PH>`          | Target pH for protonation state assignment      | input as-is |
| `--his <STRATEGY>`   | His tautomer: `network`, `hid`, `hie`, `random` | `network`   |
| `--no-water`         | Remove crystallographic water                   | keep        |
| `--no-ions`          | Remove monoatomic ions                          | keep        |
| `--no-hetero`        | Remove non-ion HETATM residues                  | keep        |
| `--vdw <FORM>`       | VdW potential: `exp` (Buckingham) or `lj`       | `exp`       |
| `--ff-rules <TOML>`  | Custom DREIDING typing rules                    | built-in    |
| `--ff-params <TOML>` | Custom force-field parameters                   | built-in    |

**Charges**

| Flag                         | Description                                        | Default        |
| :--------------------------- | :------------------------------------------------- | :------------- |
| `--protein-charges <SCHEME>` | `amber-ff14sb`, `amber-ff03`, `charmm`             | `amber-ff14sb` |
| `--nucleic-charges <SCHEME>` | `amber`, `charmm`                                  | `amber`        |
| `--water-charges <MODEL>`    | `tip3p`, `tip3p-fb`, `spc`, `spc-e`, `opc3`        | `tip3p`        |
| `--hetero-qeq <METHOD>`      | Ligand charge method: `embedded`, `vacuum`         | `embedded`     |
| `--qeq-shell <Å>`            | Embedded QEq environment shell radius              | `10.0`         |
| `--qeq-charge <e>`           | QEq target net charge (elementary charge units)    | `0.0`          |
| `--qeq-basis <TYPE>`         | QEq basis: `sto` (exact) or `gto` (approx)         | `sto`          |
| `--qeq-lambda <λ>`           | QEq orbital screening scale factor λ               | `0.5`          |
| `--qeq-tol <TOL>`            | SCF convergence tolerance (RMS Δq)                 | `1e-6`         |
| `--qeq-iter <N>`             | Maximum SCF iterations                             | `2000`         |
| `--qeq-damp <STRATEGY>`      | Charge update damping: `none`, `fixed:D`, `auto:D` | `auto:0.4`     |
| `--no-h-scf`                 | Disable hydrogen nonlinear SCF                     | off            |

**Packing**

| Flag                       | Description                           | Default |
| :------------------------- | :------------------------------------ | :------ |
| `-e, --electrostatics <D>` | Enable Coulomb with ε(r) = D·r        | off     |
| `--no-polar-h`             | Skip polar-hydrogen torsion sampling  | sample  |
| `--include-input`          | Add input conformation as candidate   | off     |
| `-T, --template <MOL2>`    | Hetero residue template (repeatable)  | —       |
| `-E, --self-energy <E>`    | Self-energy pruning window (kcal/mol) | `30.0`  |
| `-p, --prob-cutoff <P>`    | Min Dunbrack rotamer probability      | `0.0`   |
| `-q, --quiet`              | Suppress progress output              | off     |

_Run `dpack <subcommand> --help` for the full parameter set._

## Library

The public API exposes three layers: I/O, system model, and packing.

### End-to-End Example

```rust
use dreid_pack::io::{self, Format, ReadConfig};
use dreid_pack::{PackConfig, pack};
use std::io::{BufReader, BufWriter};

// Read and parameterize
let reader = BufReader::new(std::fs::File::open("input.pdb")?);
let mut session = io::read(reader, Format::Pdb, &ReadConfig::default())?;

// Pack with default settings
pack::<()>(&mut session.system, &PackConfig::default());

// Write result
let writer = BufWriter::new(std::fs::File::create("output.pdb")?);
io::write(writer, &session, Format::Pdb)?;
```

`pack::<()>` uses zero-cost no-op progress tracking. Implement the `Progress` trait for custom phase-level callbacks.

Please see the [API documentation](https://docs.rs/dreid-pack) for details.

## Benchmark

Evaluated on DB379 — 379 non-redundant X-ray crystal structures. B-factor filter: residues above the per-structure 75th-percentile heavy-atom B-factor are excluded. 68,550 residues across 18 residue types.

| Metric |       20° |   40° |
| :----- | --------: | ----: |
| χ1     |     88.8% | 91.1% |
| χ1+2   |     76.1% | 84.1% |
| χ1–4   | **71.5%** | 79.1% |

**SC-RMSD**: _0.728 Å_

| AA  |     N | χ1–4 20° | χ1–4 40° | RMSD/Å |
| :-- | ----: | -------: | -------: | -----: |
| ARG | 3,332 |    34.4% |    41.4% |  2.005 |
| ASN | 3,235 |    65.3% |    84.8% |  0.749 |
| ASP | 4,178 |    61.8% |    81.2% |  0.742 |
| CYS |   792 |    88.6% |    89.8% |  0.372 |
| GLN | 2,490 |    40.0% |    60.7% |  1.084 |
| GLU | 3,893 |    29.2% |    50.8% |  1.131 |
| HIS | 1,734 |    62.2% |    84.2% |  0.950 |
| ILE | 5,616 |    84.3% |    85.9% |  0.393 |
| LEU | 8,937 |    85.8% |    88.6% |  0.493 |
| LYS | 3,330 |    37.1% |    42.6% |  1.101 |
| MET | 1,672 |    53.1% |    60.8% |  0.902 |
| PHE | 3,839 |    82.1% |    94.1% |  1.102 |
| PRO | 3,927 |    78.7% |    80.7% |  0.241 |
| SER | 4,941 |    69.7% |    70.8% |  0.581 |
| THR | 5,040 |    93.5% |    94.2% |  0.276 |
| TRP | 1,373 |    75.4% |    85.8% |  1.103 |
| TYR | 3,161 |    79.5% |    91.5% |  1.309 |
| VAL | 7,060 |    94.7% |    95.0% |  0.250 |

DB379 structures ship with the repository. Run:

```bash
cargo run -p dreid-pack-bench --release -- crates/dreid-pack-bench/data/db379
```

## Pipeline & Algorithm

```mermaid
flowchart TD
    IN[/"PDB · mmCIF"/]

    subgraph prep ["Structure Preparation — bio-forge"]
        IN --> parse["Parse Structure"]
        parse --> clean["Clean: Strip Water / Ions / Hetero"]
        clean --> repair["Repair: Rebuild Missing Heavy Atoms"]
        repair --> ss["Disulfide Detect"]
        ss --> titrate["pH Titration\n(Asp/Glu/Lys/Arg/Cys/Tyr pKa)"]
        titrate --> salt{"Salt Bridge\nNδ/Nε near COO⁻?"}
        salt -->|yes| hip["→ Hip"]
        salt -->|no| hisn{"His Strategy"}
        hisn -->|HbNetwork| hnet["Score H-bond Network\n(Hid / Hie)"]
        hisn -->|Direct| hdir["Force Hid / Hie"]
        hisn -->|Random| hrand["Random Hid / Hie"]
        hip --> addh["Add Hydrogens"]
        hnet --> addh
        hdir --> addh
        hrand --> addh
        addh --> topo["Topology:\nIntra-residue Bonds\n+ Peptide/Nucleic Backbone\n+ Hetero Templates\n+ Disulfide Bonds"]
    end

    subgraph forge ["Force Field Parameterization — dreid-forge"]
        topo --> perc["Perception:\nRings\n→ Kekulé\n→ Aromaticity\n→ Resonance\n→ Hybridization"]
        perc --> atyp["DREIDING Typing"]
        atyp --> cdisp{"Charge Method"}
        cdisp -->|Protein / Nucleic| ffc["ffcharge Lookup\n(AMBER / CHARMM)"]
        cdisp -->|Ligand / Hetero| qeq["QEq: STO Integrals\n→ J Matrix → Linear Solve"]
        ffc --> pgen["Parameter Generation & Precomputation"]
        qeq --> pgen
    end

    subgraph build ["System Build — dreid-pack io"]
        pgen --> scope["Scope Filter\n(Full / Pocket / Interface / List)"]
        scope --> part["Partition Mobile / Fixed"]
        part --> dihed["Compute Backbone φ, ψ, ω"]
    end

    dihed --> dunq & reach & fatoms

    subgraph par ["Parallel Initialization"]
        direction LR

        subgraph s1 ["Rotamer Sampling"]
            direction TB
            dunq["Dunbrack 2010 Query\n(bilinear φ,ψ interpolation)"]
            dunq --> cis{"cis-Pro?\n(|ω| < π/2)"}
            cis -->|yes| cislib["cis-Pro Library"]
            cis -->|no| translib["trans Library"]
            cislib --> probf["Probability Filter\n(keep best + p ≥ cutoff)"]
            translib --> probf
            probf --> ph["Polar-H Expand\n(Ser/Thr/Cys/Tyr/Ash/Glh/Lys/Lyn)"]
            ph --> nerf["NERF Coordinate Build\n(code-generated per residue type)"]
            nerf --> bias["Rotamer Bias\nw · min(−ln p/p_max, 8)"]
        end

        subgraph s2 ["Contact Graph"]
            direction TB
            reach["Cα + Sidechain Reach"]
            reach --> grid1["Spatial Grid"]
            grid1 --> edet["Edge Detect\n(Cα dist ≤ reach_i + reach_j + cutoff)"]
            edet --> csr["CSR Adjacency"]
        end

        subgraph s3 ["Fixed Scaffold"]
            direction TB
            fatoms["Fixed Atom Pool"]
            fatoms --> grid2["Spatial Grid\n(cell size = interaction cutoff)"]
        end
    end

    bias --> selfe
    grid2 --> selfe

    selfe["Self-Energy: SC vs Scaffold"]
    selfe --> vdw1["VdW + H-bond + Coulomb\n(per-candidate, parallel)"]
    vdw1 --> addb["+ Weighted Rotamer Bias"]
    addb --> thresh["Threshold Prune\n(e_min + 30 kcal/mol)"]
    thresh --> compact["Compact Conformations"]

    compact --> paire
    csr --> paire

    paire["Pair Energy: SC vs SC"]
    paire --> vdw2["VdW + H-bond + Coulomb\n(per-edge × per-pair, parallel)"]

    vdw2 --> gold["Goldstein DEE\n(parallel per-slot, converge)"]
    gold --> abs1["Absorb Singletons\n(fold pair-E into self-E)"]
    abs1 --> spl["Split DEE\n(per-split-neighbor, converge)"]
    spl --> abs2["Absorb Singletons"]
    abs2 -.->|×2| gold

    abs2 --> filt["Filter Weak Edges\n(|pair_e| ≤ 2.0 kcal/mol)"]
    filt --> cc["Connected Components"]
    cc -->|parallel| mcs["MCS Elimination Ordering"]
    mcs --> chk{"Chordal\n(PEO)?"}
    chk -->|yes| peo["PEO Bags"]
    chk -->|no| minf["Min-fill Heuristic Bags"]
    peo --> tw{"tw ≤ 5?"}
    minf --> tw
    tw -->|yes| msg["Bottom-up Message Passing\n→ Traceback"]
    tw -->|no| edec["Rank-1 Edge Decomposition\n(fold weak pairs into self-E)"]
    edec -.->|retry| mcs
    msg --> gmec(["GMEC"])

    gmec --> wb["Write-back Coordinates"]
    wb --> recon["Reconstruct Structure"]
    recon --> OUT[/"PDB · mmCIF"/]
```

### Dependencies

**Direct**

| Crate                                                   | Role                                                                            |
| :------------------------------------------------------ | :------------------------------------------------------------------------------ |
| [`dreid-forge`](https://crates.io/crates/dreid-forge)   | Orchestrates structure preparation, DREIDING atom typing, and charge assignment |
| [`dreid-kernel`](https://crates.io/crates/dreid-kernel) | Stateless `no_std` energy kernels with `precompute()` optimization              |
| [`dunbrack`](https://crates.io/crates/dunbrack)         | Compile-time-embedded Dunbrack 2010 rotamer library (740K entries)              |
| [`rotamer`](https://crates.io/crates/rotamer)           | Code-generated straight-line NERF coordinate builder                            |
| [`rayon`](https://crates.io/crates/rayon)               | Work-stealing thread pool for data parallelism                                  |

**Transitive (via `dreid-forge`)**

| Crate                                                 | Role                                                                                                                                |
| :---------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------- |
| [`bio-forge`](https://crates.io/crates/bio-forge)     | PDB/mmCIF parsing, cleaning, heavy-atom repair (Kabsch SVD), pH-aware protonation, His tautomer network analysis, topology building |
| [`dreid-typer`](https://crates.io/crates/dreid-typer) | Molecular graph perception (SSSR rings, Kekulé, aromaticity, resonance, hybridization) and DREIDING atom typing rule engine         |
| [`cheq`](https://crates.io/crates/cheq)               | Charge Equilibration (QEq) solver with exact STO integrals, embedded-field support for ligands in protein environments              |
| [`ffcharge`](https://crates.io/crates/ffcharge)       | Compile-time `no_std` AMBER/CHARMM partial charge lookup (protein, nucleic acid, water, ion)                                        |
| [`sto-ns`](https://crates.io/crates/sto-ns)           | Exact two-center Coulomb integrals over ns Slater-Type Orbitals via ellipsoidal expansion                                           |

All non-`rayon` crates are authored and maintained within this project ecosystem.

## License

MIT — see [LICENSE](LICENSE).

---

_Caltech Materials and Process Simulation Center (MSC)_

Made by [Tony Kan](https://github.com/TKanX)