Skip to main content

Crate dunbrack

Crate dunbrack 

Source
Expand description

§Dunbrack

A zero-cost Rust interface to the Dunbrack 2010 backbone-dependent rotamer library.

Provides bilinearly interpolated side-chain rotamer probabilities, mean χ angles, and standard deviations for 22 amino acid types at any (φ, ψ) backbone conformation. All 740,629 source rows are baked into .rodata at compile time; queries touch zero heap memory and link zero runtime dependencies.

FeaturesInstallationUsageResidue TypesPerformanceVerificationLicense


§Features

  • Zero startup latency. The entire ~28 MB rotamer database is embedded in .rodata at compile time via build.rs. No file I/O, no deserialization, no lazy initialization.
  • Zero heap allocation. Every query returns a RotamerIter<N, R> — a stack-allocated array of exactly R Rotamer<N> values. No Vec, no Box, no allocator required.
  • #![no_std] compatible. No standard library, no libm linkage. Usable in embedded firmware, OS kernels, and WASM environments.
  • Type-safe χ dimensionality. The number of χ angles per residue is a compile-time constant N encoded in Rotamer<N> and RotamerIter<N, R>. There are no padding zeros, no runtime bounds checks, no wrong-length arrays.
  • Bilinear interpolation with circular χ means. Residue::rotamers(phi, psi) bilinearly interpolates across the four surrounding grid cells. χ means are computed via circular weighted mean (sin/cos decomposition), correctly handling the ±180° wraparound. Probabilities are re-normalized to Σ = 1.0 after interpolation.
  • Precomputed (sin χ, cos χ) in the static table. build.rs stores sin/cos pairs rather than raw angles, eliminating 8N trigonometric calls per query (4 sin + 4 cos per χ angle, per corner cell).
  • Custom branchless atan2f. A two-stage argument-reduction + degree-7 Taylor polynomial implementation with zero conditional branches and ±0.002° maximum error — 25× more accurate than the 0.05° precision requirement, with no libm dependency.
  • Compile-time data integrity. build.rs asserts seven invariants before emitting any code: rotamer count, per-row non-negative probabilities, probability sums, per-χ positive standard deviations, φ/ψ = ±180° periodicity, and bin index consistency across all 1,369 grid cells. Compilation fails loudly on data corruption.
  • for_all_residues! macro. A generated declarative macro for writing generic code over all 22 residue types without runtime dispatch.

§Installation

[dependencies]
dunbrack = "0.1.0"

Note: build.rs reads data/dunbrack-2010.lib.csv (740,629 rows) and generates ~28 MB of static Rust source. Initial compilation takes 15–30 seconds depending on hardware.


§Usage

§Basic Query

use dunbrack::{Residue, Val};

// Bilinearly interpolated rotamers for Val at α-helical backbone.
for rot in Val::rotamers(-60.0, -40.0) {
    // rot.r:         [u8; 1]  — rotamer bin index (1-based)
    // rot.prob:      f32      — probability (Σ = 1.0 across all rotamers)
    // rot.chi_mean:  [f32; 1] — mean χ angle in degrees, ±180° range
    // rot.chi_sigma: [f32; 1] — standard deviation in degrees
    println!("r={:?}  p={:.4}  χ₁={:.1}°±{:.1}°",
        rot.r, rot.prob, rot.chi_mean[0], rot.chi_sigma[0]);
}

Output (Val at φ=−60°, ψ=−40°):

r=[1]  p=0.0414  χ₁=68.0°±7.0°
r=[2]  p=0.9391  χ₁=171.5°±5.0°
r=[3]  p=0.0194  χ₁=-61.0°±9.6°

§Generic Usage

The Residue trait exposes compile-time constants usable in fully generic code:

use dunbrack::Residue;

fn rotamer_count<R: Residue>() -> usize {
    R::N_ROTAMERS
}

fn residue_name<R: Residue>() -> &'static str {
    R::NAME
}

Accessing rotamer fields (.prob, .chi_mean, .chi_sigma, .r) requires a concrete type or a monomorphized context, since Residue::Rot carries no field bounds:

use dunbrack::{Residue, Val};

// Collect and find the most probable rotamer for Val.
let best = Val::rotamers(-60.0, -40.0)
    .max_by(|a, b| a.prob.partial_cmp(&b.prob).unwrap())
    .unwrap();

§for_all_residues! Macro

This macro invokes $callback!(Type, N_CHI, N_ROTAMERS) for all 22 residue types. It drives generic infrastructure like benchmarks, coverage tests, and per-type dispatch with zero boilerplate.

use dunbrack::*;

macro_rules! print_info {
    ($Res:ident, $n_chi:literal, $n_rot:literal) => {
        println!("{}: {} χ angles, {} rotamers",
            <$Res as Residue>::NAME, $n_chi, $n_rot);
    };
}

for_all_residues!(print_info);

§Residue Types

All 22 residue types from the Dunbrack 2010 library, including separated cysteine and proline variants:

TypeN_CHIN_ROTAMERSNotes
Arg475
Asn236
Asp218
Gln3108Largest table
Glu354
His236
Ile29
Leu29
Lys473
Met327
Phe218
Ser13
Thr13
Trp236
Tyr218
Val13
Cyh13Free (non-disulfide) cysteine
Cyd13Disulfide-bonded cysteine
Cys13Combined cysteine pool (CYH + CYD)
Tpr32Trans-proline
Cpr32Cis-proline
Pro32Combined proline pool (TPR + CPR)

Each type implements Residue + Copy + PartialEq + Eq + Hash + Debug.


§Performance

Benchmarked with Criterion.rs on an Intel® Core™ i7-13620H (Raptor Lake, 4.90 GHz turbo, AVX2), Linux, opt-level=3, lto=true, codegen-units=1.

Single-point query — time to call Residue::rotamers(phi, psi) and consume the full iterator:

ResidueN_CHIN_ROTAMERSTimeThroughput
Val1331.1 ns32.1 MOps/s
Ser1332.0 ns31.2 MOps/s
Pro3243.7 ns22.9 MOps/s
Leu29143.5 ns6.97 MOps/s
Phe218268.0 ns3.73 MOps/s
Met327358.0 ns2.79 MOps/s
Asn236551.7 ns1.81 MOps/s
Glu354673.4 ns1.49 MOps/s
Arg4751,142.6 ns0.88 MOps/s
Lys4731,158.9 ns0.86 MOps/s
Gln31081,316.7 ns0.76 MOps/s

Query time scales linearly with N_ROTAMERS at ~12–16 ns per rotamer, dominated by atan2f calls (one per χ angle per rotamer).

Full grid sweep (all 37×37 = 1,369 cells, sustained throughput):

ResidueTimePer-queryTable size
Val38.2 µs27.9 ns64 KiB
Gln1,834.8 µs1,340 ns5,776 KiB

Per-query time drops ~10% in sweep mode due to cache warmth across adjacent cells.

For full data including all 22 residues and methodology, see BENCHMARKS.md.

§Why it’s fast

OptimizationSavings
Precomputed (sin χ, cos χ) in tableEliminates 8N trig calls per query (e.g. 32 calls → 4 for Arg, N=4)
Custom branchless atan2fEliminates libm overhead; zero branch-prediction penalties
Compile-time static tables (build.rs)Zero startup cost; OS can share read-only pages across processes
KEYS deduplicationbin indices stored once per residue (in KEYS), not per cell; saves ~401 KiB for Arg alone
Stack-only RotamerIter<N, R>No allocator, no pointer indirection; next() is a single array read + increment

§Verification

The library is verified at three levels:

Compile time (build.rs assertions) — compilation aborts if any of the following fail:

  • Rotamer count per cell matches the registered N_ROTAMERS
  • Every rotamer probability ≥ 0
  • Probability sum per cell ∈ [0.99, 1.01]
  • Per-χ standard deviation > 0 for every rotamer in every cell
  • φ = −180° and φ = +180° cells are bitwise identical (periodic boundary)
  • ψ = −180° and ψ = +180° cells are bitwise identical
  • bin index key sets are identical across all 1,369 cells for each residue

Unit tests (21 tests in src/):

  • atan2f accuracy: maximum error 3.5×10⁻⁵ rad (±0.002°) over a dense grid
  • Circular mean with ±180° wraparound
  • angle_to_grid at boundaries, midpoints, and out-of-range inputs

Integration tests (140 tests in tests/):

FileTestsWhat is verified
accuracy.rs8Full 740,629-row CSV round-trip; |prob_err| < 1e-5, |chi_mean_err| < 0.05° at every grid point
coverage.rs44Every (residue, φ, ψ) combination on the 37×37 grid: correct count, Σprob ≈ 1.0, valid ranges; 10,000 random (φ, ψ) fuzz inputs per residue type (220,000 total)
interpolation.rs88Determinacy, continuity (Δprob < 0.05 per 0.1° step), circular χ wrap correctness, normalization at 32×32 off-grid angles

The atan2f error of ±0.002° is 25× below the 0.05° accuracy threshold, meaning the precision ceiling is the source data (CSV values are stored to one decimal place), not the implementation.

Run the full suite:

cargo test

Macros§

for_all_residues
Invokes $callback!(Type, N_CHI, N_ROTAMERS) for all 22 residue types.

Structs§

Arg
Arginine (4 χ angles, 75 rotamers).
Asn
Asparagine (2 χ angles, 36 rotamers).
Asp
Aspartate (2 χ angles, 18 rotamers).
Cpr
Cis-proline (3 χ angles, 2 rotamers).
Cyd
Disulfide-bonded cysteine (1 χ angle, 3 rotamers).
Cyh
Free (non-disulfide) cysteine (1 χ angle, 3 rotamers).
Cys
Combined cysteine pool (1 χ angle, 3 rotamers).
Gln
Glutamine (3 χ angles, 108 rotamers).
Glu
Glutamate (3 χ angles, 54 rotamers).
His
Histidine (2 χ angles, 36 rotamers).
Ile
Isoleucine (2 χ angles, 9 rotamers).
Leu
Leucine (2 χ angles, 9 rotamers).
Lys
Lysine (4 χ angles, 73 rotamers).
Met
Methionine (3 χ angles, 27 rotamers).
Phe
Phenylalanine (2 χ angles, 18 rotamers).
Pro
Combined proline pool (3 χ angles, 2 rotamers).
Rotamer
A backbone-dependent rotamer entry.
RotamerIter
Eagerly constructed iterator over bilinearly interpolated rotamers.
Ser
Serine (1 χ angle, 3 rotamers).
Thr
Threonine (1 χ angle, 3 rotamers).
Tpr
Trans-proline (3 χ angles, 2 rotamers).
Trp
Tryptophan (2 χ angles, 36 rotamers).
Tyr
Tyrosine (2 χ angles, 18 rotamers).
Val
Valine (1 χ angle, 3 rotamers).

Traits§

Residue
Backbone-dependent rotamer library interface.