Expand description
Two-electron repulsion integrals (ERIs) by the Obara–Saika / Head-Gordon–Pople
(OS/HGP) recurrence — the second ERI engine (see ARCHITECTURE.md, L1).
A vertical recurrence (VRR) builds the
intermediate [e0|f0]^(m) classes from [00|00]^(m) per primitive quartet,
the primitives are contracted into AO space, and a horizontal recurrence
(HRR) then shifts angular momentum A→B and C→D in the contracted space.
Doing the HRR after contraction is the HGP “early contraction” trick: the
geometry-only HRR runs once per shell quartet instead of once per
primitive quartet, which is the win at high contraction degree.
It is the engineering counterpart of the crate::rys engine: same Coulomb
kernel, same row-major (a,b,c,d) block layout, validated element-for-element
against it (and against an independent McMurchie–Davidson path).
§Method
For a primitive quartet with exponents α,β on the bra centres A,B and
γ,δ on the ket centres C,D, with p=α+β, q=γ+δ, P,Q the Gaussian
product centres, W=(pP+qQ)/(p+q), ρ=pq/(p+q), and T=ρ|P−Q|²:
[00|00]^(m) = 2π^{5/2}/(p q √(p+q)) · K_ab · K_cd · F_m(T)The VRR raises angular momentum on A (the bra, index e) and C (the ket,
index f) using the standard OS ERI relations (Obara–Saika 1986; HGP 1988):
[e+1_i,0|f0]^(m) = (P−A)_i[e0|f0]^(m) + (W−P)_i[e0|f0]^(m+1)
+ e_i/2p ( [e−1_i,0|f0]^(m) − (q/(p+q))[e−1_i,0|f0]^(m+1) )
+ f_i/2(p+q) [e0|f−1_i,0]^(m+1)
[e0|f+1_i,0]^(m) = (Q−C)_i[e0|f0]^(m) + (W−Q)_i[e0|f0]^(m+1)
+ f_i/2q ( [e0|f−1_i,0]^(m) − (p/(p+q))[e0|f−1_i,0]^(m+1) )
+ e_i/2(p+q) [e−1_i,0|f0]^(m+1)After contracting [e0|f0]^(0) over the primitive quartet, the HRR builds the
Cartesian shell block:
(a,b+1_i | f0) = (a+1_i,b | f0) + (A−B)_i (a,b | f0) [bra, A→B]
(ab | c,d+1_i) = (ab | c+1_i,d) + (C−D)_i (ab | c,d) [ket, C→D]§Buffers
The VRR table [e0|f0]^(m) is 3D and W-coupled (the recurrence mixes
Cartesian axes), so unlike the axis-separable 1D tables of the one-electron and
Rys engines it cannot be a small fixed stack array — a MAX_L stack buffer
would be tens of MB, and even a heap copy of the full table is ~41 MB at
(ii|ii). The current engine removes that:
- m-marching VRR (
vrr_primitive): the VRR never materialises the fulle×f×mtable. It marches over the ketf-degree keeping only a rolling window of 3 consecutivef-degree levels, each triangle-packed in the Boys indexm. Resident VRR footprint is3·max_k[n_cart(k)·slab_k](≈ 4.3 MB at(ii|ii), vs the old 41 MB single table), plus then_e·n_fcontracted table. - flat-array HRR (
hrr_and_scatter): contiguous arrays indexed byaddr/cart_indexwith two rolling degree layers, replacing the former HashMap memoisation. - reusable arena (
EriScratch): all buffers are allocated once and reused across quartets (thread-local by default), not re-allocated per quartet.
All of this stays in safe Rust (#![forbid(unsafe_code)]) and reproduces the
former full-table engine’s values, cross-checked against the Rys engine and an
independent McMurchie–Davidson path (tests/eri_cross_algorithm.rs).
Structs§
- EriBatch4
Scratch - Reusable buffers for the 4-quartet batch entry
coulomb_shell_batch4_into_scratch: per-lane pair lists andc_efaccumulators, plus a scalarEriScratchfor the per-lane HRR (and the defensive scalar fallback). - EriScratch
- Reusable scratch arena for the OS/HGP ERI engine.
- Shell
Pair Data - Precomputed, screened [
PrimPair] list of one ordered shell pair — libcint’s pair-data (“optimizer”) precompute. A dense driver builds one per canonical shell pair once per build (O(n_shells²), trivial memory) and passes borrowed lists tocoulomb_shell_pairs_into_scratch/coulomb_shell_batch4_pairs_into_scratch, instead of the engine rebuilding the same pairs on every quartet that shares the pair. - Shell
Ref - A contracted Cartesian shell as seen by the HGP engine: its centre, angular
momentum, and primitive
(exponent, effective-coefficient)data. The coefficients are the driver’s effective coefficientsd_i · N(α_i, l)— the engine itself works on un-normalised monomials and only multiplies the four coefficients into the contracted accumulator.
Functions§
- coulomb_
shell_ batch4_ into_ scratch - Evaluate four shell quartets in lockstep —
quartets[lane] = [a, b, c, d]— accumulating each lane’s contracted Cartesian block intoouts[lane](same layout/contract ascoulomb_shell_into). - coulomb_
shell_ batch4_ pairs_ into_ scratch - Like
coulomb_shell_batch4_into_scratchbut with each lane’s bra/ket primitive-pair data supplied by the caller (precomputed once per shell pair across a dense build, seeShellPairData). Bit-identical to the self-building entry: same pair values and order, only computed elsewhere. - coulomb_
shell_ into - Accumulate the contracted Coulomb block
(ab|cd)for four shells into the row-majoroutblock (shape[n_cart(la)·n_cart(lb)·n_cart(lc)·n_cart(ld)], the same(a,b,c,d)layout ascrate::rys::coulomb_into). - coulomb_
shell_ into_ scratch - Like
coulomb_shell_intobut evaluates into the caller-providedEriScratch, reused across quartets to avoid per-quartet heap allocation. Use one instance per thread (sharing a&mut EriScratchacross threads is a compile error); the result is bit-identical regardless of which arena is used or what it last held. - coulomb_
shell_ pairs_ into - Like
coulomb_shell_into(thread-local arena) but with the bra/ket primitive-pair data supplied by the caller — the borrowed-pairs analogue, seeShellPairData. Bit-identical tocoulomb_shell_into. - coulomb_
shell_ pairs_ into_ scratch - Like
coulomb_shell_into_scratchbut with the bra/ket primitive-pair data supplied by the caller (precomputed once per shell pair across a dense build, seeShellPairData), instead of rebuilt per quartet. Bit-identical: the pair values and their order are exactly what the self-building entry computes — only where they are computed moves. - shell_
pair_ data - Build the screened pair list for the ordered shell pair
(s1, s2)— element-for-element what the self-building engine entries compute per quartet (seeShellPairDatafor the orientation contract). - surviving_
pair_ count - Number of primitive pairs of two shells that survive the
[
PAIR_NEGLIGIBLE] screen — the pair count [build_pairs] would produce. Drivers bucketing shell quartets forcoulomb_shell_batch4_into_scratchuse this (once per shell pair) to group quartets whose lanes run the primitive loop in true lockstep.