Module os_eri

Expand description

Two-electron repulsion integrals (ERIs) by the Obara–Saika / Head-Gordon–Pople (OS/HGP) recurrence — the second ERI engine (see ARCHITECTURE.md, L1).

A vertical recurrence (VRR) builds the intermediate [e0|f0]^(m) classes from [00|00]^(m) per primitive quartet, the primitives are contracted into AO space, and a horizontal recurrence (HRR) then shifts angular momentum A→B and C→D in the contracted space. Doing the HRR after contraction is the HGP “early contraction” trick: the geometry-only HRR runs once per shell quartet instead of once per primitive quartet, which is the win at high contraction degree.

It is the engineering counterpart of the crate::rys engine: same Coulomb kernel, same row-major (a,b,c,d) block layout, validated element-for-element against it (and against an independent McMurchie–Davidson path).

§Method

For a primitive quartet with exponents α,β on the bra centres A,B and γ,δ on the ket centres C,D, with p=α+β, q=γ+δ, P,Q the Gaussian product centres, W=(pP+qQ)/(p+q), ρ=pq/(p+q), and T=ρ|P−Q|²:

  [00|00]^(m) = 2π^{5/2}/(p q √(p+q)) · K_ab · K_cd · F_m(T)

The VRR raises angular momentum on A (the bra, index e) and C (the ket, index f) using the standard OS ERI relations (Obara–Saika 1986; HGP 1988):

  [e+1_i,0|f0]^(m) = (P−A)_i[e0|f0]^(m) + (W−P)_i[e0|f0]^(m+1)
      + e_i/2p ( [e−1_i,0|f0]^(m) − (q/(p+q))[e−1_i,0|f0]^(m+1) )
      + f_i/2(p+q) [e0|f−1_i,0]^(m+1)
  [e0|f+1_i,0]^(m) = (Q−C)_i[e0|f0]^(m) + (W−Q)_i[e0|f0]^(m+1)
      + f_i/2q ( [e0|f−1_i,0]^(m) − (p/(p+q))[e0|f−1_i,0]^(m+1) )
      + e_i/2(p+q) [e−1_i,0|f0]^(m+1)

After contracting [e0|f0]^(0) over the primitive quartet, the HRR builds the Cartesian shell block:

  (a,b+1_i | f0) = (a+1_i,b | f0) + (A−B)_i (a,b | f0)   [bra, A→B]
  (ab | c,d+1_i) = (ab | c+1_i,d) + (C−D)_i (ab | c,d)   [ket, C→D]

§Buffers

The VRR table [e0|f0]^(m) is 3D and W-coupled (the recurrence mixes Cartesian axes), so unlike the axis-separable 1D tables of the one-electron and Rys engines it cannot be a small fixed stack array — a MAX_L stack buffer would be tens of MB, and even a heap copy of the full table is ~41 MB at (ii|ii). The current engine removes that:

m-marching VRR (vrr_primitive): the VRR never materialises the full e×f×m table. It marches over the ket f-degree keeping only a rolling window of 3 consecutive f-degree levels, each triangle-packed in the Boys index m. Resident VRR footprint is 3·max_k[n_cart(k)·slab_k] (≈ 4.3 MB at (ii|ii), vs the old 41 MB single table), plus the n_e·n_f contracted table.
flat-array HRR (hrr_and_scatter): contiguous arrays indexed by addr / cart_index with two rolling degree layers, replacing the former HashMap memoisation.
reusable arena (EriScratch): all buffers are allocated once and reused across quartets (thread-local by default), not re-allocated per quartet.

All of this stays in safe Rust (#![forbid(unsafe_code)]) and reproduces the former full-table engine’s values, cross-checked against the Rys engine and an independent McMurchie–Davidson path (tests/eri_cross_algorithm.rs).

Structs§

EriBatch4Scratch: Reusable buffers for the 4-quartet batch entry coulomb_shell_batch4_into_scratch: per-lane pair lists and c_ef accumulators, plus a scalar EriScratch for the per-lane HRR (and the defensive scalar fallback).
EriScratch: Reusable scratch arena for the OS/HGP ERI engine.
ShellPairData: Precomputed, screened [PrimPair] list of one ordered shell pair — libcint’s pair-data (“optimizer”) precompute. A dense driver builds one per canonical shell pair once per build (O(n_shells²), trivial memory) and passes borrowed lists to coulomb_shell_pairs_into_scratch / coulomb_shell_batch4_pairs_into_scratch, instead of the engine rebuilding the same pairs on every quartet that shares the pair.
ShellRef: A contracted Cartesian shell as seen by the HGP engine: its centre, angular momentum, and primitive (exponent, effective-coefficient) data. The coefficients are the driver’s effective coefficients d_i · N(α_i, l) — the engine itself works on un-normalised monomials and only multiplies the four coefficients into the contracted accumulator.

Functions§

coulomb_shell_batch4_into_scratch: Evaluate four shell quartets in lockstep — quartets[lane] = [a, b, c, d] — accumulating each lane’s contracted Cartesian block into outs[lane] (same layout/contract as coulomb_shell_into).
coulomb_shell_batch4_pairs_into_scratch: Like coulomb_shell_batch4_into_scratch but with each lane’s bra/ket primitive-pair data supplied by the caller (precomputed once per shell pair across a dense build, see ShellPairData). Bit-identical to the self-building entry: same pair values and order, only computed elsewhere.
coulomb_shell_into: Accumulate the contracted Coulomb block (ab|cd) for four shells into the row-major out block (shape [n_cart(la)·n_cart(lb)·n_cart(lc)·n_cart(ld)], the same (a,b,c,d) layout as crate::rys::coulomb_into).
coulomb_shell_into_scratch: Like coulomb_shell_into but evaluates into the caller-provided EriScratch, reused across quartets to avoid per-quartet heap allocation. Use one instance per thread (sharing a &mut EriScratch across threads is a compile error); the result is bit-identical regardless of which arena is used or what it last held.
coulomb_shell_pairs_into: Like coulomb_shell_into (thread-local arena) but with the bra/ket primitive-pair data supplied by the caller — the borrowed-pairs analogue, see ShellPairData. Bit-identical to coulomb_shell_into.
coulomb_shell_pairs_into_scratch: Like coulomb_shell_into_scratch but with the bra/ket primitive-pair data supplied by the caller (precomputed once per shell pair across a dense build, see ShellPairData), instead of rebuilt per quartet. Bit-identical: the pair values and their order are exactly what the self-building entry computes — only where they are computed moves.
shell_pair_data: Build the screened pair list for the ordered shell pair (s1, s2) — element-for-element what the self-building engine entries compute per quartet (see ShellPairData for the orientation contract).
surviving_pair_count: Number of primitive pairs of two shells that survive the [PAIR_NEGLIGIBLE] screen — the pair count [build_pairs] would produce. Drivers bucketing shell quartets for coulomb_shell_batch4_into_scratch use this (once per shell pair) to group quartets whose lanes run the primitive loop in true lockstep.

Module os_eri

Module os_eri Copy item path

§Method

§Buffers

Structs§

Functions§

Module os_eri