Expand description
Source-conditional Dirichlet account-pair sampler (SOTA-8).
Per source string, fits a Dirichlet-multinomial over a per-source account pool.
Round 0 (FINDINGS §14) showed the synthetic engine’s source-conditional structure
is too uniform (entropy 0.97 vs corpus 0.68) and too narrow (5 vs 23.5 accounts per
source). This sampler closes both gaps simultaneously: a configurable larger pool,
drawn through a concentrated (low-α) Dirichlet.
Math: symmetric Dirichlet(α, …, α) is realised by pᵢ = Gᵢ / Σⱼ Gⱼ with each
Gᵢ ~ Gamma(α, 1). Lower α ⇒ concentrated PMF. With α = 0.5 and N_s = 25 the
expected normalised entropy is ≈ 0.65 — matching the corpus median of 0.68.
This module is wired in by je_generator only when the transactions .source_conditional_account_pair.enabled config flag is set (default off — opt-in
so existing users’ synthetic streams stay byte-identical).
Structs§
- Source
Conditional Pair Sampler - Top-level sampler — one
SourcePoolper source string. - Source
Pool - One source’s account pool with a fitted Dirichlet PMF, ready to sample from.