pub const SAMPLING_TEMPERATURE: Entropy = 2.0;Expand description
Temperature (T) - controls sampling entropy via policy scaling. Higher T → more uniform (exploratory); lower T → more peaked (greedy). Formula: σ’(a) = max(ε, (σ(a)/T + β) / (Σσ + β)).