compute_bound_threshold

Function compute_bound_threshold 

Source
pub fn compute_bound_threshold(hardware: &HardwareConfig) -> u32
Expand description

Calculate the compute-bound threshold (number of tokens at which inference becomes compute-bound) Formula: threshold = (bytes_per_param * compute_flops) / memory_bandwidth