Module arithmetic

Module arithmetic 

Source

Functionsยง

compute_bound_threshold
Calculate the compute-bound threshold (number of tokens at which inference becomes compute-bound) Formula: threshold = (bytes_per_param * compute_flops) / memory_bandwidth
flops_for_tokens
Calculate FLOPS for a given number of tokens Formula: FLOPS = 2 * num_tokens * active_parameters + attention_flops For MoE models, uses active_parameters (not total) since only some experts are activated Includes both matmul and attention FLOPs
is_compute_bound
Check if a workload is compute-bound A workload is compute-bound if the number of tokens >= compute-bound threshold
kv_cache_bytes
Calculate memory transfer bytes for KV cache for a given sequence length Formula: kv_bytes = kv_cache_bytes_per_token * seq_len
model_weight_bytes
Calculate memory transfer bytes for model weights Formula: weight_bytes = num_parameters * bytes_per_param
total_memory_transfer
Calculate total memory transfer bytes for an iteration Formula: total_bytes = model_weights + sum(kv_cache for each request)