Module arithmetic

Functions§

flops_for_tokens: Calculate FLOPS for a given number of tokens Formula: FLOPS = 2 * num_tokens * active_parameters + attention_flops For MoE models, uses active_parameters (not total) since only some experts are activated Includes both matmul and attention FLOPs
kv_cache_bytes: Calculate memory transfer bytes for KV cache for a given sequence length Formula: kv_bytes = kv_cache_bytes_per_token * seq_len
model_weight_bytes: Calculate memory transfer bytes for model weights Formula: weight_bytes = num_parameters * bytes_per_param
total_memory_transfer: Calculate total memory transfer bytes for an iteration Formula: total_bytes = model_weights + sum(kv_cache for each request)