Calculate the compute-bound threshold (number of tokens at which inference becomes compute-bound)
Formula: threshold = (bytes_per_param * compute_flops) / memory_bandwidth
Calculate FLOPS for a given number of tokens
Formula: FLOPS = 2 * num_tokens * active_parameters + attention_flops
For MoE models, uses active_parameters (not total) since only some experts are activated
Includes both matmul and attention FLOPs