Expand description
Bench command - Throughput and latency benchmarking
Modes: ferrum bench qwen3:4b # default: sequential, 5 rounds ferrum bench qwen3:4b –concurrency 4 # concurrent requests (tests batch decode) ferrum bench qwen3:4b –max-tokens 1024 # long decode (tests flash decode) ferrum bench qwen3:4b –long-context # 2k prompt + 256 decode ferrum bench qwen3:4b –concurrency 8 –max-tokens 64 # throughput stress test