Skip to main content

Module bench

Module bench 

Source
Expand description

Bench command - Throughput and latency benchmarking

Modes: ferrum bench qwen3:4b # default: sequential, 5 rounds ferrum bench qwen3:4b –concurrency 4 # concurrent requests (tests batch decode) ferrum bench qwen3:4b –max-tokens 1024 # long decode (tests flash decode) ferrum bench qwen3:4b –long-context # 2k prompt + 256 decode ferrum bench qwen3:4b –concurrency 8 –max-tokens 64 # throughput stress test

Structs§

BenchCommand

Functions§

execute