crabllm-bench-0.0.12 is not a library.

crabllm-bench

Mock OpenAI-compatible backend and competitive benchmark suite for crabllm. Compares gateway overhead against LiteLLM and Bifrost using canned responses with zero LLM latency.

Quick start

Requires Linux (host networking) and pre-built prod binaries.

# Build prod binaries
cargo build --profile prod -p crabllm -p crabllm-bench

# Run the full benchmark
make bench

# Run a specific group with custom duration
make bench ARGS="--group overhead --duration 10 --rps '100 500 1000'"

# View charts from existing results
make bench-chart

# Tear down containers
cd crates/bench && docker compose down

Benchmark groups

Group	Scenarios	What it measures
`overhead`	chat-minimal, chat-stream, embeddings	Pure gateway latency at 100–5000 RPS
`payload`	large-context (50 msgs), long-stream, tools (10 defs)	Serialization overhead on big payloads
`concurrent`	100/500/1000 concurrent streams	Connection handling under load

Architecture

All services run on host networking for zero Docker NAT overhead.

Service	Port	Image
mock	9999	crabllm-bench (canned OpenAI responses)
crabllm	6666	crabllm gateway
bifrost	6668	maximhq/bifrost (Go)
litellm	4000	ghcr.io/berriai/litellm (Python)
runner	—	oha load generator + bench.py orchestrator

Each gateway gets identical resource limits (2 CPUs, 512MB).

Results

Results are saved to crates/bench/results/ as JSON files from oha. Memory usage is tracked in .mem.json sidecar files.

# Terminal charts
python3 bench.py --chart-only --output results

# PNG export (requires matplotlib)
python3 bench.py --chart-only --png --output results

Mock backend

The mock serves canned responses with configurable streaming behavior:

crabllm-bench --port 9999 --chunks 20 --delay 0 --error-rate 0