vastar 0.2.1

HTTP load generator. Fast, zero-copy, raw TCP. Alternative to hey, oha, wrk.
vastar-0.2.1 is not a library.

vastar

HTTP load generator. Fast, zero-copy, Rust. Alternative to hey, oha, wrk.

We built vastar out of necessity. Our team develops high-throughput systems — AI gateway services, real-time simulation engines, streaming data pipelines — and we run long-duration load tests regularly as part of our development cycle. The existing tools fell short in this workflow: hey provides no live progress indicator, so a multi-minute benchmark gives no feedback until completion. oha has a TUI, but it rendered inconsistently across terminals (font issues, aggressive clear screen, emoji fallback problems). For a 30-second or multi-minute benchmark, staring at a frozen terminal wondering if the tool is still running is a poor experience. We needed a load generator that shows live progress cleanly (ASCII-only, terminal-aware), provides SLO-aware analysis out of the box (not just raw numbers), and performs well enough at high concurrency that the tool itself never becomes the bottleneck in our measurements.

$ vastar -n 3000 -c 300 -m POST -T "application/json" \
    -d '{"prompt":"bench"}' http://localhost:3081/api/gw/trigger

Summary:
  Total:        0.5027 secs
  Slowest:      0.0966 secs
  Fastest:      0.0012 secs
  Average:      0.0469 secs
  Requests/sec: 5968.13
  Total data:   6205800 bytes
  Size/request: 2068 bytes

Response time distribution:
  10.00% in 0.0190 secs
  25.00% in 0.0434 secs
  50.00% in 0.0488 secs  (48.81ms)
  75.00% in 0.0567 secs
  90.00% in 0.0649 secs
  95.00% in 0.0706 secs  (70.58ms)
  99.00% in 0.0804 secs  (80.42ms)
  99.90% in 0.0955 secs  (95.52ms)
  99.99% in 0.0966 secs

Response time histogram:        (11-level SLO color gradient)
  0.0012 [107]  ■■
  0.0099 [180]  ■■■
  0.0185 [170]  ■■■
  0.0272 [85]   ■
  0.0359 [350]  ■■■■■■
  0.0445 [1088] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.0532 [637]  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.0619 [233]  ■■■■■■■■■■
  0.0706 [117]  ■■■■■
  0.0792 [25]   ■
  0.0879 [8]

  SLO:
  ██ elite      <=21.7ms    ██ excellent  <=43.4ms    ██ good       <=48.8ms
  ██ normal     <=56.7ms    ██ acceptable <=64.9ms    ██ degraded   <=70.6ms
  ██ slow       <=80.4ms    ██ very slow  <=121ms     ██ critical   <=161ms
  ██ severe     <=241ms     ██ violation  >241ms

Status code distribution:
  [200] 3000 responses

Details (average, fastest, slowest):
  req write:    0.0000 secs, 0.0000 secs, 0.0002 secs
  resp wait:    0.0469 secs, 0.0012 secs, 0.0966 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0000 secs

Insight:
  Latency spread p99/p50 = 1.6x -- good consistency
  Tail ratio p99/p95 = 1.1x -- clean tail
  Outlier ratio p99.9/p99 = 1.2x -- no significant outliers

With SLO color gradient (dark green → red):

Measurement Comparison

vastar hey oha
Language Rust (raw TCP) Go Rust (hyper)
Binary 1.2 MB 9 MB 20 MB

Throughput (0B payload, requests/sec)

Concurrency vastar hey oha Factor
c=1 93,192 40,942 71,750 vastar 2.3x vs hey
c=10 226,650 145,607 320,712 oha 1.4x vs vastar
c=200 220,311 132,650 240,421 oha 1.1x vs vastar
c=500 408,117 71,695 234,676 vastar 1.7x vs oha
c=1,000 536,758 106,861 18,329 vastar 5.0x vs hey
c=5,000 372,191 64,443 14,196 vastar 5.8x vs hey
c=10,000 336,700 61,427 38,141 vastar 5.5x vs hey

Throughput (100KB payload, requests/sec)

Concurrency vastar hey oha Factor
c=1 40,406 20,927 30,982 vastar 1.9x vs hey
c=10 89,683 74,310 133,387 oha 1.5x vs vastar
c=100 74,229 70,411 81,466 oha 1.1x vs vastar
c=200 69,080 74,962 68,332 hey 1.1x vs vastar
c=500 96,809 65,999 56,798 vastar 1.5x vs hey
c=1,000 89,545 57,265 18,017 vastar 1.6x vs hey
c=10,000 75,224 38,281 23,113 vastar 2.0x vs hey

Memory (0B payload, Peak RSS)

Concurrency vastar hey oha
c=1 4 MB 13 MB 15 MB
c=1,000 32 MB 80 MB 41 MB
c=10,000 284 MB 492 MB 212 MB

Note: Each tool has different strengths. oha (hyper) excels at c=10-200 with large payloads. hey (Go) is stable across all scenarios. vastar (raw TCP) excels at c=1 and c=500+ where framework overhead matters most. No single tool wins every scenario — choose based on your concurrency range and payload size.

Throughput alone does not determine accuracy. A faster tool may simply have less per-request overhead, while server-side latency (resp wait) remains the same across all tools. See BENCHMARK.md for full methodology, all 4 payload sizes, and analysis.

Features

  • vastar sweep subcommand — adaptive concurrency sweet-spot finder. Runs multi-point sweep, detects the knee of the throughput-vs-latency curve, disqualifies points with unhealthy tails or excessive errors, and emits both a pretty table and machine-readable JSON for CI/driver consumption. Paired mode (--vs REFERENCE_URL) measures platform overhead vs an upstream and picks the c where the gateway/proxy/mesh is still transparent. Domain-agnostic: no workload classifications, no CPU-core heuristics — the curve is measured, not guessed.
  • 11-level SLO color histogram — ANSI 256-color gradient (dark green to dark red) mapped to percentile thresholds. SLO levels are relative to the current run's own distribution — not absolute latency targets. Organizations like Google, AWS, and others define custom SLO policies per service (e.g., p99 < 200ms for API, p99 < 50ms for cache). Use vastar's SLO as a visual distribution guide, then define your own thresholds in your monitoring platform.
  • Automated Insight — latency spread, tail ratio, outlier detection from p50/p95/p99/p99.9
  • Key percentile highlights — p50, p95, p99, p99.9 annotated with colored (ms) values
  • Phase timing Details — req write, resp wait, resp read breakdown (like hey)
  • Live progress bar — ASCII-only, terminal-aware, no emoji, no aggressive clear screen
  • Chunked transfer — supports Content-Length and Transfer-Encoding: chunked (SSE/streaming)

Install

# One-line install (Linux/macOS, auto-detects platform)
curl -sSf https://raw.githubusercontent.com/Vastar-AI/vastar/main/install.sh | sh

# Or via Cargo
cargo install vastar

# Or via Homebrew
brew tap Vastar-AI/tap && brew install vastar

Or build from source:

git clone https://github.com/Vastar-AI/vastar.git
cd vastar
cargo build --release
# Binary at ./target/release/vastar (1.2 MB)

Usage

Usage: vastar [OPTIONS] <URL>            # flat bench mode (default)
       vastar <COMMAND> [OPTIONS]        # subcommand mode

Commands:
  sweep  Adaptive concurrency sweep — finds the sweet-spot `c` empirically
  help   Print this message or the help of the given subcommand(s)

Options:
  -n <REQUESTS>              Number of requests [default: 200]
  -c <CONCURRENCY>           Concurrent workers [default: 50]
  -z <DURATION>              Duration (e.g. 10s, 1m). Overrides -n
  -q <QPS>                   Rate limit per worker [default: 0]
  -m <METHOD>                HTTP method [default: GET]
  -d <BODY>                  Request body
  -D <BODY_FILE>             Request body from file
  -T <CONTENT_TYPE>          Content-Type [default: text/html]
  -H <HEADER>                Custom header (repeatable)
  -t <TIMEOUT>               Timeout in seconds [default: 20]
  -A <ACCEPT>                Accept header
  -a <AUTH>                  Basic auth (user:pass)
      --disable-keepalive    Disable keep-alive
      --disable-compression  Disable compression
      --disable-redirects    Disable redirects
  -h, --help                 Print help
  -V, --version              Print version

Examples

# Simple GET
vastar http://localhost:8080/

# POST with JSON body
vastar -m POST -T "application/json" -d '{"key":"value"}' http://localhost:8080/api

# 10000 requests, 500 concurrent
vastar -n 10000 -c 500 http://localhost:8080/

# Duration mode: run for 30 seconds
vastar -z 30s -c 200 http://localhost:8080/

# Custom headers
vastar -H "Authorization: Bearer token" -H "X-Custom: value" http://localhost:8080/

# Basic auth
vastar -a user:pass http://localhost:8080/

# Body from file
vastar -m POST -T "application/json" -D payload.json http://localhost:8080/api

vastar sweep — Adaptive Concurrency Sweet-Spot Finder

Finding the right -c for a given endpoint is usually an operator guess. Too low and throughput is understated; too high and queueing explodes the tail without warning. vastar sweep runs multiple concurrency levels against the same URL, detects the knee of the throughput-vs-latency curve, and picks the smallest c that still delivers near-peak throughput with a healthy tail.

# Auto-sweep (log-spaced 10..1000), pick knee at 95% of peak
vastar sweep -n 2000 --repeats 3 http://localhost:8080/api

# Explicit concurrency list
vastar sweep --conc "10,50,100,500,1000" -n 2000 http://localhost:8080/api

# Log-spaced range
vastar sweep --conc "10..1000:log=6" -n 2000 http://localhost:8080/api

# Refine: coarse sweep, then bracket ±50% around winner for finer resolution
vastar sweep --conc auto --refine -n 2000 http://localhost:8080/api

# Emit JSON for script consumption (downstream bench drivers, CI gates)
vastar sweep -o json --conc auto -n 2000 http://localhost:8080/api \
  > /tmp/sweep.json

BENCH_C=$(jq -r .sweet_spot.concurrency /tmp/sweep.json)

Paired mode — platform overhead vs upstream

For gateways, proxies, meshes, or any service that fronts an upstream, a single-endpoint sweep can be misleading — the target's curve can look healthy even when your platform is the bottleneck, simply because the upstream is doing the heavy lifting. Paired mode runs both endpoints at each concurrency and picks the sweet spot where the platform still stays close to the upstream's own performance:

vastar sweep \
  --vs http://localhost:4545/v1/chat/completions \    # reference (upstream)
  --max-overhead-pct 25 \                             # DQ when target p99 >25% of ref
  -m POST -T application/json \
  -d '{"prompt":"bench"}' \
  http://localhost:3080/trigger                       # target (gateway)

For sweeping many gateway endpoints against the same upstream, cache the reference once and reuse:

# Once: cache upstream curve
vastar sweep -o json --conc auto -n 2000 --repeats 3 \
  -m POST -T application/json -d '{"prompt":"bench"}' \
  http://localhost:4545/v1/chat/completions > /tmp/upstream.json

# Many times: reuse for each gateway test, no re-sweep
vastar sweep --ref-from-json /tmp/upstream.json --max-overhead-pct 20 \
  ... http://localhost:3080/trigger

Sweep flags

vastar sweep [OPTIONS] <URL>

  --conc <SPEC>              "10,50,100,500" | "10..1000:log=6" | "10..200:step=20" | "auto"
  --refine                   Bracket ±50% around coarse winner for finer resolution
  --repeats <N>              Run each c-level N times, take median [default: 1]

  --pick <knee|score>        Selection algorithm [default: knee]
  --knee-ratio <0.95>        Smallest c reaching this fraction of peak rps
  --baseline-c <1>           Concurrency used as reference for tail-degradation check

  --max-spread <4.0>         DQ if p99/p50 > this
  --max-p999-ratio <8.0>     DQ if p99.9/p50 > this
  --max-tail-mult <3.0>      DQ if p99 > baseline_p99 × this
  --max-errors <0.01>        DQ if error_rate > this (fraction)

  --vs <URL>                 Reference endpoint (paired mode)
  --vs-method / --vs-body / --vs-content-type   Override per-reference
  --ref-from-json <FILE>     Load reference curve from prior sweep JSON
  --max-overhead-pct <25>    DQ if target overhead vs reference > this%
  --max-rps-deficit-pct <50> DQ if target rps deficit vs reference > this%

  -o <text|json|ndjson>      Output format
  --json-path <FILE>         Also write JSON to file

  # Pass-through to each sub-benchmark (same as flat vastar):
  -n / -z / -m / -d / -D / -T / -H / -A / -a / -t / --disable-keepalive / --disable-compression

Architecture

How it works

  1. Pre-connect phase. All C connections are established in parallel before the benchmark starts. A semaphore limits to 256 concurrent connects to avoid TCP backlog overflow.

  2. Adaptive worker topology. Workers scale as clamp(C/128, 1, cpus*2). At c=50 that's 1 worker. At c=500 that's 4 workers. At c=5000 that's 32 workers (capped at cpus*2). Each worker runs a FuturesUnordered event loop managing ~128 connections. The tokio scheduler sees N workers (not C tasks) — drastically less scheduling overhead at high concurrency.

  3. Raw TCP request. HTTP/1.1 request bytes are pre-built once at startup (method + path + headers + body). Each request is a single write_all() of pre-built Bytes (Arc-backed, zero-copy clone). No per-request allocation, no header map construction, no URI parsing.

  4. Synchronous response parsing. One fill_buf() call gets data into BufReader's 32KB buffer. Headers are parsed synchronously from buffered data (find \r\n\r\n, scan for Content-Length/Transfer-Encoding). Body is drained via fill_buf() + consume() — no per-response allocation. Chunked transfer encoding is handled inline.

  5. Phase timing. Each request measures write time, wait-for-first-byte time, and read time separately. These are accumulated per-worker (sum/min/max) and merged once at the end — no per-request timing allocation.

  6. SLO color histogram. 11 histogram buckets mapped to 11 ANSI 256-color levels via the color cube path: (0,1,0)→(0,4,0)→(1,5,0)→(5,5,0)→(5,1,0)→(4,0,0)→(2,0,0) (dark green through yellow to dark red). Color only emitted when stdout is a terminal.

Dependencies

tokio         — async runtime
bytes         — zero-copy buffers
clap          — CLI parsing
futures-util  — FuturesUnordered for connection multiplexing

4 crates. No HTTP framework. No TUI framework.

Roadmap

vastar is currently an HTTP/1.1 load generator. The roadmap expands it into a multi-protocol load generator for high-throughput systems.

Phase Scope Status
0. Concurrency sweep vastar sweep — adaptive sweet-spot finder, knee detection, paired-mode overhead comparison, JSON/NDJSON output shipped v0.2.0
1. HTTP parity HTTPS/TLS, HTTP/2, proxy, redirects, JSON/CSV output, coordinated omission correction planned
2. HTTPS + HTTP/2 rustls, h2 crate, ALPN negotiation, mTLS planned
3. Multi-protocol gRPC, WebSocket, QUIC/HTTP/3, MQTT, NATS, Kafka, AMQP, RSocket, GraphQL, raw TCP/UDP. SSE already supported. planned
4. Advanced analysis Coordinated omission correction, comparative mode, distributed load gen, custom SLO, CI/CD gates planned
5. Ecosystem vastar-cloud, HTML report generator, GitHub Action, IDE plugin planned
6. AI Engineering vastar ai-bench — TTFT, TPS, inter-token latency, cost estimation, multi-model, prompt sweep, guardrail overhead planned
7. Data layer vastar sql (Postgres/MySQL), vastar redis, vastar vector (Qdrant/Milvus), vastar tsdb, vastar search, vastar graph planned
8. Storage & cache vastar s3 (MinIO/S3), vastar cache, distributed FS planned
9. Infrastructure vastar dns, vastar gateway, vastar mesh, vastar serverless, vastar edge, vastar lb planned
10. Emerging vastar blockchain, vastar realtime, vastar wasm, vastar ml, vastar media, vastar audio planned

30+ subcommands planned, all sharing the same core engine. See ROADMAP.md for full details.

See ROADMAP.md for full details including feature-by-feature comparison with hey and oha.

Known Issues

See ROADMAP.md for known bugs and workarounds.

License

MIT OR Apache-2.0