vastar
HTTP load generator. Fast, zero-copy, Rust. Alternative to hey, oha, wrk.
$ vastar -n 3000 -c 500 -m POST -T "application/json" -d '{"prompt":"bench"}' http://localhost:4545/v1/chat/completions
Summary:
Total: 0.3251 secs
Slowest: 0.0978 secs
Fastest: 0.0016 secs
Average: 0.0474 secs
Requests/sec: 9229.24
Total data: 174078167 bytes
Size/request: 58026 bytes
Response time distribution:
10.00% in 0.0243 secs
25.00% in 0.0327 secs
50.00% in 0.0461 secs (46.06ms)
75.00% in 0.0627 secs
90.00% in 0.0741 secs
95.00% in 0.0811 secs (81.09ms)
99.00% in 0.0892 secs (89.22ms)
99.90% in 0.0951 secs (95.07ms)
99.99% in 0.0978 secs
Insight:
Latency spread p99/p50 = 1.9x -- good consistency
Tail ratio p99/p95 = 1.1x -- clean tail
Outlier ratio p99.9/p99 = 1.1x -- no significant outliers
Response time histogram: (11-level SLO color gradient)
0.0016 [42] ███
0.0103 [82] ██████
0.0191 [361] ███████████████████████████
0.0278 [630] ████████████████████████████████████████████████
0.0366 [364] ███████████████████████████
0.0453 [362] ███████████████████████████
0.0541 [415] ██████████████████████████████
0.0628 [364] ███████████████████████████
0.0716 [213] ████████████████
0.0803 [137] ██████████
0.0891 [30] ██
Status code distribution:
[200] 3000 responses
Details (average, fastest, slowest):
req write: 0.0000 secs, 0.0000 secs, 0.0014 secs
resp wait: 0.0226 secs, 0.0007 secs, 0.0482 secs
resp read: 0.0248 secs, 0.0008 secs, 0.0583 secs
Why vastar
| vastar | hey | oha | |
|---|---|---|---|
| Language | Rust (raw TCP) | Go | Rust (hyper) |
| Binary | 1.2 MB | 9 MB | 20 MB |
| RPS c=1 | 91K | 41K | 72K |
| RPS c=500 | 415K | 114K | 219K |
| RPS c=1000 | 414K | 65K | 18K |
| RPS c=5000 | 354K | 53K | 23K |
| Memory c=1000 | 32 MB | 78 MB | 42 MB |
At high concurrency (c=500+) vastar is 3-8x faster than hey while using 2-4x less memory.
See BENCHMARK.md for full comparison across 10 concurrency levels and 4 payload sizes.
Features
- 11-level SLO color histogram — ANSI 256-color gradient (dark green to dark red) mapped to percentile thresholds
- Automated Insight — latency spread, tail ratio, outlier detection from p50/p95/p99/p99.9
- Key percentile highlights — p50, p95, p99, p99.9 annotated with colored (ms) values
- Phase timing Details — req write, resp wait, resp read breakdown (like hey)
- Live progress bar — ASCII-only, terminal-aware, no emoji, no aggressive clear screen
- Chunked transfer — supports Content-Length and Transfer-Encoding: chunked (SSE/streaming)
Install
Or build from source:
# Binary at ./target/release/vastar (1.2 MB)
Usage
Usage: vastar [OPTIONS] <URL>
Options:
-n <REQUESTS> Number of requests [default: 200]
-c <CONCURRENCY> Concurrent workers [default: 50]
-z <DURATION> Duration (e.g. 10s, 1m). Overrides -n
-q <QPS> Rate limit per worker [default: 0]
-m <METHOD> HTTP method [default: GET]
-d <BODY> Request body
-D <BODY_FILE> Request body from file
-T <CONTENT_TYPE> Content-Type [default: text/html]
-H <HEADER> Custom header (repeatable)
-t <TIMEOUT> Timeout in seconds [default: 20]
-A <ACCEPT> Accept header
-a <AUTH> Basic auth (user:pass)
--disable-keepalive Disable keep-alive
--disable-compression Disable compression
--disable-redirects Disable redirects
-h, --help Print help
-V, --version Print version
Examples
# Simple GET
# POST with JSON body
# 10000 requests, 500 concurrent
# Duration mode: run for 30 seconds
# Custom headers
# Basic auth
# Body from file
Architecture
CLI (clap)
|
+-----+-----+
| Coordinator |
+-----+-----+
|
Phase 0: Pre-connect (semaphore, max 256 concurrent)
|
Phase 1: Distribute connections to workers
|
+---------+---------+---------+
| | | |
Worker 0 Worker 1 ... Worker N <- clamp(C/128, 1, cpus*2)
| | | |
+----+----+ ... ... ...
| | |
Conn Conn Conn <- FuturesUnordered event loop
| | | per worker (~128 conns each)
| | |
Raw TCP streams <- hand-crafted HTTP/1.1
| | |
+----+----+
|
BufReader 32KB <- synchronous header parse
| fill_buf() + consume() body drain
|
AtomicU64 progress <- lock-free, 10 FPS render
How it works
-
Pre-connect phase. All C connections are established in parallel before the benchmark starts. A semaphore limits to 256 concurrent connects to avoid TCP backlog overflow.
-
Adaptive worker topology. Workers scale as
clamp(C/128, 1, cpus*2). At c=50 that's 1 worker. At c=500 that's 4 workers. At c=5000 that's 32 workers (capped at cpus*2). Each worker runs a FuturesUnordered event loop managing ~128 connections. The tokio scheduler sees N workers (not C tasks) — drastically less scheduling overhead at high concurrency. -
Raw TCP request. HTTP/1.1 request bytes are pre-built once at startup (
method + path + headers + body). Each request is a singlewrite_all()of pre-builtBytes(Arc-backed, zero-copy clone). No per-request allocation, no header map construction, no URI parsing. -
Synchronous response parsing. One
fill_buf()call gets data into BufReader's 32KB buffer. Headers are parsed synchronously from buffered data (find\r\n\r\n, scan forContent-Length/Transfer-Encoding). Body is drained viafill_buf()+consume()— no per-response allocation. Chunked transfer encoding is handled inline. -
Phase timing. Each request measures write time, wait-for-first-byte time, and read time separately. These are accumulated per-worker (sum/min/max) and merged once at the end — no per-request timing allocation.
-
SLO color histogram. 11 histogram buckets mapped to 11 ANSI 256-color levels via the color cube path:
(0,1,0)→(0,4,0)→(1,5,0)→(5,5,0)→(5,1,0)→(4,0,0)→(2,0,0)(dark green through yellow to dark red). Color only emitted when stdout is a terminal.
Dependencies
tokio — async runtime
bytes — zero-copy buffers
clap — CLI parsing
futures-util — FuturesUnordered for connection multiplexing
4 crates. No HTTP framework. No TUI framework.
License
MIT OR Apache-2.0