vastar 0.1.1

HTTP load generator. Fast, zero-copy, raw TCP. Alternative to hey, oha, wrk.
vastar-0.1.1 is not a library.

vastar

HTTP load generator. Fast, zero-copy, Rust. Alternative to hey, oha, wrk.

$ vastar -n 3000 -c 500 -m POST -T "application/json" -d '{"prompt":"bench"}' http://localhost:4545/v1/chat/completions

Summary:
  Total:        0.3251 secs
  Slowest:      0.0978 secs
  Fastest:      0.0016 secs
  Average:      0.0474 secs
  Requests/sec: 9229.24
  Total data:   174078167 bytes
  Size/request: 58026 bytes

Response time distribution:
  10.00% in 0.0243 secs
  25.00% in 0.0327 secs
  50.00% in 0.0461 secs  (46.06ms)
  75.00% in 0.0627 secs
  90.00% in 0.0741 secs
  95.00% in 0.0811 secs  (81.09ms)
  99.00% in 0.0892 secs  (89.22ms)
  99.90% in 0.0951 secs  (95.07ms)
  99.99% in 0.0978 secs

  Insight:
    Latency spread p99/p50 = 1.9x -- good consistency
    Tail ratio p99/p95 = 1.1x -- clean tail
    Outlier ratio p99.9/p99 = 1.1x -- no significant outliers

Response time histogram:        (11-level SLO color gradient)
  0.0016 [42]   ███
  0.0103 [82]   ██████
  0.0191 [361]  ███████████████████████████
  0.0278 [630]  ████████████████████████████████████████████████
  0.0366 [364]  ███████████████████████████
  0.0453 [362]  ███████████████████████████
  0.0541 [415]  ██████████████████████████████
  0.0628 [364]  ███████████████████████████
  0.0716 [213]  ████████████████
  0.0803 [137]  ██████████
  0.0891 [30]   ██

Status code distribution:
  [200] 3000 responses

Details (average, fastest, slowest):
  req write:    0.0000 secs, 0.0000 secs, 0.0014 secs
  resp wait:    0.0226 secs, 0.0007 secs, 0.0482 secs
  resp read:    0.0248 secs, 0.0008 secs, 0.0583 secs

Why vastar

vastar hey oha
Language Rust (raw TCP) Go Rust (hyper)
Binary 1.2 MB 9 MB 20 MB
RPS c=1 91K 41K 72K
RPS c=500 415K 114K 219K
RPS c=1000 414K 65K 18K
RPS c=5000 354K 53K 23K
Memory c=1000 32 MB 78 MB 42 MB

At high concurrency (c=500+) vastar is 3-8x faster than hey while using 2-4x less memory.

See BENCHMARK.md for full comparison across 10 concurrency levels and 4 payload sizes.

Features

  • 11-level SLO color histogram — ANSI 256-color gradient (dark green to dark red) mapped to percentile thresholds
  • Automated Insight — latency spread, tail ratio, outlier detection from p50/p95/p99/p99.9
  • Key percentile highlights — p50, p95, p99, p99.9 annotated with colored (ms) values
  • Phase timing Details — req write, resp wait, resp read breakdown (like hey)
  • Live progress bar — ASCII-only, terminal-aware, no emoji, no aggressive clear screen
  • Chunked transfer — supports Content-Length and Transfer-Encoding: chunked (SSE/streaming)

Install

cargo install vastar

Or build from source:

git clone https://github.com/Vastar-AI/vastar.git
cd vastar
cargo build --release
# Binary at ./target/release/vastar (1.2 MB)

Usage

Usage: vastar [OPTIONS] <URL>

Options:
  -n <REQUESTS>              Number of requests [default: 200]
  -c <CONCURRENCY>           Concurrent workers [default: 50]
  -z <DURATION>              Duration (e.g. 10s, 1m). Overrides -n
  -q <QPS>                   Rate limit per worker [default: 0]
  -m <METHOD>                HTTP method [default: GET]
  -d <BODY>                  Request body
  -D <BODY_FILE>             Request body from file
  -T <CONTENT_TYPE>          Content-Type [default: text/html]
  -H <HEADER>                Custom header (repeatable)
  -t <TIMEOUT>               Timeout in seconds [default: 20]
  -A <ACCEPT>                Accept header
  -a <AUTH>                  Basic auth (user:pass)
      --disable-keepalive    Disable keep-alive
      --disable-compression  Disable compression
      --disable-redirects    Disable redirects
  -h, --help                 Print help
  -V, --version              Print version

Examples

# Simple GET
vastar http://localhost:8080/

# POST with JSON body
vastar -m POST -T "application/json" -d '{"key":"value"}' http://localhost:8080/api

# 10000 requests, 500 concurrent
vastar -n 10000 -c 500 http://localhost:8080/

# Duration mode: run for 30 seconds
vastar -z 30s -c 200 http://localhost:8080/

# Custom headers
vastar -H "Authorization: Bearer token" -H "X-Custom: value" http://localhost:8080/

# Basic auth
vastar -a user:pass http://localhost:8080/

# Body from file
vastar -m POST -T "application/json" -D payload.json http://localhost:8080/api

Architecture

              CLI (clap)
                  |
            +-----+-----+
            | Coordinator |
            +-----+-----+
                  |
    Phase 0: Pre-connect (semaphore, max 256 concurrent)
                  |
    Phase 1: Distribute connections to workers
                  |
        +---------+---------+---------+
        |         |         |         |
    Worker 0  Worker 1  ...  Worker N      <- clamp(C/128, 1, cpus*2)
        |         |         |         |
   +----+----+   ...       ...       ...
   |    |    |
  Conn Conn Conn   <- FuturesUnordered event loop
   |    |    |         per worker (~128 conns each)
   |    |    |
  Raw TCP streams  <- hand-crafted HTTP/1.1
   |    |    |
   +----+----+
        |
   BufReader 32KB <- synchronous header parse
        |              fill_buf() + consume() body drain
        |
   AtomicU64 progress <- lock-free, 10 FPS render

How it works

  1. Pre-connect phase. All C connections are established in parallel before the benchmark starts. A semaphore limits to 256 concurrent connects to avoid TCP backlog overflow.

  2. Adaptive worker topology. Workers scale as clamp(C/128, 1, cpus*2). At c=50 that's 1 worker. At c=500 that's 4 workers. At c=5000 that's 32 workers (capped at cpus*2). Each worker runs a FuturesUnordered event loop managing ~128 connections. The tokio scheduler sees N workers (not C tasks) — drastically less scheduling overhead at high concurrency.

  3. Raw TCP request. HTTP/1.1 request bytes are pre-built once at startup (method + path + headers + body). Each request is a single write_all() of pre-built Bytes (Arc-backed, zero-copy clone). No per-request allocation, no header map construction, no URI parsing.

  4. Synchronous response parsing. One fill_buf() call gets data into BufReader's 32KB buffer. Headers are parsed synchronously from buffered data (find \r\n\r\n, scan for Content-Length/Transfer-Encoding). Body is drained via fill_buf() + consume() — no per-response allocation. Chunked transfer encoding is handled inline.

  5. Phase timing. Each request measures write time, wait-for-first-byte time, and read time separately. These are accumulated per-worker (sum/min/max) and merged once at the end — no per-request timing allocation.

  6. SLO color histogram. 11 histogram buckets mapped to 11 ANSI 256-color levels via the color cube path: (0,1,0)→(0,4,0)→(1,5,0)→(5,5,0)→(5,1,0)→(4,0,0)→(2,0,0) (dark green through yellow to dark red). Color only emitted when stdout is a terminal.

Dependencies

tokio         — async runtime
bytes         — zero-copy buffers
clap          — CLI parsing
futures-util  — FuturesUnordered for connection multiplexing

4 crates. No HTTP framework. No TUI framework.

License

MIT OR Apache-2.0