Trixter – Chaos Monkey TCP Proxy
A high‑performance, runtime‑tunable TCP chaos proxy — a minimal, blazing‑fast alternative to Toxiproxy written in Rust with Tokio. It lets you inject latency, throttle bandwidth, slice writes (to simulate small MTUs/Nagle‑like behavior), corrupt bytes in flight by injecting random bytes, randomly terminate connections, and hard‑timeout sessions – all controllable per connection via a simple REST API.
Why Trixter?
- Zero-friction: one static binary, no external deps.
- Runtime knobs: flip chaos on/off without restarting.
- Per-conn control: target just the flows you want.
- Minimal overhead: adapters are lightweight and composable.
Features
- Fast path:
tokio::io::copy_bidirectionalon a multi‑thread runtime; - Runtime control (per active connection):
- Latency: add/remove delay in ms.
- Throttle: cap bytes/sec.
- Slice: split writes into fixed‑size chunks.
- Corrupt: inject random bytes with a tunable probability.
- Chaos termination: probability [0.0..=1.0] to abort on each read/write.
- Hard timeout: stop a session after N milliseconds.
- REST API to list connections and change settings on the fly.
- Targeted kill: shut down a single connection with a reason.
- Deterministic chaos: seed the RNG for reproducible scenarios.
- RST on chaos: resets (best-effort) when a timeout/termination triggers.
Quick start
1. Run an upstream echo server (demo)
Use any TCP server. Examples:
2. Run trixter chaos proxy
with docker:
or build from scratch:
or install with cargo:
and run:
RUST_LOG=info \
3. Test
Now connect your app/CLI to localhost:8080. The proxy forwards to 127.0.0.1:8181.
REST API
Base URL is the --api address, e.g. http://127.0.0.1:8888.
Data model
Notes:
delayserializes as astd::time::Durationobject withsecs/nanosfields (zeroed when the delay is disabled).idis unique per connection; use it to target a single connection.corrupt_probability_ratereports the current per-operation flip probability (0.0when corruption is off).
Health check
List connections
|
Kill a connection
ID=
Kill all connections
Set latency (ms)
# Remove latency
Throttle bytes/sec
Slice writes (bytes)
Randomly terminate reads/writes
# Set 5% probability per read/write operation
Inject random bytes
# Corrupt ~1% of operations
# Remove corruption
Error responses
404 Not Found— bad connection ID400 Bad Request— invalid probability (outside 0.0..=1.0) for termination/corruption500 Internal Server Error— internal channel/handler error
CLI flags
--listen <ip:port> # e.g. 0.0.0.0:8080
--upstream <ip:port> # e.g. 127.0.0.1:8181
--api <ip:port> # e.g. 127.0.0.1:8888
--delay-ms <ms> # 0 = off (default)
--throttle-rate-bytes <bytes/s> # 0 = unlimited (default)
--slice-size-bytes <bytes> # 0 = off (default)
--terminate-probability-rate <0..1> # 0.0 = off (default)
--corrupt-probability-rate <0..1> # 0.0 = off (default)
--connection-duration-ms <ms> # 0 = unlimited (default)
--random-seed <u64> # seed RNG for deterministic chaos (optional)
All of the above can be changed per connection at runtime via the REST API, except
--connection-duration-mswhich is a process-wide default applied to new connections.Omit
--random-seedto draw entropy for every run; set it when you want bit-for-bit reproducibility.
How it works (architecture)
Each accepted downstream connection spawns a task that:
-
Connects to the upstream target.
-
Wraps both sides with tunable adapters with
tokio-netem:DelayedWriter→ optional latencyThrottledWriter→ bandwidth capSlicedWriter→ fixed‑size write chunksTerminator→ probabilistic abortsCorrupter→ probabilistic random byte injectorShutdowner(downstream only) → out‑of‑band shutdown via oneshot channel
-
Runs
tokio::io::copy_bidirectionaluntil EOF/error/timeout. -
Tracks the live connection in a
DashMapso the API can query/mutate it.
Use cases
- Flaky networks: simulate 3G/EDGE/satellite latency and low bandwidth.
- MTU/segmentation bugs: force small write slices to uncover packetization assumptions.
- Resilience drills: randomly kill connections during critical paths.
- Data validation: corrupt bytes to exercise checksums and retry logic.
- Timeout tuning: enforce hard upper‑bounds to validate client retry/backoff logic.
- Canary/E2E tests: target only specific connections and tweak dynamically.
- Load/soak: run for hours with varying chaos settings from CI/scripts.
Recipes
Simulate a shaky mobile link
# Add ~250ms latency and 64 KiB/s cap to the first active connection
ID=
Force tiny packets (find buffering bugs)
Introduce flakiness (5% ops abort)
Add data corruption
Timebox a connection to 5s at startup
Kill the slowpoke
Integration: CI & E2E tests
- Spin up the proxy as a sidecar/container.
- Discover the right connection (by
downstream/upstreampair) viaGET /connections. - Apply chaos during specific test phases with
PATCHcalls. - Always clean up with
POST /connections/{id}/shutdownto free ports quickly.
Reproduce CI failures
Omit --random-seed in CI so each run draws fresh entropy. When a failure hits, check the proxy logs for the random seed: <value> line and replay the scenario locally with that seed:
Performance notes
- Built on
Tokiomulti‑thread runtime; avoid heavy CPU work on the I/O threads. - Throttling and slicing affect throughput by design; set them to
0to disable. - Use loopback or a fast NIC for local tests; network stack/OS settings still apply.
- Logging:
RUST_LOG=info(ordebug) for visibility; turn off for max throughput.
Security
- The API performs no auth; bind it to a trusted interface (e.g.,
127.0.0.1). - The proxy is transparent TCP; apply your own TLS/ACLs at the edges if needed.
Error handling
- Invalid probability returns
400with{ "error": "invalid probability; must be between 0.0 and 1.0" }. - Unknown connection IDs return
404. - Internal channel/handler errors return
500.
License
MIT