# Armchair
Armchair is a load test binary that can be used to benchmark Rime's TTS service
with concurrent requests.
Primary use cases:
- To find the time-to-first-byte (TTFB) and real-time factor (RTF) for a given concurrency level.
- To find the maximum concurrency that satisfy the given performance targets.
- To find an optimal client-side buffer size to avoid underrun issues.
For audio streaming with concurrent sessions, TTFB and RTF are the key performance indicators. To achieve
real-time streaming, it is imperative that RTF is under 1 and a maximum concurrency is typically imposed
to ensure that RTF is under 1.
Supported features:
- Bisection to find maximum concurrency based on configurable performance target (success, TTFB, RTF)
- Time-to-first byte and RTF metrics for a given concurrency
- Session start staggering via exponential distribution
- Intra-session delays via truncated normal distribution with playback-aware waiting
- Client-side buffer simulation and underrun detection
## Methodology
This tool simulates many concurrent streaming sessions and evaluates performance against a configurable target.
- Session model:
- At a given concurrency C, C sessions are launched.
- Session starts are staggered by an exponential inter-arrival process with rate λ (`--session-rate`, starts/second).
- Each session performs `-n/--requests-per-session` sequential requests.
- Per-request timing and metrics:
- TTFB is measured from request send to the first received byte.
- Elapsed is the total time to stream the entire response.
- Audio duration is parsed from the WAV headers; if parsing fails the request is treated as non-audio for RTF purposes.
- RTF is computed as (elapsed − TTFB) / audio_duration. RTF values requiring missing/invalid audio duration are excluded from RTF percentiles.
- Intra-session delay model (traffic shaping):
- After each request completes, the tool waits any remaining playback time if the audio was synthesized faster than real time, i.e. max(0, audio_duration − (elapsed − TTFB)).
- Then it sleeps an additional delay sampled from a Normal distribution with parameters `--intra-session-delay-mu` and `--intra-session-delay-sigma`, truncated to [`--intra-session-delay-min`, `--intra-session-delay-max`].
- The first request in a session has no intra-session delay; session start staggering is controlled by the exponential process above.
- Buffer underrun detection:
- Simulates a client-side buffer of size `--client-buffer` (default 0ms).
- Playback starts once the buffer is full.
- An underrun occurs if the buffer empties before playback completes.
- Requires valid WAV headers to determine the byte rate.
- Aggregation and statistics:
- Success is counted when HTTP status is 2xx and the body is non-empty.
- Percentiles (p50/p90/p95/p99) are computed via linear interpolation over sorted samples; NaN/invalid values are excluded from the relevant metric’s distribution.
- A startup config dump prints all key parameters for reproducibility.
- Performance target evaluation (`--target`):
- The target is a conjunction: all configured clauses must pass.
- Supported clauses: `success:<fraction>`, `ttfb:pXX@<duration>`, `rtf:pXX@<value>`, `underrun:<fraction>`.
- If a metric cannot be computed (e.g., no valid audio for RTF), that clause fails.
- Results show OK/FAIL per clause, with color when the terminal supports it.
- Maximum concurrency search (when `--concurrency 0`):
- Exponential growth: repeatedly doubles concurrency (1, 2, 4, …) until the performance target fails; waits 10s between trials.
- Binary search: bisection between last known-good and first failing to find the largest concurrency that still satisfies the target.
- After discovery, a final run at the chosen concurrency prints a full summary.
Note: The traffic and delay processes are stochastic; repeated runs will vary. Randomness is seeded from system entropy.
## Usage
### Installation
```shell
cargo install armchair
```
### Maximum concurrency
To find the maximum concurrency where each session sends 5 requests with:
- session starts following an exponential process (lambda=5 starts/sec)
- intra-session delays sampled from a truncated normal N(mu=10s, sigma=5s), clamped to [0s, 20s]
- performance targets `success:1.00,ttfb:p99@1s,rtf:p99@1.00,underrun:0.00` (default)
```shell
armchair --url '<RIME_SERVICE>' --token '<RIME_API_KEY>'
```
The tool should then report metrics like:
```
=== MAXIMUM CONCURRENCY FOUND: 16 ===
...
----- Summary -----
total: 80 success: 80 (100.0%)
Buffer underrun: 0 (0.0%)
TTFB ms: mean=104.4 p50=100.8 p90=117.6 p95=126.2 p99=141.0
Elapsed ms: mean=13924.0 p50=13772.5 p90=16412.8 p95=17065.3 p99=18527.3
RTF: mean=1.067 p50=1.061 p90=1.170 p95=1.208 p99=1.254
```
### Fixed concurrency
By specifying the flag `--concurrency`, the tool skips the bisection and simply
produces the latency metrics.
### Request customization
- `-n`: Number of requests in each session, e.g. `5`
- `--session-rate`: Session starts per second following a Poisson distribution for staggered starts, e.g. `5`
- `--intra-session-delay-mu`: Intra-session delay mean, e.g. `10s`
- `--intra-session-delay-sigma`: Intra-session delay standard deviation, e.g. `5s`
- `--intra-session-delay-min`: Intra-session delay minimum clamp, e.g. `0s`
- `--intra-session-delay-max`: Intra-session delay maximum clamp, e.g. `20s`
- `--client-buffer`: Client-side initial playback buffer, e.g. `100ms`
- `--target`: Performance target specification, e.g. `success:1.00,ttfb:p90@500ms,rtf:p90@1.00,underrun:0.00`
- `--percentiles`: List of percentiles to report, e.g. `1,25,50,90,99`
#### Duration value syntax
Flags that accept durations (e.g., `--intra-session-delay-mu`) take values with units:
```
500ms, 1.5s, 10s
```
#### Performance target flag
`--target` accepts a comma-separated list:
```
success:<fraction>,ttfb:pXX@<duration>,rtf:pXX@<value>,underrun:<fraction>
```
Examples:
```
--target success:0.99,ttfb:p95@800ms,rtf:p90@1.20,underrun:0.01
--target success:1.00,ttfb:p90@1s,rtf:p90@1.00,underrun:0.00
```