armchair 0.2.2

Concurrency benchmarking tool for Rime TTS services
# Armchair

Armchair is a load test binary that can be used to benchmark Rime's TTS service
with concurrent requests.

Primary use cases:

- To find the time-to-first-byte (TTFB) and real-time factor (RTF) for a given concurrency level.
- To find the maximum concurrency that satisfy the given performance targets.
- To find an optimal client-side buffer size to avoid underrun issues.

For audio streaming with concurrent sessions, TTFB and RTF are the key performance indicators. To achieve
real-time streaming, it is imperative that RTF is under 1 and a maximum concurrency is typically imposed
to ensure that RTF is under 1.

Supported features:

- Bisection to find maximum concurrency based on configurable performance target (success, TTFB, RTF)
- Time-to-first byte and RTF metrics for a given concurrency
- Session start staggering via exponential distribution
- Intra-session delays via truncated normal distribution with playback-aware waiting
- Client-side buffer simulation and underrun detection

## Methodology

This tool simulates many concurrent streaming sessions and evaluates performance against a configurable target.

- Session model:
  - At a given concurrency C, C sessions are launched.
  - Session starts are staggered by an exponential inter-arrival process with rate λ (`--session-rate`, starts/second).
  - Each session performs `-n/--requests-per-session` sequential requests.

- Per-request timing and metrics:
  - TTFB is measured from request send to the first received byte.
  - Elapsed is the total time to stream the entire response.
  - Audio duration is parsed from the WAV headers; if parsing fails the request is treated as non-audio for RTF purposes.
  - RTF is computed as (elapsed − TTFB) / audio_duration. RTF values requiring missing/invalid audio duration are excluded from RTF percentiles.

- Intra-session delay model (traffic shaping):
  - After each request completes, the tool waits any remaining playback time if the audio was synthesized faster than real time, i.e. max(0, audio_duration − (elapsed − TTFB)).
  - Then it sleeps an additional delay sampled from a Normal distribution with parameters `--intra-session-delay-mu` and `--intra-session-delay-sigma`, truncated to [`--intra-session-delay-min`, `--intra-session-delay-max`].
  - The first request in a session has no intra-session delay; session start staggering is controlled by the exponential process above.

- Buffer underrun detection:
  - Simulates a client-side buffer of size `--client-buffer` (default 0ms).
  - Playback starts once the buffer is full.
  - An underrun occurs if the buffer empties before playback completes.
  - Requires valid WAV headers to determine the byte rate.

- Aggregation and statistics:
  - Success is counted when HTTP status is 2xx and the body is non-empty.
  - Percentiles (p50/p90/p95/p99) are computed via linear interpolation over sorted samples; NaN/invalid values are excluded from the relevant metric’s distribution.
  - A startup config dump prints all key parameters for reproducibility.

- Performance target evaluation (`--target`):
  - The target is a conjunction: all configured clauses must pass.
  - Supported clauses: `success:<fraction>`, `ttfb:pXX@<duration>`, `rtf:pXX@<value>`, `underrun:<fraction>`.
  - If a metric cannot be computed (e.g., no valid audio for RTF), that clause fails.
  - Results show OK/FAIL per clause, with color when the terminal supports it.

- Maximum concurrency search (when `--concurrency 0`):
  - Exponential growth: repeatedly doubles concurrency (1, 2, 4, …) until the performance target fails; waits 10s between trials.
  - Binary search: bisection between last known-good and first failing to find the largest concurrency that still satisfies the target.
  - After discovery, a final run at the chosen concurrency prints a full summary.

Note: The traffic and delay processes are stochastic; repeated runs will vary. Randomness is seeded from system entropy.

## Usage

### Installation

```shell
cargo install armchair
```

### Maximum concurrency

To find the maximum concurrency where each session sends 5 requests with:

- session starts following an exponential process (lambda=5 starts/sec)
- intra-session delays sampled from a truncated normal N(mu=10s, sigma=5s), clamped to [0s, 20s]
- performance targets `success:1.00,ttfb:p99@1s,rtf:p99@1.00,underrun:0.00` (default)

```shell
armchair --url '<RIME_SERVICE>' --token '<RIME_API_KEY>'
```

The tool should then report metrics like:

```
=== MAXIMUM CONCURRENCY FOUND: 16 ===

...

----- Summary -----
total: 80 success: 80 (100.0%)
Buffer underrun: 0 (0.0%)
TTFB ms: mean=104.4 p50=100.8 p90=117.6 p95=126.2 p99=141.0
Elapsed ms: mean=13924.0 p50=13772.5 p90=16412.8 p95=17065.3 p99=18527.3
RTF: mean=1.067 p50=1.061 p90=1.170 p95=1.208 p99=1.254
```

### Fixed concurrency

By specifying the flag `--concurrency`, the tool skips the bisection and simply
produces the latency metrics.

### Request customization

- `-n`: Number of requests in each session, e.g. `5`
- `--session-rate`: Session starts per second following a Poisson distribution for staggered starts, e.g. `5`
- `--intra-session-delay-mu`: Intra-session delay mean, e.g. `10s`
- `--intra-session-delay-sigma`: Intra-session delay standard deviation, e.g. `5s`
- `--intra-session-delay-min`: Intra-session delay minimum clamp, e.g. `0s`
- `--intra-session-delay-max`: Intra-session delay maximum clamp, e.g. `20s`
- `--client-buffer`: Client-side initial playback buffer, e.g. `100ms`
- `--target`: Performance target specification, e.g. `success:1.00,ttfb:p90@500ms,rtf:p90@1.00,underrun:0.00`
- `--percentiles`: List of percentiles to report, e.g. `1,25,50,90,99`

#### Duration value syntax

Flags that accept durations (e.g., `--intra-session-delay-mu`) take values with units:

```
500ms, 1.5s, 10s
```

#### Performance target flag

`--target` accepts a comma-separated list:

```
success:<fraction>,ttfb:pXX@<duration>,rtf:pXX@<value>,underrun:<fraction>
```

Examples:

```
--target success:0.99,ttfb:p95@800ms,rtf:p90@1.20,underrun:0.01
--target success:1.00,ttfb:p90@1s,rtf:p90@1.00,underrun:0.00
```