tailtriage-controller 0.1.2

Configurable control layer for repeated bounded capture windows in long-lived services
Documentation
# tailtriage-controller


`tailtriage-controller` manages repeated, bounded capture windows for long-lived services.

Use it when you want to turn capture on, collect one generation, turn capture off, and later start a fresh generation without restarting the process.

Analysis is still done by `tailtriage-cli`.

## When to use this crate


Use `tailtriage-controller` when you need repeated arm/disarm windows in one process.

Use `tailtriage-core` for a single explicit `build -> capture -> shutdown` run.

Use `tailtriage` when you want the default entry point with controller support enabled by default (or disabled via Cargo features).

## Installation


```bash
cargo add tailtriage-controller
```

## Quick start


`output("tailtriage-run.json")` configures the base artifact path template. Each activation writes a per-generation artifact with `-generation-N` in the file name (for example, generation 1 writes `tailtriage-run-generation-1.json`).

```rust,no_run
use tailtriage_controller::TailtriageController;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let controller = TailtriageController::builder("checkout-service")
        .initially_enabled(false)
        .output("tailtriage-run.json")
        .build()?;

    let _generation = controller.enable()?;

    let started = controller.begin_request("/checkout");
    started.completion.finish_ok();

    let _ = controller.disable()?;
    Ok(())
}
```

## Mental model


A controller owns a **template** plus at most one **active generation**.

- `enable()` creates a fresh generation from the current template.
- `disable()` stops new admissions for that generation.
- If no captured requests are still in flight, the generation finalizes immediately.
- Otherwise the generation enters **closing** and finalizes after its already-admitted captured requests drain.
- The next `enable()` creates a new generation with a new artifact path.

Requests started while the controller is disabled or closing are **inert**:

- they preserve request metadata
- they record no capture events
- they never join a later generation

Each activation writes a per-generation artifact whose file name includes `-generation-N`.

## Minimal TOML example


Use TOML when you want repeatable operational settings, including mode selection.

```toml
[controller]
service_name = "checkout-service"

[controller.activation]
mode = "light"

[controller.activation.sink]
type = "local_json"
output_path = "tailtriage-run.json"
```

## Expanded TOML example


```toml
[controller]
service_name = "checkout-service"
initially_enabled = false

[controller.activation]
mode = "investigation"
strict_lifecycle = true

[controller.activation.capture_limits_override]
max_requests = 150000
max_stages = 300000
max_queues = 300000
max_inflight_snapshots = 300000
max_runtime_snapshots = 150000

[controller.activation.sink]
type = "local_json"
output_path = "tailtriage-run.json"

[controller.activation.runtime_sampler]
enabled_for_armed_runs = true
mode_override = "investigation"
interval_ms = 250
max_runtime_snapshots = 20000

[controller.activation.run_end_policy]
kind = "auto_seal_on_limits_hit"
```

## Config precedence and reload rules


When TOML is loaded with `config_path(...)`:

- `service_name` from TOML overrides the builder value when present.
- builder `service_name` is a fallback only when TOML omits `service_name`.
- `initially_enabled` falls back to the builder value when omitted.
- activation template settings come from TOML.
- omitted optional activation subfields use TOML contract defaults.

`reload_config()` updates the template for **future** generations only.

It does not mutate a generation that is already active.

## Run-end policies


Supported policies:

- `continue_after_limits_hit` _(default)_
- `auto_seal_on_limits_hit`

Behavior:

- `continue_after_limits_hit`: generation stays active after the first truncation
- `auto_seal_on_limits_hit`: on the first `limits_hit`, new admissions stop and the generation moves to closing; finalization happens immediately if no captured requests are still in flight, otherwise after they drain

TOML contract:

- `[controller.activation.run_end_policy]` is optional
- if that table is present, `kind` is required

## Runtime sampler template


The controller can start a Tokio runtime sampler automatically for armed generations.

Important constraints:

- sampler startup still requires an active Tokio runtime
- sampler settings are fixed at activation time
- runtime snapshot retention is still bounded by the resolved core capture limits

## TOML field reference


### `[controller]`


- `service_name` _(optional string)_: overrides the builder service name when present; must not be empty
- `initially_enabled` _(optional bool)_: when `true`, `build()` starts generation `1`

### `[controller.activation]`


- `mode` _(required string)_: `light` or `investigation`
- `strict_lifecycle` _(optional bool, default `false`)_

### `[controller.activation.sink]`


- `type` _(required string)_: `local_json`
- `output_path` _(required string for `local_json`)_: base path template for per-generation files

### `[controller.activation.capture_limits_override]`


All fields are optional:

- `max_requests`
- `max_stages`
- `max_queues`
- `max_inflight_snapshots`
- `max_runtime_snapshots`

### `[controller.activation.runtime_sampler]`


Optional table. Default is disabled.

- `enabled_for_armed_runs`
- `mode_override`
- `interval_ms`
- `max_runtime_snapshots`

### `[controller.activation.run_end_policy]`


Optional table. If present, `kind` is required.

- `kind = "continue_after_limits_hit"`
- `kind = "auto_seal_on_limits_hit"`

## Important constraints


- at most one generation is active at a time
- active generation settings do not change after activation
- requests remain bound to the generation that admitted them
- controller capture and artifact analysis are separate; analysis happens in `tailtriage-cli`

## Related crates


- `tailtriage`: default entry point
- `tailtriage-core`: direct instrumentation lifecycle
- `tailtriage-tokio`: runtime-pressure sampling
- `tailtriage-cli`: artifact analysis