Crate tailtriage_cli

Expand description

§tailtriage-cli

tailtriage-cli loads tailtriage run artifacts and turns them into a triage report.

Install it after capture instrumentation is in place.

The binary name is:

tailtriage

§What this tool does

tailtriage-cli owns the analysis-side contract:

load a captured artifact
validate schema compatibility
produce JSON or human-readable triage output
rank likely bottleneck families
emit evidence and next checks

The output is intended to guide the next investigation step. It does not prove root cause on its own.

§Installation

cargo install tailtriage-cli

§Minimal usage

Default text output:

tailtriage analyze tailtriage-run.json

Machine-readable JSON output:

tailtriage analyze tailtriage-run.json --format json

The CLI artifact loader requires at least one request event in requests.

§How to read the result

Read output in this order:

primary_suspect.kind
primary_suspect.evidence[]
primary_suspect.next_checks[]

Then run one targeted check, change one thing, and re-run under comparable load.

§Representative output shape

{
  "request_count": 250,
  "p50_latency_us": 782227,
  "p95_latency_us": 1468239,
  "p99_latency_us": 1518551,
  "p95_queue_share_permille": 982,
  "p95_service_share_permille": 267,
  "inflight_trend": null,
  "warnings": [],
  "primary_suspect": {
    "kind": "application_queue_saturation",
    "score": 90,
    "confidence": "high",
    "evidence": ["Queue wait at p95 consumes 98.2% of request time."],
    "next_checks": ["Inspect queue admission limits and producer burst patterns."]
  },
  "secondary_suspects": []
}

inflight_trend may be null when no in-flight gauges were captured.

§What the report contains

A report can include:

request count
request latency percentiles (p50, p95, p99)
p95 queue/service share summaries
optional in-flight trend summary
report warnings from analysis/report generation (for example truncation-related)
primary and secondary suspects

tailtriage analyze also prints loader/lifecycle warnings to stderr before the report. Those warnings are surfaced separately; they are not merged into the report warnings field.

Each suspect includes:

kind
score
confidence
evidence[]
next_checks[]

§Artifact compatibility contract

The tailtriage analyze workflow expects a supported tailtriage run artifact with minimum required content.

Current contract:

top-level schema_version is required
missing schema_version is rejected
non-integer schema_version is rejected
unsupported schema_version is rejected
current supported schema version is 1
requests must contain at least one request event
artifacts with an empty requests array are rejected by the CLI loader

Library note:

this crate’s library analyzer API, analyze::analyze_run(&Run), can analyze an in-memory Run with zero requests
the stricter non-empty requests rule applies to CLI artifact loading from disk

§Important interpretation notes

suspects are investigation leads, not proof of root cause
truncation warnings mean the diagnosis is based on partial retained data
unfinished lifecycle warnings printed by the CLI indicate some requests were not completed cleanly
p95_queue_share_permille and p95_service_share_permille are independent percentile summaries and do not need to sum to 1000

§Suspect kinds

The current report surface includes these suspect kinds:

application_queue_saturation
blocking_pool_pressure
executor_pressure_suspected
downstream_stage_dominates
insufficient_evidence

§When the result is `insufficient_evidence`

Usually the next step is to add more structure to capture:

add queue wrappers around suspected waits
add stage wrappers around suspected downstream work
optionally add runtime sampling if runtime pressure is unclear
re-run under comparable load

§What this tool does not do

tailtriage-cli does not capture instrumentation data.

Use capture-side crates for that:

tailtriage: recommended capture-side entry point
tailtriage-core: direct instrumentation primitives
tailtriage-controller: repeated bounded windows
tailtriage-tokio: runtime-pressure sampling
tailtriage-axum: Axum request-boundary integration Contract note: CLI artifact loading requires non-empty requests, while analyze::analyze_run can analyze an in-memory tailtriage_core::Run that has zero requests.

Modules§

analyze: Heuristic triage analyzer and text-report rendering.
artifact: Artifact loading and validation helpers for CLI workflows.