ccstat 0.6.0 - Docs.rs

# SPEC.md — ccstat Rust Rewrite

**Status:** Draft
**Date:** 2026-02-20

---

## Terminology

| Term | Definition |
|------|-----------|
| Provider | An AI coding tool whose usage data is tracked (Claude, Codex, OpenCode, Amp, Pi) |
| Report | A time-based or logical grouping of usage data (daily, monthly, weekly, session, blocks, statusline) |
| Entry | A single usage event parsed from a provider's log file |
| Session | A continuous interaction period with a provider, identified by a session ID or file |
| Block | A fixed-duration billing window (default 5 hours) used by Claude |
| Token category | One of: input, output, cache creation, cache read |
| Burn rate | Tokens consumed per minute or cost per hour within an active block |
| Projection | Estimated total usage if current burn rate continues to block end |
| Cost mode | Strategy for computing cost: `auto`, `calculate`, or `display` |
| LiteLLM | External pricing database providing per-model token costs |
| MCP | Model Context Protocol — a standard for exposing tools to AI assistants |

---

## 1. Intent

### 1.1 Motivation

The current ccstat project is a TypeScript monorepo containing 6 separate CLI packages, 1 MCP server, and 2 shared libraries. Each CLI is distributed as an independent npm package requiring Node.js. This creates friction:

- **Startup latency**: Node.js cold start is measurable, especially for the statusline hook that runs on every prompt.
- **Distribution complexity**: Six separate packages to install, version, and publish.
- **Supply chain surface**: Hundreds of transitive npm dependencies.
- **Cross-platform packaging**: No single static binary — requires Node.js runtime on every target.

A Rust rewrite produces a single statically-linked binary that replaces all six CLIs and the MCP server, with sub-millisecond startup and zero runtime dependencies.

### 1.2 Goals

- Ship a single `ccstat` binary that replaces all existing CLI packages and the MCP server.
- Maintain feature parity with every existing app (Claude, Codex, OpenCode, Amp, Pi, MCP).
- Improve CLI ergonomics with a unified provider/report subcommand structure.
- Produce identical JSON output schemas for downstream consumers.
- Target sub-50ms startup for the statusline command and sub-200ms for typical reports.
- Provide cross-platform static binaries (Linux x86_64/aarch64, macOS x86_64/aarch64, Windows x86_64).

### 1.3 Users

- **Individual developers** using AI coding tools who want to understand their token usage and costs.
- **Team leads** monitoring aggregate usage patterns across projects.
- **Automation pipelines** consuming JSON output for dashboards or alerts.
- **Claude Code statusline hooks** requiring minimal-latency cost display.
- **MCP clients** (Claude Desktop, other MCP-compatible tools) querying usage data programmatically.

### 1.4 Non-Goals

- Web UI or hosted dashboard.
- Cloud service or database backend.
- Real-time streaming of usage events.
- Support for AI tools beyond the five providers listed.
- Dictating internal data structures, algorithms, or crate-internal module organization.

---

## 2. Scope

### 2.1 Unified CLI Design

The binary is named `ccstat`. The CLI uses a two-level subcommand structure:

```
ccstat [provider] <report> [flags]
ccstat mcp [flags]
ccstat --version
ccstat --help
```

When the provider is omitted, it defaults to `claude`. This means:
- `ccstat daily` is equivalent to `ccstat claude daily`
- `ccstat codex daily` explicitly selects the Codex provider

**Providers:** `claude`, `codex`, `opencode`, `amp`, `pi`
**Reports:** `daily`, `monthly`, `weekly`, `session`, `blocks`, `statusline`
**Special subcommand:** `mcp` (starts the MCP server)

### 2.2 Provider-Report Matrix

Not every provider supports every report type. The system rejects unsupported combinations with a clear error message.

| Report | claude | codex | opencode | amp | pi |
|-----------|--------|-------|----------|-----|-----|
| daily | Y | Y | Y | Y | Y |
| monthly | Y | Y | Y | Y | Y |
| weekly | Y | N | Y | N | N |
| session | Y | Y | Y | Y | Y |
| blocks | Y | N | N | N | N |
| statusline | Y | N | N | N | N |

**Weekly report support:** Weekly is limited to Claude and OpenCode because only these providers existed in the TypeScript version when the weekly report was implemented. Codex, Amp, and Pi were added later without weekly support. The data format is not a blocker — future versions may extend weekly to other providers.

### 2.3 Feature List

**Data loading:**
- Parse JSONL files (Claude, Codex, Pi)
- Parse individual JSON files (OpenCode, Amp)
- Multi-directory data discovery with environment variable overrides
- Entry deduplication per provider
- Date range filtering (`--since`, `--until`)

**Aggregation:**
- Group by calendar date (daily)
- Group by calendar month (monthly)
- Group by ISO week with configurable start day (weekly)
- Group by session/project directory (session)
- Group by fixed-duration billing blocks with gap detection (blocks)
- Single-line compact output from stdin JSON (statusline)

**Cost calculation:**
- Four token categories: input, output, cache creation, cache read
- Three cost modes: auto, calculate, display
- LiteLLM pricing database integration (online fetch + embedded offline snapshot)
- Tiered pricing (200k token threshold for Anthropic models)
- Pre-calculated cost passthrough when available
- Credits-based billing for Amp alongside USD estimates

**Output:**
- Pretty-printed terminal tables with responsive/compact modes
- Structured JSON with consistent schemas
- jq integration (pipe JSON through external `jq` binary)
- Per-model cost/token breakdown
- Color-coded output with color control flags

**MCP server:**
- stdio and HTTP transports
- Tools for daily, monthly, session, blocks (Claude) and daily, monthly (Codex)

**Configuration:**
- JSON config file with global defaults and per-command overrides
- Config file auto-discovery with explicit path override
- CLI flags override config values; config overrides compiled defaults

**Debugging:**
- Cost mismatch detection comparing pre-calculated vs computed costs
- Per-model and per-version statistics
- Sample discrepancy reporting

### 2.4 User Journeys

**View daily Claude usage:**
Context: Developer wants to see today's token usage and costs.
Action: `ccstat daily`
Outcome: Table showing per-day token counts and costs for all Claude sessions, sorted ascending by date.

**View Codex usage for a date range in JSON:**
Context: Developer needs machine-readable Codex usage for the last week.
Action: `ccstat codex daily --since 20260213 --until 20260220 --json`
Outcome: JSON object with `daily` array and `totals` printed to stdout.

**Monitor active billing block:**
Context: Developer wants to check remaining budget in current 5-hour window.
Action: `ccstat blocks --active`
Outcome: Single-row table showing active block with burn rate, projection, and remaining time.

**Statusline hook integration:**
Context: Claude Code pipes session JSON to statusline on every prompt.
Action: `echo '{"session_id":"...","transcript_path":"...","model":{"id":"claude-sonnet-4-20250514",...},...}' | ccstat statusline`
Outcome: Single-line compact status printed to stdout with model, costs, burn rate, and context window usage.

**Start MCP server for Claude Desktop:**
Context: User configures Claude Desktop to query usage data.
Action: `ccstat mcp` (stdio) or `ccstat mcp --transport http --port 8080`
Outcome: MCP server starts, registering usage report tools.

**Use offline pricing:**
Context: Developer is on an airplane with no internet.
Action: `ccstat daily --offline`
Outcome: Report uses embedded pricing snapshot instead of fetching from LiteLLM.

**Filter JSON output with jq:**
Context: Developer wants only total cost from daily report.
Action: `ccstat daily --jq '.totals.totalCost'`
Outcome: Numeric cost value printed to stdout.

**Project-level breakdown:**
Context: Developer tracks multiple projects and wants per-project daily costs.
Action: `ccstat daily --instances`
Outcome: Table grouped by project, each with its own date rows and subtotals.

### 2.5 Data Sources

| Provider | Default path | Env override | Format | File pattern |
|----------|-------------|-------------|--------|-------------|
| Claude | `~/.config/claude/projects/` and `~/.claude/projects/` | `CLAUDE_CONFIG_DIR` (comma-separated) | JSONL | `{project}/{sessionId}.jsonl` |
| Codex | `~/.codex/sessions/` | `CODEX_HOME` | JSONL | `{sessionId}.jsonl` |
| OpenCode | `~/.local/share/opencode/storage/message/` | `OPENCODE_DATA_DIR` | JSON | `{messageId}.json` |
| Amp | `~/.local/share/amp/threads/` | `AMP_DATA_DIR` | JSON | `T-{uuid}.json` |
| Pi | `~/.pi/agent/sessions/` | `PI_AGENT_DIR` | JSONL | `{project}/{sessionId}.jsonl` |

Claude uses XDG base directories: when `CLAUDE_CONFIG_DIR` is not set, it searches `$XDG_CONFIG_HOME/claude/projects/` (defaulting to `~/.config/claude/projects/`) and `~/.claude/projects/`. Data from all valid directories is combined.

---

## 3. Behavior

### 3.1 CLI Structure and Defaults

The binary name is `ccstat`. Global flags apply to all commands. Provider-specific and report-specific flags are documented per command.

**Default provider:** `claude`
**Default report:** Running `ccstat` with no arguments displays help. Running `ccstat daily` runs the daily report for Claude.

**Version:** `ccstat --version` prints `ccstat <version>` and exits.
**Help:** `ccstat --help` lists providers and reports. `ccstat claude --help` lists reports for Claude. `ccstat claude daily --help` lists flags for the daily report.

**Shell completions:** The binary generates completions for bash, zsh, fish, and PowerShell via a `completions` subcommand.

### 3.2 Shared Flags

These flags are available on all report commands (daily, monthly, weekly, session, blocks). Statusline has its own flag set documented in 3.3.6.

| Flag | Short | Type | Default | Description |
|------|-------|------|---------|-------------|
| `--since` | `-s` | YYYYMMDD | none | Include entries on or after this date |
| `--until` | `-u` | YYYYMMDD | none | Include entries on or before this date |
| `--json` | `-j` | bool | false | Output as JSON instead of table |
| `--jq` | `-q` | string | none | Pipe JSON through jq filter (implies `--json`) |
| `--mode` | `-m` | enum | auto | Cost calculation mode: `auto`, `calculate`, `display` |
| `--offline` | `-O` | bool | false | Use embedded pricing snapshot instead of fetching |
| `--order` | `-o` | enum | asc | Sort order: `asc` (oldest first), `desc` (newest first) |
| `--breakdown` | `-b` | bool | false | Show per-model token/cost breakdown rows |
| `--timezone` | `-z` | string | system | IANA timezone for date grouping (e.g., `UTC`, `America/New_York`) |
| `--locale` | `-l` | string | en-CA | Locale for number/date formatting (e.g., `en-US`, `ja-JP`) |
| `--compact` | | bool | false | Force compact table layout |
| `--config` | | path | auto | Path to config file (overrides auto-discovery) |
| `--color` | | bool | auto | Force colored output |
| `--no-color` | | bool | auto | Force disable colored output |
| `--debug` | `-d` | bool | false | Show pricing mismatch debug info |
| `--debug-samples` | | int | 5 | Number of sample discrepancies in debug output |

**Color behavior:** When neither `--color` nor `--no-color` is specified, color is auto-detected from terminal capability. Environment variables `FORCE_COLOR=1` and `NO_COLOR=1` are respected.

**Date filtering:** `--since` and `--until` use YYYYMMDD format (e.g., `20260220`). Entries are filtered by their timestamp converted to the specified timezone.

### 3.3 Report Behaviors

#### 3.3.1 Daily Report

**Command:** `ccstat [provider] daily [flags]`

**Additional flags (Claude, Pi only):**

| Flag | Short | Type | Default | Description |
|------|-------|------|---------|-------------|
| `--instances` | `-i` | bool | false | Group by project, showing per-project daily rows |
| `--project` | `-p` | string | none | Filter to a specific project name |
| `--project-aliases` | | string | none | Comma-separated `key=value` pairs for project name aliases |

**Aggregation:** Group entries by calendar date in the specified timezone.

**Table columns:** Date, Input, Output, Cache Create, Cache Read, Total, Cost, Models

When `--breakdown` is set, indented sub-rows appear below each date showing per-model token counts and cost.

When `--instances` is set, rows are grouped by project. Each project section has its own date rows.

**JSON output:**
```json
{
  "daily": [
    {
      "date": "YYYY-MM-DD",
      "inputTokens": 0,
      "outputTokens": 0,
      "cacheCreationTokens": 0,
      "cacheReadTokens": 0,
      "totalTokens": 0,
      "totalCost": 0.00,
      "modelsUsed": ["model-name"],
      "modelBreakdowns": [
        {
          "modelName": "model-name",
          "inputTokens": 0,
          "outputTokens": 0,
          "cacheCreationTokens": 0,
          "cacheReadTokens": 0,
          "cost": 0.00
        }
      ]
    }
  ],
  "totals": {
    "inputTokens": 0,
    "outputTokens": 0,
    "cacheCreationTokens": 0,
    "cacheReadTokens": 0,
    "totalTokens": 0,
    "totalCost": 0.00
  }
}
```

When `--instances` is set, the JSON structure groups by project:
```json
{
  "projects": {
    "project-name": [
      { "date": "...", "inputTokens": 0, ... }
    ]
  },
  "totals": { ... }
}
```

#### 3.3.2 Monthly Report

**Command:** `ccstat [provider] monthly [flags]`

**Aggregation:** Group entries by calendar month (YYYY-MM).

**Table columns:** Month, Input, Output, Cache Create, Cache Read, Total, Cost, Models

**JSON output:**
```json
{
  "monthly": [
    {
      "month": "YYYY-MM",
      "inputTokens": 0,
      "outputTokens": 0,
      "cacheCreationTokens": 0,
      "cacheReadTokens": 0,
      "totalTokens": 0,
      "totalCost": 0.00,
      "modelsUsed": ["model-name"],
      "modelBreakdowns": [...]
    }
  ],
  "totals": { ... }
}
```

#### 3.3.3 Weekly Report

**Command:** `ccstat [provider] weekly [flags]`

**Supported providers:** Claude, OpenCode only. Other providers return an error.

**Additional flags:**

| Flag | Short | Type | Default | Description |
|------|-------|------|---------|-------------|
| `--start-of-week` | `-w` | enum | sunday | Day to start the week: `sunday` through `saturday` |

**Aggregation:** Group entries by ISO week. The week label is the date of the start day.

**Table columns:** Week, Input, Output, Cache Create, Cache Read, Total, Cost, Models

**JSON output:**
```json
{
  "weekly": [
    {
      "week": "YYYY-MM-DD",
      "inputTokens": 0,
      ...
    }
  ],
  "totals": { ... }
}
```

#### 3.3.4 Session Report

**Command:** `ccstat [provider] session [flags]`

**Additional flags:**

| Flag | Short | Type | Default | Description |
|------|-------|------|---------|-------------|
| `--id` | `-i` | string | none | Look up a specific session by ID (shows per-entry detail) |

**Aggregation (list mode):** Group entries by session. For Claude and Pi, sessions are identified by project directory. For Codex, by session file. For OpenCode, by sessionID field. For Amp, by thread file.

**Table columns (list mode):** Session, Input, Output, Cache Create, Cache Read, Total, Cost, Models, Last Activity

**JSON output (list mode):**
```json
{
  "sessions": [
    {
      "sessionId": "session-identifier",
      "projectPath": "project/path",
      "inputTokens": 0,
      "outputTokens": 0,
      "cacheCreationTokens": 0,
      "cacheReadTokens": 0,
      "totalTokens": 0,
      "totalCost": 0.00,
      "lastActivity": "YYYY-MM-DD",
      "modelsUsed": ["model-name"],
      "modelBreakdowns": [...]
    }
  ],
  "totals": { ... }
}
```

**Session ID lookup mode (`--id`):** When `--id` is provided, the system loads only that session and displays per-entry details.

**JSON output (ID lookup):**
```json
{
  "sessionId": "session-id",
  "totalCost": 0.00,
  "totalTokens": 0,
  "entries": [
    {
      "timestamp": "ISO-8601",
      "inputTokens": 0,
      "outputTokens": 0,
      "cacheCreationTokens": 0,
      "cacheReadTokens": 0,
      "model": "model-name",
      "costUSD": 0.00
    }
  ]
}
```

#### 3.3.5 Blocks Report (Claude only)

**Command:** `ccstat blocks [flags]`

**Additional flags:**

| Flag | Short | Type | Default | Description |
|------|-------|------|---------|-------------|
| `--active` | `-a` | bool | false | Show only the currently active block with projections |
| `--recent` | `-r` | bool | false | Show blocks from last 3 days including active |
| `--token-limit` | `-t` | string | none | Token limit for quota warnings (number or `"max"`) |
| `--session-length` | `-n` | number | 5 | Block duration in hours |

**Aggregation:** Group entries into consecutive fixed-duration blocks. A new block starts when the gap between consecutive entries exceeds the session length.

**Block identification rules:**
- A block starts at the timestamp of its first entry, floored to the hour.
- A block's logical end is start time + session length.
- An active block is one whose logical end is in the future.
- A gap block is inserted between consecutive usage blocks when the gap exceeds the session length. Gap blocks contain no entries and display the gap duration.

**Burn rate:** For active blocks, calculate tokens per minute and cost per hour based on elapsed usage within the block.

**Projection:** For active blocks, extrapolate current burn rate to the block's logical end to estimate total tokens and cost.

**Token limit warnings:** When `--token-limit` is set, display a warning indicator when a block's total tokens exceed 80% of the limit.

**Table columns:** Block Time, Input, Output, Cache Create, Cache Read, Total, Cost, Models

Active blocks display remaining time. Gap blocks display gap duration. Compact mode activates when terminal width is below 120 columns.

**JSON output:**
```json
{
  "blocks": [
    {
      "id": "ISO-8601",
      "startTime": "ISO-8601",
      "endTime": "ISO-8601",
      "isActive": false,
      "isGap": false,
      "inputTokens": 0,
      "outputTokens": 0,
      "cacheCreationTokens": 0,
      "cacheReadTokens": 0,
      "totalTokens": 0,
      "costUSD": 0.00,
      "models": ["model-name"],
      "burnRate": {
        "tokensPerMinute": 0.0,
        "costPerHour": 0.00
      },
      "projection": {
        "totalTokens": 0,
        "totalCost": 0.00,
        "remainingMinutes": 0
      },
      "tokenLimitStatus": {
        "limit": 0,
        "percentage": 0.0,
        "exceeded": false
      }
    }
  ],
  "totals": { ... }
}
```

`burnRate`, `projection`, and `tokenLimitStatus` are present only when applicable (active blocks and when `--token-limit` is set, respectively).

#### 3.3.6 Statusline (Claude only)

**Command:** `ccstat statusline [flags]`

**Input:** Reads a single JSON object from stdin. This JSON is provided by the Claude Code statusline hook.

**Statusline-specific flags:**

| Flag | Short | Type | Default | Description |
|------|-------|------|---------|-------------|
| `--offline` | `-O` | bool | true | Use embedded pricing (default true for speed) |
| `--visual-burn-rate` | `-B` | enum | off | Burn rate display: `off`, `emoji`, `text`, `emoji-text` |
| `--cost-source` | | enum | auto | Session cost source: `auto`, `ccstat`, `cc`, `both` |
| `--cache` | | bool | true | Enable hybrid caching |
| `--no-cache` | | bool | | Disable caching |
| `--refresh-interval` | | int | 1 | Cache refresh interval in seconds |
| `--context-low-threshold` | | int (0-100) | 50 | Context usage green/yellow boundary (%) |
| `--context-medium-threshold` | | int (0-100) | 80 | Context usage yellow/red boundary (%) |
| `--config` | | path | auto | Config file path |
| `--debug` | `-d` | bool | false | Debug mode |

**Input schema:**
```json
{
  "session_id": "string",
  "transcript_path": "string",
  "cwd": "string",
  "model": {
    "id": "string",
    "display_name": "string"
  },
  "workspace": {
    "current_dir": "string",
    "project_dir": "string"
  },
  "version": "string (optional)",
  "cost": {
    "total_cost_usd": 0.00,
    "total_duration_ms": 0,
    "total_api_duration_ms": 0,
    "total_lines_added": 0,
    "total_lines_removed": 0
  },
  "context_window": {
    "total_input_tokens": 0,
    "total_output_tokens": 0,
    "context_window_size": 0
  }
}
```

The `cost` and `context_window` fields are optional.

**Output:** A single line to stdout with the format:
```
model | session-cost / today-cost / block-cost (time-remaining) | burn-rate | context-usage
```

**Caching behavior:**
- Hybrid strategy: time-based expiry (refresh interval) combined with transcript file modification time detection.
- When cache is valid, the cached output is returned immediately without recomputation.

**Semaphore behavior:**
- A semaphore file per session is created in the system temp directory (e.g., `/tmp/ccstat-statusline-{session_id}.lock`) to prevent concurrent computation.
- The semaphore file contains the PID of the owning process.
- **Stale detection:** Before blocking, check if the PID in the semaphore file is still alive. If the process no longer exists, treat the semaphore as stale and delete it.
- **Age-based fallback:** If the semaphore file is older than 30 seconds (regardless of PID status), treat it as stale.
- **Lock failure behavior:** If the lock cannot be acquired after stale detection, return the most recent cached output. If no cached output exists, print an empty line.
- **Cleanup:** The semaphore file is deleted when computation completes (both success and error paths). Use OS-level atomic file creation (e.g., `O_CREAT | O_EXCL`) for the lock.

**Context usage color coding:**
- Below low threshold: green
- Between low and medium threshold: yellow
- Above medium threshold: red

**Cost source behavior:**
- `auto`: Prefer the cost from Claude Code's hook data (`cc`); fall back to ccstat calculation if unavailable.
- `ccstat`: Always calculate from JSONL data using pricing database.
- `cc`: Always use Claude Code's reported `total_cost_usd`.
- `both`: Display both values separated by a slash.

### 3.4 Provider-Specific Data Parsing

#### 3.4.1 Claude Code

**Source directories:** `~/.config/claude/projects/` and `~/.claude/projects/`
**Env override:** `CLAUDE_CONFIG_DIR` — comma-separated list of directories. When set, only those directories are searched. When unset, both default paths are searched.

**File structure:** `projects/{project_dir}/{session_id}.jsonl`
Each line is a JSON object representing one usage entry.

**Entry schema:**
```json
{
  "timestamp": "ISO-8601",
  "sessionId": "string (optional)",
  "version": "semver (optional)",
  "message": {
    "usage": {
      "input_tokens": 0,
      "output_tokens": 0,
      "cache_creation_input_tokens": 0,
      "cache_read_input_tokens": 0
    },
    "model": "string (optional)",
    "id": "string (optional)"
  },
  "costUSD": 0.00,
  "requestId": "string (optional)",
  "isApiErrorMessage": false
}
```

**Deduplication:** By the pair `(message.id, requestId)`. If both are present and the pair has been seen before, the entry is skipped. Entries missing either field are never considered duplicates.

**Token mapping:**
- `inputTokens` ← `message.usage.input_tokens`
- `outputTokens` ← `message.usage.output_tokens`
- `cacheCreationTokens` ← `message.usage.cache_creation_input_tokens` (default 0)
- `cacheReadTokens` ← `message.usage.cache_read_input_tokens` (default 0)

**Pre-calculated cost:** The `costUSD` field, when present, contains Claude Code's own cost calculation.

**Error entries:** Entries with `isApiErrorMessage: true` are included (they still consume tokens).

**Malformed lines:** Silently skipped. Parsing continues with the next line.

#### 3.4.2 Codex

**Source directory:** `${CODEX_HOME:-~/.codex}/sessions/`
**Env override:** `CODEX_HOME`

**File structure:** `sessions/{session_id}.jsonl`
Each file contains interleaved event types.

**Relevant event types:**
- `turn_context`: Contains model metadata (`model_id`). Used to associate a model with subsequent token events.
- `event_msg` with `payload.type === "token_count"`: Contains token counts.

**Token event schema:**
```json
{
  "type": "event_msg",
  "timestamp": "ISO-8601",
  "payload": {
    "type": "token_count",
    "info": {
      "total_token_usage": {
        "input_tokens": 0,
        "cached_input_tokens": 0,
        "output_tokens": 0,
        "reasoning_output_tokens": 0,
        "total_tokens": 0
      },
      "last_token_usage": {
        "input_tokens": 0,
        "cached_input_tokens": 0,
        "output_tokens": 0,
        "reasoning_output_tokens": 0,
        "total_tokens": 0
      }
    }
  }
}
```

**Cumulative-to-delta conversion:** `total_token_usage` contains cumulative values. When `last_token_usage` is not present, subtract the previous event's cumulative totals to derive per-event deltas.

**Token mapping:**
- `inputTokens` ← `input_tokens`
- `outputTokens` ← `output_tokens` (includes reasoning cost)
- `cacheReadTokens` ← `cached_input_tokens` or `cache_read_input_tokens`
- `cacheCreationTokens` ← 0 (not tracked by Codex)
- `reasoningOutputTokens` ← `reasoning_output_tokens` (informational only, already included in `output_tokens`)
- `totalTokens` ← `total_tokens` when present; otherwise `input_tokens + output_tokens`

**Model fallback:** When no `turn_context` provides model metadata for a session, use `gpt-5` as the fallback model. Tag such entries with `isFallbackModel: true`.

**Model aliases:** `gpt-5-codex` maps to `gpt-5` for pricing lookups.

**Deduplication:** Not applicable — events are inherently sequential within a session.

#### 3.4.3 OpenCode

**Source directory:** `${OPENCODE_DATA_DIR:-~/.local/share/opencode}/storage/message/`
**Session metadata:** `${OPENCODE_DATA_DIR:-~/.local/share/opencode}/storage/session/`
**Env override:** `OPENCODE_DATA_DIR`

**File structure:** `storage/message/{messageId}.json` — one JSON file per message.
**Session files:** `storage/session/{sessionId}.json` — session metadata (title, project).

**Message schema:**
```json
{
  "id": "string",
  "sessionID": "string",
  "modelID": "string",
  "providerID": "string",
  "time": {
    "created": 0
  },
  "tokens": {
    "input": 0,
    "output": 0,
    "cache": {
      "read": 0,
      "write": 0
    }
  },
  "cost": 0.00
}
```

**Token mapping:**
- `inputTokens` ← `tokens.input`
- `outputTokens` ← `tokens.output`
- `cacheReadTokens` ← `tokens.cache.read` (default 0)
- `cacheCreationTokens` ← `tokens.cache.write` (default 0)

**Pre-calculated cost:** The `cost` field when present.

**Deduplication:** By message `id` field.

**Model aliases:** `gemini-3-pro-high` maps to `gemini-3-pro-preview` for pricing.

#### 3.4.4 Amp

**Source directory:** `${AMP_DATA_DIR:-~/.local/share/amp}/threads/`
**Env override:** `AMP_DATA_DIR`

**File structure:** `threads/T-{uuid}.json` — one JSON file per thread (conversation).

**Thread schema (relevant fields):**
```json
{
  "id": "string",
  "title": "string",
  "created": "ISO-8601",
  "messages": [
    {
      "id": "string",
      "type": "string",
      "created_at": "ISO-8601",
      "model": "string",
      "usage": {
        "input_tokens": 0,
        "output_tokens": 0,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
      }
    }
  ],
  "usageLedger": {
    "events": [
      {
        "messageId": "string",
        "model": "string",
        "inputTokens": 0,
        "outputTokens": 0,
        "totalTokens": 0,
        "credits": 0.00,
        "createdAt": "ISO-8601"
      }
    ]
  }
}
```

**Token extraction strategy:**
- Primary token source: `usageLedger.events[]` for billing-relevant token counts.
- Cache breakdown: Derived from `messages[].usage` at the matching `messageId`.
- Credits: Stored alongside USD cost estimates. The `credits` field from usage ledger events is included in output.

**Token mapping:**
- `inputTokens` ← `usageLedger.events[].inputTokens`
- `outputTokens` ← `usageLedger.events[].outputTokens`
- `cacheCreationTokens` ← `messages[].usage.cache_creation_input_tokens` (matched by messageId)
- `cacheReadTokens` ← `messages[].usage.cache_read_input_tokens` (matched by messageId)

**Deduplication:** By usage ledger event (each event is unique).

**Session mapping:** Each thread file represents one session. Thread `id` is the session identifier.

#### 3.4.5 Pi-Agent

**Source directory:** `${PI_AGENT_DIR:-~/.pi/agent}/sessions/`
**Env override:** `PI_AGENT_DIR`

**File structure:** `sessions/{project}/{session_id}.jsonl` — JSONL files organized by project subdirectory.

**Entry schema:**
```json
{
  "type": "string",
  "timestamp": "ISO-8601",
  "message": {
    "role": "string",
    "model": "string",
    "usage": {
      "input": 0,
      "output": 0,
      "cacheRead": 0,
      "cacheWrite": 0,
      "totalTokens": 0,
      "cost": {
        "total": 0.00
      }
    }
  }
}
```

**Filtering:** Only entries with `message.role === "assistant"` and non-null `message.usage` are processed.

**Token mapping:**
- `inputTokens` ← `message.usage.input`
- `outputTokens` ← `message.usage.output`
- `cacheReadTokens` ← `message.usage.cacheRead` (default 0)
- `cacheCreationTokens` ← `message.usage.cacheWrite` (default 0)

**Pre-calculated cost:** `message.usage.cost.total` when present.

**Model naming:** Models are prefixed with `[pi]` in output display (e.g., `[pi] claude-opus-4`).

**Session ID:** Extracted from filename (portion after underscore in the filename).
**Project:** Directory name under `sessions/`.

**Deduplication:** Not applicable — entries are sequential within a session file.

### 3.5 Cost Calculation

#### 3.5.1 Cost Modes

| Mode | Behavior |
|------|----------|
| `auto` | Use the entry's pre-calculated cost (e.g., `costUSD`) when present and non-zero. Otherwise, calculate from tokens using model pricing. |
| `calculate` | Always calculate from token counts using model pricing. Ignore pre-calculated costs. |
| `display` | Always use the entry's pre-calculated cost. Show $0.00 for entries without a pre-calculated cost. |

#### 3.5.2 Token Cost Formula

For a single entry with known model pricing:

```
non_cached_input_cost = (input_tokens - cache_read_tokens) / 1_000_000 * input_cost_per_mtoken
cached_input_cost     = cache_read_tokens / 1_000_000 * cached_input_cost_per_mtoken
cache_creation_cost   = cache_creation_tokens / 1_000_000 * cache_creation_cost_per_mtoken
output_cost           = output_tokens / 1_000_000 * output_cost_per_mtoken
total_cost            = non_cached_input_cost + cached_input_cost + cache_creation_cost + output_cost
```

When `cached_input_cost_per_mtoken` is not available in the pricing data, fall back to `input_cost_per_mtoken`.
When `cache_creation_cost_per_mtoken` is not available, fall back to `input_cost_per_mtoken`.

#### 3.5.3 Tiered Pricing

Anthropic models have tiered pricing with a 200,000 token threshold. The threshold applies **per-entry** — each usage entry's tokens are evaluated independently.

**Allocation rule:** For a single entry, if `input_tokens` exceeds 200,000:

```
standard_input   = min(input_tokens, 200_000)
excess_input     = max(input_tokens - 200_000, 0)

non_cached_input_cost = (standard_input - cache_read_tokens) / 1M * input_cost_per_mtoken
                      + excess_input / 1M * input_cost_per_token_above_200k_tokens * 1M
cached_input_cost     = cache_read_tokens / 1M * cached_input_cost_per_mtoken_below_200k
output_cost           = min(output_tokens, 200_000) / 1M * output_cost_per_mtoken
                      + max(output_tokens - 200_000, 0) / 1M * output_cost_per_token_above_200k_tokens * 1M
```

When the tiered pricing fields are absent from a model's pricing data, use the standard (non-tiered) formula from §3.5.2.

**Tiered pricing fields in LiteLLM:**

- `input_cost_per_token_above_200k_tokens`
- `output_cost_per_token_above_200k_tokens`
- `cache_creation_input_token_cost_above_200k_tokens`
- `cache_read_input_token_cost_above_200k_tokens`

**Note:** Cache read and cache creation tokens follow the same per-entry threshold logic. When cache-specific tiered fields are present, apply the higher rate to the portion of cache tokens that falls within the excess range.

#### 3.5.4 Pricing Database

**Source:** LiteLLM's `model_prices_and_context_window.json` hosted on GitHub.

**Online mode (default):** Fetch the JSON from the LiteLLM GitHub repository. Cache in memory for the process lifetime.

**Offline mode (`--offline`):** Use a pricing snapshot embedded in the binary at compile time.

**Model name matching:** Look up the model name using the following fallback chain (first match wins):

1. Exact match against the LiteLLM key (e.g., `claude-sonnet-4-20250514`)
2. Prefixed match with `anthropic/` (e.g., `anthropic/claude-sonnet-4-20250514`)
3. Prefixed match with `openai/` (for Codex/OpenAI models)
4. Prefixed match with `openrouter/` (common alias namespace)

Provider-specific aliases (§3.4.2 `gpt-5-codex → gpt-5`, §3.4.3 `gemini-3-pro-high → gemini-3-pro-preview`) are applied before the fallback chain. If no match is found after all attempts, the model's cost is zero and a debug-level warning is logged.

#### 3.5.5 Amp Credits

Amp entries include a `credits` field representing Amp's internal billing unit. This value is carried through aggregation and included in both table and JSON output alongside the USD cost estimate.

### 3.6 Output Formatting

#### 3.6.1 Table Output

- **Responsive layout:** Detect terminal width. Switch to compact mode when width is below 120 columns, or when `--compact` is set.
- **Compact mode:** Omit cache creation and cache read columns. Abbreviate column headers.
- **Alignment:** Date/label columns left-aligned. Numeric columns right-aligned.
- **Number formatting:** Locale-aware with thousand separators (e.g., `1,234,567`).
- **Cost formatting:** USD with 2 decimal places (e.g., `$12.34`).
- **Model display:** Comma-separated sorted list. In compact mode, abbreviated (e.g., `claude-sonnet-4-20250514` → `sonnet-4`).
- **Color:** Header row in cyan. Active block status in green. Gap blocks in gray. Warnings in red.
- **Totals row:** Separated from data rows by an empty row. Shows aggregate sums.
- **Breakdown rows:** When `--breakdown` is set, indented sub-rows below each period showing per-model token counts and cost.

#### 3.6.2 JSON Output

- Pretty-printed with 2-space indentation.
- Totals object always present.
- Model breakdowns included when `--breakdown` is set.
- Codex entries with fallback models include `"isFallback": true`.

**Cost field naming convention:** JSON schemas use two cost field names by design, matching the TypeScript version for backward compatibility:
- `totalCost` — aggregated cost for a time period or session (used in daily, monthly, weekly, session reports and totals objects).
- `costUSD` — cost for a single block or individual entry (used in blocks report and session ID lookup entries).
- `cost` — cost within a model breakdown object.

This distinction is intentional: `totalCost` represents a sum across entries, while `costUSD` represents a discrete billing unit. Downstream consumers rely on these exact field names.

#### 3.6.3 jq Integration

When `--jq <filter>` is specified:
1. Produce the JSON output in memory.
2. Spawn the external `jq` binary with the filter as argument.
3. Pipe the JSON to jq's stdin.
4. Print jq's stdout to the process stdout.

If `jq` is not installed, print an error message and exit with a non-zero code.
If the jq filter is invalid, print jq's stderr and exit with a non-zero code.

### 3.7 MCP Server

**Command:** `ccstat mcp [flags]`

**MCP-specific flags:**

| Flag | Type | Default | Description |
|------|------|---------|-------------|
| `--transport` | enum | stdio | Transport type: `stdio` or `http` |
| `--port` | int | 8080 | Port for HTTP transport |

**Transports:**
- `stdio`: Communicate over stdin/stdout using the MCP protocol.
- `http`: Start an HTTP server with Server-Sent Events (SSE) streaming for MCP messages.

**Registered tools:**

| Tool name | Provider | Report | Parameters |
|-----------|----------|--------|-----------|
| `daily` | Claude | daily | `since`, `until`, `mode`, `timezone`, `locale` |
| `monthly` | Claude | monthly | `since`, `until`, `mode`, `timezone`, `locale` |
| `session` | Claude | session | `since`, `until`, `mode`, `timezone`, `locale` |
| `blocks` | Claude | blocks | `since`, `until`, `mode`, `timezone`, `locale` |
| `codex-daily` | Codex | daily | `since`, `until`, `timezone`, `locale`, `offline` |
| `codex-monthly` | Codex | monthly | `since`, `until`, `timezone`, `locale`, `offline` |

**Response format:** Each tool returns a JSON text content block containing the same JSON structure as the corresponding `--json` CLI output.

### 3.8 Configuration System

**Config file format:** JSON
**Config file name:** `ccstat.json`

**Discovery order (first found wins):**
1. Path specified by `--config` flag
2. `.ccstat/ccstat.json` in the current working directory
3. `{claude_config_dir}/ccstat.json` for each Claude config directory

**Schema:**
```json
{
  "$schema": "https://...",
  "defaults": {
    "json": false,
    "mode": "auto",
    "offline": false,
    "order": "asc",
    "breakdown": false,
    "timezone": "UTC",
    "locale": "en-CA",
    "compact": false
  },
  "commands": {
    "daily": {
      "instances": false,
      "breakdown": true
    },
    "blocks": {
      "sessionLength": 5,
      "tokenLimit": "500000"
    },
    "statusline": {
      "offline": true,
      "costSource": "auto",
      "cache": true,
      "refreshInterval": 1,
      "contextLowThreshold": 50,
      "contextMediumThreshold": 80
    }
  }
}
```

**Merge priority (highest to lowest):**
1. CLI flags explicitly provided by the user
2. `commands.<name>` section for the active command
3. `defaults` section
4. Compiled-in defaults

Only CLI flags that the user explicitly provides override config values. Default values from the argument parser do not override config values.

### 3.9 Debug Mode

When `--debug` is set, after the normal report output, the system performs cost mismatch analysis:

1. For each entry that has both a pre-calculated cost and a calculable model pricing, compute the cost from tokens.
2. Compare the pre-calculated cost with the computed cost.
3. A match is within 0.1% tolerance.
4. Report:
   - Total entries analyzed
   - Number of matches and mismatches
   - Per-model statistics (match rate, average discrepancy)
   - Per-version statistics (match rate)
   - Sample discrepancies (up to `--debug-samples` entries) showing timestamp, model, pre-calculated vs computed cost

### 3.10 Logging

**Environment variable:** `LOG_LEVEL`

| Value | Level |
|-------|-------|
| 0 | Silent |
| 1 | Warn |
| 2 | Log |
| 3 | Info |
| 4 | Debug |
| 5 | Trace |

When `LOG_LEVEL` is not set, default to level 2 (log). Log output goes to stderr, never stdout (stdout is reserved for report output).

### 3.11 Error Handling

- **Missing data directory:** Print an error message naming the expected path and the relevant environment variable. Exit with non-zero code.
- **No data files found:** Print a message indicating no usage data was found. Exit with code 0 and empty output.
- **Malformed JSONL lines:** Silently skip. Continue parsing the rest of the file.
- **Malformed JSON files:** Skip the file. Log a warning at debug level.
- **Unsupported provider-report combination:** Print an error listing supported reports for the provider. Exit with non-zero code.
- **Network failure during pricing fetch:** Fall back to the embedded offline snapshot. Log a warning.
- **Unknown model in pricing database:** Use zero cost for that entry. Log a warning at debug level.
- **Invalid date range (`--since` after `--until`):** Print an error message stating that `--since` must be on or before `--until`. Exit with non-zero code.
- **Malformed config file:** Print an error message naming the config file path and the parse error. Exit with non-zero code.
- **Statusline empty or malformed stdin:** Print an empty line to stdout and exit with code 0. Log a warning at debug level. The statusline must never block the shell prompt.
- **`jq` binary not found:** Print an error message suggesting installation. Exit with non-zero code.
- **Invalid `jq` filter:** Print jq's stderr output. Exit with non-zero code.
- **Statusline semaphore stale:** See §3.3.6 for semaphore handling details.

---

## 4. Refinement

### 4.1 Rust Workspace Structure

The project uses a Cargo workspace. Crate-internal module organization is left to the implementer.

```
Cargo.toml              (workspace root)
crates/
  ccstat-cli/          Binary crate: argument parsing, command dispatch, output formatting
  ccstat-core/         Library crate: shared types, token aggregation, cost calculation, date utilities
  ccstat-pricing/      Library crate: LiteLLM pricing fetcher, offline snapshot, tiered pricing logic
  ccstat-terminal/     Library crate: table formatting, responsive layout, color output, model name formatting
  ccstat-provider-claude/    Library crate: Claude data loading and parsing
  ccstat-provider-codex/     Library crate: Codex data loading and parsing
  ccstat-provider-opencode/  Library crate: OpenCode data loading and parsing
  ccstat-provider-amp/       Library crate: Amp data loading and parsing
  ccstat-provider-pi/        Library crate: Pi data loading and parsing
  ccstat-mcp/          Library crate: MCP server implementation (stdio + HTTP)
```

### 4.2 Recommended Crate Ecosystem

These are recommendations, not requirements. The implementer may substitute equivalent crates.

| Concern | Crate | Purpose |
|---------|-------|---------|
| CLI parsing | `clap` (derive) | Argument parsing with subcommands |
| Async runtime | `tokio` | HTTP fetch, file I/O, MCP server |
| HTTP client | `reqwest` | Pricing data fetch |
| JSON | `serde` + `serde_json` | Parsing and serialization |
| MCP | `rmcp` or equivalent | MCP protocol implementation |
| Tables | `comfy-table` or `tabled` | Terminal table rendering |
| Color | `owo-colors` or `colored` | Terminal color output |
| Date/Time | `chrono` or `jiff` | Timezone-aware date operations |
| Error handling | `thiserror` + `anyhow` | Error types and propagation |
| File glob | `glob` or `globwalk` | File pattern matching |
| Home directory | `dirs` | Platform-native home/config paths |
| Shell completions | `clap_complete` | Generate shell completions |
| Snapshot testing | `insta` | Table output regression tests |

### 4.3 Embedded Pricing Snapshot

The LiteLLM pricing JSON is embedded in the binary at compile time (e.g., via `include_str!` or a build script that downloads and embeds the file). The `--offline` flag selects this snapshot. In online mode, if the fetch fails, the snapshot is used as fallback.

### 4.4 Cross-Platform Considerations

- **Home directory:** Use the `dirs` crate or equivalent for platform-native paths (XDG on Linux, `~/Library` on macOS, `%APPDATA%` on Windows).
- **Path separators:** Use `std::path` for all path operations.
- **Environment variables:** Same names as the TypeScript version for migration ease.
- **Binary name:** `ccstat` on all platforms (`ccstat.exe` on Windows).

### 4.5 Performance Targets

| Scenario | Target |
|----------|--------|
| Statusline cold start | < 50ms |
| Statusline cached | < 5ms |
| Daily report (~1000 entries) | < 200ms |
| Pricing fetch (cached in memory) | < 1ms |
| Binary startup (no arguments → help) | < 10ms |

### 4.6 Migration and Compatibility

**Environment variables:** Identical names to the TypeScript version: `CLAUDE_CONFIG_DIR`, `CODEX_HOME`, `OPENCODE_DATA_DIR`, `AMP_DATA_DIR`, `PI_AGENT_DIR`, `LOG_LEVEL`.

**Config file:** Same `ccstat.json` format and search paths.

**JSON output schemas:** Identical field names and structure to the TypeScript version. Downstream consumers (scripts, dashboards) should work without modification.

**Breaking CLI changes:** The only structural break is the provider subcommand layer. Users must change `ccstat-codex daily` to `ccstat codex daily`. The `ccstat daily` command (Claude) continues to work unchanged.

### 4.7 Build and Distribution

**Cargo workspace:** Unified build with `cargo build --release`.

**Cross-compilation targets:**
- `x86_64-unknown-linux-gnu`
- `aarch64-unknown-linux-gnu`
- `x86_64-apple-darwin`
- `aarch64-apple-darwin`
- `x86_64-pc-windows-msvc`

**Distribution:** Static binaries published as GitHub release assets.

**CI:** GitHub Actions with matrix builds for all targets. Tests run on Linux (x86_64). Release triggered by git tags.