tga 1.2.0 - Docs.rs

# trusty-git-analytics

Analyze git repositories to measure developer productivity — classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.

## What It Does

`tga` walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a seven-tier classification cascade, then aggregates the results into per-author, per-week, DORA, velocity, and quality reports. It is a feature-complete Rust port of [gitflow-analytics](https://github.com/bobmatnyc/gitflow-analytics) with the same YAML config schema and the same SQLite schema — existing config files work without modification.

## Installation

### From crates.io (recommended)

```bash
cargo install tga
```

This installs the `tga` binary to `~/.cargo/bin/`. Ensure `~/.cargo/bin` is in your `PATH`.

### From source

```bash
git clone https://github.com/bobmatnyc/trusty-tools
cd trusty-tools/crates/trusty-git-analytics
cargo install --path .
```

### Verify installation

```bash
tga --version
tga --help
```

## Quick Start

### Run Your First Analysis

**Step 1** — Create a `config.yaml`:

```yaml
repositories:
  - path: ~/code/my-project
    name: my-project

output:
  directory: ./reports
  formats: [csv, json, markdown]
```

**Step 2** — Run the full pipeline:

```bash
tga analyze --config config.yaml
```

**Step 3** — Find reports in `./reports/`:

A full run writes 14 files: 9 CSV, 4 JSON, and 1 Markdown report. The most
commonly used ones are:

```
reports/
├── authors.csv         # Per-author commit summary
├── weekly_activity.csv # Week-by-week breakdown
├── ... (7 more CSV files: DORA, velocity, quality, etc.)
├── report.json         # Full structured payload
├── ... (3 more JSON files)
└── report.md           # Narrative Markdown report
```

## Configuration

### Minimal config.yaml

```yaml
repositories:
  - path: ~/code/my-repo
    name: my-repo
```

All other sections are optional. When `output.formats` is omitted, all three formats (CSV, JSON, Markdown) are written.

### Full config reference

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `repositories` | list | required | Repos to analyze |
| `developer_aliases` | map | `{}` | Canonical name → list of emails/aliases |
| `team` | object | — | Alternative to `developer_aliases`; roster with email |
| `output.directory` | path | `./reports` | Where reports are written |
| `output.formats` | list | `[csv, json, markdown]` | `csv`, `json`, and/or `markdown` |
| `output.include_unclassified` | bool | `false` | Include commits with no category |
| `output.include_merges` | bool | `false` | Include merge commits |
| `output.include_files` | bool | `false` | Include file-level change detail |
| `classification.rules_file` | path | — | Path to custom rules YAML/JSON |
| `classification.use_llm` | bool | `false` | Enable LLM fallback tier |
| `classification.llm_model` | string | `gpt-4o-mini` | LLM model identifier |
| `classification.confidence_threshold` | float | `0.7` | Minimum acceptance confidence |
| `classification.llm_fallback_threshold` | float | `0.0` | Commits with confidence above this value skip the LLM tier |
| `classification.llm_fallback_concurrency` | uint | `8` | Max concurrent LLM requests during fallback |
| `github.token` | string | `$GITHUB_TOKEN` | GitHub PAT for PR fetch. Required scopes: `public_repo` for public repos, `repo` for private repos. Without a token, GitHub rate-limits anonymous traffic to 60 requests/hour and most PRs will be missed. |
| `github.org` | string | — | Org slug for org-wide PR queries |
| `github.repo` | string | — | Single repo slug (`owner/name`) |
| `github.fetch_prs` | bool | `false` | Fetch pull request metadata. **Must be `true` for `tga pr-metrics` to return data** — when left at the default, `tga collect` writes zero rows to `pull_requests` (issue #211). |
| `github.ticket_regex` | string | — | Override regex for detecting GitHub ticket refs in commit messages |
| `jira.url` | string | — | JIRA base URL |
| `jira.username` | string | — | JIRA API username (email for Cloud) |
| `jira.token` | string | — | JIRA API token |
| `jira.project_key` | string | — | Project key filter (e.g. `API`) |
| `jira.ticket_regex` | string | — | Override regex for detecting JIRA ticket refs in commit messages |
| `linear.ticket_regex` | string | — | Override regex for detecting Linear ticket refs in commit messages |
| `pm.azure_devops.organization_url` | string | — | ADO org URL (e.g. `https://dev.azure.com/myorg`) |
| `pm.azure_devops.pat` | string | — | Azure DevOps Personal Access Token |
| `pm.azure_devops.project` | string | — | Default ADO project name |
| `pm.azure_devops.fetch_on_reference` | bool | `false` | Fetch work items when `AB#N` refs appear in commits |
| `pm.azure_devops.fetch_prs` | bool | `false` | Fetch ADO pull requests and reviewer data |
| `pm.azure_devops.ticket_regex` | string | `AB#(\d+)` | Override regex for detecting ADO work item refs in commit messages |
| `pm.bitbucket.workspace` | string | — | Bitbucket Cloud workspace slug |
| `pm.bitbucket.repo_slug` | string | — | Repository slug within the workspace |
| `pm.bitbucket.fetch_prs` | bool | `false` | Fetch Bitbucket Cloud pull request metadata |
| `pm.bitbucket.token` | string | `$BITBUCKET_TOKEN` | Bearer token (App password or OAuth) |
| `pm.bitbucket.username` | string | — | Atlassian account username for Basic auth |
| `pm.bitbucket.app_password` | string | — | Atlassian App password for Basic auth (alternative to `token`) |
| `cache.directory` | path | — | Cache directory (supports `~`) |
| `version` | string | — | Schema version; stored for compatibility |
| `profile` | string | — | Named profile; stored for compatibility |
| `dora.deployment_source` | string | `git_tags` | Source for `tga deployments collect`. One of `git_tags`, `github_releases`, `github_actions`, `manual`. |
| `dora.deployment_tag_pattern` | regex | `^v?[0-9]+\.[0-9]+\.[0-9]+(-...)?$` | Tags matching this regex are ingested as deployments. |
| `dora.production_branch` | string | `main` | Default branch that production deployments come from. |
| `dora.failure_signals` | list | `[]` | One signal per entry; `work_type` (classification category) and/or `commit_message_pattern` (regex) + `within_hours` window. |
| `dora.datadog_dir` | path | — | Directory of Datadog incident exports (.json). Currently a stub — JIRA SRE path is the default MTTR source. |
| `jira.jira_project_mappings` | map | `{}` | Project key → work type (issue #206). Fires as Tier 1.6 — outranks the generic ticket regex. |
| `jira.jira_project_mapping_confidence` | float | `0.88` | Per-verdict confidence for the JIRA mapping tier. |

Paths support `~` expansion. Config files from the Python `gitflow-analytics` tool load without changes — unknown keys are silently ignored.

### developer_aliases vs team.members

**`developer_aliases`** (Python-compatible flat map):

```yaml
developer_aliases:
  "Alice Smith":
    - "alice@company.com"
    - "asmith@company.com"
    - "alice@personal.dev"
  "Bob Jones":
    - "bob@company.com"
    - "129991831+bobgithub@users.noreply.github.com"
```

**`team.members`** (structured roster with canonical email):

```yaml
team:
  members:
    - name: Alice Smith
      email: alice@company.com
      aliases:
        - asmith@company.com
        - alice@personal.dev
```

When `developer_aliases` is non-empty it takes precedence over `team.members`. Use `developer_aliases` when migrating an existing Python config file; use `team.members` for new setups where canonical email matters for downstream tooling.

### Example: multi-repo config with GitHub

See [`configs/example-config.yaml`](configs/example-config.yaml) for a working example that covers multiple repositories, developer aliases, and CSV+Markdown output.

## CLI Reference

All subcommands accept these global flags:

| Flag | Default | Description |
|------|---------|-------------|
| `--config <PATH>` | `config.yaml` | Path to config YAML |
| `--database <PATH>` | `tga.db` | Path to SQLite database |
| `-v / -vv / -vvv` | warnings only | Increase log verbosity |

### tga analyze

Run the full pipeline: collect → classify → report.

```bash
tga analyze [--config <PATH>] [--database <PATH>] [--output <DIR>]
            [--skip-collect] [--skip-classify] [--weeks <N>]
            [--from <DATE>] [--to <DATE>] [--dry-run]
            [--validate-only] [--no-validate]
```

| Flag | Description |
|------|-------------|
| `--skip-collect` | Skip Stage 1; use commits already in the database |
| `--skip-classify` | Skip Stage 2; use existing classifications |
| `--output <DIR>` | Override `output.directory` from config |
| `--weeks <N>` | Limit collection to the last N weeks (overrides config `start_date`) |
| `--from <DATE>` | Start date for collection (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks` |
| `--to <DATE>` | End date for collection (ISO 8601 `YYYY-MM-DD`); defaults to today |
| `--dry-run` | Perform all steps against an in-memory database; the on-disk database is left untouched |
| `--validate-only` | Run configuration validation and exit (0 on success, 1 on errors) |
| `--no-validate` | Skip pre-flight configuration validation |

```bash
# Full pipeline
tga analyze --config config.yaml

# Re-run reports only (commits already collected and classified)
tga analyze --skip-collect --skip-classify --output ./reports-v2
```

### tga collect

Stage 1: extract commits from git repositories into the database.

```bash
tga collect [--config <PATH>] [--database <PATH>]
            [--repos <NAME,...>] [--from <DATE>] [--to <DATE>] [--weeks <N>]
            [--since <DATE>] [--until <DATE>] [--dry-run]
            [--force-refresh-prs] [--validate-only] [--no-validate]
```

| Flag | Description |
|------|-------------|
| `--repos <NAME,...>` | Comma-separated list of repository names to collect; others are skipped |
| `--from <DATE>` | Collect commits on or after this ISO 8601 date; mutually exclusive with `--weeks` |
| `--to <DATE>` | Collect commits on or before this ISO 8601 date; defaults to today |
| `--since <DATE>` | Legacy alias for `--from` (Python-predecessor compatibility); `--from` takes precedence |
| `--until <DATE>` | Legacy alias for `--to` (Python-predecessor compatibility); `--to` takes precedence |
| `--weeks <N>` | Limit collection to the last N weeks; `--weeks` takes precedence over `--from`/`--to` |
| `--dry-run` | Run collection against an in-memory database; the on-disk database is left untouched |
| `--force-refresh-prs` | Re-fetch ADO pull requests even when already cached (backfills pre-v1.0.9 rows) |
| `--validate-only` | Run configuration validation and exit (0 on success, 1 on errors) |
| `--no-validate` | Skip pre-flight configuration validation |

```bash
tga collect --repos my-project --from 2024-01-01 --to 2024-03-31
tga collect --weeks 4   # collect last 4 weeks across all repos
```

### tga classify

Stage 2: run the classification cascade over collected commits.

```bash
tga classify [--config <PATH>] [--database <PATH>]
             [--rules <PATH>] [--use-llm] [--backfill-complexity]
             [--no-external]
```

| Flag | Description |
|------|-------------|
| `--rules <PATH>` | Override `classification.rules_file` from config. Custom rules default to priority 110 (above built-in 100) and are standalone by default (`extend_defaults: false`). |
| `--use-llm` | Enable LLM fallback regardless of config setting |
| `--backfill-complexity` | Fill missing complexity scores (1–5) for already-classified commits via the LLM, without re-running the full cascade; category, confidence, and method are left untouched |
| `--no-external` | Disable all external classification sources (JIRA, GitHub Issues) for this run, even if configured in the rules file or config |

```bash
tga classify --rules ./custom-rules.yaml --use-llm
tga classify --no-external   # offline / CI mode
```

### tga report

Stage 3: generate reports from classified commits.

```bash
tga report [--config <PATH>] [--database <PATH>]
           [--output <DIR>] [--formats <FMT,...>]
```

| Flag | Description |
|------|-------------|
| `--output <DIR>` | Override `output.directory` from config |
| `--formats <FMT,...>` | Comma-separated: `csv`, `json`, `markdown` |

```bash
tga report --output ./q1-reports --formats csv,json
```

### tga pr-metrics

Aggregate pull-request metrics per engineer from the `pull_requests` cache.

```bash
tga pr-metrics [--config <PATH>] [--database <PATH>]
               [--weeks <N>] [--csv] [--output <PATH>]
```

| Flag | Description |
|------|-------------|
| `--weeks <N>` | Limit metrics to PRs created within the last N weeks |
| `--csv` | Emit CSV instead of an aligned text table |
| `--output <PATH>` | Write output to a file (CSV with `--csv`, otherwise the text table) |

```bash
tga pr-metrics --weeks 12 --csv --output pr-metrics.csv
```

The `pr_comments_given` and `avg_revisions` columns are reserved for future
use and currently always output `0.0`; the underlying review-comment and
revision-count data is not yet tracked.

### tga backfill

Retroactive maintenance operations that update existing commit rows in place
(outside the normal `collect → classify → report` pipeline).

```bash
tga backfill <SUBCOMMAND> [--dry-run]
```

| Subcommand | Description |
|------------|-------------|
| `ai-detection` | Re-run LLM classification on low-confidence prior LLM verdicts |
| `revert-flags` | Scan commit messages for revert patterns and set `is_revert` |
| `ticket-ids` | Scan commit messages for ticket refs and update `ticket_id`/`ticketed` |

| Flag | Description |
|------|-------------|
| `--dry-run` | Report how many rows would change without writing |

```bash
tga backfill revert-flags --dry-run
tga backfill ticket-ids
```

### tga override

Manage manual classification overrides (Tier 0). Rows here pin a commit's
verdict regardless of what the rule-based or LLM tiers would produce.

```bash
tga override <SUBCOMMAND>
```

| Subcommand | Description |
|------------|-------------|
| `add <SHA> <WORK_TYPE> <CHANGE_TYPE> [--notes <TEXT>] [--repo <PATH>]` | Insert (or replace) an override row for a commit SHA |
| `list [--repo <PATH>]` | List every override row, optionally filtered by repository |
| `remove <SHA> [--yes]` | Delete the override row(s) for a SHA (`--yes` skips confirmation) |

```bash
tga override add abc1234 feature feature --notes "manual review"
tga override list
tga override remove abc1234 --yes
```

### tga rules

Introspect the classification rule set (issue #209). Useful for tuning
rules and answering "why was this commit classified as X?".

```bash
tga rules list                       # every loaded rule, sorted by priority
tga rules show <commit-sha>          # verdict + method recorded for a commit
tga rules test "feat: add login"     # dry-run the cascade against a message
```

### tga deployments / tga incidents / tga dora

DORA metrics infrastructure (issues #207, #208, #212, #213).

```bash
# Step 1: ingest deployment events (default: git tags matching dora.deployment_tag_pattern)
tga deployments collect

# Step 2 (optional): ingest production incidents from JIRA SRE issues
tga incidents collect

# Step 3: compute Deployment Frequency, Lead Time, Change Failure Rate, MTTR.
# Also rebuilds the `deployment_failures` derived join from the current
# `dora.failure_signals` config — safe to re-run after a config edit.
tga dora
```

The four DORA metrics land in pre-computed SQL views
(`v_deployment_frequency`, `v_lead_time`, `v_change_failure_rate`,
`v_mttr`) so dashboards can read them directly without re-aggregating.

### Tag & Release-Branch Reachability (issue #279)

`tga collect` automatically populates `fact_commit_reachability` with four new columns that distinguish "deployed via cherry-pick to a release branch and tagged" from "abandoned WIP":

| Column | Type | Meaning |
|---|---|---|
| `on_any_tag` | boolean | `true` if any git tag reaches this commit |
| `reachable_from_tags` | JSON array | Tag names that contain this commit |
| `on_release_branch` | boolean | `true` if commit is on any configured release branch |
| `release_branches` | JSON array | Matching release-branch names |

This resolves a systematic blind spot: bug fixes and security patches cherry-picked to `release/*` branches, tagged for production, and never merged to `main` previously showed `on_default_branch = false` — making them indistinguishable from abandoned WIP. With this feature, the 32% "merged rate" for bug fixes turns out to be much higher when deployed-via-tag commits are counted.

**Config:**

```yaml
reachability:
  track_tags: true               # default: true
  track_release_branches: true   # default: true
  release_branch_patterns:       # default: ["release/*", "hotfix/*", "chore/release-*", "v*"]
    - "release/*"
    - "hotfix/*"
    - "chore/release-*"
    - "v*"
```

Set `track_tags: false` to skip the tag scan (useful for trunk-based repos with thousands of tags). The default is `true` because the scan is O(repo + refs), not O(repo × refs × commits).

Optionally disable the scan for a single run:

```bash
tga collect --skip-tag-reachability
```

**Useful derived queries:**

```sql
-- What % of bug fixes actually shipped (via any path)?
SELECT
  COUNT(*) FILTER (WHERE on_default_branch OR on_any_tag OR on_release_branch)
  * 100.0 / COUNT(*) AS shipped_pct
FROM classifications cl
JOIN commits c ON c.classification_id = cl.id
JOIN fact_commit_reachability fcr ON fcr.commit_sha = c.sha
WHERE cl.category = 'bug_fix';

-- Cherry-pick rate: commits reachable from tags but not on main
SELECT repo, COUNT(*)
FROM fact_commit_reachability
WHERE on_any_tag AND NOT on_default_branch
GROUP BY repo;
```

## Pipeline Architecture

```
git repos ──┐
             │   collect      SQLite (tga.db)   classify       SQLite    report
GitHub API ──┼──────────────► [commits]        ──────────────► [classif]─────────► CSV (×9)
JIRA API ────┤   (libgit2,    [authors]         (7-tier                 ► JSON
Linear API ──┤   reqwest)     [pull_requests]   cascade,                ► Markdown
ADO API  ────┘                [work_items]      Rayon-parallel)
```

**Stage 1 — collect** (`tga::collect`): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, fetches GitHub PR / JIRA issue / Linear / Azure DevOps work item metadata via REST/GraphQL, and writes everything to SQLite.

**Stage 2 — classify** (`tga::classify`): reads unclassified commits from the database, runs each message through the seven-tier cascade (see below), and writes a classification verdict back. Rule-based tiers execute in parallel via Rayon.

**Stage 3 — report** (`tga::report`): reads the classified database, aggregates per-author, per-week, DORA, velocity, and quality statistics, and writes the configured output formats to the output directory.

## Classification

### Seven-Tier Cascade

Each commit message is tested against tiers in order. The first tier to produce a confident result wins.

**Tier 0 — Manual Override** (confidence 1.0): looks up the `(commit_hash, repo_path)` pair in the `classification_overrides` table. Managed via `tga override add|list|remove`.

**Tier 1.5 — Issue Type** (confidence 0.90): when the commit has ticket references resolving to rows in `issue_cache`, maps the upstream issue type (`bug`, `story`, `task`, `spike`, etc.) directly to a `change_type`.

**Tier 1.6 — JIRA Project Mapping** (default confidence 0.88, override via `jira.jira_project_mapping_confidence`): when `jira.jira_project_mappings` is configured, maps the JIRA project key prefix of any `[A-Z]+-\d+` reference to a `change_type`. Fires *before* the regex tier so project mappings outrank the generic `jira-ticket` regex rule (issue #206). Example config:

```yaml
jira:
  jira_project_mappings:
    TQL: bug_fix
    APEX: integration
    INFRA: platform_infrastructure
    SEC: security
  jira_project_mapping_confidence: 0.88   # optional, default 0.88
```

Tier-0 manual overrides and exact-keyword conventional-commit prefixes (e.g. `fix:`) still beat this tier.

**Tier 4 — Exact (Aho-Corasick)**: builds a single finite-state machine from every keyword list across every rule and scans the message in O(n) time. Matches `feat:`, `fix:`, `chore:`, etc. Confidence 0.85–0.95.

**Tier 5 — Regex**: applies pre-compiled regex patterns from the rule set. Handles anchored conventional-commit patterns (`^feat(\([^)]*\))?!?:`) and JIRA ticket IDs (`\b[A-Z][A-Z0-9]+-\d+\b`).

**Tier 6 — Fuzzy heuristics**: detects merge commits (via `is_merge` flag or `Merge pull request` prefix) and reverts (via `Revert` prefix). No external dependencies.

**Tier 7 — LLM fallback** (optional, async): calls an OpenAI-compatible API (**OpenRouter** by default, **AWS Bedrock** behind the `bedrock` cargo feature) when tiers 0–6 leave a commit in a fallthrough category. Disabled by default; enable with `analysis.llm_classification.enabled: true` or `--use-llm`. Results are only accepted when `confidence >= confidence_threshold` (default 0.7).

### Default Rules

| ID | Category | Keywords / Patterns |
|----|----------|---------------------|
| `cc-feat` | `feature` | `feat:`, `feature:`, `^feat(...)?!?:` |
| `cc-fix` | `bugfix` | `fix:`, `bugfix:`, `hotfix`, `^fix(...)?!?:` |
| `cc-chore` | `chore` | `chore:`, `^chore(...)?!?:` |
| `cc-docs` | `documentation` | `docs:`, `doc:`, `^docs?(...)?!?:` |
| `cc-refactor` | `refactor` | `refactor:`, `^refactor(...)?!?:` |
| `cc-test` | `test` | `test:`, `tests:`, `^tests?(...)?!?:` |
| `cc-ci` | `ci` | `ci:`, `^ci(...)?!?:` |
| `cc-perf` | `performance` | `perf:`, `^perf(...)?!?:` |
| `cc-style` | `style` | `style:`, `^style(...)?!?:` |
| `cc-build` | `build` | `build:`, `^build(...)?!?:` |
| `cc-revert` | `revert` | `revert:`, `^revert(...)?!?:` |
| `breaking-change` | `breaking` | `breaking change`, `breaking-change` |
| `jira-ticket` | `feature` (ticketed) | `\b[A-Z][A-Z0-9]+-\d+\b` |
| `kw-bug` | `bugfix` | `bug`, `defect` |
| `kw-security` | `bugfix` (security) | `security`, `cve-`, `vulnerability` |

Commits that match no rule are assigned category `uncategorized` with confidence 0.0.

### Custom Rules File

Supply your own rules alongside the defaults:

```yaml
# my-rules.yaml
version: "1.0"
rules:
  - id: my-deploy
    category: deployment
    keywords:
      - "deploy:"
      - "release:"
    patterns:
      - "(?i)^deploy(ment)?:"
    priority: 80
    confidence: 0.9
```

```bash
tga classify --rules ./my-rules.yaml
# or in config.yaml:
# classification:
#   rules_file: ./my-rules.yaml
```

### Rule Priority and extend_defaults (issue #259)

Custom rules default to **priority 110** — one step above the highest built-in rule priority (100). This means user rules win over the default ruleset without needing an explicit `priority:` entry in every rule.

Custom rule files also default to **standalone** mode (`extend_defaults: false`): only the rules in the file are applied. Opt in to merging with the built-in defaults by adding `extend_defaults: true`.

```yaml
# my-rules.yaml — standalone by default (no built-in rules loaded)
extend_defaults: false   # optional: explicitly document the default
rules:
  - id: my-deploy
    category: deployment
    keywords: ["deploy:", "release:"]
    # priority defaults to 110 — beats the built-in cc-feat (100)
```

```yaml
# my-addons.yaml — augments the defaults
extend_defaults: true
rules:
  - id: my-payments
    category: payments
    keywords: ["payment:", "billing:"]
    confidence: 0.92
```

### Multi-source classification (issue #260)

`tga classify` can consult external ticket systems — JIRA Cloud/Server and
GitHub Issues — as high-confidence classification signals **before** the
commit-message rule tiers run.

**Priority model** (highest to lowest):

1. Manual overrides (`tga override add`)
2. External sources — JIRA issue type / GitHub Issues labels (confidence 0.92)
3. Custom regex rules (default priority 110)
4. Built-in TGA rules (priority 100)
5. LLM fallback (when `use_llm: true`)

**Configuration** — add a `sources:` list to `config.yaml` under
`classification:`:

```yaml
classification:
  sources:
    # JIRA source
    - type: jira
      base_url: "https://yourco.atlassian.net"
      token_env: JIRA_API_TOKEN      # env var carrying the API token
      username: "you@yourco.com"     # omit for Bearer-only tokens
      project_keys: ["PROJ", "ENG"]  # empty = all projects
      field_mappings:
        issue_type:
          Bug: bug_fix
          Story: new_feature
          Task: tech_debt_refactoring
          Epic: new_feature
        labels:
          ktlo: tech_debt_refactoring
          security: security
        components:
          Platform: platform_infrastructure

    # GitHub Issues source
    - type: github_issues
      repo: "acme/widgets"
      token_env: GITHUB_TOKEN
      label_mappings:
        bug: bug_fix
        enhancement: new_feature
        dependencies: tech_debt_refactoring
        documentation: documentation
```

Credentials are read from the **named environment variables** at runtime —
never store tokens directly in config files:

```bash
export JIRA_API_TOKEN="your-jira-api-token"
export GITHUB_TOKEN="your-github-pat"
tga classify --config config.yaml
```

**Caching**: each unique ticket is fetched at most once per `tga classify`
run (in-memory cache, never persisted to disk). On HTTP failure or missing
token, the source is skipped with a warning and the pipeline falls through
to commit-message rules.

**Disabling external sources**: use `--no-external` to skip all sources for
a run (useful in CI or offline environments):

```bash
tga classify --no-external
```

See [`examples/multi-source-config.yaml`](examples/multi-source-config.yaml)
for a fully annotated example.

**Deferred sources** (planned for a future release): Linear, Shortcut,
Confluence, Datadog.

## Output Formats

### CSV

Nine CSV files are written when `csv` is in the format list (per-author,
weekly activity, DORA metrics, velocity, quality, and related breakdowns).
The two most commonly used are documented below.

**`authors.csv`** — one row per author:

| Column | Description |
|--------|-------------|
| `name` | Canonical author name |
| `email` | Canonical author email |
| `commit_count` | Total commits |
| `insertions` | Total lines added |
| `deletions` | Total lines deleted |
| `files_changed` | Total files changed |
| `first_commit` | ISO 8601 timestamp of earliest commit |
| `last_commit` | ISO 8601 timestamp of most recent commit |

**`weekly_activity.csv`** — one row per week/author/repository bucket:

| Column | Description |
|--------|-------------|
| `week` | ISO week label, e.g. `2024-W03` |
| `author` | Author name |
| `repository` | Repository name |
| `commit_count` | Commits in this bucket |
| `insertions` | Lines added in this bucket |
| `deletions` | Lines deleted in this bucket |

### JSON

Four JSON files are written when `json` is in the format list:
`report.json`, `velocity_summary.json`, `quality_summary.json`, and
`dora_summary.json`. The primary one is documented below.

**`report.json`** — full structured payload:

```json
{
  "generated_at": "2024-03-15T10:00:00Z",
  "period_start": "2024-01-01T00:00:00Z",
  "period_end":   "2024-03-14T23:59:59Z",
  "total_commits": 347,
  "total_authors": 8,
  "category_breakdown": { "feature": 120, "bugfix": 45, ... },
  "authors": [
    {
      "name": "Alice Smith",
      "email": "alice@company.com",
      "commit_count": 87,
      "insertions": 4200,
      "deletions": 1100,
      "files_changed": 310,
      "categories": { "feature": 50, "bugfix": 20, ... },
      "first_commit": "...",
      "last_commit": "..."
    }
  ],
  "repositories": [
    {
      "name": "my-project",
      "commit_count": 347,
      "author_count": 8,
      "insertions": 18000,
      "deletions": 6000,
      "top_categories": [["feature", 120], ["bugfix", 45]]
    }
  ],
  "weekly_activity": [
    {
      "week": "2024-W03",
      "author": "Alice Smith",
      "repository": "my-project",
      "commit_count": 12,
      "insertions": 500,
      "deletions": 120,
      "categories": { "feature": 8, "bugfix": 4 }
    }
  ]
}
```

### Markdown

**`report.md`** — a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.

## Development

### Build and Test

```bash
# Build everything
cargo build

# Build release binary
cargo build --release

# Run all tests
cargo test

# Lint (zero warnings required)
cargo clippy -- -D warnings

# Format check (CI gate)
cargo fmt --check

# Auto-format
cargo fmt

# Generate rustdoc
cargo doc --open
```

### Running Against Real Repos

`configs/example-config.yaml` is a working example that analyzes repositories using `developer_aliases`. Copy it, adjust paths and names to match your setup, then run:

```bash
tga analyze --config configs/example-config.yaml --database tga.db
```

### CI Gates

The GitHub Actions workflow (`ci.yml`) requires:
- `cargo fmt -- --check`
- `cargo clippy --all-targets -- -D warnings`
- `cargo test`
- `cargo doc --no-deps` with `RUSTDOCFLAGS="-D warnings"`

## Crate Structure

Single `tga` crate (consolidated from the original 5-crate workspace):

| Module | Path | Purpose |
|--------|------|---------|
| `tga::core` | `src/core/` | Shared types, config, DB schema, migrations, error types |
| `tga::collect` | `src/collect/` | Stage 1: git extraction (libgit2), GitHub/JIRA/Linear/ADO clients, `PmAdapter` trait |
| `tga::classify` | `src/classify/` | Stage 2: seven-tier classification cascade |
| `tga::report` | `src/report/` | Stage 3: CSV/JSON/Markdown output |
| `commands` (binary-private) | `src/commands/` | Subcommand handlers wired into `src/main.rs` |

## License

**Non-commercial use only.** See [LICENSE](LICENSE) for terms.

Elastic License 2.0