# trusty-git-analytics
Analyze git repositories to measure developer productivity — classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.
## What It Does
`tga` walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a seven-tier classification cascade, then aggregates the results into per-author, per-week, DORA, velocity, and quality reports. It is a feature-complete Rust port of [gitflow-analytics](https://github.com/bobmatnyc/gitflow-analytics) with the same YAML config schema and the same SQLite schema — existing config files work without modification.
## Installation
### From crates.io (recommended)
```bash
cargo install tga
```
This installs the `tga` binary to `~/.cargo/bin/`. Ensure `~/.cargo/bin` is in your `PATH`.
### From source
```bash
git clone https://github.com/bobmatnyc/trusty-tools
cd trusty-tools/crates/trusty-git-analytics
cargo install --path .
```
### Verify installation
```bash
tga --version
tga --help
```
## Quick Start
### Run Your First Analysis
**Step 1** — Create a `config.yaml`:
```yaml
repositories:
- path: ~/code/my-project
name: my-project
output:
directory: ./reports
formats: [csv, json, markdown]
```
**Step 2** — Run the full pipeline:
```bash
tga analyze --config config.yaml
```
**Step 3** — Find reports in `./reports/`:
A full run writes 14 files: 9 CSV, 4 JSON, and 1 Markdown report. The most
commonly used ones are:
```
reports/
├── authors.csv # Per-author commit summary
├── weekly_activity.csv # Week-by-week breakdown
├── ... (7 more CSV files: DORA, velocity, quality, etc.)
├── report.json # Full structured payload
├── ... (3 more JSON files)
└── report.md # Narrative Markdown report
```
## Configuration
### Minimal config.yaml
```yaml
repositories:
- path: ~/code/my-repo
name: my-repo
```
All other sections are optional. When `output.formats` is omitted, all three formats (CSV, JSON, Markdown) are written.
### Full config reference
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `repositories` | list | required | Repos to analyze |
| `developer_aliases` | map | `{}` | Canonical name → list of emails/aliases |
| `team` | object | — | Alternative to `developer_aliases`; roster with email |
| `output.directory` | path | `./reports` | Where reports are written |
| `output.formats` | list | `[csv, json, markdown]` | `csv`, `json`, and/or `markdown` |
| `output.include_unclassified` | bool | `false` | Include commits with no category |
| `output.include_merges` | bool | `false` | Include merge commits |
| `output.include_files` | bool | `false` | Include file-level change detail |
| `classification.rules_file` | path | — | Path to custom rules YAML/JSON |
| `classification.use_llm` | bool | `false` | Enable LLM fallback tier |
| `classification.llm_model` | string | `gpt-4o-mini` | LLM model identifier |
| `classification.confidence_threshold` | float | `0.7` | Minimum acceptance confidence |
| `classification.llm_fallback_threshold` | float | `0.0` | Commits with confidence above this value skip the LLM tier |
| `classification.llm_fallback_concurrency` | uint | `8` | Max concurrent LLM requests during fallback |
| `github.token` | string | `$GITHUB_TOKEN` | GitHub PAT for PR fetch. Required scopes: `public_repo` for public repos, `repo` for private repos. Without a token, GitHub rate-limits anonymous traffic to 60 requests/hour and most PRs will be missed. |
| `github.org` | string | — | Org slug for org-wide PR queries |
| `github.repo` | string | — | Single repo slug (`owner/name`) |
| `github.fetch_prs` | bool | `false` | Fetch pull request metadata. **Must be `true` for `tga pr-metrics` to return data** — when left at the default, `tga collect` writes zero rows to `pull_requests` (issue #211). |
| `github.ticket_regex` | string | — | Override regex for detecting GitHub ticket refs in commit messages |
| `jira.url` | string | — | JIRA base URL |
| `jira.username` | string | — | JIRA API username (email for Cloud) |
| `jira.token` | string | — | JIRA API token |
| `jira.project_key` | string | — | Project key filter (e.g. `API`) |
| `jira.ticket_regex` | string | — | Override regex for detecting JIRA ticket refs in commit messages |
| `linear.ticket_regex` | string | — | Override regex for detecting Linear ticket refs in commit messages |
| `pm.azure_devops.organization_url` | string | — | ADO org URL (e.g. `https://dev.azure.com/myorg`) |
| `pm.azure_devops.pat` | string | — | Azure DevOps Personal Access Token |
| `pm.azure_devops.project` | string | — | Default ADO project name |
| `pm.azure_devops.fetch_on_reference` | bool | `false` | Fetch work items when `AB#N` refs appear in commits |
| `pm.azure_devops.fetch_prs` | bool | `false` | Fetch ADO pull requests and reviewer data |
| `pm.azure_devops.ticket_regex` | string | `AB#(\d+)` | Override regex for detecting ADO work item refs in commit messages |
| `pm.bitbucket.workspace` | string | — | Bitbucket Cloud workspace slug |
| `pm.bitbucket.repo_slug` | string | — | Repository slug within the workspace |
| `pm.bitbucket.fetch_prs` | bool | `false` | Fetch Bitbucket Cloud pull request metadata |
| `pm.bitbucket.token` | string | `$BITBUCKET_TOKEN` | Bearer token (App password or OAuth) |
| `pm.bitbucket.username` | string | — | Atlassian account username for Basic auth |
| `pm.bitbucket.app_password` | string | — | Atlassian App password for Basic auth (alternative to `token`) |
| `cache.directory` | path | — | Cache directory (supports `~`) |
| `version` | string | — | Schema version; stored for compatibility |
| `profile` | string | — | Named profile; stored for compatibility |
| `dora.deployment_source` | string | `git_tags` | Source for `tga deployments collect`. One of `git_tags`, `github_releases`, `github_actions`, `manual`. |
| `dora.deployment_tag_pattern` | regex | `^v?[0-9]+\.[0-9]+\.[0-9]+(-...)?$` | Tags matching this regex are ingested as deployments. |
| `dora.production_branch` | string | `main` | Default branch that production deployments come from. |
| `dora.failure_signals` | list | `[]` | One signal per entry; `work_type` (classification category) and/or `commit_message_pattern` (regex) + `within_hours` window. |
| `dora.datadog_dir` | path | — | Directory of Datadog incident exports (.json). Currently a stub — JIRA SRE path is the default MTTR source. |
| `jira.jira_project_mappings` | map | `{}` | Project key → work type (issue #206). Fires as Tier 1.6 — outranks the generic ticket regex. |
| `jira.jira_project_mapping_confidence` | float | `0.88` | Per-verdict confidence for the JIRA mapping tier. |
Paths support `~` expansion. Config files from the Python `gitflow-analytics` tool load without changes — unknown keys are silently ignored.
### developer_aliases vs team.members
**`developer_aliases`** (Python-compatible flat map):
```yaml
developer_aliases:
"Alice Smith":
- "alice@company.com"
- "asmith@company.com"
- "alice@personal.dev"
"Bob Jones":
- "bob@company.com"
- "129991831+bobgithub@users.noreply.github.com"
```
**`team.members`** (structured roster with canonical email):
```yaml
team:
members:
- name: Alice Smith
email: alice@company.com
aliases:
- asmith@company.com
- alice@personal.dev
```
When `developer_aliases` is non-empty it takes precedence over `team.members`. Use `developer_aliases` when migrating an existing Python config file; use `team.members` for new setups where canonical email matters for downstream tooling.
### Example: multi-repo config with GitHub
See [`configs/example-config.yaml`](configs/example-config.yaml) for a working example that covers multiple repositories, developer aliases, and CSV+Markdown output.
## CLI Reference
All subcommands accept these global flags:
| Flag | Default | Description |
|------|---------|-------------|
| `--config <PATH>` | `config.yaml` | Path to config YAML |
| `--database <PATH>` | `tga.db` | Path to SQLite database |
| `-v / -vv / -vvv` | warnings only | Increase log verbosity |
### tga analyze
Run the full pipeline: collect → classify → report.
```bash
tga analyze [--config <PATH>] [--database <PATH>] [--output <DIR>]
[--skip-collect] [--skip-classify] [--weeks <N>]
[--from <DATE>] [--to <DATE>] [--dry-run]
[--validate-only] [--no-validate]
```
| Flag | Description |
|------|-------------|
| `--skip-collect` | Skip Stage 1; use commits already in the database |
| `--skip-classify` | Skip Stage 2; use existing classifications |
| `--output <DIR>` | Override `output.directory` from config |
| `--weeks <N>` | Limit collection to the last N weeks (overrides config `start_date`) |
| `--from <DATE>` | Start date for collection (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks` |
| `--to <DATE>` | End date for collection (ISO 8601 `YYYY-MM-DD`); defaults to today |
| `--dry-run` | Perform all steps against an in-memory database; the on-disk database is left untouched |
| `--validate-only` | Run configuration validation and exit (0 on success, 1 on errors) |
| `--no-validate` | Skip pre-flight configuration validation |
```bash
# Full pipeline
tga analyze --config config.yaml
# Re-run reports only (commits already collected and classified)
tga analyze --skip-collect --skip-classify --output ./reports-v2
```
### tga collect
Stage 1: extract commits from git repositories into the database.
```bash
tga collect [--config <PATH>] [--database <PATH>]
[--repos <NAME,...>] [--from <DATE>] [--to <DATE>] [--weeks <N>]
[--since <DATE>] [--until <DATE>] [--dry-run]
[--force-refresh-prs] [--validate-only] [--no-validate]
```
| Flag | Description |
|------|-------------|
| `--repos <NAME,...>` | Comma-separated list of repository names to collect; others are skipped |
| `--from <DATE>` | Collect commits on or after this ISO 8601 date; mutually exclusive with `--weeks` |
| `--to <DATE>` | Collect commits on or before this ISO 8601 date; defaults to today |
| `--since <DATE>` | Legacy alias for `--from` (Python-predecessor compatibility); `--from` takes precedence |
| `--until <DATE>` | Legacy alias for `--to` (Python-predecessor compatibility); `--to` takes precedence |
| `--weeks <N>` | Limit collection to the last N weeks; `--weeks` takes precedence over `--from`/`--to` |
| `--dry-run` | Run collection against an in-memory database; the on-disk database is left untouched |
| `--force-refresh-prs` | Re-fetch ADO pull requests even when already cached (backfills pre-v1.0.9 rows) |
| `--validate-only` | Run configuration validation and exit (0 on success, 1 on errors) |
| `--no-validate` | Skip pre-flight configuration validation |
```bash
tga collect --repos my-project --from 2024-01-01 --to 2024-03-31
tga collect --weeks 4 # collect last 4 weeks across all repos
```
### tga classify
Stage 2: run the classification cascade over collected commits.
```bash
tga classify [--config <PATH>] [--database <PATH>]
[--rules <PATH>] [--use-llm] [--backfill-complexity]
[--no-external]
```
| Flag | Description |
|------|-------------|
| `--rules <PATH>` | Override `classification.rules_file` from config. Custom rules default to priority 110 (above built-in 100) and are standalone by default (`extend_defaults: false`). |
| `--use-llm` | Enable LLM fallback regardless of config setting |
| `--backfill-complexity` | Fill missing complexity scores (1–5) for already-classified commits via the LLM, without re-running the full cascade; category, confidence, and method are left untouched |
| `--no-external` | Disable all external classification sources (JIRA, GitHub Issues) for this run, even if configured in the rules file or config |
```bash
tga classify --rules ./custom-rules.yaml --use-llm
tga classify --no-external # offline / CI mode
```
### tga report
Stage 3: generate reports from classified commits.
```bash
tga report [--config <PATH>] [--database <PATH>]
[--output <DIR>] [--formats <FMT,...>]
```
| Flag | Description |
|------|-------------|
| `--output <DIR>` | Override `output.directory` from config |
| `--formats <FMT,...>` | Comma-separated: `csv`, `json`, `markdown` |
```bash
tga report --output ./q1-reports --formats csv,json
```
### tga pr-metrics
Aggregate pull-request metrics per engineer from the `pull_requests` cache.
```bash
tga pr-metrics [--config <PATH>] [--database <PATH>]
[--weeks <N>] [--csv] [--output <PATH>]
```
| Flag | Description |
|------|-------------|
| `--weeks <N>` | Limit metrics to PRs created within the last N weeks |
| `--csv` | Emit CSV instead of an aligned text table |
| `--output <PATH>` | Write output to a file (CSV with `--csv`, otherwise the text table) |
```bash
tga pr-metrics --weeks 12 --csv --output pr-metrics.csv
```
The `pr_comments_given` and `avg_revisions` columns are reserved for future
use and currently always output `0.0`; the underlying review-comment and
revision-count data is not yet tracked.
### tga backfill
Retroactive maintenance operations that update existing commit rows in place
(outside the normal `collect → classify → report` pipeline).
```bash
tga backfill <SUBCOMMAND> [--dry-run]
```
| Subcommand | Description |
|------------|-------------|
| `ai-detection` | Re-run LLM classification on low-confidence prior LLM verdicts |
| `revert-flags` | Scan commit messages for revert patterns and set `is_revert` |
| `ticket-ids` | Scan commit messages for ticket refs and update `ticket_id`/`ticketed` |
| Flag | Description |
|------|-------------|
| `--dry-run` | Report how many rows would change without writing |
```bash
tga backfill revert-flags --dry-run
tga backfill ticket-ids
```
### tga override
Manage manual classification overrides (Tier 0). Rows here pin a commit's
verdict regardless of what the rule-based or LLM tiers would produce.
```bash
tga override <SUBCOMMAND>
```
| Subcommand | Description |
|------------|-------------|
| `add <SHA> <WORK_TYPE> <CHANGE_TYPE> [--notes <TEXT>] [--repo <PATH>]` | Insert (or replace) an override row for a commit SHA |
| `list [--repo <PATH>]` | List every override row, optionally filtered by repository |
| `remove <SHA> [--yes]` | Delete the override row(s) for a SHA (`--yes` skips confirmation) |
```bash
tga override add abc1234 feature feature --notes "manual review"
tga override list
tga override remove abc1234 --yes
```
### tga rules
Introspect the classification rule set (issue #209). Useful for tuning
rules and answering "why was this commit classified as X?".
```bash
tga rules list # every loaded rule, sorted by priority
tga rules show <commit-sha> # verdict + method recorded for a commit
tga rules test "feat: add login" # dry-run the cascade against a message
```
### tga deployments / tga incidents / tga dora
DORA metrics infrastructure (issues #207, #208, #212, #213).
```bash
# Step 1: ingest deployment events (default: git tags matching dora.deployment_tag_pattern)
tga deployments collect
# Step 2 (optional): ingest production incidents from JIRA SRE issues
tga incidents collect
# Step 3: compute Deployment Frequency, Lead Time, Change Failure Rate, MTTR.
# Also rebuilds the `deployment_failures` derived join from the current
# `dora.failure_signals` config — safe to re-run after a config edit.
tga dora
```
The four DORA metrics land in pre-computed SQL views
(`v_deployment_frequency`, `v_lead_time`, `v_change_failure_rate`,
`v_mttr`) so dashboards can read them directly without re-aggregating.
### Tag & Release-Branch Reachability (issue #279)
`tga collect` automatically populates `fact_commit_reachability` with four new columns that distinguish "deployed via cherry-pick to a release branch and tagged" from "abandoned WIP":
| Column | Type | Meaning |
|---|---|---|
| `on_any_tag` | boolean | `true` if any git tag reaches this commit |
| `reachable_from_tags` | JSON array | Tag names that contain this commit |
| `on_release_branch` | boolean | `true` if commit is on any configured release branch |
| `release_branches` | JSON array | Matching release-branch names |
This resolves a systematic blind spot: bug fixes and security patches cherry-picked to `release/*` branches, tagged for production, and never merged to `main` previously showed `on_default_branch = false` — making them indistinguishable from abandoned WIP. With this feature, the 32% "merged rate" for bug fixes turns out to be much higher when deployed-via-tag commits are counted.
**Config:**
```yaml
reachability:
track_tags: true # default: true
track_release_branches: true # default: true
release_branch_patterns: # default: ["release/*", "hotfix/*", "chore/release-*", "v*"]
- "release/*"
- "hotfix/*"
- "chore/release-*"
- "v*"
```
Set `track_tags: false` to skip the tag scan (useful for trunk-based repos with thousands of tags). The default is `true` because the scan is O(repo + refs), not O(repo × refs × commits).
Optionally disable the scan for a single run:
```bash
tga collect --skip-tag-reachability
```
**Useful derived queries:**
```sql
-- What % of bug fixes actually shipped (via any path)?
SELECT
COUNT(*) FILTER (WHERE on_default_branch OR on_any_tag OR on_release_branch)
* 100.0 / COUNT(*) AS shipped_pct
FROM classifications cl
JOIN commits c ON c.classification_id = cl.id
JOIN fact_commit_reachability fcr ON fcr.commit_sha = c.sha
WHERE cl.category = 'bug_fix';
-- Cherry-pick rate: commits reachable from tags but not on main
SELECT repo, COUNT(*)
FROM fact_commit_reachability
WHERE on_any_tag AND NOT on_default_branch
GROUP BY repo;
```
## Pipeline Architecture
```
git repos ──┐
│ collect SQLite (tga.db) classify SQLite report
GitHub API ──┼──────────────► [commits] ──────────────► [classif]─────────► CSV (×9)
JIRA API ────┤ (libgit2, [authors] (7-tier ► JSON
Linear API ──┤ reqwest) [pull_requests] cascade, ► Markdown
ADO API ────┘ [work_items] Rayon-parallel)
```
**Stage 1 — collect** (`tga::collect`): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, fetches GitHub PR / JIRA issue / Linear / Azure DevOps work item metadata via REST/GraphQL, and writes everything to SQLite.
**Stage 2 — classify** (`tga::classify`): reads unclassified commits from the database, runs each message through the seven-tier cascade (see below), and writes a classification verdict back. Rule-based tiers execute in parallel via Rayon.
**Stage 3 — report** (`tga::report`): reads the classified database, aggregates per-author, per-week, DORA, velocity, and quality statistics, and writes the configured output formats to the output directory.
## Classification
### Seven-Tier Cascade
Each commit message is tested against tiers in order. The first tier to produce a confident result wins.
**Tier 0 — Manual Override** (confidence 1.0): looks up the `(commit_hash, repo_path)` pair in the `classification_overrides` table. Managed via `tga override add|list|remove`.
**Tier 1.5 — Issue Type** (confidence 0.90): when the commit has ticket references resolving to rows in `issue_cache`, maps the upstream issue type (`bug`, `story`, `task`, `spike`, etc.) directly to a `change_type`.
**Tier 1.6 — JIRA Project Mapping** (default confidence 0.88, override via `jira.jira_project_mapping_confidence`): when `jira.jira_project_mappings` is configured, maps the JIRA project key prefix of any `[A-Z]+-\d+` reference to a `change_type`. Fires *before* the regex tier so project mappings outrank the generic `jira-ticket` regex rule (issue #206). Example config:
```yaml
jira:
jira_project_mappings:
TQL: bug_fix
APEX: integration
INFRA: platform_infrastructure
SEC: security
jira_project_mapping_confidence: 0.88 # optional, default 0.88
```
Tier-0 manual overrides and exact-keyword conventional-commit prefixes (e.g. `fix:`) still beat this tier.
**Tier 4 — Exact (Aho-Corasick)**: builds a single finite-state machine from every keyword list across every rule and scans the message in O(n) time. Matches `feat:`, `fix:`, `chore:`, etc. Confidence 0.85–0.95.
**Tier 5 — Regex**: applies pre-compiled regex patterns from the rule set. Handles anchored conventional-commit patterns (`^feat(\([^)]*\))?!?:`) and JIRA ticket IDs (`\b[A-Z][A-Z0-9]+-\d+\b`).
**Tier 6 — Fuzzy heuristics**: detects merge commits (via `is_merge` flag or `Merge pull request` prefix) and reverts (via `Revert` prefix). No external dependencies.
**Tier 7 — LLM fallback** (optional, async): calls an OpenAI-compatible API (**OpenRouter** by default, **AWS Bedrock** behind the `bedrock` cargo feature) when tiers 0–6 leave a commit in a fallthrough category. Disabled by default; enable with `analysis.llm_classification.enabled: true` or `--use-llm`. Results are only accepted when `confidence >= confidence_threshold` (default 0.7).
### Default Rules
| ID | Category | Keywords / Patterns |
|----|----------|---------------------|
| `cc-feat` | `feature` | `feat:`, `feature:`, `^feat(...)?!?:` |
| `cc-fix` | `bugfix` | `fix:`, `bugfix:`, `hotfix`, `^fix(...)?!?:` |
| `cc-chore` | `chore` | `chore:`, `^chore(...)?!?:` |
| `cc-docs` | `documentation` | `docs:`, `doc:`, `^docs?(...)?!?:` |
| `cc-refactor` | `refactor` | `refactor:`, `^refactor(...)?!?:` |
| `cc-test` | `test` | `test:`, `tests:`, `^tests?(...)?!?:` |
| `cc-ci` | `ci` | `ci:`, `^ci(...)?!?:` |
| `cc-perf` | `performance` | `perf:`, `^perf(...)?!?:` |
| `cc-style` | `style` | `style:`, `^style(...)?!?:` |
| `cc-build` | `build` | `build:`, `^build(...)?!?:` |
| `cc-revert` | `revert` | `revert:`, `^revert(...)?!?:` |
| `breaking-change` | `breaking` | `breaking change`, `breaking-change` |
| `jira-ticket` | `feature` (ticketed) | `\b[A-Z][A-Z0-9]+-\d+\b` |
| `kw-bug` | `bugfix` | `bug`, `defect` |
| `kw-security` | `bugfix` (security) | `security`, `cve-`, `vulnerability` |
Commits that match no rule are assigned category `uncategorized` with confidence 0.0.
### Custom Rules File
Supply your own rules alongside the defaults:
```yaml
# my-rules.yaml
version: "1.0"
rules:
- id: my-deploy
category: deployment
keywords:
- "deploy:"
- "release:"
patterns:
- "(?i)^deploy(ment)?:"
priority: 80
confidence: 0.9
```
```bash
tga classify --rules ./my-rules.yaml
# or in config.yaml:
# classification:
# rules_file: ./my-rules.yaml
```
### Rule Priority and extend_defaults (issue #259)
Custom rules default to **priority 110** — one step above the highest built-in rule priority (100). This means user rules win over the default ruleset without needing an explicit `priority:` entry in every rule.
Custom rule files also default to **standalone** mode (`extend_defaults: false`): only the rules in the file are applied. Opt in to merging with the built-in defaults by adding `extend_defaults: true`.
```yaml
# my-rules.yaml — standalone by default (no built-in rules loaded)
extend_defaults: false # optional: explicitly document the default
rules:
- id: my-deploy
category: deployment
keywords: ["deploy:", "release:"]
# priority defaults to 110 — beats the built-in cc-feat (100)
```
```yaml
# my-addons.yaml — augments the defaults
extend_defaults: true
rules:
- id: my-payments
category: payments
keywords: ["payment:", "billing:"]
confidence: 0.92
```
### Multi-source classification (issue #260)
`tga classify` can consult external ticket systems — JIRA Cloud/Server and
GitHub Issues — as high-confidence classification signals **before** the
commit-message rule tiers run.
**Priority model** (highest to lowest):
1. Manual overrides (`tga override add`)
2. External sources — JIRA issue type / GitHub Issues labels (confidence 0.92)
3. Custom regex rules (default priority 110)
4. Built-in TGA rules (priority 100)
5. LLM fallback (when `use_llm: true`)
**Configuration** — add a `sources:` list to `config.yaml` under
`classification:`:
```yaml
classification:
sources:
# JIRA source
- type: jira
base_url: "https://yourco.atlassian.net"
token_env: JIRA_API_TOKEN # env var carrying the API token
username: "you@yourco.com" # omit for Bearer-only tokens
project_keys: ["PROJ", "ENG"] # empty = all projects
field_mappings:
issue_type:
Bug: bug_fix
Story: new_feature
Task: tech_debt_refactoring
Epic: new_feature
labels:
ktlo: tech_debt_refactoring
security: security
components:
Platform: platform_infrastructure
# GitHub Issues source
- type: github_issues
repo: "acme/widgets"
token_env: GITHUB_TOKEN
label_mappings:
bug: bug_fix
enhancement: new_feature
dependencies: tech_debt_refactoring
documentation: documentation
```
Credentials are read from the **named environment variables** at runtime —
never store tokens directly in config files:
```bash
export JIRA_API_TOKEN="your-jira-api-token"
export GITHUB_TOKEN="your-github-pat"
tga classify --config config.yaml
```
**Caching**: each unique ticket is fetched at most once per `tga classify`
run (in-memory cache, never persisted to disk). On HTTP failure or missing
token, the source is skipped with a warning and the pipeline falls through
to commit-message rules.
**Disabling external sources**: use `--no-external` to skip all sources for
a run (useful in CI or offline environments):
```bash
tga classify --no-external
```
See [`examples/multi-source-config.yaml`](examples/multi-source-config.yaml)
for a fully annotated example.
**Deferred sources** (planned for a future release): Linear, Shortcut,
Confluence, Datadog.
## Output Formats
### CSV
Nine CSV files are written when `csv` is in the format list (per-author,
weekly activity, DORA metrics, velocity, quality, and related breakdowns).
The two most commonly used are documented below.
**`authors.csv`** — one row per author:
| Column | Description |
|--------|-------------|
| `name` | Canonical author name |
| `email` | Canonical author email |
| `commit_count` | Total commits |
| `insertions` | Total lines added |
| `deletions` | Total lines deleted |
| `files_changed` | Total files changed |
| `first_commit` | ISO 8601 timestamp of earliest commit |
| `last_commit` | ISO 8601 timestamp of most recent commit |
**`weekly_activity.csv`** — one row per week/author/repository bucket:
| Column | Description |
|--------|-------------|
| `week` | ISO week label, e.g. `2024-W03` |
| `author` | Author name |
| `repository` | Repository name |
| `commit_count` | Commits in this bucket |
| `insertions` | Lines added in this bucket |
| `deletions` | Lines deleted in this bucket |
### JSON
Four JSON files are written when `json` is in the format list:
`report.json`, `velocity_summary.json`, `quality_summary.json`, and
`dora_summary.json`. The primary one is documented below.
**`report.json`** — full structured payload:
```json
{
"generated_at": "2024-03-15T10:00:00Z",
"period_start": "2024-01-01T00:00:00Z",
"period_end": "2024-03-14T23:59:59Z",
"total_commits": 347,
"total_authors": 8,
"category_breakdown": { "feature": 120, "bugfix": 45, ... },
"authors": [
{
"name": "Alice Smith",
"email": "alice@company.com",
"commit_count": 87,
"insertions": 4200,
"deletions": 1100,
"files_changed": 310,
"categories": { "feature": 50, "bugfix": 20, ... },
"first_commit": "...",
"last_commit": "..."
}
],
"repositories": [
{
"name": "my-project",
"commit_count": 347,
"author_count": 8,
"insertions": 18000,
"deletions": 6000,
"top_categories": [["feature", 120], ["bugfix", 45]]
}
],
"weekly_activity": [
{
"week": "2024-W03",
"author": "Alice Smith",
"repository": "my-project",
"commit_count": 12,
"insertions": 500,
"deletions": 120,
"categories": { "feature": 8, "bugfix": 4 }
}
]
}
```
### Markdown
**`report.md`** — a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.
## Development
### Build and Test
```bash
# Build everything
cargo build
# Build release binary
cargo build --release
# Run all tests
cargo test
# Lint (zero warnings required)
cargo clippy -- -D warnings
# Format check (CI gate)
cargo fmt --check
# Auto-format
cargo fmt
# Generate rustdoc
cargo doc --open
```
### Running Against Real Repos
`configs/example-config.yaml` is a working example that analyzes repositories using `developer_aliases`. Copy it, adjust paths and names to match your setup, then run:
```bash
tga analyze --config configs/example-config.yaml --database tga.db
```
### CI Gates
The GitHub Actions workflow (`ci.yml`) requires:
- `cargo fmt -- --check`
- `cargo clippy --all-targets -- -D warnings`
- `cargo test`
- `cargo doc --no-deps` with `RUSTDOCFLAGS="-D warnings"`
## Crate Structure
Single `tga` crate (consolidated from the original 5-crate workspace):
| Module | Path | Purpose |
|--------|------|---------|
| `tga::core` | `src/core/` | Shared types, config, DB schema, migrations, error types |
| `tga::collect` | `src/collect/` | Stage 1: git extraction (libgit2), GitHub/JIRA/Linear/ADO clients, `PmAdapter` trait |
| `tga::classify` | `src/classify/` | Stage 2: seven-tier classification cascade |
| `tga::report` | `src/report/` | Stage 3: CSV/JSON/Markdown output |
| `commands` (binary-private) | `src/commands/` | Subcommand handlers wired into `src/main.rs` |
## License
**Non-commercial use only.** See [LICENSE](LICENSE) for terms.
Elastic License 2.0