trusty-git-analytics

Analyze git repositories to measure developer productivity — classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.

What It Does

tga walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a seven-tier classification cascade, then aggregates the results into per-author, per-week, DORA, velocity, and quality reports. It is a feature-complete Rust port of gitflow-analytics with the same YAML config schema and the same SQLite schema — existing config files work without modification.

Installation

From crates.io (recommended)

cargo install tga

This installs the tga binary to ~/.cargo/bin/. Ensure ~/.cargo/bin is in your PATH.

From source

git clone https://github.com/bobmatnyc/trusty-tools
cd trusty-tools/crates/trusty-git-analytics
cargo install --path .

Verify installation

tga --version
tga --help

Quick Start

Run Your First Analysis

Step 1 — Create a config.yaml:

repositories:
  - path: ~/code/my-project
    name: my-project

output:
  directory: ./reports
  formats: [csv, json, markdown]

Step 2 — Run the full pipeline:

tga analyze --config config.yaml

Step 3 — Find reports in ./reports/:

A full run writes 14 files: 9 CSV, 4 JSON, and 1 Markdown report. The most commonly used ones are:

reports/
├── authors.csv         # Per-author commit summary
├── weekly_activity.csv # Week-by-week breakdown
├── ... (7 more CSV files: DORA, velocity, quality, etc.)
├── report.json         # Full structured payload
├── ... (3 more JSON files)
└── report.md           # Narrative Markdown report

Configuration

Minimal config.yaml

repositories:
  - path: ~/code/my-repo
    name: my-repo

All other sections are optional. When output.formats is omitted, all three formats (CSV, JSON, Markdown) are written.

Full config reference

Key	Type	Default	Description
`repositories`	list	required	Repos to analyze
`developer_aliases`	map	`{}`	Canonical name → list of emails/aliases
`team`	object	—	Alternative to `developer_aliases`; roster with email
`output.directory`	path	`./reports`	Where reports are written
`output.formats`	list	`[csv, json, markdown]`	`csv`, `json`, and/or `markdown`
`output.include_unclassified`	bool	`false`	Include commits with no category
`output.include_merges`	bool	`false`	Include merge commits
`output.include_files`	bool	`false`	Include file-level change detail
`classification.rules_file`	path	—	Path to custom rules YAML/JSON
`classification.use_llm`	bool	`false`	Enable LLM fallback tier
`classification.llm_model`	string	`gpt-4o-mini`	LLM model identifier
`classification.confidence_threshold`	float	`0.7`	Minimum acceptance confidence
`classification.llm_fallback_threshold`	float	`0.0`	Commits with confidence above this value skip the LLM tier
`classification.llm_fallback_concurrency`	uint	`8`	Max concurrent LLM requests during fallback
`github.token`	string	`$GITHUB_TOKEN`	GitHub PAT for PR fetch. Required scopes: `public_repo` for public repos, `repo` for private repos. Without a token, GitHub rate-limits anonymous traffic to 60 requests/hour and most PRs will be missed.
`github.org`	string	—	Org slug for org-wide PR queries
`github.repo`	string	—	Single repo slug (`owner/name`)
`github.fetch_prs`	bool	`false`	Fetch pull request metadata. Must be `true` for `tga pr-metrics` to return data — when left at the default, `tga collect` writes zero rows to `pull_requests` (issue #211).
`github.ticket_regex`	string	—	Override regex for detecting GitHub ticket refs in commit messages
`jira.url`	string	—	JIRA base URL
`jira.username`	string	—	JIRA API username (email for Cloud)
`jira.token`	string	—	JIRA API token
`jira.project_key`	string	—	Project key filter (e.g. `API`)
`jira.ticket_regex`	string	—	Override regex for detecting JIRA ticket refs in commit messages
`linear.ticket_regex`	string	—	Override regex for detecting Linear ticket refs in commit messages
`pm.azure_devops.organization_url`	string	—	ADO org URL (e.g. `https://dev.azure.com/myorg`)
`pm.azure_devops.pat`	string	—	Azure DevOps Personal Access Token
`pm.azure_devops.project`	string	—	Default ADO project name
`pm.azure_devops.fetch_on_reference`	bool	`false`	Fetch work items when `AB#N` refs appear in commits
`pm.azure_devops.fetch_prs`	bool	`false`	Fetch ADO pull requests and reviewer data
`pm.azure_devops.ticket_regex`	string	`AB#(\d+)`	Override regex for detecting ADO work item refs in commit messages
`pm.bitbucket.workspace`	string	—	Bitbucket Cloud workspace slug
`pm.bitbucket.repo_slug`	string	—	Repository slug within the workspace
`pm.bitbucket.fetch_prs`	bool	`false`	Fetch Bitbucket Cloud pull request metadata
`pm.bitbucket.token`	string	`$BITBUCKET_TOKEN`	Bearer token (App password or OAuth)
`pm.bitbucket.username`	string	—	Atlassian account username for Basic auth
`pm.bitbucket.app_password`	string	—	Atlassian App password for Basic auth (alternative to `token`)
`cache.directory`	path	—	Cache directory (supports `~`)
`version`	string	—	Schema version; stored for compatibility
`profile`	string	—	Named profile; stored for compatibility
`dora.deployment_source`	string	`git_tags`	Source for `tga deployments collect`. One of `git_tags`, `github_releases`, `github_actions`, `manual`.
`dora.deployment_tag_pattern`	regex	`^v?[0-9]+\.[0-9]+\.[0-9]+(-...)?$`	Tags matching this regex are ingested as deployments.
`dora.production_branch`	string	`main`	Default branch that production deployments come from.
`dora.failure_signals`	list	`[]`	One signal per entry; `work_type` (classification category) and/or `commit_message_pattern` (regex) + `within_hours` window.
`dora.datadog_dir`	path	—	Directory of Datadog incident exports (.json). Currently a stub — JIRA SRE path is the default MTTR source.
`jira.jira_project_mappings`	map	`{}`	Project key → work type (issue #206). Fires as Tier 1.6 — outranks the generic ticket regex.
`jira.jira_project_mapping_confidence`	float	`0.88`	Per-verdict confidence for the JIRA mapping tier.

Paths support ~ expansion. Config files from the Python gitflow-analytics tool load without changes — unknown keys are silently ignored.

developer_aliases vs team.members

developer_aliases (Python-compatible flat map):

developer_aliases:
  "Alice Smith":
    - "alice@company.com"
    - "asmith@company.com"
    - "alice@personal.dev"
  "Bob Jones":
    - "bob@company.com"
    - "129991831+bobgithub@users.noreply.github.com"

team.members (structured roster with canonical email):

team:
  members:
    - name: Alice Smith
      email: alice@company.com
      aliases:
        - asmith@company.com
        - alice@personal.dev

When developer_aliases is non-empty it takes precedence over team.members. Use developer_aliases when migrating an existing Python config file; use team.members for new setups where canonical email matters for downstream tooling.

Example: multi-repo config with GitHub

See configs/example-config.yaml for a working example that covers multiple repositories, developer aliases, and CSV+Markdown output.

CLI Reference

All subcommands accept these global flags:

Flag	Default	Description
`--config <PATH>`	`config.yaml`	Path to config YAML
`--database <PATH>`	`tga.db`	Path to SQLite database
`-v / -vv / -vvv`	warnings only	Increase log verbosity

tga analyze

Run the full pipeline: collect → classify → report.

tga analyze [--config <PATH>] [--database <PATH>] [--output <DIR>]
            [--skip-collect] [--skip-classify] [--weeks <N>]
            [--from <DATE>] [--to <DATE>] [--dry-run]
            [--validate-only] [--no-validate]

Flag	Description
`--skip-collect`	Skip Stage 1; use commits already in the database
`--skip-classify`	Skip Stage 2; use existing classifications
`--output <DIR>`	Override `output.directory` from config
`--weeks <N>`	Limit collection to the last N weeks (overrides config `start_date`)
`--from <DATE>`	Start date for collection (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks`
`--to <DATE>`	End date for collection (ISO 8601 `YYYY-MM-DD`); defaults to today
`--dry-run`	Perform all steps against an in-memory database; the on-disk database is left untouched
`--validate-only`	Run configuration validation and exit (0 on success, 1 on errors)
`--no-validate`	Skip pre-flight configuration validation

# Full pipeline
tga analyze --config config.yaml

# Re-run reports only (commits already collected and classified)
tga analyze --skip-collect --skip-classify --output ./reports-v2

tga collect

Stage 1: extract commits from git repositories into the database.

tga collect [--config <PATH>] [--database <PATH>]
            [--repos <NAME,...>] [--from <DATE>] [--to <DATE>] [--weeks <N>]
            [--since <DATE>] [--until <DATE>] [--dry-run]
            [--force-refresh-prs] [--validate-only] [--no-validate]

Flag	Description
`--repos <NAME,...>`	Comma-separated list of repository names to collect; others are skipped
`--from <DATE>`	Collect commits on or after this ISO 8601 date; mutually exclusive with `--weeks`
`--to <DATE>`	Collect commits on or before this ISO 8601 date; defaults to today
`--since <DATE>`	Legacy alias for `--from` (Python-predecessor compatibility); `--from` takes precedence
`--until <DATE>`	Legacy alias for `--to` (Python-predecessor compatibility); `--to` takes precedence
`--weeks <N>`	Limit collection to the last N weeks; `--weeks` takes precedence over `--from`/`--to`
`--dry-run`	Run collection against an in-memory database; the on-disk database is left untouched
`--force-refresh-prs`	Re-fetch ADO pull requests even when already cached (backfills pre-v1.0.9 rows)
`--validate-only`	Run configuration validation and exit (0 on success, 1 on errors)
`--no-validate`	Skip pre-flight configuration validation

tga collect --repos my-project --from 2024-01-01 --to 2024-03-31
tga collect --weeks 4   # collect last 4 weeks across all repos

tga classify

Stage 2: run the classification cascade over collected commits.

tga classify [--config <PATH>] [--database <PATH>]
             [--rules <PATH>] [--use-llm] [--backfill-complexity]
             [--no-external]

Flag	Description
`--rules <PATH>`	Override `classification.rules_file` from config. Custom rules default to priority 110 (above built-in 100) and are standalone by default (`extend_defaults: false`).
`--use-llm`	Enable LLM fallback regardless of config setting
`--backfill-complexity`	Fill missing complexity scores (1–5) for already-classified commits via the LLM, without re-running the full cascade; category, confidence, and method are left untouched
`--no-external`	Disable all external classification sources (JIRA, GitHub Issues) for this run, even if configured in the rules file or config

tga classify --rules ./custom-rules.yaml --use-llm
tga classify --no-external   # offline / CI mode

tga report

Stage 3: generate reports from classified commits.

tga report [--config <PATH>] [--database <PATH>]
           [--output <DIR>] [--formats <FMT,...>]

Flag	Description
`--output <DIR>`	Override `output.directory` from config
`--formats <FMT,...>`	Comma-separated: `csv`, `json`, `markdown`

tga report --output ./q1-reports --formats csv,json

tga pr-metrics

Aggregate pull-request metrics per engineer from the pull_requests cache.

tga pr-metrics [--config <PATH>] [--database <PATH>]
               [--weeks <N>] [--csv] [--output <PATH>]

Flag	Description
`--weeks <N>`	Limit metrics to PRs created within the last N weeks
`--csv`	Emit CSV instead of an aligned text table
`--output <PATH>`	Write output to a file (CSV with `--csv`, otherwise the text table)

tga pr-metrics --weeks 12 --csv --output pr-metrics.csv

The pr_comments_given and avg_revisions columns are reserved for future use and currently always output 0.0; the underlying review-comment and revision-count data is not yet tracked.

tga backfill

Retroactive maintenance operations that update existing commit rows in place (outside the normal collect → classify → report pipeline).

tga backfill <SUBCOMMAND> [--dry-run]

Subcommand	Description
`ai-detection`	Re-run LLM classification on low-confidence prior LLM verdicts
`revert-flags`	Scan commit messages for revert patterns and set `is_revert`
`ticket-ids`	Scan commit messages for ticket refs and update `ticket_id`/`ticketed`

Flag	Description
`--dry-run`	Report how many rows would change without writing

tga backfill revert-flags --dry-run
tga backfill ticket-ids

tga override

Manage manual classification overrides (Tier 0). Rows here pin a commit's verdict regardless of what the rule-based or LLM tiers would produce.

tga override <SUBCOMMAND>

Subcommand	Description
`add <SHA> <WORK_TYPE> <CHANGE_TYPE> [--notes <TEXT>] [--repo <PATH>]`	Insert (or replace) an override row for a commit SHA
`list [--repo <PATH>]`	List every override row, optionally filtered by repository
`remove <SHA> [--yes]`	Delete the override row(s) for a SHA (`--yes` skips confirmation)

tga override add abc1234 feature feature --notes "manual review"
tga override list
tga override remove abc1234 --yes

tga rules

Introspect the classification rule set (issue #209). Useful for tuning rules and answering "why was this commit classified as X?".

tga rules list                       # every loaded rule, sorted by priority
tga rules show <commit-sha>          # verdict + method recorded for a commit
tga rules test "feat: add login"     # dry-run the cascade against a message

tga deployments / tga incidents / tga dora

DORA metrics infrastructure (issues #207, #208, #212, #213).

# Step 1: ingest deployment events (default: git tags matching dora.deployment_tag_pattern)
tga deployments collect

# Step 2 (optional): ingest production incidents from JIRA SRE issues
tga incidents collect

# Step 3: compute Deployment Frequency, Lead Time, Change Failure Rate, MTTR.
# Also rebuilds the `deployment_failures` derived join from the current
# `dora.failure_signals` config — safe to re-run after a config edit.
tga dora

The four DORA metrics land in pre-computed SQL views (v_deployment_frequency, v_lead_time, v_change_failure_rate, v_mttr) so dashboards can read them directly without re-aggregating.

Tag & Release-Branch Reachability (issue #279)

tga collect automatically populates fact_commit_reachability with four new columns that distinguish "deployed via cherry-pick to a release branch and tagged" from "abandoned WIP":

Column	Type	Meaning
`on_any_tag`	boolean	`true` if any git tag reaches this commit
`reachable_from_tags`	JSON array	Tag names that contain this commit
`on_release_branch`	boolean	`true` if commit is on any configured release branch
`release_branches`	JSON array	Matching release-branch names

This resolves a systematic blind spot: bug fixes and security patches cherry-picked to release/* branches, tagged for production, and never merged to main previously showed on_default_branch = false — making them indistinguishable from abandoned WIP. With this feature, the 32% "merged rate" for bug fixes turns out to be much higher when deployed-via-tag commits are counted.

Config:

reachability:
  track_tags: true               # default: true
  track_release_branches: true   # default: true
  release_branch_patterns:       # default: ["release/*", "hotfix/*", "chore/release-*", "v*"]
    - "release/*"
    - "hotfix/*"
    - "chore/release-*"
    - "v*"

Set track_tags: false to skip the tag scan (useful for trunk-based repos with thousands of tags). The default is true because the scan is O(repo + refs), not O(repo × refs × commits).

Optionally disable the scan for a single run:

tga collect --skip-tag-reachability

Useful derived queries:

-- What % of bug fixes actually shipped (via any path)?
SELECT
  COUNT(*) FILTER (WHERE on_default_branch OR on_any_tag OR on_release_branch)
  * 100.0 / COUNT(*) AS shipped_pct
FROM classifications cl
JOIN commits c ON c.classification_id = cl.id
JOIN fact_commit_reachability fcr ON fcr.commit_sha = c.sha
WHERE cl.category = 'bug_fix';

-- Cherry-pick rate: commits reachable from tags but not on main
SELECT repo, COUNT(*)
FROM fact_commit_reachability
WHERE on_any_tag AND NOT on_default_branch
GROUP BY repo;

Pipeline Architecture

git repos ──┐
             │   collect      SQLite (tga.db)   classify       SQLite    report
GitHub API ──┼──────────────► [commits]        ──────────────► [classif]─────────► CSV (×9)
JIRA API ────┤   (libgit2,    [authors]         (7-tier                 ► JSON
Linear API ──┤   reqwest)     [pull_requests]   cascade,                ► Markdown
ADO API  ────┘                [work_items]      Rayon-parallel)

Stage 1 — collect (tga::collect): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, fetches GitHub PR / JIRA issue / Linear / Azure DevOps work item metadata via REST/GraphQL, and writes everything to SQLite.

Stage 2 — classify (tga::classify): reads unclassified commits from the database, runs each message through the seven-tier cascade (see below), and writes a classification verdict back. Rule-based tiers execute in parallel via Rayon.

Stage 3 — report (tga::report): reads the classified database, aggregates per-author, per-week, DORA, velocity, and quality statistics, and writes the configured output formats to the output directory.

Classification

Seven-Tier Cascade

Each commit message is tested against tiers in order. The first tier to produce a confident result wins.

Tier 0 — Manual Override (confidence 1.0): looks up the (commit_hash, repo_path) pair in the classification_overrides table. Managed via tga override add|list|remove.

Tier 1.5 — Issue Type (confidence 0.90): when the commit has ticket references resolving to rows in issue_cache, maps the upstream issue type (bug, story, task, spike, etc.) directly to a change_type.

Tier 1.6 — JIRA Project Mapping (default confidence 0.88, override via jira.jira_project_mapping_confidence): when jira.jira_project_mappings is configured, maps the JIRA project key prefix of any [A-Z]+-\d+ reference to a change_type. Fires before the regex tier so project mappings outrank the generic jira-ticket regex rule (issue #206). Example config:

jira:
  jira_project_mappings:
    TQL: bug_fix
    APEX: integration
    INFRA: platform_infrastructure
    SEC: security
  jira_project_mapping_confidence: 0.88   # optional, default 0.88

Tier-0 manual overrides and exact-keyword conventional-commit prefixes (e.g. fix:) still beat this tier.

Tier 4 — Exact (Aho-Corasick): builds a single finite-state machine from every keyword list across every rule and scans the message in O(n) time. Matches feat:, fix:, chore:, etc. Confidence 0.85–0.95.

Tier 5 — Regex: applies pre-compiled regex patterns from the rule set. Handles anchored conventional-commit patterns (^feat($[^)]*$)?!?:) and JIRA ticket IDs (\b[A-Z][A-Z0-9]+-\d+\b).

Tier 6 — Fuzzy heuristics: detects merge commits (via is_merge flag or Merge pull request prefix) and reverts (via Revert prefix). No external dependencies.

Tier 7 — LLM fallback (optional, async): calls an OpenAI-compatible API (OpenRouter by default, AWS Bedrock behind the bedrock cargo feature) when tiers 0–6 leave a commit in a fallthrough category. Disabled by default; enable with analysis.llm_classification.enabled: true or --use-llm. Results are only accepted when confidence >= confidence_threshold (default 0.7).

Default Rules

ID	Category	Keywords / Patterns
`cc-feat`	`feature`	`feat:`, `feature:`, `^feat(...)?!?:`
`cc-fix`	`bugfix`	`fix:`, `bugfix:`, `hotfix`, `^fix(...)?!?:`
`cc-chore`	`chore`	`chore:`, `^chore(...)?!?:`
`cc-docs`	`documentation`	`docs:`, `doc:`, `^docs?(...)?!?:`
`cc-refactor`	`refactor`	`refactor:`, `^refactor(...)?!?:`
`cc-test`	`test`	`test:`, `tests:`, `^tests?(...)?!?:`
`cc-ci`	`ci`	`ci:`, `^ci(...)?!?:`
`cc-perf`	`performance`	`perf:`, `^perf(...)?!?:`
`cc-style`	`style`	`style:`, `^style(...)?!?:`
`cc-build`	`build`	`build:`, `^build(...)?!?:`
`cc-revert`	`revert`	`revert:`, `^revert(...)?!?:`
`breaking-change`	`breaking`	`breaking change`, `breaking-change`
`jira-ticket`	`feature` (ticketed)	`\b[A-Z][A-Z0-9]+-\d+\b`
`kw-bug`	`bugfix`	`bug`, `defect`
`kw-security`	`bugfix` (security)	`security`, `cve-`, `vulnerability`

Commits that match no rule are assigned category uncategorized with confidence 0.0.

Custom Rules File

Supply your own rules alongside the defaults:

# my-rules.yaml
version: "1.0"
rules:
  - id: my-deploy
    category: deployment
    keywords:
      - "deploy:"
      - "release:"
    patterns:
      - "(?i)^deploy(ment)?:"
    priority: 80
    confidence: 0.9

tga classify --rules ./my-rules.yaml
# or in config.yaml:
# classification:
#   rules_file: ./my-rules.yaml

Rule Priority and extend_defaults (issue #259)

Custom rules default to priority 110 — one step above the highest built-in rule priority (100). This means user rules win over the default ruleset without needing an explicit priority: entry in every rule.

Custom rule files also default to standalone mode (extend_defaults: false): only the rules in the file are applied. Opt in to merging with the built-in defaults by adding extend_defaults: true.

# my-rules.yaml — standalone by default (no built-in rules loaded)
extend_defaults: false   # optional: explicitly document the default
rules:
  - id: my-deploy
    category: deployment
    keywords: ["deploy:", "release:"]
    # priority defaults to 110 — beats the built-in cc-feat (100)

# my-addons.yaml — augments the defaults
extend_defaults: true
rules:
  - id: my-payments
    category: payments
    keywords: ["payment:", "billing:"]
    confidence: 0.92

Multi-source classification (issue #260)

tga classify can consult external ticket systems — JIRA Cloud/Server and GitHub Issues — as high-confidence classification signals before the commit-message rule tiers run.

Priority model (highest to lowest):

Manual overrides (tga override add)
External sources — JIRA issue type / GitHub Issues labels (confidence 0.92)
Custom regex rules (default priority 110)
Built-in TGA rules (priority 100)
LLM fallback (when use_llm: true)

Configuration — add a sources: list to config.yaml under classification::

classification:
  sources:
    # JIRA source
    - type: jira
      base_url: "https://yourco.atlassian.net"
      token_env: JIRA_API_TOKEN      # env var carrying the API token
      username: "you@yourco.com"     # omit for Bearer-only tokens
      project_keys: ["PROJ", "ENG"]  # empty = all projects
      field_mappings:
        issue_type:
          Bug: bug_fix
          Story: new_feature
          Task: tech_debt_refactoring
          Epic: new_feature
        labels:
          ktlo: tech_debt_refactoring
          security: security
        components:
          Platform: platform_infrastructure

    # GitHub Issues source
    - type: github_issues
      repo: "acme/widgets"
      token_env: GITHUB_TOKEN
      label_mappings:
        bug: bug_fix
        enhancement: new_feature
        dependencies: tech_debt_refactoring
        documentation: documentation

Credentials are read from the named environment variables at runtime — never store tokens directly in config files:

export JIRA_API_TOKEN="your-jira-api-token"
export GITHUB_TOKEN="your-github-pat"
tga classify --config config.yaml

Caching: each unique ticket is fetched at most once per tga classify run (in-memory cache, never persisted to disk). On HTTP failure or missing token, the source is skipped with a warning and the pipeline falls through to commit-message rules.

Disabling external sources: use --no-external to skip all sources for a run (useful in CI or offline environments):

tga classify --no-external

See examples/multi-source-config.yaml for a fully annotated example.

Deferred sources (planned for a future release): Linear, Shortcut, Confluence, Datadog.

Output Formats

CSV

Nine CSV files are written when csv is in the format list (per-author, weekly activity, DORA metrics, velocity, quality, and related breakdowns). The two most commonly used are documented below.

authors.csv — one row per author:

Column	Description
`name`	Canonical author name
`email`	Canonical author email
`commit_count`	Total commits
`insertions`	Total lines added
`deletions`	Total lines deleted
`files_changed`	Total files changed
`first_commit`	ISO 8601 timestamp of earliest commit
`last_commit`	ISO 8601 timestamp of most recent commit

weekly_activity.csv — one row per week/author/repository bucket:

Column	Description
`week`	ISO week label, e.g. `2024-W03`
`author`	Author name
`repository`	Repository name
`commit_count`	Commits in this bucket
`insertions`	Lines added in this bucket
`deletions`	Lines deleted in this bucket

JSON

Four JSON files are written when json is in the format list: report.json, velocity_summary.json, quality_summary.json, and dora_summary.json. The primary one is documented below.

report.json — full structured payload:

{
  "generated_at": "2024-03-15T10:00:00Z",
  "period_start": "2024-01-01T00:00:00Z",
  "period_end":   "2024-03-14T23:59:59Z",
  "total_commits": 347,
  "total_authors": 8,
  "category_breakdown": { "feature": 120, "bugfix": 45, ... },
  "authors": [
    {
      "name": "Alice Smith",
      "email": "alice@company.com",
      "commit_count": 87,
      "insertions": 4200,
      "deletions": 1100,
      "files_changed": 310,
      "categories": { "feature": 50, "bugfix": 20, ... },
      "first_commit": "...",
      "last_commit": "..."
    }
  ],
  "repositories": [
    {
      "name": "my-project",
      "commit_count": 347,
      "author_count": 8,
      "insertions": 18000,
      "deletions": 6000,
      "top_categories": [["feature", 120], ["bugfix", 45]]
    }
  ],
  "weekly_activity": [
    {
      "week": "2024-W03",
      "author": "Alice Smith",
      "repository": "my-project",
      "commit_count": 12,
      "insertions": 500,
      "deletions": 120,
      "categories": { "feature": 8, "bugfix": 4 }
    }
  ]
}

Markdown

report.md — a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.

Development

Build and Test

# Build everything
cargo build

# Build release binary
cargo build --release

# Run all tests
cargo test

# Lint (zero warnings required)
cargo clippy -- -D warnings

# Format check (CI gate)
cargo fmt --check

# Auto-format
cargo fmt

# Generate rustdoc
cargo doc --open

Running Against Real Repos

configs/example-config.yaml is a working example that analyzes repositories using developer_aliases. Copy it, adjust paths and names to match your setup, then run:

tga analyze --config configs/example-config.yaml --database tga.db

CI Gates

The GitHub Actions workflow (ci.yml) requires:

cargo fmt -- --check
cargo clippy --all-targets -- -D warnings
cargo test
cargo doc --no-deps with RUSTDOCFLAGS="-D warnings"

Crate Structure

Single tga crate (consolidated from the original 5-crate workspace):

Module	Path	Purpose
`tga::core`	`src/core/`	Shared types, config, DB schema, migrations, error types
`tga::collect`	`src/collect/`	Stage 1: git extraction (libgit2), GitHub/JIRA/Linear/ADO clients, `PmAdapter` trait
`tga::classify`	`src/classify/`	Stage 2: seven-tier classification cascade
`tga::report`	`src/report/`	Stage 3: CSV/JSON/Markdown output
`commands` (binary-private)	`src/commands/`	Subcommand handlers wired into `src/main.rs`

License

Non-commercial use only. See LICENSE for terms.

Elastic License 2.0

tga 1.2.0