trusty-git-analytics
Analyze git repositories to measure developer productivity — classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.
What It Does
tga walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a seven-tier classification cascade, then aggregates the results into per-author, per-week, DORA, velocity, and quality reports. It is a feature-complete Rust port of gitflow-analytics with the same YAML config schema and the same SQLite schema — existing config files work without modification.
Installation
From crates.io (recommended)
This installs the tga binary to ~/.cargo/bin/. Ensure ~/.cargo/bin is in your PATH.
From source
Verify installation
Quick Start
Run Your First Analysis
Step 1 — Create a config.yaml:
repositories:
- path: ~/code/my-project
name: my-project
output:
directory: ./reports
formats:
Step 2 — Run the full pipeline:
Step 3 — Find reports in ./reports/:
A full run writes 14 files: 9 CSV, 4 JSON, and 1 Markdown report. The most commonly used ones are:
reports/
├── authors.csv # Per-author commit summary
├── weekly_activity.csv # Week-by-week breakdown
├── ... (7 more CSV files: DORA, velocity, quality, etc.)
├── report.json # Full structured payload
├── ... (3 more JSON files)
└── report.md # Narrative Markdown report
Configuration
Minimal config.yaml
repositories:
- path: ~/code/my-repo
name: my-repo
All other sections are optional. When output.formats is omitted, all three formats (CSV, JSON, Markdown) are written.
Full config reference
| Key | Type | Default | Description |
|---|---|---|---|
repositories |
list | required | Repos to analyze |
developer_aliases |
map | {} |
Canonical name → list of emails/aliases |
team |
object | — | Alternative to developer_aliases; roster with email |
output.directory |
path | ./reports |
Where reports are written |
output.formats |
list | [csv, json, markdown] |
csv, json, and/or markdown |
output.include_unclassified |
bool | false |
Include commits with no category |
output.include_merges |
bool | false |
Include merge commits |
output.include_files |
bool | false |
Include file-level change detail |
classification.rules_file |
path | — | Path to custom rules YAML/JSON |
classification.use_llm |
bool | false |
Enable LLM fallback tier |
classification.llm_model |
string | gpt-4o-mini |
LLM model identifier |
classification.confidence_threshold |
float | 0.7 |
Minimum acceptance confidence |
classification.llm_fallback_threshold |
float | 0.0 |
Commits with confidence above this value skip the LLM tier |
classification.llm_fallback_concurrency |
uint | 8 |
Max concurrent LLM requests during fallback |
github.token |
string | $GITHUB_TOKEN |
GitHub PAT for PR fetch |
github.org |
string | — | Org slug for org-wide PR queries |
github.repo |
string | — | Single repo slug (owner/name) |
github.fetch_prs |
bool | false |
Fetch pull request metadata |
github.ticket_regex |
string | — | Override regex for detecting GitHub ticket refs in commit messages |
jira.url |
string | — | JIRA base URL |
jira.username |
string | — | JIRA API username (email for Cloud) |
jira.token |
string | — | JIRA API token |
jira.project_key |
string | — | Project key filter (e.g. API) |
jira.ticket_regex |
string | — | Override regex for detecting JIRA ticket refs in commit messages |
linear.ticket_regex |
string | — | Override regex for detecting Linear ticket refs in commit messages |
pm.azure_devops.organization_url |
string | — | ADO org URL (e.g. https://dev.azure.com/myorg) |
pm.azure_devops.pat |
string | — | Azure DevOps Personal Access Token |
pm.azure_devops.project |
string | — | Default ADO project name |
pm.azure_devops.fetch_on_reference |
bool | false |
Fetch work items when AB#N refs appear in commits |
pm.azure_devops.fetch_prs |
bool | false |
Fetch ADO pull requests and reviewer data |
pm.azure_devops.ticket_regex |
string | AB#(\d+) |
Override regex for detecting ADO work item refs in commit messages |
pm.bitbucket.workspace |
string | — | Bitbucket Cloud workspace slug |
pm.bitbucket.repo_slug |
string | — | Repository slug within the workspace |
pm.bitbucket.fetch_prs |
bool | false |
Fetch Bitbucket Cloud pull request metadata |
pm.bitbucket.token |
string | $BITBUCKET_TOKEN |
Bearer token (App password or OAuth) |
pm.bitbucket.username |
string | — | Atlassian account username for Basic auth |
pm.bitbucket.app_password |
string | — | Atlassian App password for Basic auth (alternative to token) |
cache.directory |
path | — | Cache directory (supports ~) |
version |
string | — | Schema version; stored for compatibility |
profile |
string | — | Named profile; stored for compatibility |
Paths support ~ expansion. Config files from the Python gitflow-analytics tool load without changes — unknown keys are silently ignored.
developer_aliases vs team.members
developer_aliases (Python-compatible flat map):
developer_aliases:
"Alice Smith":
- "alice@company.com"
- "asmith@company.com"
- "alice@personal.dev"
"Bob Jones":
- "bob@company.com"
- "129991831+bobgithub@users.noreply.github.com"
team.members (structured roster with canonical email):
team:
members:
- name: Alice Smith
email: alice@company.com
aliases:
- asmith@company.com
- alice@personal.dev
When developer_aliases is non-empty it takes precedence over team.members. Use developer_aliases when migrating an existing Python config file; use team.members for new setups where canonical email matters for downstream tooling.
Example: multi-repo config with GitHub
See configs/example-config.yaml for a working example that covers multiple repositories, developer aliases, and CSV+Markdown output.
CLI Reference
All subcommands accept these global flags:
| Flag | Default | Description |
|---|---|---|
--config <PATH> |
config.yaml |
Path to config YAML |
--database <PATH> |
tga.db |
Path to SQLite database |
-v / -vv / -vvv |
warnings only | Increase log verbosity |
tga analyze
Run the full pipeline: collect → classify → report.
| Flag | Description |
|---|---|
--skip-collect |
Skip Stage 1; use commits already in the database |
--skip-classify |
Skip Stage 2; use existing classifications |
--output <DIR> |
Override output.directory from config |
--weeks <N> |
Limit collection to the last N weeks (overrides config start_date) |
--from <DATE> |
Start date for collection (ISO 8601 YYYY-MM-DD); mutually exclusive with --weeks |
--to <DATE> |
End date for collection (ISO 8601 YYYY-MM-DD); defaults to today |
--dry-run |
Perform all steps against an in-memory database; the on-disk database is left untouched |
--validate-only |
Run configuration validation and exit (0 on success, 1 on errors) |
--no-validate |
Skip pre-flight configuration validation |
# Full pipeline
# Re-run reports only (commits already collected and classified)
tga collect
Stage 1: extract commits from git repositories into the database.
| Flag | Description |
|---|---|
--repos <NAME,...> |
Comma-separated list of repository names to collect; others are skipped |
--from <DATE> |
Collect commits on or after this ISO 8601 date; mutually exclusive with --weeks |
--to <DATE> |
Collect commits on or before this ISO 8601 date; defaults to today |
--since <DATE> |
Legacy alias for --from (Python-predecessor compatibility); --from takes precedence |
--until <DATE> |
Legacy alias for --to (Python-predecessor compatibility); --to takes precedence |
--weeks <N> |
Limit collection to the last N weeks; --weeks takes precedence over --from/--to |
--dry-run |
Run collection against an in-memory database; the on-disk database is left untouched |
--force-refresh-prs |
Re-fetch ADO pull requests even when already cached (backfills pre-v1.0.9 rows) |
--validate-only |
Run configuration validation and exit (0 on success, 1 on errors) |
--no-validate |
Skip pre-flight configuration validation |
tga classify
Stage 2: run the classification cascade over collected commits.
| Flag | Description |
|---|---|
--rules <PATH> |
Override classification.rules_file from config |
--use-llm |
Enable LLM fallback regardless of config setting |
--backfill-complexity |
Fill missing complexity scores (1–5) for already-classified commits via the LLM, without re-running the full cascade; category, confidence, and method are left untouched |
tga report
Stage 3: generate reports from classified commits.
| Flag | Description |
|---|---|
--output <DIR> |
Override output.directory from config |
--formats <FMT,...> |
Comma-separated: csv, json, markdown |
tga pr-metrics
Aggregate pull-request metrics per engineer from the pull_requests cache.
| Flag | Description |
|---|---|
--weeks <N> |
Limit metrics to PRs created within the last N weeks |
--csv |
Emit CSV instead of an aligned text table |
--output <PATH> |
Write output to a file (CSV with --csv, otherwise the text table) |
The pr_comments_given and avg_revisions columns are reserved for future
use and currently always output 0.0; the underlying review-comment and
revision-count data is not yet tracked.
tga backfill
Retroactive maintenance operations that update existing commit rows in place
(outside the normal collect → classify → report pipeline).
| Subcommand | Description |
|---|---|
ai-detection |
Re-run LLM classification on low-confidence prior LLM verdicts |
revert-flags |
Scan commit messages for revert patterns and set is_revert |
ticket-ids |
Scan commit messages for ticket refs and update ticket_id/ticketed |
| Flag | Description |
|---|---|
--dry-run |
Report how many rows would change without writing |
tga override
Manage manual classification overrides (Tier 0). Rows here pin a commit's verdict regardless of what the rule-based or LLM tiers would produce.
| Subcommand | Description |
|---|---|
add <SHA> <WORK_TYPE> <CHANGE_TYPE> [--notes <TEXT>] [--repo <PATH>] |
Insert (or replace) an override row for a commit SHA |
list [--repo <PATH>] |
List every override row, optionally filtered by repository |
remove <SHA> [--yes] |
Delete the override row(s) for a SHA (--yes skips confirmation) |
Pipeline Architecture
git repos ──┐
│ collect SQLite (tga.db) classify SQLite report
GitHub API ──┼──────────────► [commits] ──────────────► [classif]─────────► CSV (×9)
JIRA API ────┤ (libgit2, [authors] (7-tier ► JSON
Linear API ──┤ reqwest) [pull_requests] cascade, ► Markdown
ADO API ────┘ [work_items] Rayon-parallel)
Stage 1 — collect (tga::collect): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, fetches GitHub PR / JIRA issue / Linear / Azure DevOps work item metadata via REST/GraphQL, and writes everything to SQLite.
Stage 2 — classify (tga::classify): reads unclassified commits from the database, runs each message through the seven-tier cascade (see below), and writes a classification verdict back. Rule-based tiers execute in parallel via Rayon.
Stage 3 — report (tga::report): reads the classified database, aggregates per-author, per-week, DORA, velocity, and quality statistics, and writes the configured output formats to the output directory.
Classification
Seven-Tier Cascade
Each commit message is tested against tiers in order. The first tier to produce a confident result wins.
Tier 0 — Manual Override (confidence 1.0): looks up the (commit_hash, repo_path) pair in the classification_overrides table. Managed via tga override add|list|remove.
Tier 1.5 — Issue Type (confidence 0.90): when the commit has ticket references resolving to rows in issue_cache, maps the upstream issue type (bug, story, task, spike, etc.) directly to a change_type.
Tier 3 — JIRA Project Mapping (confidence 0.95): when jira_project_mappings is configured, maps the JIRA project key prefix of any [A-Z]+-\d+ reference to a change_type.
Tier 4 — Exact (Aho-Corasick): builds a single finite-state machine from every keyword list across every rule and scans the message in O(n) time. Matches feat:, fix:, chore:, etc. Confidence 0.85–0.95.
Tier 5 — Regex: applies pre-compiled regex patterns from the rule set. Handles anchored conventional-commit patterns (^feat(\([^)]*\))?!?:) and JIRA ticket IDs (\b[A-Z][A-Z0-9]+-\d+\b).
Tier 6 — Fuzzy heuristics: detects merge commits (via is_merge flag or Merge pull request prefix) and reverts (via Revert prefix). No external dependencies.
Tier 7 — LLM fallback (optional, async): calls an OpenAI-compatible API (OpenRouter by default, AWS Bedrock behind the bedrock cargo feature) when tiers 0–6 leave a commit in a fallthrough category. Disabled by default; enable with analysis.llm_classification.enabled: true or --use-llm. Results are only accepted when confidence >= confidence_threshold (default 0.7).
Default Rules
| ID | Category | Keywords / Patterns |
|---|---|---|
cc-feat |
feature |
feat:, feature:, ^feat(...)?!?: |
cc-fix |
bugfix |
fix:, bugfix:, hotfix, ^fix(...)?!?: |
cc-chore |
chore |
chore:, ^chore(...)?!?: |
cc-docs |
documentation |
docs:, doc:, ^docs?(...)?!?: |
cc-refactor |
refactor |
refactor:, ^refactor(...)?!?: |
cc-test |
test |
test:, tests:, ^tests?(...)?!?: |
cc-ci |
ci |
ci:, ^ci(...)?!?: |
cc-perf |
performance |
perf:, ^perf(...)?!?: |
cc-style |
style |
style:, ^style(...)?!?: |
cc-build |
build |
build:, ^build(...)?!?: |
cc-revert |
revert |
revert:, ^revert(...)?!?: |
breaking-change |
breaking |
breaking change, breaking-change |
jira-ticket |
feature (ticketed) |
\b[A-Z][A-Z0-9]+-\d+\b |
kw-bug |
bugfix |
bug, defect |
kw-security |
bugfix (security) |
security, cve-, vulnerability |
Commits that match no rule are assigned category uncategorized with confidence 0.0.
Custom Rules File
Supply your own rules alongside the defaults:
# my-rules.yaml
version: "1.0"
rules:
- id: my-deploy
category: deployment
keywords:
- "deploy:"
- "release:"
patterns:
- "(?i)^deploy(ment)?:"
priority: 80
confidence: 0.9
# or in config.yaml:
# classification:
# rules_file: ./my-rules.yaml
Output Formats
CSV
Nine CSV files are written when csv is in the format list (per-author,
weekly activity, DORA metrics, velocity, quality, and related breakdowns).
The two most commonly used are documented below.
authors.csv — one row per author:
| Column | Description |
|---|---|
name |
Canonical author name |
email |
Canonical author email |
commit_count |
Total commits |
insertions |
Total lines added |
deletions |
Total lines deleted |
files_changed |
Total files changed |
first_commit |
ISO 8601 timestamp of earliest commit |
last_commit |
ISO 8601 timestamp of most recent commit |
weekly_activity.csv — one row per week/author/repository bucket:
| Column | Description |
|---|---|
week |
ISO week label, e.g. 2024-W03 |
author |
Author name |
repository |
Repository name |
commit_count |
Commits in this bucket |
insertions |
Lines added in this bucket |
deletions |
Lines deleted in this bucket |
JSON
Four JSON files are written when json is in the format list:
report.json, velocity_summary.json, quality_summary.json, and
dora_summary.json. The primary one is documented below.
report.json — full structured payload:
Markdown
report.md — a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.
Development
Build and Test
# Build everything
# Build release binary
# Run all tests
# Lint (zero warnings required)
# Format check (CI gate)
# Auto-format
# Generate rustdoc
Running Against Real Repos
configs/example-config.yaml is a working example that analyzes repositories using developer_aliases. Copy it, adjust paths and names to match your setup, then run:
CI Gates
The GitHub Actions workflow (ci.yml) requires:
cargo fmt -- --checkcargo clippy --all-targets -- -D warningscargo testcargo doc --no-depswithRUSTDOCFLAGS="-D warnings"
Crate Structure
Single tga crate (consolidated from the original 5-crate workspace):
| Module | Path | Purpose |
|---|---|---|
tga::core |
src/core/ |
Shared types, config, DB schema, migrations, error types |
tga::collect |
src/collect/ |
Stage 1: git extraction (libgit2), GitHub/JIRA/Linear/ADO clients, PmAdapter trait |
tga::classify |
src/classify/ |
Stage 2: seven-tier classification cascade |
tga::report |
src/report/ |
Stage 3: CSV/JSON/Markdown output |
commands (binary-private) |
src/commands/ |
Subcommand handlers wired into src/main.rs |
License
Non-commercial use only. See LICENSE for terms.
Elastic License 2.0