trusty-git-analytics
Analyze git repositories to measure developer productivity — classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.
What It Does
tga walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a seven-tier classification cascade, then aggregates the results into per-author, per-week, DORA, velocity, and quality reports. It is a feature-complete Rust port of gitflow-analytics with the same YAML config schema and the same SQLite schema — existing config files work without modification.
Installation
From crates.io (recommended)
This installs the tga binary to ~/.cargo/bin/. Ensure ~/.cargo/bin is in your PATH.
From source
Verify installation
Quick Start
Run Your First Analysis
Step 1 — Create a config.yaml:
repositories:
- path: ~/code/my-project
name: my-project
output:
directory: ./reports
formats:
Step 2 — Run the full pipeline:
Step 3 — Find reports in ./reports/:
reports/
├── authors.csv # Per-author commit summary
├── weekly_activity.csv # Week-by-week breakdown
├── report.json # Full structured payload
└── report.md # Narrative Markdown report
Configuration
Minimal config.yaml
repositories:
- path: ~/code/my-repo
name: my-repo
All other sections are optional. When output.formats is omitted, all three formats (CSV, JSON, Markdown) are written.
Full config reference
| Key | Type | Default | Description |
|---|---|---|---|
repositories |
list | required | Repos to analyze |
developer_aliases |
map | {} |
Canonical name → list of emails/aliases |
team |
object | — | Alternative to developer_aliases; roster with email |
output.directory |
path | ./reports |
Where reports are written |
output.formats |
list | [csv, json, markdown] |
csv, json, and/or markdown |
output.include_unclassified |
bool | false |
Include commits with no category |
output.include_merges |
bool | false |
Include merge commits |
output.include_files |
bool | false |
Include file-level change detail |
classification.rules_file |
path | — | Path to custom rules YAML/JSON |
classification.use_llm |
bool | false |
Enable LLM fallback tier |
classification.llm_model |
string | gpt-4o-mini |
LLM model identifier |
classification.confidence_threshold |
float | 0.7 |
Minimum acceptance confidence |
github.token |
string | $GITHUB_TOKEN |
GitHub PAT for PR fetch |
github.org |
string | — | Org slug for org-wide PR queries |
github.repo |
string | — | Single repo slug (owner/name) |
github.fetch_prs |
bool | false |
Fetch pull request metadata |
jira.url |
string | — | JIRA base URL |
jira.username |
string | — | JIRA API username (email for Cloud) |
jira.token |
string | — | JIRA API token |
jira.project_key |
string | — | Project key filter (e.g. API) |
pm.azure_devops.organization_url |
string | — | ADO org URL (e.g. https://dev.azure.com/myorg) |
pm.azure_devops.pat |
string | — | Azure DevOps Personal Access Token |
pm.azure_devops.project |
string | — | Default ADO project name |
pm.azure_devops.fetch_on_reference |
bool | false |
Fetch work items when AB#N refs appear in commits |
cache.directory |
path | — | Cache directory (supports ~) |
version |
string | — | Schema version; stored for compatibility |
profile |
string | — | Named profile; stored for compatibility |
Paths support ~ expansion. Config files from the Python gitflow-analytics tool load without changes — unknown keys are silently ignored.
developer_aliases vs team.members
developer_aliases (Python-compatible flat map):
developer_aliases:
"Alice Smith":
- "alice@company.com"
- "asmith@company.com"
- "alice@personal.dev"
"Bob Jones":
- "bob@company.com"
- "129991831+bobgithub@users.noreply.github.com"
team.members (structured roster with canonical email):
team:
members:
- name: Alice Smith
email: alice@company.com
aliases:
- asmith@company.com
- alice@personal.dev
When developer_aliases is non-empty it takes precedence over team.members. Use developer_aliases when migrating an existing Python config file; use team.members for new setups where canonical email matters for downstream tooling.
Example: multi-repo config with GitHub
See configs/example-config.yaml for a working example that covers multiple repositories, developer aliases, and CSV+Markdown output.
CLI Reference
All subcommands accept these global flags:
| Flag | Default | Description |
|---|---|---|
--config <PATH> |
config.yaml |
Path to config YAML |
--database <PATH> |
tga.db |
Path to SQLite database |
-v / -vv / -vvv |
warnings only | Increase log verbosity |
tga analyze
Run the full pipeline: collect → classify → report.
| Flag | Description |
|---|---|
--skip-collect |
Skip Stage 1; use commits already in the database |
--skip-classify |
Skip Stage 2; use existing classifications |
--output <DIR> |
Override output.directory from config |
--weeks <N> |
Limit collection to the last N weeks (overrides config start_date) |
# Full pipeline
# Re-run reports only (commits already collected and classified)
tga collect
Stage 1: extract commits from git repositories into the database.
| Flag | Description |
|---|---|
--repos <NAME,...> |
Comma-separated list of repository names to collect; others are skipped |
--since <DATE> |
Collect commits on or after this ISO 8601 date (overrides config and --weeks) |
--until <DATE> |
Collect commits on or before this ISO 8601 date (overrides config) |
--weeks <N> |
Limit collection to the last N weeks; --since takes precedence if both supplied |
tga classify
Stage 2: run the classification cascade over collected commits.
| Flag | Description |
|---|---|
--rules <PATH> |
Override classification.rules_file from config |
--use-llm |
Enable LLM fallback regardless of config setting |
tga report
Stage 3: generate reports from classified commits.
| Flag | Description |
|---|---|
--output <DIR> |
Override output.directory from config |
--formats <FMT,...> |
Comma-separated: csv, json, markdown |
Pipeline Architecture
git repos ──┐
│ collect SQLite (tga.db) classify SQLite report
GitHub API ──┼──────────────► [commits] ──────────────► [classif]─────────► CSV (×9)
JIRA API ────┤ (libgit2, [authors] (7-tier ► JSON
Linear API ──┤ reqwest) [pull_requests] cascade, ► Markdown
ADO API ────┘ [work_items] Rayon-parallel)
Stage 1 — collect (tga::collect): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, fetches GitHub PR / JIRA issue / Linear / Azure DevOps work item metadata via REST/GraphQL, and writes everything to SQLite.
Stage 2 — classify (tga::classify): reads unclassified commits from the database, runs each message through the seven-tier cascade (see below), and writes a classification verdict back. Rule-based tiers execute in parallel via Rayon.
Stage 3 — report (tga::report): reads the classified database, aggregates per-author, per-week, DORA, velocity, and quality statistics, and writes the configured output formats to the output directory.
Classification
Seven-Tier Cascade
Each commit message is tested against tiers in order. The first tier to produce a confident result wins.
Tier 0 — Manual Override (confidence 1.0): looks up the (commit_hash, repo_path) pair in the classification_overrides table. Managed via tga override add|list|remove.
Tier 1.5 — Issue Type (confidence 0.90): when the commit has ticket references resolving to rows in issue_cache, maps the upstream issue type (bug, story, task, spike, etc.) directly to a change_type.
Tier 3 — JIRA Project Mapping (confidence 0.95): when jira_project_mappings is configured, maps the JIRA project key prefix of any [A-Z]+-\d+ reference to a change_type.
Tier 4 — Exact (Aho-Corasick): builds a single finite-state machine from every keyword list across every rule and scans the message in O(n) time. Matches feat:, fix:, chore:, etc. Confidence 0.85–0.95.
Tier 5 — Regex: applies pre-compiled regex patterns from the rule set. Handles anchored conventional-commit patterns (^feat(\([^)]*\))?!?:) and JIRA ticket IDs (\b[A-Z][A-Z0-9]+-\d+\b).
Tier 6 — Fuzzy heuristics: detects merge commits (via is_merge flag or Merge pull request prefix) and reverts (via Revert prefix). No external dependencies.
Tier 7 — LLM fallback (optional, async): calls an OpenAI-compatible API (OpenRouter by default, AWS Bedrock behind the bedrock cargo feature) when tiers 0–6 leave a commit in a fallthrough category. Disabled by default; enable with analysis.llm_classification.enabled: true or --use-llm. Results are only accepted when confidence >= confidence_threshold (default 0.7).
Default Rules
| ID | Category | Keywords / Patterns |
|---|---|---|
cc-feat |
feature |
feat:, feature:, ^feat(...)?!?: |
cc-fix |
bugfix |
fix:, bugfix:, hotfix, ^fix(...)?!?: |
cc-chore |
chore |
chore:, ^chore(...)?!?: |
cc-docs |
documentation |
docs:, doc:, ^docs?(...)?!?: |
cc-refactor |
refactor |
refactor:, ^refactor(...)?!?: |
cc-test |
test |
test:, tests:, ^tests?(...)?!?: |
cc-ci |
ci |
ci:, ^ci(...)?!?: |
cc-perf |
performance |
perf:, ^perf(...)?!?: |
cc-style |
style |
style:, ^style(...)?!?: |
cc-build |
build |
build:, ^build(...)?!?: |
cc-revert |
revert |
revert:, ^revert(...)?!?: |
breaking-change |
breaking |
breaking change, breaking-change |
jira-ticket |
feature (ticketed) |
\b[A-Z][A-Z0-9]+-\d+\b |
kw-bug |
bugfix |
bug, defect |
kw-security |
bugfix (security) |
security, cve-, vulnerability |
Commits that match no rule are assigned category uncategorized with confidence 0.0.
Custom Rules File
Supply your own rules alongside the defaults:
# my-rules.yaml
version: "1.0"
rules:
- id: my-deploy
category: deployment
keywords:
- "deploy:"
- "release:"
patterns:
- "(?i)^deploy(ment)?:"
priority: 80
confidence: 0.9
# or in config.yaml:
# classification:
# rules_file: ./my-rules.yaml
Output Formats
CSV
Two files are written when csv is in the format list:
authors.csv — one row per author:
| Column | Description |
|---|---|
name |
Canonical author name |
email |
Canonical author email |
commit_count |
Total commits |
insertions |
Total lines added |
deletions |
Total lines deleted |
files_changed |
Total files changed |
first_commit |
ISO 8601 timestamp of earliest commit |
last_commit |
ISO 8601 timestamp of most recent commit |
weekly_activity.csv — one row per week/author/repository bucket:
| Column | Description |
|---|---|
week |
ISO week label, e.g. 2024-W03 |
author |
Author name |
repository |
Repository name |
commit_count |
Commits in this bucket |
insertions |
Lines added in this bucket |
deletions |
Lines deleted in this bucket |
JSON
report.json — full structured payload:
Markdown
report.md — a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.
Development
Build and Test
# Build everything
# Build release binary
# Run all tests
# Lint (zero warnings required)
# Format check (CI gate)
# Auto-format
# Generate rustdoc
Running Against Real Repos
configs/example-config.yaml is a working example that analyzes repositories using developer_aliases. Copy it, adjust paths and names to match your setup, then run:
CI Gates
The GitHub Actions workflow (ci.yml) requires:
cargo fmt -- --checkcargo clippy --all-targets -- -D warningscargo testcargo doc --no-depswithRUSTDOCFLAGS="-D warnings"
Crate Structure
Single tga crate (consolidated from the original 5-crate workspace):
| Module | Path | Purpose |
|---|---|---|
tga::core |
src/core/ |
Shared types, config, DB schema, migrations, error types |
tga::collect |
src/collect/ |
Stage 1: git extraction (libgit2), GitHub/JIRA/Linear/ADO clients, PmAdapter trait |
tga::classify |
src/classify/ |
Stage 2: seven-tier classification cascade |
tga::report |
src/report/ |
Stage 3: CSV/JSON/Markdown output |
commands (binary-private) |
src/commands/ |
Subcommand handlers wired into src/main.rs |
License
Non-commercial use only. See LICENSE for terms.
Elastic License 2.0