trusty-git-analytics
Analyze git repositories to measure developer productivity — classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.
What It Does
tga walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a four-tier rule cascade, then aggregates the results into per-author and per-week reports. It is a Rust port of gitflow-analytics with the same YAML config schema and the same SQLite schema — existing config files work without modification.
Quick Start
Installation
# From crates.io (once published)
# From source
# Binary: ./target/release/tga
Run Your First Analysis
Step 1 — Create a config.yaml:
repositories:
- path: ~/code/my-project
name: my-project
output:
directory: ./reports
formats:
Step 2 — Run the full pipeline:
Step 3 — Find reports in ./reports/:
reports/
├── authors.csv # Per-author commit summary
├── weekly_activity.csv # Week-by-week breakdown
├── report.json # Full structured payload
└── report.md # Narrative Markdown report
Configuration
Minimal config.yaml
repositories:
- path: ~/code/my-repo
name: my-repo
All other sections are optional. When output.formats is omitted, all three formats (CSV, JSON, Markdown) are written.
Full config reference
| Key | Type | Default | Description |
|---|---|---|---|
repositories |
list | required | Repos to analyze |
developer_aliases |
map | {} |
Canonical name → list of emails/aliases |
team |
object | — | Alternative to developer_aliases; roster with email |
output.directory |
path | ./reports |
Where reports are written |
output.formats |
list | [csv, json, markdown] |
csv, json, and/or markdown |
output.include_unclassified |
bool | false |
Include commits with no category |
output.include_merges |
bool | false |
Include merge commits |
output.include_files |
bool | false |
Include file-level change detail |
classification.rules_file |
path | — | Path to custom rules YAML/JSON |
classification.use_llm |
bool | false |
Enable LLM fallback tier |
classification.llm_model |
string | gpt-4o-mini |
LLM model identifier |
classification.confidence_threshold |
float | 0.7 |
Minimum acceptance confidence |
github.token |
string | $GITHUB_TOKEN |
GitHub PAT for PR fetch |
github.org |
string | — | Org slug for org-wide PR queries |
github.repo |
string | — | Single repo slug (owner/name) |
github.fetch_prs |
bool | false |
Fetch pull request metadata |
jira.url |
string | — | JIRA base URL |
jira.username |
string | — | JIRA API username (email for Cloud) |
jira.token |
string | — | JIRA API token |
jira.project_key |
string | — | Project key filter (e.g. API) |
cache.directory |
path | — | Cache directory (supports ~) |
version |
string | — | Schema version; stored for compatibility |
profile |
string | — | Named profile; stored for compatibility |
Paths support ~ expansion. Config files from the Python gitflow-analytics tool load without changes — unknown keys are silently ignored.
developer_aliases vs team.members
developer_aliases (Python-compatible flat map):
developer_aliases:
"Alice Smith":
- "alice@company.com"
- "asmith@company.com"
- "alice@personal.dev"
"Bob Jones":
- "bob@company.com"
- "129991831+bobgithub@users.noreply.github.com"
team.members (structured roster with canonical email):
team:
members:
- name: Alice Smith
email: alice@company.com
aliases:
- asmith@company.com
- alice@personal.dev
When developer_aliases is non-empty it takes precedence over team.members. Use developer_aliases when migrating an existing Python config file; use team.members for new setups where canonical email matters for downstream tooling.
Example: multi-repo config with GitHub
See configs/duetto-contractors.yaml for a working real-world example that covers multiple repositories, developer aliases, and CSV+Markdown output.
CLI Reference
All subcommands accept these global flags:
| Flag | Default | Description |
|---|---|---|
--config <PATH> |
config.yaml |
Path to config YAML |
--database <PATH> |
tga.db |
Path to SQLite database |
-v / -vv / -vvv |
warnings only | Increase log verbosity |
tga analyze
Run the full pipeline: collect → classify → report.
| Flag | Description |
|---|---|
--skip-collect |
Skip Stage 1; use commits already in the database |
--skip-classify |
Skip Stage 2; use existing classifications |
--output <DIR> |
Override output.directory from config |
# Full pipeline
# Re-run reports only (commits already collected and classified)
tga collect
Stage 1: extract commits from git repositories into the database.
| Flag | Description |
|---|---|
--repos <NAME,...> |
Comma-separated list of repository names to collect; others are skipped |
--since <DATE> |
Collect commits on or after this ISO 8601 date (overrides config) |
--until <DATE> |
Collect commits on or before this ISO 8601 date (overrides config) |
tga classify
Stage 2: run the classification cascade over collected commits.
| Flag | Description |
|---|---|
--rules <PATH> |
Override classification.rules_file from config |
--use-llm |
Enable LLM fallback regardless of config setting |
tga report
Stage 3: generate reports from classified commits.
| Flag | Description |
|---|---|
--output <DIR> |
Override output.directory from config |
--formats <FMT,...> |
Comma-separated: csv, json, markdown |
Pipeline Architecture
git repos ──┐
│ tga-collect SQLite (tga.db) tga-classify SQLite tga-report
GitHub API ──┼─────────────► [commits] ──────────────► [classif]─────────► CSV
JIRA API ───┘ (libgit2, [authors] (rules + ► JSON
reqwest) [pull_requests] LLM fallback) ► Markdown
Stage 1 — collect (tga-collect): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, optionally fetches GitHub PR metadata via the REST API, and writes everything to SQLite.
Stage 2 — classify (tga-classify): reads unclassified commits from the database, runs each message through the four-tier cascade (see below), and writes a classification verdict back. Tiers 1–3 execute in parallel via Rayon.
Stage 3 — report (tga-report): reads the classified database, aggregates per-author and per-week statistics, and writes the configured output formats to the output directory.
Classification
Four-Tier Cascade
Each commit message is tested against tiers in order. The first match wins.
Tier 1 — Exact (Aho-Corasick): builds a single finite-state machine from all keyword lists across all rules, then scans the message in O(n) time. Matches feat:, fix:, chore:, etc. Confidence: 0.85–0.95.
Tier 2 — Regex: applies pre-compiled regex patterns from rules. Handles anchored conventional-commit patterns (^feat(\([^)]*\))?!?:) and JIRA ticket IDs (\b[A-Z][A-Z0-9]+-\d+\b).
Tier 3 — Fuzzy heuristics: detects merge commits (via is_merge flag or "Merge pull request" prefix) and reverts (via "Revert" prefix). No external dependencies.
Tier 4 — LLM fallback (optional, async): calls an OpenAI-compatible API when tiers 1–3 all fail. Reads OPENAI_API_KEY from the environment. Disabled by default; enable with classification.use_llm: true or --use-llm.
Default Rules
| ID | Category | Keywords / Patterns |
|---|---|---|
cc-feat |
feature |
feat:, feature:, ^feat(...)?!?: |
cc-fix |
bugfix |
fix:, bugfix:, hotfix, ^fix(...)?!?: |
cc-chore |
chore |
chore:, ^chore(...)?!?: |
cc-docs |
documentation |
docs:, doc:, ^docs?(...)?!?: |
cc-refactor |
refactor |
refactor:, ^refactor(...)?!?: |
cc-test |
test |
test:, tests:, ^tests?(...)?!?: |
cc-ci |
ci |
ci:, ^ci(...)?!?: |
cc-perf |
performance |
perf:, ^perf(...)?!?: |
cc-style |
style |
style:, ^style(...)?!?: |
cc-build |
build |
build:, ^build(...)?!?: |
cc-revert |
revert |
revert:, ^revert(...)?!?: |
breaking-change |
breaking |
breaking change, breaking-change |
jira-ticket |
feature (ticketed) |
\b[A-Z][A-Z0-9]+-\d+\b |
kw-bug |
bugfix |
bug, defect |
kw-security |
bugfix (security) |
security, cve-, vulnerability |
Commits that match no rule are assigned category uncategorized with confidence 0.0.
Custom Rules File
Supply your own rules alongside the defaults:
# my-rules.yaml
version: "1.0"
rules:
- id: my-deploy
category: deployment
keywords:
- "deploy:"
- "release:"
patterns:
- "(?i)^deploy(ment)?:"
priority: 80
confidence: 0.9
# or in config.yaml:
# classification:
# rules_file: ./my-rules.yaml
Output Formats
CSV
Two files are written when csv is in the format list:
authors.csv — one row per author:
| Column | Description |
|---|---|
name |
Canonical author name |
email |
Canonical author email |
commit_count |
Total commits |
insertions |
Total lines added |
deletions |
Total lines deleted |
files_changed |
Total files changed |
first_commit |
ISO 8601 timestamp of earliest commit |
last_commit |
ISO 8601 timestamp of most recent commit |
weekly_activity.csv — one row per week/author/repository bucket:
| Column | Description |
|---|---|
week |
ISO week label, e.g. 2024-W03 |
author |
Author name |
repository |
Repository name |
commit_count |
Commits in this bucket |
insertions |
Lines added in this bucket |
deletions |
Lines deleted in this bucket |
JSON
report.json — full structured payload:
Markdown
report.md — a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.
Development
Build and Test
# Build everything
# Build release binary
# Run all tests
# Lint (zero warnings required)
# Format check (CI gate)
# Auto-format
# Generate rustdoc
Running Against Real Repos
configs/duetto-contractors.yaml is a working example that analyzes three repositories using developer_aliases. Adjust paths to match your local checkout:
CI Gates
The GitHub Actions workflow (ci.yml) requires:
cargo fmt --all -- --checkcargo clippy --workspace --all-targets -- -D warningscargo test --workspacecargo doc --workspace --no-depswithRUSTDOCFLAGS="-D warnings"
Crate Structure
| Crate | Purpose | crates.io |
|---|---|---|
tga-core |
Shared types, config, DB schema, migrations, error types | tga-core |
tga-collect |
Stage 1: git extraction (libgit2), GitHub/JIRA clients | tga-collect |
tga-classify |
Stage 2: four-tier classification cascade | tga-classify |
tga-report |
Stage 3: CSV/JSON/Markdown output | tga-report |
tga-cli |
Binary entry point (tga), clap CLI |
tga-cli |
License
MIT