tga 2.8.1

Developer productivity analytics β€” git commit collection, classification, and reporting
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
# trusty-git-analytics

Analyze git repositories to measure developer productivity β€” classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.

## What It Does

`tga` walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a multi-tier classification cascade (including the new weighted-sum Tier 2.5 added in 1.3.0), then aggregates the results into per-author, per-week, DORA, velocity, and quality reports. It is a feature-complete Rust port of [gitflow-analytics](https://github.com/bobmatnyc/gitflow-analytics) with the same YAML config schema and the same SQLite schema β€” existing config files work without modification.

1.4.0 adds four new external classification sources (Linear, Shortcut, Confluence, Datadog) and fixes a SQLite WAL data-loss bug (issue #298) that could cause up to 22 minutes of classification work to be silently discarded on exit.

## πŸ“š Documentation

Full documentation lives at the workspace top level in
[`docs/trusty-git-analytics/`](../../docs/trusty-git-analytics/): requirements
specifications, developer/user guides, crate-specific
[decision records](../../docs/trusty-git-analytics/decisions/), and
[regression-testing snapshots](../../docs/trusty-git-analytics/regression-testing/).
This README and the rustdoc stay in-crate; everything else lives under `docs/`.

## Installation

### From GitHub Releases (recommended for binary users)

Prebuilt binaries are available for macOS (Apple Silicon) and Linux (x86_64).

1. Download the latest release from [GitHub Releases](https://github.com/bobmatnyc/trusty-tools/releases):
   - Look for assets tagged `tga-v2.7.0`
   - Download the archive for your platform:
     - **macOS arm64 (Apple Silicon)**: `tga-v2.7.0-aarch64-apple-darwin.tar.gz`
     - **Linux x86_64**: `tga-v2.7.0-x86_64-unknown-linux-gnu.tar.gz`

2. Extract and install:
   ```bash
   tar xzf tga-v2.7.0-*.tar.gz
   chmod +x tga
   sudo mv tga /usr/local/bin/    # or ~/.local/bin/ if you prefer user install
   ```

3. Verify the installation:
   ```bash
   tga --version
   ```

### From Source with Cargo

Requires Rust 1.91 or later ([install Rust](https://rustup.rs/)).

```bash
cargo install --git https://github.com/bobmatnyc/trusty-tools tga --locked
```

This builds from the latest commit on `main` and installs the binary to `~/.cargo/bin/`. Make sure `~/.cargo/bin/` is on your PATH.

To install a specific version:
```bash
cargo install --git https://github.com/bobmatnyc/trusty-tools --tag tga-v2.7.0 tga --locked
```

### With Homebrew (recommended)

```bash
brew tap bobmatnyc/trusty
brew install trusty-git-analytics
```

Or install directly without tapping:

```bash
brew install bobmatnyc/trusty/trusty-git-analytics
```

Homebrew provides:
- Automatic updates via `brew upgrade trusty-git-analytics`
- Standard macOS / Linux PATH integration
- Easy dependency management

### Prerequisites & Special Cases

#### System Requirements

- **Git**: standard; the tool reads git history via git2.
- **OS**: macOS or Linux (Windows support via WSL2; not officially tested).
- **Database**: SQLite (bundled; no external SQLite install required).

#### Configuration

The CLI reads from `tga.yaml` or `~/.config/tga/config.yaml`. See the crate README and the configuration specification for details on setting up repository paths, identity resolvers, and report outputs.

```bash
tga analyze --config /path/to/tga.yaml
```

### Verify Installation

All installations can be verified by running:

```bash
tga --version
```

Expected output: the semantic version of the installed binary (e.g., `tga 2.7.0`).

## Quick Start

### Run Your First Analysis

**Step 1** β€” Create a `config.yaml`:

```yaml
repositories:
  - path: ~/code/my-project
    name: my-project

output:
  directory: ./reports
  formats: [csv, json, markdown]
```

**Step 2** β€” Run the full pipeline:

```bash
tga analyze --config config.yaml
```

**Step 3** β€” Find reports in `./reports/`:

A full run writes 14 files: 9 CSV, 4 JSON, and 1 Markdown report. The most
commonly used ones are:

```
reports/
β”œβ”€β”€ authors.csv         # Per-author commit summary
β”œβ”€β”€ weekly_activity.csv # Week-by-week breakdown
β”œβ”€β”€ ... (7 more CSV files: DORA, velocity, quality, etc.)
β”œβ”€β”€ report.json         # Full structured payload
β”œβ”€β”€ ... (3 more JSON files)
└── report.md           # Narrative Markdown report
```

## Configuration

### Minimal config.yaml

```yaml
repositories:
  - path: ~/code/my-repo
    name: my-repo
```

All other sections are optional. When `output.formats` is omitted, all three formats (CSV, JSON, Markdown) are written.

### Full config reference

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `repositories` | list | required | Repos to analyze |
| `developer_aliases` | map | `{}` | Canonical name β†’ list of emails/aliases |
| `team` | object | β€” | Alternative to `developer_aliases`; roster with email |
| `output.directory` | path | `./reports` | Where reports are written |
| `output.formats` | list | `[csv, json, markdown]` | `csv`, `json`, and/or `markdown` |
| `output.include_unclassified` | bool | `false` | Include commits with no category |
| `output.include_merges` | bool | `false` | Include merge commits |
| `output.include_files` | bool | `false` | Include file-level change detail |
| `classification.rules_file` | path | β€” | Path to custom rules YAML/JSON |
| `classification.use_llm` | bool | `false` | Enable LLM fallback tier |
| `classification.llm_model` | string | `gpt-4o-mini` | LLM model identifier |
| `classification.confidence_threshold` | float | `0.7` | Minimum acceptance confidence |
| `classification.llm_fallback_threshold` | float | `0.65` | Commits with confidence above this value skip the LLM tier. Raised from `0.0` in 1.3.0; see [LLM fallback threshold](#llm-fallback-threshold-migration). |
| `classification.llm_fallback_concurrency` | uint | `8` | Max concurrent LLM requests during fallback |
| `github.token` | string | `$GITHUB_TOKEN` | GitHub PAT for PR fetch. Required scopes: `public_repo` for public repos, `repo` for private repos. Without a token, GitHub rate-limits anonymous traffic to 60 requests/hour and most PRs will be missed. |
| `github.org` | string | β€” | Org slug for org-wide PR queries |
| `github.repo` | string | β€” | Single repo slug (`owner/name`) |
| `github.fetch_prs` | bool | `false` | Fetch pull request metadata. **Must be `true` for `tga pr-metrics` to return data** β€” when left at the default, `tga collect` writes zero rows to `pull_requests` (issue #211). |
| `github.ticket_regex` | string | β€” | Override regex for detecting GitHub ticket refs in commit messages |
| `jira.url` | string | β€” | JIRA base URL |
| `jira.username` | string | β€” | JIRA API username (email for Cloud) |
| `jira.token` | string | β€” | JIRA API token |
| `jira.project_key` | string | β€” | Project key filter (e.g. `API`) |
| `jira.ticket_regex` | string | β€” | Override regex for detecting JIRA ticket refs in commit messages |
| `linear.ticket_regex` | string | β€” | Override regex for detecting Linear ticket refs in commit messages |
| `pm.azure_devops.organization_url` | string | β€” | ADO org URL (e.g. `https://dev.azure.com/myorg`) |
| `pm.azure_devops.pat` | string | β€” | Azure DevOps Personal Access Token |
| `pm.azure_devops.project` | string | β€” | Default ADO project name |
| `pm.azure_devops.fetch_on_reference` | bool | `false` | Fetch work items when `AB#N` refs appear in commits |
| `pm.azure_devops.fetch_prs` | bool | `false` | Fetch ADO pull requests and reviewer data |
| `pm.azure_devops.ticket_regex` | string | `AB#(\d+)` | Override regex for detecting ADO work item refs in commit messages |
| `pm.bitbucket.workspace` | string | β€” | Bitbucket Cloud workspace slug |
| `pm.bitbucket.repo_slug` | string | β€” | Repository slug within the workspace |
| `pm.bitbucket.fetch_prs` | bool | `false` | Fetch Bitbucket Cloud pull request metadata |
| `pm.bitbucket.token` | string | `$BITBUCKET_TOKEN` | Bearer token (App password or OAuth) |
| `pm.bitbucket.username` | string | β€” | Atlassian account username for Basic auth |
| `pm.bitbucket.app_password` | string | β€” | Atlassian App password for Basic auth (alternative to `token`) |
| `cache.directory` | path | β€” | Cache directory (supports `~`) |
| `version` | string | β€” | Schema version; stored for compatibility |
| `profile` | string | β€” | Named profile; stored for compatibility |
| `dora.deployment_source` | string | `git_tags` | Source for `tga deployments collect`. One of `git_tags`, `github_releases`, `github_actions`, `manual`. |
| `dora.deployment_tag_pattern` | regex | `^v?[0-9]+\.[0-9]+\.[0-9]+(-...)?$` | Tags matching this regex are ingested as deployments. |
| `dora.production_branch` | string | `main` | Default branch that production deployments come from. |
| `dora.failure_signals` | list | `[]` | One signal per entry; `work_type` (classification category) and/or `commit_message_pattern` (regex) + `within_hours` window. |
| `dora.datadog_dir` | path | β€” | Directory of Datadog incident exports (.json). Currently a stub β€” JIRA SRE path is the default MTTR source. |
| `jira.jira_project_mappings` | map | `{}` | Project key β†’ work type (issue #206). Fires as Tier 1.6 β€” outranks the generic ticket regex. |
| `jira.jira_project_mapping_confidence` | float | `0.88` | Per-verdict confidence for the JIRA mapping tier. |

Paths support `~` expansion. Config files from the Python `gitflow-analytics` tool load without changes β€” unknown keys are silently ignored.

### developer_aliases vs team.members

**`developer_aliases`** (Python-compatible flat map):

```yaml
developer_aliases:
  "Alice Smith":
    - "alice@company.com"
    - "asmith@company.com"
    - "alice@personal.dev"
  "Bob Jones":
    - "bob@company.com"
    - "129991831+bobgithub@users.noreply.github.com"
```

**`team.members`** (structured roster with canonical email):

```yaml
team:
  members:
    - name: Alice Smith
      email: alice@company.com
      aliases:
        - asmith@company.com
        - alice@personal.dev
```

When `developer_aliases` is non-empty it takes precedence over `team.members`. Use `developer_aliases` when migrating an existing Python config file; use `team.members` for new setups where canonical email matters for downstream tooling.

### Example: multi-repo config with GitHub

See [`configs/example-config.yaml`](configs/example-config.yaml) for a working example that covers multiple repositories, developer aliases, and CSV+Markdown output.

## Common Workflows

### First-time setup

```bash
# 1. Generate config.yaml interactively
tga install

# 2. Collect all history across all repos (full branch coverage, tga 2.0.0+)
tga collect

# 3. Classify all collected commits
tga classify

# 4. Generate reports
tga report
```

### Routine weekly run

```bash
# Collect last 4 weeks (incremental; already-collected weeks are skipped)
tga collect --weeks 4

# Classify any unclassified commits
tga classify

# Regenerate reports
tga report
```

### Scoped re-run for one service

```bash
# Re-collect only one repo, forcing re-walk of the last 8 weeks
tga collect --repos my-service --weeks 8 --force

# Re-classify only that repo
tga classify --force --repos my-service --weeks 8

# Reports always cover the full DB β€” no scoping needed
tga report
```

### Upgrade from tga ≀ 1.5.4 (branch coverage fix)

```bash
# Re-collect all history to pick up commits from non-default branches
tga collect --force

# Re-classify everything (force overwrites prior verdicts)
tga classify --force

# Regenerate reports
tga report
```

### Restrict collection to specific branches

```bash
# Walk only 'main' and 'release/1.0' branches for all repos
tga collect --branch main,release/1.0

# Walk only 'main' for one specific repo
tga collect --repos my-service --branch main
```

### Monitor fetch health across many repos

`tga` fetches via libgit2 with full TLS support β€” `https://` remotes (GitHub,
GitLab, Bitbucket, etc.) work out of the box without a system `git` pre-fetch.
On macOS the native Security framework is used; on Linux openssl is vendored
statically; on Windows Schannel is used.

By default `tga collect` always fetches each repo before walking and prints an
end-of-run summary:

```
Fetch summary: 116 / 118 repos updated (2 failure(s), 0 skipped)
  - ml_pricing_engine: could not find remote 'origin'
  - datapipelines: authentication required
```

```bash
# Use --strict-fetch in CI to fail the run if any fetch fails
tga collect --strict-fetch

# See per-repo success lines too (useful for debugging network topology)
tga collect --verbose-fetch

# Skip fetching entirely (stale data warning is printed to stderr)
tga collect --no-fetch
```

Per-repo fetch timeouts can be set in the YAML config to prevent a slow remote
from blocking the entire run:

```yaml
repositories:
  - path: ~/code/slow-repo
    name: slow-repo
    fetch_timeout_secs: 30   # optional; enforcement is pending a future release
```

### Update rules and re-classify

```bash
# Preview which rules will fire (dry-run)
tga rules test "feat: add login page" --rules ./my-rules.yaml

# Re-classify all commits with the new rules (force overwrites existing verdicts)
tga classify --rules ./my-rules.yaml --force

# Scope re-classification to the last 4 weeks only
tga classify --rules ./my-rules.yaml --force --weeks 4
```

### Fix existing database (backfill operations)

```bash
# Fix on_default_branch = 0 for pre-1.4.1 databases
tga backfill reachability

# Re-extract ticket IDs after extending ticket patterns
tga backfill ticket-ids

# Score historical commit effort (persists in fact_commit_effort)
tga backfill effort --repos my-service

# Preview revert-flag changes without writing
tga backfill revert-flags --dry-run
```

### DORA metrics pipeline

```bash
# Ingest deployment events (default: git tags matching dora.deployment_tag_pattern)
tga deployments collect

# Ingest incidents from JIRA SRE issues or Datadog
tga incidents collect

# Compute and display the four DORA metrics
tga dora

# Limit to events since the start of Q2
tga dora --since 2026-04-01
```

## CLI Subcommand Reference

| Subcommand | Purpose | Key flags |
|------------|---------|-----------|
| `tga analyze` | Full pipeline (collect β†’ classify β†’ report) in one command | `--weeks`, `--from`, `--to`, `--force`, `--skip-collect`, `--skip-classify`, `--dry-run` |
| `tga collect` | Stage 1: extract commits into the database | `--repos`, `--branch`, `--weeks`, `--from`, `--to`, `--force`, `--head-only`, `--no-fetch`, `--strict-fetch`, `--verbose-fetch`, `--dry-run` |
| `tga classify` | Stage 2: classify collected commits | `--repos`, `--weeks`, `--since`, `--until`, `--force`, `--rules`, `--use-llm`, `--no-external` |
| `tga report` | Stage 3: generate CSV/JSON/Markdown reports | `--output`, `--formats`, `--author` |
| `tga author` | Per-engineer drill-down (commits, effort, PRs, categories) | `<email>`, `--format`, `--since`, `--until` |
| `tga pr-metrics` | Pull-request metrics per engineer | `--weeks`, `--csv`, `--output` |
| `tga install` | Interactive first-time config wizard | `--output`, `--force` |
| `tga aliases` | Manage developer identity aliases | `list`, `merge <src> <dst>`, `add-login <email> <provider> <login>` |
| `tga backfill` | Retroactive maintenance on existing rows | `--repos`, `--weeks`, `--since`, `--until`, `--dry-run` |
| `tga override` | Pin classification verdicts (Tier 0) | `add`, `list`, `remove` |
| `tga rules` | Introspect the active rule set | `list`, `show <sha>`, `test "<message>"` |
| `tga deployments` | DORA: ingest deployment events | `collect [--source]` |
| `tga incidents` | DORA: ingest production incidents | `collect [--source]` |
| `tga dora` | DORA: compute and display all four metrics | `--since` |

## CLI Reference

All subcommands accept these global flags:

| Flag | Default | Description |
|------|---------|-------------|
| `--config <PATH>` | `config.yaml` | Path to config YAML |
| `--database <PATH>` | `tga.db` | Path to SQLite database |
| `-v / -vv / -vvv` | warnings only | Increase log verbosity |

### tga analyze

Run the full pipeline: collect β†’ classify β†’ report.

```bash
tga analyze [--config <PATH>] [--database <PATH>] [--output <DIR>]
            [--skip-collect] [--skip-classify] [--weeks <N>]
            [--from <DATE>] [--to <DATE>] [--dry-run]
            [--validate-only] [--no-validate]
```

| Flag | Description |
|------|-------------|
| `--skip-collect` | Skip Stage 1; use commits already in the database |
| `--skip-classify` | Skip Stage 2; use existing classifications |
| `--output <DIR>` | Override `output.directory` from config |
| `--weeks <N>` | Limit collection to the last N weeks (overrides config `start_date`) |
| `--from <DATE>` | Start date for collection (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks` |
| `--to <DATE>` | End date for collection (ISO 8601 `YYYY-MM-DD`); defaults to today |
| `--dry-run` | Perform all steps against an in-memory database; the on-disk database is left untouched |
| `--validate-only` | Run configuration validation and exit (0 on success, 1 on errors) |
| `--no-validate` | Skip pre-flight configuration validation |

```bash
# Full pipeline
tga analyze --config config.yaml

# Re-run reports only (commits already collected and classified)
tga analyze --skip-collect --skip-classify --output ./reports-v2
```

### tga collect

Stage 1: extract commits from git repositories into the database.

```bash
tga collect [--config <PATH>] [--database <PATH>]
            [--repos <NAME,...>] [--branch <NAME[,NAME…]>]
            [--from <DATE>] [--to <DATE>] [--weeks <N>]
            [--since <DATE>] [--until <DATE>] [--dry-run]
            [--force-refresh-prs] [--validate-only] [--no-validate]
            [--head-only]
```

**Branch coverage (tga 2.0.0+):** By default `tga collect` walks ALL local
branches (`refs/heads/*`) and remote tracking refs (`refs/remotes/origin/*`),
so commits on PR branches, feature branches, and hotfixes are not silently
excluded. This fixes a data-integrity bug (#331) that was losing ~56% of
commits in multi-branch repos when using the HEAD-only default from tga ≀ 1.5.4.

To restore the legacy HEAD-only behaviour, pass `--head-only` globally or set
`head_only: true` on individual repository entries in your YAML config.

**Migration from tga ≀ 1.5.4:** existing databases are missing commits from
non-default branches. Re-collect with `--force` to recover that history:

```bash
tga collect --force   # re-walk all weeks with full branch coverage
```

| Flag | Description |
|------|-------------|
| `--repos <NAME,...>` | Comma-separated list of repository names to collect; others are skipped |
| `--branch <NAME[,NAME…]>` | Restrict the revwalk to these branch names (seeds from both `refs/heads/<name>` and `refs/remotes/origin/<name>`); mutually exclusive with `--head-only` |
| `--from <DATE>` | Collect commits on or after this ISO 8601 date; mutually exclusive with `--weeks` |
| `--to <DATE>` | Collect commits on or before this ISO 8601 date; defaults to today |
| `--since <DATE>` | Legacy alias for `--from` (Python-predecessor compatibility); `--from` takes precedence |
| `--until <DATE>` | Legacy alias for `--to` (Python-predecessor compatibility); `--to` takes precedence |
| `--weeks <N>` | Limit collection to the last N weeks; `--weeks` takes precedence over `--from`/`--to` |
| `--dry-run` | Run collection against an in-memory database; the on-disk database is left untouched |
| `--force-refresh-prs` | Re-fetch ADO pull requests even when already cached (backfills pre-v1.0.9 rows) |
| `--validate-only` | Run configuration validation and exit (0 on success, 1 on errors) |
| `--no-validate` | Skip pre-flight configuration validation |
| `--no-fetch` | Skip the pre-walk `git fetch`; use only local refs. A warning is printed to stderr so you know the data may be stale. |
| `--strict-fetch` | Exit non-zero if any repository's fetch fails (default: failures are shown in the summary but exit code is 0). Useful for CI. |
| `--verbose-fetch` | Print a success line per fetched repository in the fetch summary (default: only failures are shown). |
| `--head-only` | Restore legacy HEAD-only revwalk (tga ≀ 1.5.4 behaviour); mutually exclusive with `--branch`; also available per-repo via `head_only: true` in YAML |

```bash
tga collect --repos my-project --from 2024-01-01 --to 2024-03-31
tga collect --weeks 4           # collect last 4 weeks across all repos
tga collect --force             # re-collect all history (e.g. after upgrading to 2.0.0)
tga collect --branch main       # walk only the main branch for all repos
tga collect --branch main,release/1.0 --repos my-service  # branch + repo filter
tga collect --head-only         # legacy HEAD-only walk for all repos
```

### tga classify

Stage 2: run the classification cascade over collected commits.

```bash
tga classify [--config <PATH>] [--database <PATH>]
             [--repos <NAME,...>] [--weeks <N>] [--since <DATE>] [--until <DATE>]
             [--force] [--rules <PATH>] [--use-llm] [--backfill-complexity]
             [--no-external]
```

| Flag | Description |
|------|-------------|
| `--repos <NAME,...>` | Limit re-classification to these repository names (comma-separated) |
| `--weeks <N>` | Limit to commits in the last N ISO weeks; mutually exclusive with `--since`/`--until` |
| `--since <DATE>` | Lower bound on author timestamp (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks` |
| `--until <DATE>` | Upper bound on author timestamp (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks` |
| `--force` | Re-classify commits that already have a verdict (useful after updating rules) |
| `--rules <PATH>` | Override `classification.rules_file` from config. Custom rules default to priority 110 (above built-in 100) and are standalone by default (`extend_defaults: false`). |
| `--use-llm` | Enable LLM fallback regardless of config setting |
| `--backfill-complexity` | Fill missing complexity scores (1–5) for already-classified commits via the LLM, without re-running the full cascade; category, confidence, and method are left untouched |
| `--no-external` | Disable all external classification sources (JIRA, GitHub Issues) for this run, even if configured in the rules file or config |

```bash
tga classify --rules ./custom-rules.yaml --use-llm
tga classify --no-external                         # offline / CI mode
tga classify --force --repos my-service --weeks 4  # re-classify one repo, last 4 weeks
tga classify --force --since 2026-01-01            # re-classify from a date forward
```

### tga report

Stage 3: generate reports from classified commits.

```bash
tga report [--config <PATH>] [--database <PATH>]
           [--output <DIR>] [--formats <FMT,...>]
           [--author <EMAIL>]
```

| Flag | Description |
|------|-------------|
| `--output <DIR>` | Override `output.directory` from config |
| `--formats <FMT,...>` | Comma-separated: `csv`, `json`, `markdown` |
| `--author <EMAIL>` | Scope report to one canonical identity (matches `canonical_email` case-insensitively) |

```bash
# Full team report
tga report --output ./q1-reports --formats csv,json

# Single-engineer report
tga report --author alice@example.com --output ./alice-q1

# List available canonical identities
tga aliases list
```

If `--author` is supplied but the email is not in the `authors` table, `tga report`
exits non-zero with a suggestion to run `tga aliases list`.

### tga pr-metrics

Aggregate pull-request metrics per engineer from the `pull_requests` cache.

```bash
tga pr-metrics [--config <PATH>] [--database <PATH>]
               [--weeks <N>] [--csv] [--output <PATH>]
```

| Flag | Description |
|------|-------------|
| `--weeks <N>` | Limit metrics to PRs created within the last N weeks |
| `--csv` | Emit CSV instead of an aligned text table |
| `--output <PATH>` | Write output to a file (CSV with `--csv`, otherwise the text table) |

```bash
tga pr-metrics --weeks 12 --csv --output pr-metrics.csv
```

The `pr_comments_given` and `avg_revisions` columns are reserved for future
use and currently always output `0.0`; the underlying review-comment and
revision-count data is not yet tracked.

### tga backfill

Retroactive maintenance operations that update existing commit rows in place
(outside the normal `collect β†’ classify β†’ report` pipeline). All global filter
flags (`--repos`, `--weeks`, `--since`, `--until`) scope the operation uniformly.

**Note:** `--branch` is collect-only. Commits in the database do not carry branch
attribution after collection, so there is no branch filter for backfill operations.

```bash
tga backfill <SUBCOMMAND> [--dry-run]
             [--repos <NAME,...>] [--weeks <N>] [--since <DATE>] [--until <DATE>]
```

| Subcommand | Description |
|------------|-------------|
| `ai-detection` | Re-run LLM classification on low-confidence prior LLM verdicts |
| `revert-flags` | Scan commit messages for revert patterns and set `is_revert` |
| `ticket-ids` | Scan commit messages for ticket refs and update `ticket_id`/`ticketed` |
| `reachability` | Re-run the full reachability scan and upsert `fact_commit_reachability` without re-collecting (fixes `on_default_branch=0` rows in existing databases, issue #290) |
| `effort` | Compute empirical effort scores (XS/S/M/L/XL) for historical commits and persist in `fact_commit_effort` |

| Global flag | Description |
|-------------|-------------|
| `--dry-run` | Report how many rows would change without writing |
| `--repos <NAME,...>` | Limit to these repository names (comma-separated) |
| `--weeks <N>` | Limit to commits in the last N ISO weeks; mutually exclusive with `--since`/`--until` |
| `--since <DATE>` | Lower bound on author timestamp (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks` |
| `--until <DATE>` | Upper bound on author timestamp (ISO 8601 `YYYY-MM-DD`); mutually exclusive with `--weeks` |

```bash
tga backfill revert-flags --dry-run
tga backfill ticket-ids
tga backfill reachability                      # fix on_default_branch for all repos
tga backfill reachability --repos my-service   # single repo only
tga backfill effort --repos api --weeks 4      # effort for one repo, last 4 weeks
tga backfill ticket-ids --since 2026-01-01     # ticket IDs from a date forward
```

### tga override

Manage manual classification overrides (Tier 0). Rows here pin a commit's
verdict regardless of what the rule-based or LLM tiers would produce.

```bash
tga override <SUBCOMMAND>
```

| Subcommand | Description |
|------------|-------------|
| `add <SHA> <WORK_TYPE> <CHANGE_TYPE> [--notes <TEXT>] [--repo <PATH>]` | Insert (or replace) an override row for a commit SHA |
| `list [--repo <PATH>]` | List every override row, optionally filtered by repository |
| `remove <SHA> [--yes]` | Delete the override row(s) for a SHA (`--yes` skips confirmation) |

```bash
tga override add abc1234 feature feature --notes "manual review"
tga override list
tga override remove abc1234 --yes
```

### tga rules

Introspect the classification rule set (issue #209). Useful for tuning
rules and answering "why was this commit classified as X?".

```bash
tga rules list                       # every loaded rule, sorted by priority
tga rules show <commit-sha>          # verdict + method recorded for a commit
tga rules test "feat: add login"     # dry-run the cascade against a message
```

### tga deployments / tga incidents / tga dora

DORA metrics infrastructure (issues #207, #208, #212, #213).

```bash
# Step 1: ingest deployment events (default: git tags matching dora.deployment_tag_pattern)
tga deployments collect

# Step 2 (optional): ingest production incidents from JIRA SRE issues
tga incidents collect

# Step 3: compute Deployment Frequency, Lead Time, Change Failure Rate, MTTR.
# Also rebuilds the `deployment_failures` derived join from the current
# `dora.failure_signals` config β€” safe to re-run after a config edit.
tga dora
```

The four DORA metrics land in pre-computed SQL views
(`v_deployment_frequency`, `v_lead_time`, `v_change_failure_rate`,
`v_mttr`) so dashboards can read them directly without re-aggregating.

### Tag & Release-Branch Reachability (issues #279, #290)

`tga collect` automatically populates `fact_commit_reachability` with five columns that distinguish "deployed via cherry-pick to a release branch and tagged" from "abandoned WIP":

| Column | Type | Meaning |
|---|---|---|
| `on_default_branch` | boolean | `true` if the commit is reachable from the repo's default branch (`main`/`master`) |
| `on_any_tag` | boolean | `true` if any git tag reaches this commit |
| `reachable_from_tags` | JSON array | Tag names that contain this commit |
| `on_release_branch` | boolean | `true` if commit is on any configured release branch |
| `release_branches` | JSON array | Matching release-branch names |

**`on_default_branch` fix (1.4.1 β€” issue #290):** Prior to 1.4.1, `on_default_branch`
was declared in the schema but never populated β€” every row had `on_default_branch = 0`.
`tga collect` now auto-detects each repo's default branch via `refs/remotes/origin/HEAD`
(the symref set by `git clone`), falling back through `refs/heads/main`,
`refs/heads/master`, `refs/remotes/origin/main`, `refs/remotes/origin/master`.
If none of these refs exist for a repo, a warning is logged and `on_default_branch`
stays 0 for that repo's commits. To fix existing databases without re-running the
full collection pipeline, see `tga backfill reachability` below.

This resolves a systematic blind spot: bug fixes and security patches cherry-picked to `release/*` branches, tagged for production, and never merged to `main` previously showed `on_default_branch = false` β€” making them indistinguishable from abandoned WIP. With this feature, the 32% "merged rate" for bug fixes turns out to be much higher when deployed-via-tag commits are counted.

**Config:**

```yaml
reachability:
  track_tags: true               # default: true
  track_release_branches: true   # default: true
  release_branch_patterns:       # default: ["release/*", "hotfix/*", "chore/release-*", "v*"]
    - "release/*"
    - "hotfix/*"
    - "chore/release-*"
    - "v*"
```

Set `track_tags: false` to skip the tag scan (useful for trunk-based repos with thousands of tags). The default is `true` because the scan is O(repo + refs), not O(repo Γ— refs Γ— commits).

Optionally disable the scan for a single run:

```bash
tga collect --skip-tag-reachability
```

**Useful derived queries:**

```sql
-- What % of bug fixes actually shipped (via any path)?
SELECT
  COUNT(*) FILTER (WHERE on_default_branch OR on_any_tag OR on_release_branch)
  * 100.0 / COUNT(*) AS shipped_pct
FROM classifications cl
JOIN commits c ON c.classification_id = cl.id
JOIN fact_commit_reachability fcr ON fcr.commit_sha = c.sha
WHERE cl.category = 'bug_fix';

-- Cherry-pick rate: commits reachable from tags but not on main
SELECT repo, COUNT(*)
FROM fact_commit_reachability
WHERE on_any_tag AND NOT on_default_branch
GROUP BY repo;
```

## Pipeline Architecture

```
git repos ──┐
             β”‚   collect      SQLite (tga.db)   classify       SQLite    report
GitHub API ──┼──────────────► [commits]        ──────────────► [classif]─────────► CSV (Γ—9)
JIRA API ─────   (libgit2,    [authors]         (7-tier                 β–Ί JSON
Linear API ───   reqwest)     [pull_requests]   cascade,                β–Ί Markdown
ADO API  β”€β”€β”€β”€β”˜                [work_items]      Rayon-parallel)
```

**Stage 1 β€” collect** (`tga::collect`): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, fetches GitHub PR / JIRA issue / Linear / Azure DevOps work item metadata via REST/GraphQL, and writes everything to SQLite.

**Stage 2 β€” classify** (`tga::classify`): reads unclassified commits from the database, runs each message through the classification cascade (see below), and writes a classification verdict back. Rule-based tiers execute in parallel via Rayon.

**Stage 3 β€” report** (`tga::report`): reads the classified database, aggregates per-author, per-week, DORA, velocity, and quality statistics, and writes the configured output formats to the output directory.

## Classification

### Classification Cascade

Each commit message is tested against tiers in order. The first tier to produce a confident result wins.

**Tier 0 β€” Manual Override** (confidence 1.0): looks up the `(commit_hash, repo_path)` pair in the `classification_overrides` table. Managed via `tga override add|list|remove`.

**Tier 1.5 β€” Issue Type** (confidence 0.90): when the commit has ticket references resolving to rows in `issue_cache`, maps the upstream issue type (`bug`, `story`, `task`, `spike`, etc.) directly to a `change_type`.

**Tier 1.6 β€” JIRA Project Mapping** (default confidence 0.88, override via `jira.jira_project_mapping_confidence`): when `jira.jira_project_mappings` is configured, maps the JIRA project key prefix of any `[A-Z]+-\d+` reference to a `change_type`. Fires *before* the regex tier so project mappings outrank the generic `jira-ticket` regex rule (issue #206). Example config:

```yaml
jira:
  jira_project_mappings:
    TQL: bug_fix
    APEX: integration
    INFRA: platform_infrastructure
    SEC: security
  jira_project_mapping_confidence: 0.88   # optional, default 0.88
```

Tier-0 manual overrides and exact-keyword conventional-commit prefixes (e.g. `fix:`) still beat this tier.

**Tier 1 β€” Exact (Aho-Corasick)**: builds a single finite-state machine from every keyword list across every rule and scans the message in O(n) time. Matches `feat:`, `fix:`, `chore:`, etc. Confidence 0.85–0.95.

**Tier 2 β€” Regex**: applies pre-compiled regex patterns from the rule set. Handles anchored conventional-commit patterns (`^feat(\([^)]*\))?!?:`) and JIRA ticket IDs (`\b[A-Z][A-Z0-9]+-\d+\b`).

**Tier 2.5 β€” Weighted Sum** (new in 1.3.0): composes five independent signals into per-category scores and emits a verdict when the argmax score meets the minimum confidence threshold (default 0.55). Active for all commits including when `extend_defaults: false`. See [Weighted-Sum Tier](#weighted-sum-tier-250-new-in-130).

**Tier 3 β€” Fuzzy heuristics**: detects merge commits (via `is_merge` flag or `Merge pull request` prefix) and reverts (via `Revert` prefix). No external dependencies. Suppressed when `extend_defaults: false`.

**Tier 7 β€” LLM fallback** (optional, async): calls an OpenAI-compatible API (**OpenRouter** by default, **AWS Bedrock** behind the `bedrock` cargo feature) when tiers 0–3 leave a commit below the fallback threshold. Disabled by default; enable with `analysis.llm_classification.enabled: true` or `--use-llm`. Results are only accepted when `confidence >= confidence_threshold` (default 0.7). See [LLM fallback threshold](#llm-fallback-threshold-migration).

### Default Rules

| ID | Category | Keywords / Patterns |
|----|----------|---------------------|
| `cc-feat` | `feature` | `feat:`, `feature:`, `^feat(...)?!?:` |
| `cc-fix` | `bugfix` | `fix:`, `bugfix:`, `hotfix`, `^fix(...)?!?:` |
| `cc-chore` | `chore` | `chore:`, `^chore(...)?!?:` |
| `cc-docs` | `documentation` | `docs:`, `doc:`, `^docs?(...)?!?:` |
| `cc-refactor` | `refactor` | `refactor:`, `^refactor(...)?!?:` |
| `cc-test` | `test` | `test:`, `tests:`, `^tests?(...)?!?:` |
| `cc-ci` | `ci` | `ci:`, `^ci(...)?!?:` |
| `cc-perf` | `performance` | `perf:`, `^perf(...)?!?:` |
| `cc-style` | `style` | `style:`, `^style(...)?!?:` |
| `cc-build` | `build` | `build:`, `^build(...)?!?:` |
| `cc-revert` | `revert` | `revert:`, `^revert(...)?!?:` |
| `breaking-change` | `breaking` | `breaking change`, `breaking-change` |
| `jira-ticket` | `feature` (ticketed) | `\b[A-Z][A-Z0-9]+-\d+\b` |
| `kw-bug` | `bugfix` | `bug`, `defect` |
| `kw-security` | `bugfix` (security) | `security`, `cve-`, `vulnerability` |

Commits that match no rule are assigned category `uncategorized` with confidence 0.0.

### Custom Rules File

Supply your own rules as a YAML (or JSON) file:

```yaml
# my-rules.yaml
version: "1.0"
extend_defaults: false   # true = merge with built-ins; false = standalone (default)
rules:
  - id: my-deploy
    category: deployment
    keywords:
      - "deploy:"
      - "release:"
    patterns:
      - "(?i)^deploy(ment)?:"
    priority: 80
    confidence: 0.9
```

```bash
tga classify --rules ./my-rules.yaml
# or in config.yaml:
# classification:
#   rules_file: ./my-rules.yaml
```

### Rules YAML schema

Each entry under `rules:` is a **Rule** with these fields:

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `id` | string | **required** | Unique rule identifier (used in logs and `tga rules test`) |
| `category` | string | **required** | The classification label written to the database |
| `subcategory` | string | `null` | Optional leaf label (e.g. `"sql-injection"` under `category: "security"`) |
| `keywords` | list of strings | `[]` | Exact substrings, matched case-insensitively via Aho-Corasick |
| `pattern` / `patterns` | string or list | `[]` | Regex patterns (see note below) |
| `priority` | integer | `110` | Higher = checked first. Custom-rule default (110) beats built-ins (peak 115 for `cc-revert`, 100 for `cc-feat`/`cc-fix`) |
| `confidence` | float 0–1 | `0.85` | Score attached to the verdict; only accepted if β‰₯ `confidence_threshold` |

**`pattern:` vs `patterns:`** β€” both key names are accepted:

```yaml
# Singular string (most natural for a single regex)
pattern: "(?i)^feat[:(]"

# Plural list (multiple alternates for one rule)
patterns:
  - "(?i)^feat[:(]"
  - "(?i)^feature[:(]"
```

Using `pattern:` with a single string is exactly equivalent to `patterns:` with a single-element list. Both forms were silently mishandled in versions ≀ 1.2.0 (issue #259); if you had rules that never fired, this was the cause.

The top-level `RuleSet` also has two optional fields:

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `version` | string | `null` | Schema version tag (informational) |
| `extend_defaults` | bool | `false` | When `true`, custom rules are merged with the built-in set; same-`id` entries override the built-in |

### Weighted-Sum Tier 2.5 (new in 1.3.0)

The weighted-sum tier is a lightweight ensemble that sits between the regex tier and the fuzzy tier. It was introduced in 1.3.0 (issue #270) after evaluating fuzzy-logic crates β€” none of them provided sufficient recall improvement over the stepped keyword approach at acceptable binary size cost.

#### How it works

Five signals are evaluated independently and their scores are summed per category. The category with the highest total score wins, provided the score meets the `min_confidence` threshold (default 0.55). Confidence is clamped to `[min_confidence, 0.95]`.

| Signal | How scored |
|--------|------------|
| Keyword density | 0 matches β†’ 0.0; 1 match β†’ 0.40; 2 matches β†’ 0.60; 3+ matches β†’ 0.75 |
| Ticket prefix | Uniform +0.05 when a `PROJ-123` style prefix is detected |
| Message length | Short (<12 chars): boosts KTLO/Merge (+0.10), Maintenance (+0.05); suppresses Feature (βˆ’0.05). Long (>80 chars): boosts Feature/PlatformWork (+0.10), Maintenance/Bugfix (+0.05). |
| Merge indicator | Dedicated weight vector fires when `is_merge=true` or message starts with "merge…" |
| File paths | Test files β†’ Maintenance/Bugfix signal; docs/markdown β†’ Content signal; manifests β†’ KTLO/Maintenance signal |

If two or more categories share the exact same top score, the tier returns no verdict (tie-breaking falls through to fuzzy or LLM).

The tier is active even when `extend_defaults: false`. Unlike the fuzzy tier, which emits fixed built-in category strings (`merge`, `feature`, `chore`), the weighted-sum tier picks its verdict dynamically by signal composition and does not embed built-in category knowledge directly.

#### Configuration

```yaml
classification:
  weighted_sum:
    enabled: true         # default: true. Set false to disable this tier entirely.
    min_confidence: 0.55  # default: 0.55. Minimum score to emit a verdict.
```

To disable just this tier without touching anything else:

```yaml
classification:
  weighted_sum:
    enabled: false
```

### LLM Fallback Threshold Migration

**Breaking change in 1.3.0**: `classification.llm_fallback_threshold` default raised from `0.0` to `0.65`.

At `0.0` (the old default), the LLM fallback would only fire when every deterministic tier returned confidence exactly `0.0` β€” effectively never. This made `use_llm: true` a no-op for most real-world commit messages. The new default of `0.65` routes any deterministic verdict below 0.65 to the LLM when `use_llm: true`.

**Affected setups**: configs that set `use_llm: true` (or `--use-llm`) and relied on the LLM *never* re-classifying commits that the deterministic tiers had already classified with confidence below 0.65. In practice this affects configs where the fuzzy tier's outputs (0.40 for short chore messages, 0.60 for bare ticket refs) were intentionally accepted as final.

**Migration options**:

Option 1 β€” restore the old behavior explicitly:

```yaml
classification:
  llm_fallback_threshold: 0.0   # pin to old default
```

Option 2 β€” accept the new default and tune upward if the LLM fires too often:

```yaml
classification:
  llm_fallback_threshold: 0.70  # only route very low-confidence verdicts to LLM
```

**Users with `use_llm: false`** (the default) are not affected by this change.

### Rule Priority and extend_defaults (issue #259)

Custom rules default to **priority 110** β€” one step above the highest built-in rule priority (100). This means user rules win over the default ruleset without needing an explicit `priority:` entry in every rule.

Custom rule files also default to **standalone** mode (`extend_defaults: false`): only the rules in the file are applied. Opt in to merging with the built-in defaults by adding `extend_defaults: true`.

**Fuzzy-tier gating (1.2.2)**: the fuzzy heuristic tier (which emits the built-in category strings `merge`, `feature`, `chore`) is suppressed when `extend_defaults: false`. This aligns with the principle that `extend_defaults: false` means "no built-in hardcoded classification." If you see `method=fuzzy_match` rows for a config with `extend_defaults: false`, upgrade to 1.2.2.

**Weighted-sum tier and extend_defaults (1.3.0)**: unlike the fuzzy tier, the weighted-sum tier (Tier 2.5) remains active even with `extend_defaults: false`. It picks its verdict via signal composition rather than emitting fixed built-in strings, so disabling it requires `classification.weighted_sum.enabled: false`.

```yaml
# my-rules.yaml β€” standalone by default (no built-in rules loaded)
extend_defaults: false   # optional: explicitly document the default
rules:
  - id: my-deploy
    category: deployment
    keywords: ["deploy:", "release:"]
    # priority defaults to 110 β€” beats the built-in cc-feat (100)
```

```yaml
# my-addons.yaml β€” augments the defaults
extend_defaults: true
rules:
  - id: my-payments
    category: payments
    keywords: ["payment:", "billing:"]
    confidence: 0.92
```

**Diagnosing rules that never fire**: if your custom categories never appear in reports, run:

```bash
tga rules list --rules ./my-rules.yaml   # shows all loaded rules with their matchers
tga rules test "feat: add login" --rules ./my-rules.yaml   # dry-run a message
tga classify -vv --rules ./my-rules.yaml   # verbose mode shows per-rule decisions
```

A `WARN rule has no keywords or patterns` in the log output means a rule's YAML key was silently dropped β€” the most common cause is writing `pattern:` (singular) in a version before 1.2.1. Upgrade to 1.2.1+ and both `pattern:` and `patterns:` are accepted.

See also: [Multi-source classification](#multi-source-classification-issue-260) for external ticket system integration.

### Multi-source classification (issue #260)

`tga classify` can consult external ticket systems β€” JIRA Cloud/Server and
GitHub Issues β€” as high-confidence classification signals **before** the
commit-message rule tiers run.

External sources fire as **Tier 0.5** β€” after manual overrides, before
commit-message rules β€” with a fixed confidence of `0.92`. They are enabled
by default when configured; use `--no-external` to opt out for a run.

**Priority model** (highest to lowest):

1. Manual overrides (`tga override add`, confidence 1.0)
2. External sources β€” JIRA issue type / GitHub Issues labels (confidence 0.92)
3. Custom commit-message rules (default priority 110)
4. Built-in TGA commit-message rules (priority 100)
5. LLM fallback (when `use_llm: true`)

**Configuration** β€” add a `sources:` list to `config.yaml` under
`classification:`:

```yaml
classification:
  no_external: false    # opt-out flag; sources fire by default once configured
  sources:

    # --- JIRA source ---
    - type: jira
      base_url: "https://yourco.atlassian.net"
      token_env: JIRA_API_TOKEN      # name of the env var holding your API token
      email_env: JIRA_USER_EMAIL     # optional β€” only needed for JIRA Cloud Basic auth
      project_keys: ["PROJ", "ENG"]  # empty list = match all projects
      field_mappings:
        # Maps JIRA issue type β†’ tga category. Case-sensitive on the JIRA side.
        issue_type:
          Bug: bug_fix
          Story: new_feature
          Task: tech_debt_refactoring
          Sub-task: tech_debt_refactoring
          Epic: new_feature
        # Maps JIRA labels β†’ tga category. Labels are case-sensitive.
        labels:
          ktlo: tech_debt_refactoring
          security: security
        # Maps JIRA components β†’ tga category.
        components:
          "CI/CD": devops
          Platform: platform_infrastructure

    # --- GitHub Issues source ---
    - type: github_issues
      repo: "acme/widgets"           # "owner/repo" slug
      token_env: GITHUB_TOKEN        # name of the env var holding your PAT
      label_mappings:
        bug: bug_fix
        enhancement: new_feature
        dependencies: tech_debt_refactoring
        documentation: documentation
        security: security
```

**Field-mapping precedence within a JIRA source** (highest to lowest):
`issue_type` β†’ `labels` β†’ `components`. The first mapping that matches wins.

**Credential handling**: tokens are read from the **named environment
variables** at runtime β€” never store them directly in config files:

```bash
export JIRA_API_TOKEN="your-jira-api-token"
export JIRA_USER_EMAIL="you@yourco.com"   # JIRA Cloud only
export GITHUB_TOKEN="your-github-pat"
tga classify --config config.yaml
```

**Failure mode**: a missing token, HTTP 401/404, or network error causes the
source to be skipped with a `WARN` log and the pipeline falls through to
commit-message rules for affected commits. Classification never panics on
source failures.

**Caching**: each unique ticket is fetched at most once per `tga classify`
run (in-memory dedup, never persisted to disk). A 15 000-commit run against
a project with 500 unique JIRA tickets makes 500 API calls, not 15 000.

**Disabling external sources**: use `--no-external` to skip all sources for
a run (useful in CI or offline environments):

```bash
tga classify --no-external
```

See [`examples/multi-source-config.yaml`](examples/multi-source-config.yaml)
for a fully annotated example.

    # --- Linear source (new in 1.4.0) ---
    - type: linear
      api_key_env: LINEAR_API_KEY         # Personal API Key (not OAuth)
      team_keys: ["ENG", "INFRA"]         # Empty = match all Linear teams.
                                          # Use this to disambiguate from JIRA keys
                                          # when both follow the PROJ-NNN pattern.
      field_mappings:
        issue_type:
          Bug: bug_fix
          Feature: new_feature
          Improvement: tech_debt_refactoring
        labels:
          security: security
          ktlo: tech_debt_refactoring
        cycle:
          "Sprint 42": new_feature        # active cycle/sprint β†’ category

    # --- Shortcut source (new in 1.4.0) ---
    - type: shortcut
      api_token_env: SHORTCUT_API_TOKEN
      field_mappings:
        story_type:
          bug: bug_fix
          feature: new_feature
          chore: tech_debt_refactoring
        labels:
          security: security
        workflow_state:
          "In Progress": new_feature      # state name β†’ category

    # --- Confluence source (new in 1.4.0) ---
    # Note: Confluence confidence is fixed at 0.80 (lower than the default 0.92).
    # References are extracted from /wiki/spaces/.../pages/<id> URLs and
    # Smart Commit tags ([CONF-<id>] in commit messages).
    - type: confluence
      base_url: "https://yourco.atlassian.net"
      token_env: CONFLUENCE_API_TOKEN
      email_env: CONFLUENCE_EMAIL         # Confluence Cloud Basic auth email
      label_mappings:
        runbook: devops
        architecture: tech_debt_refactoring
        security: security

    # --- Datadog source (new in 1.4.0) ---
    # Matches commit SHAs (7–40 hex chars) in commit messages against the
    # Datadog Events API deployment events. Confidence is fixed at 0.95.
    - type: datadog
      api_key_env: DD_API_KEY
      app_key_env: DD_APP_KEY
      site: datadoghq.com                 # optional, default "datadoghq.com"
      default_category: devops            # category written when a deployment
                                          # event is found (default: "devops")
      lookback_days: 30                   # how far back to search events (default: 30)
```

**Linear key disambiguation**: Linear keys and JIRA keys share the same
`PROJ-NNN` shape. Set `team_keys` on the Linear source to the exact team
prefixes you use β€” keys that do not match any configured prefix are silently
skipped by the Linear source (and may still be picked up by JIRA if that
source is also configured).

**Shortcut reference formats**: two forms are recognized automatically:
- `[ch1234]` β€” bracket form (in commit message body or subject)
- `sc-1234` β€” sc-prefix form (common in branch names carried into messages)

**Confluence reference formats**: two forms are recognized:
- `/wiki/spaces/MYSPACE/pages/12345` β€” full URL (page ID extracted)
- `[CONF-12345]` β€” Smart Commit tag (page ID after the hyphen)

**Credential handling for new sources**:

```bash
export LINEAR_API_KEY="lin_api_..."  # pragma: allowlist secret
export SHORTCUT_API_TOKEN="..."      # pragma: allowlist secret
export CONFLUENCE_API_TOKEN="..."    # pragma: allowlist secret
export CONFLUENCE_EMAIL="you@yourco.com"
export DD_API_KEY="..."              # pragma: allowlist secret
export DD_APP_KEY="..."              # pragma: allowlist secret
tga classify --config config.yaml
```

See [`examples/multi-source-config.yaml`](examples/multi-source-config.yaml)
for a fully annotated example covering all six sources.

See also: [Rules YAML schema](#rules-yaml-schema) for commit-message rule authoring.

## Operational Notes

### Fixing `on_default_branch = 0` for existing databases (1.4.1 β€” issue #290)

If your `tga.db` was built before 1.4.1, every row in `fact_commit_reachability`
has `on_default_branch = 0` because the column was never populated. You can fix
this **without re-running the full collection pipeline** (which can take 20+
minutes on large corpora):

```bash
# Verify the problem (sum should be 0 on a pre-1.4.1 DB)
sqlite3 tga.db "SELECT SUM(on_default_branch), COUNT(*) FROM fact_commit_reachability;"

# Fix in-place β€” upserts all five reachability columns, no row deletion
tga backfill reachability

# Verify the fix (sum should now be > 0 for repos with main/master)
sqlite3 tga.db "SELECT SUM(on_default_branch), SUM(on_any_tag), COUNT(*) FROM fact_commit_reachability;"
sqlite3 tga.db "SELECT repo, SUM(on_default_branch) FROM fact_commit_reachability GROUP BY repo LIMIT 10;"
```

The backfill is **idempotent** β€” running it multiple times produces identical
results and never deletes rows. Only the five reachability columns are touched.

**Auto-detection logic:** for each repository, tga tries (in order):
1. `refs/remotes/origin/HEAD` (symref β€” set by `git clone`, points at whatever
   the remote considers its default branch)
2. `refs/heads/main`
3. `refs/heads/master`
4. `refs/remotes/origin/main`
5. `refs/remotes/origin/master`

If none of the above resolve to a commit, a `WARN` is logged and
`on_default_branch` remains 0 for that repo.

### SQLite WAL durability (fixed in 1.4.0 β€” issue #298)

`tga` opens its database in WAL (Write-Ahead Log) mode for concurrent-read
performance. Prior to 1.4.0, the WAL was never explicitly checkpointed on
exit, so the main `.db` file could lag behind the `-wal` sidecar. On a large
corpus this meant up to 22 minutes of classification work was held only in the
WAL and could be lost if the file was copied or the process was killed.

**1.4.0 fix**: `tga classify` now calls `PRAGMA wal_checkpoint(TRUNCATE)` on
clean exit, flushing all WAL frames into the main database file and zeroing
the WAL. In addition, a periodic `PRAGMA wal_checkpoint(PASSIVE)` fires every
`classification.checkpoint_every` commits (default 0 = disabled) to cap the
data-loss window during long runs.

```yaml
classification:
  checkpoint_every: 1000  # PASSIVE checkpoint every 1000 classified commits
                          # (0 = off, which is the default)
```

The checkpoint is non-fatal: if it fails (e.g., `SQLITE_CORRUPT`), `tga`
logs a `WARN` and continues. You can manually flush at any time:

```bash
sqlite3 tga.db 'PRAGMA wal_checkpoint(TRUNCATE)'
```

## Output Formats

### CSV

Nine CSV files are written when `csv` is in the format list (per-author,
weekly activity, DORA metrics, velocity, quality, and related breakdowns).
The two most commonly used are documented below.

**`authors.csv`** β€” one row per author:

| Column | Description |
|--------|-------------|
| `name` | Canonical author name |
| `email` | Canonical author email |
| `commit_count` | Total commits |
| `insertions` | Total lines added |
| `deletions` | Total lines deleted |
| `files_changed` | Total files changed |
| `first_commit` | ISO 8601 timestamp of earliest commit |
| `last_commit` | ISO 8601 timestamp of most recent commit |

**`weekly_activity.csv`** β€” one row per week/author/repository bucket:

| Column | Description |
|--------|-------------|
| `week` | ISO week label, e.g. `2024-W03` |
| `author` | Author name |
| `repository` | Repository name |
| `commit_count` | Commits in this bucket |
| `insertions` | Lines added in this bucket |
| `deletions` | Lines deleted in this bucket |

### JSON

Four JSON files are written when `json` is in the format list:
`report.json`, `velocity_summary.json`, `quality_summary.json`, and
`dora_summary.json`. The primary one is documented below.

**`report.json`** β€” full structured payload:

```json
{
  "generated_at": "2024-03-15T10:00:00Z",
  "period_start": "2024-01-01T00:00:00Z",
  "period_end":   "2024-03-14T23:59:59Z",
  "total_commits": 347,
  "total_authors": 8,
  "category_breakdown": { "feature": 120, "bugfix": 45, ... },
  "authors": [
    {
      "name": "Alice Smith",
      "email": "alice@company.com",
      "commit_count": 87,
      "insertions": 4200,
      "deletions": 1100,
      "files_changed": 310,
      "categories": { "feature": 50, "bugfix": 20, ... },
      "first_commit": "...",
      "last_commit": "..."
    }
  ],
  "repositories": [
    {
      "name": "my-project",
      "commit_count": 347,
      "author_count": 8,
      "insertions": 18000,
      "deletions": 6000,
      "top_categories": [["feature", 120], ["bugfix", 45]]
    }
  ],
  "weekly_activity": [
    {
      "week": "2024-W03",
      "author": "Alice Smith",
      "repository": "my-project",
      "commit_count": 12,
      "insertions": 500,
      "deletions": 120,
      "categories": { "feature": 8, "bugfix": 4 }
    }
  ]
}
```

### Markdown

**`report.md`** β€” a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.

## Development

### Build and Test

```bash
# Build everything
cargo build

# Build release binary
cargo build --release

# Run all tests
cargo test

# Lint (zero warnings required)
cargo clippy -- -D warnings

# Format check (CI gate)
cargo fmt --check

# Auto-format
cargo fmt

# Generate rustdoc
cargo doc --open
```

### Running Against Real Repos

`configs/example-config.yaml` is a working example that analyzes repositories using `developer_aliases`. Copy it, adjust paths and names to match your setup, then run:

```bash
tga analyze --config configs/example-config.yaml --database tga.db
```

### CI Gates

The GitHub Actions workflow (`ci.yml`) requires:
- `cargo fmt -- --check`
- `cargo clippy --all-targets -- -D warnings`
- `cargo test`
- `cargo doc --no-deps` with `RUSTDOCFLAGS="-D warnings"`

## Crate Structure

Single `tga` crate (consolidated from the original 5-crate workspace):

| Module | Path | Purpose |
|--------|------|---------|
| `tga::core` | `src/core/` | Shared types, config, DB schema, migrations, error types |
| `tga::collect` | `src/collect/` | Stage 1: git extraction (libgit2), GitHub/JIRA/Linear/ADO clients, `PmAdapter` trait |
| `tga::classify` | `src/classify/` | Stage 2: seven-tier classification cascade |
| `tga::report` | `src/report/` | Stage 3: CSV/JSON/Markdown output |
| `commands` (binary-private) | `src/commands/` | Subcommand handlers wired into `src/main.rs` |

## Performance

Benchmarked against the Duetto production corpus (72,608 commits, 20 JIRA projects) on
Apple M4 Max (128 GB RAM) with tga 1.3.0:

| Metric | Value |
|--------|-------|
| CPU classify throughput (rule cascade only) | ~113,000 commits/sec |
| Coverage β€” rule tiers only (`--no-external`) | 64.3% of 72,608 commits |
| Coverage β€” with JIRA external source | 67.7% of 72,608 commits |
| Peak RSS (full classify pass) | 235 MB |
| weighted_sum tier contribution (new in 1.3.0) | 9.6–11.0% of classified commits |

JIRA external-source classification adds 3.4 percentage points of coverage at the cost of
~23 minutes of network-bound JIRA API time (the rule-cascade itself completes in 0.64s on
this corpus). Use `--no-external` for fast iteration on rule files.

Full benchmark: [`docs/trusty-git-analytics/regression-testing/v1.3.0-2026-05-27.md`](
../../docs/trusty-git-analytics/regression-testing/v1.3.0-2026-05-27.md)

## License

MIT β€” see [LICENSE](LICENSE).