# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Build & Test Commands
```bash
# Build
cargo build
cargo build --release
# Run all tests
cargo test --verbose
# Run only unit tests
cargo test --lib
# Run only integration tests
cargo test --test '*'
# Run a single test by name
cargo test test_full_workflow -- --nocapture
# Lint and format
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
# Security audit
cargo audit
```
## CLI Usage
```bash
# Run locally during development
cargo run -- <command>
# Setup and configuration
cargo run -- setup # One-time global setup (Claude Code integration)
cargo run -- doctor # Verify configuration
cargo run -- init # Initialize repository hooks
# Core attribution commands
cargo run -- blame src/main.rs
cargo run -- blame src/main.rs --ai-only
cargo run -- show HEAD
cargo run -- show HEAD --format json
cargo run -- prompt src/main.rs:42
cargo run -- summary --base main --format markdown
# Status and utility commands
cargo run -- status # Show pending changes
cargo run -- clear # Discard pending changes
# Data export and management
cargo run -- export --format json
cargo run -- export --since 2024-01-01 --until 2024-12-31
cargo run -- retention preview
cargo run -- retention apply --execute
cargo run -- audit --limit 100
# Developer integration (GitHub, git)
cargo run -- annotations --base main --head HEAD
cargo run -- annotations --base main --min-ai-lines 5 --sort-by coverage
cargo run -- annotations --base main --diff-only --group-ai-types
cargo run -- pager # Read diff from stdin
# Privacy testing
cargo run -- redact-test --text "api_key=secret"
cargo run -- redact-test --list-patterns
# Copy attribution (after cherry-pick or recovery)
cargo run -- copy-notes <old-sha> <new-sha>
cargo run -- copy-notes abc123 def456 --dry-run
```
## Architecture Overview
whogitit tracks AI-generated code at line-level granularity using a dual-hook capture system and three-way diff analysis.
### Data Flow
```
Claude Code Session (Edit/Write tools)
│
├─► PreToolUse Hook: saves file state before edit
├─► PostToolUse Hook: captures change + prompt from transcript
│
▼
Pending Buffer (.whogitit-pending.json)
│ - Full content snapshots (original → AI edit 1 → AI edit 2 → ...)
│ - Session metadata, prompts, file histories
│
▼ (git commit triggers post-commit hook)
│
Three-Way Analysis
│ - Compares: original → AI snapshots → final committed
│ - Attributes each line: AI / AIModified / Human / Original
│
▼
Git Notes (refs/notes/whogitit)
- AIAttribution JSON per commit
- Prompts (redacted), line-level attribution, summaries
```
### Key Modules
- **capture/**: Hook handlers and pending buffer
- `hook.rs`: CaptureHook - handles PreToolUse/PostToolUse from Claude Code
- `pending.rs`: PendingBuffer - stores snapshots until commit
- `threeway.rs`: ThreeWayAnalyzer - core attribution algorithm
- `snapshot.rs`: Data structures (ContentSnapshot, AIEdit, FileEditHistory, LineAttribution)
- `diff.rs`: Diff utilities
- **core/**: Attribution data models and blame engine
- `attribution.rs`: AIAttribution, PromptInfo, SessionMetadata, ModelInfo
- `blame.rs`: AIBlamer - combines git blame with AI notes
- **storage/**: Git notes persistence
- `notes.rs`: NotesStore - read/write attribution to `refs/notes/whogitit`
- `trailers.rs`: TrailerGenerator - git trailers from attribution
- `audit.rs`: AuditLog, AuditEvent - compliance event logging
- **cli/**: Command implementations
- `blame.rs`, `show.rs`, `prompt.rs`, `summary.rs` - core attribution commands
- `annotations.rs`: GitHub Checks API annotation generation
- `pager.rs`: Git diff pager with AI attribution markers
- `export.rs`: Bulk attribution export (JSON/CSV)
- `setup.rs`: Global setup, doctor, and init commands
- `retention.rs`: Data retention policy management
- `audit.rs`: Audit log viewing
- `redact.rs`: Redaction pattern testing
- `copy.rs`: Copy attribution between commits
- `output.rs`: Formatting (Pretty, JSON, Markdown)
- **privacy/**: Sensitive data protection
- `redaction.rs`: Redactor - regex patterns for API keys, emails, passwords, etc.
- `config.rs`: WhogititConfig, PrivacyConfig, RetentionConfig - `.whogitit.toml` parsing
### Line Attribution Types
| AI | `●` | Generated by AI, unchanged |
| AIModified | `◐` | Generated by AI, then edited by human |
| Human | `+` | Added by human after AI edits |
| Original | `─` | Existed before AI session |
| Unknown | `?` | Could not determine source |
### Data Formats
**Pending Buffer** (`.whogitit-pending.json`): Version 2 format with full content snapshots per file, edit history chain, and session metadata.
**Git Notes** (`refs/notes/whogitit`): AIAttribution JSON with schema version 2, containing session info, prompts array, and per-file line-level attribution results.
## Code Style
- Rust 2021 edition
- Max line width: 100 characters
- 4-space indentation
- Run `cargo fmt` before committing
## Hook Integration
The shell hook at `hooks/whogitit-capture.sh` (installed to `~/.claude/hooks/`) reads the `transcript_path` JSONL file provided by Claude Code to extract user prompts. It uses `jq -s` (slurp mode) since the transcript is JSON Lines format, not a JSON array.