whogitit 0.1.0

Track AI-generated code at line-level granularity
Documentation
# Privacy & Redaction

whogitit automatically redacts sensitive information from prompts before storing them in git notes. This protects credentials, personal information, and other secrets that might accidentally appear in your prompts.

## How Redaction Works

When a prompt is captured, whogitit scans it against a set of patterns and replaces matches with `[REDACTED]`:

**Before:**
```
Add authentication using api_key = sk-12345abcdef
```

**After (stored):**
```
Add authentication using api_key = [REDACTED]
```

## Built-in Patterns

whogitit includes patterns for common sensitive data:

### Credentials

| Pattern | Examples |
|---------|----------|
| **API_KEY** | `api_key=xxx`, `apikey: xxx`, `secret=xxx` |
| **AWS_ACCESS_KEY** | `AKIA...` (20 chars) |
| **AWS_SECRET_KEY** | 40-character base64 strings |
| **BEARER_TOKEN** | `Bearer eyJ...`, `Authorization: Bearer` |
| **GITHUB_TOKEN** | `ghp_xxx`, `gho_xxx`, `ghs_xxx`, `ghr_xxx` |
| **GOOGLE_API_KEY** | `AIza...` |
| **SLACK_TOKEN** | `xoxb-xxx`, `xoxp-xxx` |
| **JWT** | `eyJ...` JSON Web Tokens |
| **PASSWORD** | `password=xxx`, `passwd: xxx` |
| **PRIVATE_KEY** | `-----BEGIN.*PRIVATE KEY-----` blocks |

### Personal Information

| Pattern | Examples |
|---------|----------|
| **EMAIL** | `user@example.com` |
| **PHONE** | `(555) 123-4567`, `+1-555-123-4567` |
| **SSN** | `123-45-6789` |
| **CREDIT_CARD** | `4111-1111-1111-1111` |

## Testing Redaction

Use `redact-test` to see how your text would be redacted:

```bash
whogitit redact-test "Connect using api_key=sk-secret123 and email user@example.com"
```

Output:

```
Original:
  Connect using api_key=sk-secret123 and email user@example.com

Redacted:
  Connect using api_key=[REDACTED] and email [REDACTED]

Patterns matched:
  - API_KEY (1 match)
  - EMAIL (1 match)
```

### Testing Files

```bash
whogitit redact-test --file config.example.txt
```

## Custom Patterns

Add organization-specific patterns in `.whogitit.toml`:

```toml
[privacy]
# Custom patterns
[[privacy.custom_patterns]]
name = "INTERNAL_ID"
pattern = "INT-[A-Z0-9]{8}"
description = "Internal system IDs"

[[privacy.custom_patterns]]
name = "EMPLOYEE_ID"
pattern = "EMP\\d{6}"
description = "Employee identification numbers"

[[privacy.custom_patterns]]
name = "DATABASE_URL"
pattern = "postgres://[^\\s]+"
description = "PostgreSQL connection strings"
```

### Pattern Syntax

Patterns use Rust regex syntax (similar to PCRE):

| Syntax | Meaning |
|--------|---------|
| `\d` | Digit |
| `\w` | Word character |
| `\s` | Whitespace |
| `[A-Z]` | Character class |
| `+` | One or more |
| `*` | Zero or more |
| `{n}` | Exactly n times |
| `{n,m}` | Between n and m times |
| `(?i)` | Case insensitive |

### Testing Custom Patterns

After adding patterns, verify they work:

```bash
whogitit redact-test "Reference INT-ABC12345 for employee EMP123456"
```

## Disabling Patterns

If a built-in pattern is too aggressive, disable it:

```toml
[privacy]
disabled_patterns = ["EMAIL", "PHONE"]
```

Common reasons to disable:

| Pattern | Why Disable |
|---------|-------------|
| EMAIL | Open source projects where contributor emails are public |
| PHONE | False positives with version numbers or IDs |

## Disabling Redaction

To disable redaction entirely:

```toml
[privacy]
enabled = false
```

> **Warning**: Disabling redaction may expose sensitive data in your git history. Only do this if you're certain no sensitive data will appear in prompts.

## Audit Trail

When audit logging is enabled, redaction events are recorded:

```toml
[privacy]
audit_log = true
```

View redaction events:

```bash
whogitit audit --event-type redaction
```

Output:

```
2026-01-30 14:23:15 redaction pattern:API_KEY redactions:2
2026-01-30 14:20:00 redaction pattern:EMAIL redactions:1
```

## Best Practices

### 1. Test Before Production

Before enabling whogitit on a project, test redaction with representative prompts:

```bash
# Test various scenarios
whogitit redact-test "Your typical prompt with api_key=xxx"
```

### 2. Add Organization Patterns

Identify internal secret formats and add patterns:

```toml
[[privacy.custom_patterns]]
name = "ACME_API_KEY"
pattern = "acme_[a-f0-9]{32}"
description = "ACME Corp API keys"
```

### 3. Review Periodically

Check the audit log for missed patterns:

```bash
whogitit audit --event-type redaction --limit 100
```

If certain patterns are matching frequently, they're working. If sensitive data is getting through, add new patterns.

### 4. Use Environment Variables

Instead of including secrets in prompts, reference environment variables:

**Instead of:**
```
Connect to postgres://user:password@host/db
```

**Use:**
```
Connect using the DATABASE_URL environment variable
```

### 5. Enable Audit Logging

For compliance-sensitive projects:

```toml
[privacy]
audit_log = true
```

## Limitations

- Redaction is pattern-based and cannot catch everything
- Novel secret formats may not be detected
- Context-dependent secrets (e.g., "the password is banana") are not detected
- Redaction is one-way - original text cannot be recovered

## See Also

- [Configuration]./configuration.md - Full configuration reference
- [audit]./commands/audit.md - Viewing redaction audit trail