git-ai 1.1.0

Git AI: Automates commit messages using ChatGPT. Stage your files, and Git AI generates the messages.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
# Git AI Hook Commit Message Generation Process - Detailed Overview

## Table of Contents

1. [Introduction]#introduction
2. [Architecture Overview]#architecture-overview
3. [Process Flow]#process-flow
4. [Phase-by-Phase Breakdown]#phase-by-phase-breakdown
5. [Multi-Step Analysis System]#multi-step-analysis-system
6. [Implementation Details]#implementation-details
7. [Error Handling & Fallbacks]#error-handling--fallbacks
8. [Performance Optimization]#performance-optimization
9. [Configuration & Customization]#configuration--customization

## Introduction

Git AI is a sophisticated Rust-based CLI tool that automates the generation of high-quality commit messages by analyzing git diffs through a structured, multi-phase process. The system seamlessly integrates with git hooks to intercept the commit process and generate contextually relevant commit messages using AI.

## Architecture Overview

The system consists of several key components:

```
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Git Command   │────▶│  Git AI Hook     │────▶│  Multi-Step     │
│  (git commit)   │     │ (prepare-commit) │     │   Processor     │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                                │                          │
                                ▼                          ▼
                        ┌──────────────────┐     ┌─────────────────┐
                        │   Diff Parser    │     │   AI Analysis   │
                        │   & Processor    │     │   (OpenAI/Local)│
                        └──────────────────┘     └─────────────────┘
```

### Core Components

1. **CLI Interface** (`src/main.rs`)

   - Command-line interface for configuration and hook management
   - Handles installation, uninstallation, and configuration commands

2. **Git Hook Binary** (`src/bin/hook.rs`)

   - Actual git hook executable invoked during commit process
   - Manages the prepare-commit-msg hook lifecycle

3. **Multi-Step Integration** (`src/multi_step_integration.rs`)

   - Orchestrates the multi-phase commit message generation
   - Coordinates between different analysis steps

4. **Multi-Step Analysis** (`src/multi_step_analysis.rs`)

   - Implements file categorization and impact scoring
   - Generates commit message candidates

5. **Function Calling** (`src/function_calling.rs`)
   - Implements OpenAI function calling for structured output
   - Ensures consistent response format

## Process Flow

```mermaid
graph TD
    A[Git Commit Initiated] --> B{Check Arguments}
    B -->|No Message| C[Parse Git Diff]
    B -->|Message Provided| Z[Use Provided Message]

    C --> D[Multi-Step Analysis]
    D --> E[File Analysis]
    E --> F[Impact Scoring]
    F --> G[Message Generation]
    G --> H[Best Candidate Selection]

    H --> I{API Available?}
    I -->|Yes| J[OpenAI Multi-Step]
    I -->|No| K[Local Multi-Step]

    J --> L{Success?}
    K --> M{Success?}

    L -->|Yes| N[Apply Message]
    L -->|No| K
    M -->|Yes| N
    M -->|No| O[Single-Step Fallback]

    O --> P{Success?}
    P -->|Yes| N
    P -->|No| Q[Error]

    N --> R[Complete Commit]
```

## Phase-by-Phase Breakdown

### Phase 1: Initialization & Setup

When `git commit` is executed without a message:

1. **Hook Invocation**

   ```rust
   // src/bin/hook.rs
   struct Args {
       commit_msg_file: PathBuf,  // .git/COMMIT_EDITMSG
       source: Option<Source>,     // message, template, merge, squash, commit
       sha1: Option<String>        // HEAD or specific commit SHA
   }
   ```

2. **Environment Setup**

   - Parse command-line arguments
   - Initialize logging and profiling
   - Load configuration from `~/.config/git-ai/config.toml`
   - Validate OpenAI API key availability

3. **Repository Context**
   - Open git repository from current directory
   - Determine commit type (new, amend, merge, etc.)
   - Check for existing commit messages

### Phase 2: Git Diff Analysis

The system generates and processes the git diff:

```rust
// src/hook.rs
impl PatchRepository for Repository {
    fn to_commit_diff(&self, tree: Option<Tree<'_>>) -> Result<git2::Diff<'_>> {
        // Get diff between tree and index (staged changes only)
        match tree {
            Some(tree) => self.diff_tree_to_index(Some(&tree), None, Some(&mut opts)),
            None => {
                // If no HEAD, compare against empty tree
                let empty_tree = self.find_tree(self.treebuilder(None)?.write()?)?;
                self.diff_tree_to_index(Some(&empty_tree), None, Some(&mut opts))
            }
        }
    }
}
```

**Diff Processing Features:**

- Parallel processing for large diffs
- Token counting and management
- Memory pooling for efficiency
- Chunk-based processing (25 files per chunk)

### Phase 3: AI Processing Strategy

The system employs a sophisticated multi-step approach:

#### Primary Attempt - Multi-Step Approach

```rust
// src/multi_step_integration.rs
pub async fn generate_commit_message_multi_step(
    client: &Client<OpenAIConfig>,
    model: &str,
    diff_content: &str,
    max_length: Option<usize>
) -> Result<String> {
    // Step 1: Parse diff into individual files
    let parsed_files = parse_diff(diff_content)?;

    // Step 2: Analyze each file
    let file_analyses = /* ... */;

    // Step 3: Calculate impact scores
    let scored_files = call_score_function(/* ... */)?;

    // Step 4: Generate candidates
    let candidates = call_generate_function(/* ... */)?;

    // Step 5: Select best candidate
    let final_message = select_best_candidate(/* ... */)?;

    Ok(final_message)
}
```

#### Fallback - Single-Step Approach

If multi-step fails, the system falls back to a simpler approach using function calling.

### Phase 4: Commit Message Generation

The generation process follows these principles:

1. **Analyze Functional Significance**

   - Identify primary changes
   - Determine change impact
   - Group related modifications

2. **Generate Reasoning**

   - Explain why changes were made
   - Justify message selection
   - Document decision process

3. **Create Message**
   - Follow conventional commit format
   - Stay within character limits (default: 72)
   - Focus on most impactful changes

### Phase 5: File-Level Analysis

For each modified file, the system performs:

```rust
// src/multi_step_analysis.rs
pub fn analyze_file(file_path: &str, diff_content: &str, operation_type: &str) -> FileAnalysisResult {
    // Count lines added/removed
    let mut lines_added = 0u32;
    let mut lines_removed = 0u32;

    for line in diff_content.lines() {
        if line.starts_with('+') && !line.starts_with("+++") {
            lines_added += 1;
        } else if line.starts_with('-') && !line.starts_with("---") {
            lines_removed += 1;
        }
    }

    // Categorize file
    let file_category = categorize_file(file_path);

    // Generate summary
    let summary = generate_file_summary(file_path, diff_content, operation_type);

    FileAnalysisResult {
        lines_added,
        lines_removed,
        file_category,
        summary
    }
}
```

**File Categories:**

- `source`: Core application code (weight: 1.0)
- `test`: Test files (weight: 0.6)
- `config`: Configuration files (weight: 0.8)
- `docs`: Documentation (weight: 0.4)
- `binary`: Binary files (weight: 0.1)
- `build`: Build/dependency files (weight: 0.7)

### Phase 6: Statistical Analysis

Impact scoring algorithm:

```rust
fn calculate_single_impact_score(file_data: &FileDataForScoring) -> f32 {
    let mut score = 0.0f32;

    // Base score from operation type
    score += match file_data.operation_type.as_str() {
        "added" => 0.3,
        "modified" => 0.2,
        "deleted" => 0.25,
        "renamed" => 0.1,
        "binary" => 0.05,
        _ => 0.1
    };

    // Score from file category
    score += match file_data.file_category.as_str() {
        "source" => 0.4,
        "test" => 0.2,
        "config" => 0.25,
        "build" => 0.3,
        "docs" => 0.1,
        "binary" => 0.05,
        _ => 0.1
    };

    // Score from lines changed (normalized)
    let total_lines = file_data.lines_added + file_data.lines_removed;
    let line_score = (total_lines as f32 / 100.0).min(0.3);
    score += line_score;

    score.min(1.0) // Cap at 1.0
}
```

### Phase 7: Performance Monitoring

The system includes comprehensive profiling:

```rust
// src/profiling.rs
macro_rules! profile {
    ($name:expr) => {
        let _guard = $crate::profiling::ProfileGuard::new($name);
    };
}
```

**Tracked Metrics:**

- Total execution time
- API request/response duration
- Diff parsing time
- Token counting overhead
- File analysis duration

### Phase 8: Final Commit Execution

Once the message is generated:

1. Write message to commit file
2. Clear progress indicators
3. Log performance metrics
4. Return control to git

## Multi-Step Analysis System

The multi-step system provides several advantages:

### 1. Divide and Conquer Strategy

Instead of processing the entire diff at once, the system:

- Parses individual files
- Analyzes each independently
- Aggregates results intelligently

### 2. Specialized Function Tools

Three specialized OpenAI function tools:

```rust
// Analyze Function
create_analyze_function_tool() // Examines individual files

// Score Function
create_score_function_tool()   // Calculates impact scores

// Generate Function
create_generate_function_tool() // Creates message candidates
```

### 3. Candidate Generation

Multiple message styles are generated:

```rust
// Action-focused: "Update authentication logic"
// Component-focused: "auth: improve validation"
// Impact-focused: "New feature for user authentication"
```

## Error Handling & Fallbacks

The system implements a robust fallback strategy:

```
Multi-Step OpenAI → Local Multi-Step → Single-Step OpenAI → Error
```

### Fallback Triggers

1. **API Failures**

   - Network timeouts
   - Rate limiting
   - Invalid responses

2. **Token Limits**

   - Diff too large
   - Context overflow
   - Model constraints

3. **Parsing Errors**
   - Malformed diffs
   - Binary files
   - Encoding issues

## Performance Optimization

### 1. Parallel Processing

```rust
// Process files in parallel chunks
const PARALLEL_CHUNK_SIZE: usize = 25;

files.par_chunks(PARALLEL_CHUNK_SIZE)
    .map(|chunk| process_chunk(chunk))
    .collect()
```

### 2. Memory Management

```rust
struct StringPool {
    strings: Vec<String>,
    capacity: usize
}

// Reuse string allocations
let mut pool = StringPool::new(DEFAULT_STRING_CAPACITY);
```

### 3. Token Optimization

- Pre-calculate instruction tokens
- Truncate diffs intelligently
- Prioritize high-impact changes

## Configuration & Customization

### Configuration File

Location: `~/.config/git-ai/config.toml`

```toml
model = "gpt-4o-mini"
max_tokens = 4096
max_commit_length = 72
openai_api_key = "sk-..."
timeout = 30
```

### Environment Variables

```bash
OPENAI_API_KEY=sk-...     # API key
RUST_LOG=debug            # Enable debug logging
GIT_AI_MODEL=gpt-4        # Override model
```

### CLI Configuration

```bash
# Set model
git-ai config set model gpt-4o-mini

# Set max commit length
git-ai config set max-commit-length 100

# Set API key
git-ai config set openai-api-key sk-...

# Reset to defaults
git-ai config reset
```

## Key Features Summary

1. **Intelligent Fallback System**

   - Multi-step → Local → Single-step
   - Ensures commits always succeed

2. **Impact-Based Prioritization**

   - Focuses on functionally significant changes
   - Weights files by category and size

3. **Comprehensive Logging**

   - Debug output with timing
   - Structured information display
   - Performance metrics

4. **Token Management**

   - Careful limit handling
   - Intelligent truncation
   - Model-aware processing

5. **Error Resilience**
   - Graceful degradation
   - Multiple retry strategies
   - Clear error messages

## Expected Outputs

### Commit Message Format

```
<type>: <description>

<optional body>
<optional footer>
```

### Example Output

```
feat: Add JWT authentication system

- Implement JWT token generation and validation
- Add middleware for protected routes
- Include comprehensive error handling
- Update package.json with new dependencies

Files changed: 5 (4 source, 1 build)
Impact score: 0.95 (high priority changes)
```

### Debug Output

```
[DEBUG] Starting multi-step commit message generation
[DEBUG] Parsed 5 files from diff
[DEBUG] Analyzing file: src/auth/jwt.rs
[DEBUG] File analysis complete: +89 -0 lines, category: source
[DEBUG] Calculating impact scores for 5 files
[DEBUG] Generated 3 commit message candidates
[DEBUG] Selected best candidate based on impact analysis
[DEBUG] Total execution time: 1.23s
```

## Conclusion

The Git AI hook system represents a sophisticated approach to automated commit message generation. By combining multi-step analysis, intelligent fallbacks, and comprehensive error handling, it ensures developers always get meaningful commit messages that accurately represent their changes while maintaining high performance and reliability.