benchkit 0.18.0

Lightweight benchmarking toolkit focused on practical performance analysis and report generation. Non-restrictive alternative to criterion, designed for easy integration and markdown report generation.
Documentation
# Task 001: Fix MarkdownUpdater Section Matching Bug

## Priority: CRITICAL
## Status: Completed
## Severity: High - Breaks documentation generation

## Problem Summary

The `MarkdownUpdater.replace_section_content()` method in `src/reporting.rs:56` uses substring matching (`line.contains()`) instead of exact matching for section headers. This causes severe section duplication when section names share common substrings.

## Root Cause

**File:** `src/reporting.rs:56`  
**Buggy Code:**
```rust
if line.contains(self.section_marker.trim_start_matches("## ")) {
```

**Issue:** When updating section "Performance Benchmarks", it incorrectly matches ANY section containing "Performance", including:
- `## Performance Benchmarks` ✓ (intended)
- `## Language Operations Performance` ✓ (unintended - contains "Performance")
- `## Realistic Scenarios Performance` ✓ (unintended - contains "Performance")

## Impact

### Real-World Evidence
- wflow project: readme.md grew from 5,865 to 7,751 lines (+1,886 lines) in one benchmark run
- 37 duplicate "Performance Benchmarks" sections created
- 201 duplicate table headers generated
- Documentation becomes unusable and contradictory

### User Experience Impact
- **Critical**: Documentation generation completely breaks
- **High**: Exponential file growth with each benchmark run  
- **Medium**: Confusing/contradictory performance claims
- **Low**: Repository bloat affects git performance

## Reproduction Steps

### MRE (Minimal Reproduction Example)

```rust
// Create file with overlapping section names:
let content = r#"
## Performance Benchmarks
Old data
## Language Operations Performance  
Old data
"#;

// Try to update "Performance Benchmarks" section:
let updater = MarkdownUpdater::new("test.md", "Performance Benchmarks");
updater.update_section("New data");

// BUG: Content gets inserted into BOTH sections because 
// both contain substring "Performance"
```

### Complete MRE File
A complete reproduction example is available at: `../../../llm_tools/module/wflow/final_mre.rs`

Run with: `rustc final_mre.rs && ./final_mre`

## Technical Fix Required

### Change Location
**File:** `src/reporting.rs`  
**Line:** 56

### Current (Buggy) Implementation
```rust
if line.contains(self.section_marker.trim_start_matches("## ")) {
```

### Required Fix
```rust
if line.trim() == self.section_marker.trim() {
```

### Alternative Fix (More Robust)
```rust
if line.trim_start().starts_with("## ") && line.trim() == self.section_marker.trim() {
```

## Testing Requirements

### Test Cases to Add
1. **Overlapping section names**: "Performance" and "Performance Benchmarks"
2. **Substring matching edge cases**: "API" and "API Documentation" 
3. **Exact matching validation**: Ensure only exact sections are updated
4. **Multiple updates**: Verify no duplication after repeated runs
5. **Case sensitivity**: Test various whitespace/formatting scenarios

### Regression Test
```rust
#[test]
fn test_no_section_duplication_with_overlapping_names() 
{
    let initial = r#"## Performance Benchmarks
Old data
## Language Operations Performance
Old data"#;
    
    let updater = MarkdownUpdater::new("test.md", "Performance Benchmarks");
    let result = updater.replace_section_content(initial, "New data");
    
    // Should have exactly 2 sections, not 3 or more
    assert_eq!(result.matches("## ").count(), 2);
    assert_eq!(result.matches("## Performance Benchmarks").count(), 1);
    assert!(result.contains("New data"));
}
```

## Verification Checklist

- [x] Fix applied to `src/reporting.rs:56`
- [x] Regression tests added and passing
- [x] No existing functionality broken
- [x] Documentation updated if needed
- [x] Integration test with real markdown files
- [x] Performance impact assessed (should be minimal)

## Additional Context

This bug was discovered during wflow project benchmarking where multiple benchmark functions used overlapping section names:
- `benchmark_line_counting_performance()` → "Performance Benchmarks"
- `benchmark_language_operations()` → "Language Operations Performance"  
- `benchmark_realistic_scenarios()` → "Realistic Scenarios Performance"

Each benchmark run multiplied the duplicate sections, making documentation maintenance impossible.

## Related Issues
- None currently, but this affects any user with overlapping section names
- Could impact other projects using benchkit for documentation generation

## Outcomes

**Status**: ✅ **COMPLETED**

**Implementation Summary**:
- **Critical bug fixed**: Changed `line.contains()` to `line.trim() == self.section_marker.trim()` in `src/reporting.rs:188`
- **Comprehensive regression tests added**: 11 new tests in `tests/reporting.rs` covering all edge cases
- **Real-world verification**: Created integration test proving the fix works with actual benchmark scenarios
- **Zero regressions**: All existing tests continue to pass

**Key Technical Changes**:
1. **Exact section matching**: Prevents substring conflicts that caused exponential file growth
2. **Comprehensive test coverage**: Tests for overlapping names, whitespace variations, multiple updates
3. **Real-world validation**: MRE demonstrates both the bug and the solution
4. **API safety**: Made `replace_section_content()` public for testing verification

**Impact Results**:
- ✅ Files no longer grow exponentially (5k → 7k lines issue eliminated)
- ✅ No duplicate sections created with overlapping names
- ✅ Documentation generation now works reliably
- ✅ Users can safely use section names like "Performance Benchmarks", "Language Operations Performance" without conflicts

**Future Maintenance**:
- Regression tests ensure this bug cannot reoccur
- Fix is minimal and performant (no overhead added)
- Backwards compatible with all existing usage

---
**Created by:** wflow development team  
**Date:** Current  
**Completed by:** AI Assistant
**Verification:** Automated testing + real-world integration testing