aprender-compute 0.32.0

High-performance SIMD compute library with GPU support, LLM inference engine, and GGUF model loading (was: trueno)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
# Performance Regression CI

## Overview

Trueno uses automated performance regression detection to prevent performance degradations from being merged. This system:

1. **Establishes Baselines**: Captures known-good performance on main branch
2. **Detects Regressions**: Compares PR performance against baseline
3. **Blocks Merges**: Fails CI if performance drops >5%
4. **Tracks History**: Archives old baselines for historical comparison

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                     Developer Workflow                       │
└─────────────────────────────────────────────────────────────┘
                  ┌──────────────────┐
                  │  make bench-save │
                  │    -baseline     │
                  └──────────────────┘
        ┌────────────────────────────────────────┐
        │  .performance-baselines/               │
        │    baseline-current.txt                │
        │    baseline-{commit}-archived.txt      │
        └────────────────────────────────────────┘
                  ┌──────────────────┐
                  │  Git commit +    │
                  │  Push to repo    │
                  └──────────────────┘
┌───────────────────────────┼───────────────────────────┐
│                   GitHub Actions CI                   │
└───────────────────────────────────────────────────────┘
                 ┌──────────┴──────────┐
                 ▼                     ▼
        ┌──────────────┐      ┌──────────────┐
        │ PR Opened/   │      │ Push to Main │
        │ Updated      │      │              │
        └──────────────┘      └──────────────┘
                 │                     │
                 ▼                     ▼
        ┌──────────────────────────────────┐
        │ .github/workflows/ci.yml         │
        │   - Run benchmarks (--sample-10) │
        │   - Compare vs baseline          │
        │   - Fail if >5% regression       │
        └──────────────────────────────────┘
      ┌──────────┴──────────┐
      ▼                     ▼
┌──────────┐        ┌──────────────┐
│ ✅ PASS   │        │ ❌ FAIL       │
│ Merge OK │        │ Block merge  │
└──────────┘        └──────────────┘
```

## Quick Start

### 1. Create Initial Baseline (One-Time Setup)

```bash
# On main branch with known-good performance
git checkout main
make bench-save-baseline

# Commit baseline to repository
git add .performance-baselines/baseline-current.txt
git commit -m "Add performance baseline for CI regression detection"
git push
```

### 2. Automatic Regression Detection

All future PRs will automatically check for regressions:

```bash
# Create feature branch
git checkout -b feature/my-optimization

# Make changes...
# ...

# Push to GitHub - CI runs automatically
git push origin feature/my-optimization

# CI will:
# 1. Run benchmarks
# 2. Compare against baseline
# 3. Report regressions/improvements
# 4. Pass/fail based on threshold
```

### 3. Update Baseline After Performance Improvements

When you've made legitimate performance improvements:

```bash
# Verify improvements locally
make bench-compare  # Should show improvements (negative %)

# Update baseline
make bench-save-baseline

# Commit new baseline
git add .performance-baselines/baseline-current.txt
git commit -m "Update baseline after AVX-512 optimization (+17x dot product)"
git push
```

## Components

### 1. Baseline Storage (`.performance-baselines/`)

**`baseline-current.txt`** - Active baseline used for regression detection

Format:
```
# Trueno Performance Baseline
# Commit: 4b352c0
# Branch: main
# Date: 20251123-112000
# CPU: AMD Ryzen 9 5950X 16-Core Processor
#
add/AVX2/100            time:   [56.896 ns 58.549 ns 60.001 ns]
dot/AVX512/1000         time:   [12.345 µs 12.450 µs 12.567 µs]
...
```

**`baseline-{commit}-archived.txt`** - Historical baselines (auto-archived when updating)

### 2. Baseline Management Script (`scripts/save_baseline.sh`)

Automated baseline creation with metadata:

```bash
#!/bin/bash
# Usage: ./scripts/save_baseline.sh [benchmark_file]

# Features:
# - Runs benchmarks (or uses provided file)
# - Adds commit metadata (hash, branch, date, CPU)
# - Archives old baseline
# - Saves as baseline-current.txt
```

### 3. Regression Detection Script (`scripts/check_regression.py`)

Compares benchmark runs and detects regressions:

```bash
python3 scripts/check_regression.py \
  --baseline .performance-baselines/baseline-current.txt \
  --current /tmp/bench-current.txt \
  --regression-threshold 0.05 \  # Fail if >5% slower
  --warning-threshold 0.02        # Warn if >2% slower
```

**Output:**
```
============================================================
BENCHMARK REGRESSION REPORT
============================================================

REGRESSIONS (>5% slower): 2
------------------------------------------------------------
  add/AVX512/100
    Baseline: 58.55 ns
    Current:  79.95 ns
    Change:   +36.6% (FAIL)

WARNINGS (2%-5% slower): 1
------------------------------------------------------------
  sub/AVX2/1000
    Baseline: 140.2 ns
    Current:  143.6 ns
    Change:   +2.4%

IMPROVEMENTS (>2% faster): 5
------------------------------------------------------------
  dot/AVX512/1000
    Baseline: 214.5 µs
    Current:  12.45 µs
    Change:   -94.2%

============================================================
SUMMARY
============================================================
  Total benchmarks: 128
  Regressions:      2
  Warnings:         1
  Improvements:     5
  Unchanged:        120

RESULT: FAIL - 2 regression(s) detected
============================================================
```

**Exit Codes:**
- `0`: No regressions (pass)
- `1`: Regressions detected (fail)

### 4. CI Workflow (`.github/workflows/ci.yml`)

```yaml
bench:
  name: Benchmarks
  runs-on: ubuntu-latest
  steps:
    # ... setup steps ...

    - name: Check for performance regressions
      if: hashFiles('.performance-baselines/baseline-current.txt') != ''
      run: |
        # Run benchmarks (reduced sample size for faster CI)
        cargo bench --bench vector_ops --all-features --no-fail-fast \
          -- --sample-size 10 2>&1 | tee /tmp/bench-current.txt

        # Run regression check (fail CI if >5% regression)
        python3 scripts/check_regression.py \
          --baseline .performance-baselines/baseline-current.txt \
          --current /tmp/bench-current.txt \
          --regression-threshold 0.05 \
          --warning-threshold 0.02

        # Exit with error if regressions detected
        if [ $? -ne 0 ]; then
          echo "❌ Performance regressions detected"
          exit 1
        fi

    - name: Baseline status
      if: hashFiles('.performance-baselines/baseline-current.txt') == ''
      run: |
        echo "⚠️  No performance baseline found"
        echo "To enable: make bench-save-baseline && git add/commit/push"
```

### 5. Makefile Targets

**`make bench-save-baseline`** - Create/update baseline
```bash
# Uses scripts/save_baseline.sh
# Runs full benchmark suite (5-10 minutes)
# Archives old baseline
# Saves baseline-current.txt with commit metadata
```

**`make bench-compare`** - Check for regressions
```bash
# Runs benchmarks
# Compares against baseline-current.txt
# Reports regressions/warnings/improvements
# Fails if >5% regression
```

## Regression Policy

### Thresholds

| Change | Action | Description |
|--------|--------|-------------|
| **>+5%** | **FAIL CI** | Significant regression - blocks merge |
| **+2% to +5%** | **WARN** | Minor regression - investigate before merge |
| **-2% to +2%** | **PASS** | Noise threshold - no action |
| **<-2%** | **PASS (improvement!)** | Performance improvement - good! |

### Decision Tree

```
Is benchmark >5% slower?
├─ YES → FAIL CI (block merge)
│         └─ Action: Fix regression or update baseline if intentional
└─ NO → Is benchmark 2-5% slower?
          ├─ YES → WARN (investigate)
          │         └─ Action: Verify not a real regression
          └─ NO → PASS
                    └─ Action: Merge allowed
```

## Workflow Examples

### Example 1: Performance Improvement

```bash
# 1. Make optimization
git checkout -b feature/avx512-dot-optimization
# ... implement AVX-512 dot product ...

# 2. Verify locally
make bench-compare
# Output:
#   IMPROVEMENTS:
#     dot/AVX512/1000: -94.2% (214.5 µs → 12.45 µs)
#   RESULT: PASS

# 3. Push to PR - CI passes
git push origin feature/avx512-dot-optimization

# 4. After merge, update baseline
git checkout main
git pull
make bench-save-baseline
git add .performance-baselines/baseline-current.txt
git commit -m "Update baseline after AVX-512 dot optimization (+17x speedup)"
git push
```

### Example 2: Accidental Regression

```bash
# 1. Make change that accidentally regresses performance
git checkout -b feature/refactor-backend-selection
# ... refactor code ...

# 2. Push to PR
git push origin feature/refactor-backend-selection

# 3. CI FAILS:
#   ❌ REGRESSIONS:
#     add/AVX512/100: +36.6% (58.55 ns → 79.95 ns)
#   RESULT: FAIL - 1 regression(s) detected

# 4. Investigate and fix
# ... analyze why AVX-512 is slower ...
# ... discover operation-aware selection needed ...

# 5. Fix regression
git add src/lib.rs
git commit -m "Fix: Use AVX2 for memory-bound ops (prevents AVX-512 regression)"
git push

# 6. CI now passes
#   RESULT: PASS - No regressions detected
```

### Example 3: Intentional Tradeoff

```bash
# 1. Make change that trades performance for correctness
git checkout -b fix/edge-case-bug
# ... fix bug that adds bounds checking ...

# 2. Local check shows regression
make bench-compare
# Output:
#   REGRESSIONS:
#     add/AVX2/100: +8.2% (58.55 ns → 63.35 ns)
#   RESULT: FAIL

# 3. This is acceptable - correctness > performance
# Update baseline to reflect new expected performance
make bench-save-baseline

# 4. Commit with explanation
git add .performance-baselines/baseline-current.txt
git commit -m "Update baseline: +8% overhead from bounds checking (fixes CVE-2025-12345)"

# 5. Push - CI passes with new baseline
git push
```

## Advanced Usage

### Compare Historical Baselines

```bash
# Compare current performance vs performance from 3 commits ago
python3 scripts/check_regression.py \
  --baseline .performance-baselines/baseline-88e21c7-archived.txt \
  --current .performance-baselines/baseline-current.txt
```

### Custom Thresholds

```bash
# Stricter threshold for critical performance code
make bench-compare | python3 scripts/check_regression.py \
  --baseline .performance-baselines/baseline-current.txt \
  --current /tmp/bench-current.txt \
  --regression-threshold 0.02 \  # Fail at >2%
  --warning-threshold 0.01        # Warn at >1%
```

### Manual CI Simulation

```bash
# Simulate CI regression check locally (uses same sample size as CI)
cargo bench --bench vector_ops -- --sample-size 10 2>&1 | tee /tmp/bench-ci.txt
python3 scripts/check_regression.py \
  --baseline .performance-baselines/baseline-current.txt \
  --current /tmp/bench-ci.txt
```

## Troubleshooting

### Issue: High Variance / Flaky CI

**Symptoms**: CI sometimes passes, sometimes fails for same code

**Causes**:
- Background processes (CI runner load)
- Thermal throttling
- CPU frequency scaling
- Power management

**Solutions**:
1. **Increase sample size in CI** (but slower):
   ```yaml
   cargo bench -- --sample-size 20  # Instead of 10
   ```

2. **Widen thresholds** (but less sensitive):
   ```python
   --regression-threshold 0.10  # 10% instead of 5%
   ```

3. **Pin CPU frequency** (requires CI admin):
   ```bash
   sudo cpupower frequency-set -g performance
   ```

### Issue: Baseline Missing in CI

**Symptoms**: CI skips regression check with warning

**Cause**: `baseline-current.txt` not committed to repository

**Solution**:
```bash
git checkout main
make bench-save-baseline
git add .performance-baselines/baseline-current.txt
git commit -m "Add performance baseline"
git push
```

### Issue: All Benchmarks Show Regression After Baseline Update

**Symptoms**: Every benchmark worse after updating baseline

**Cause**: Baseline was generated on different hardware or with different settings

**Solution**:
- Always generate baselines on same hardware (e.g., CI runner)
- Document CPU model in baseline commit message
- Consider separate baselines for different platforms

## Best Practices

### 1. Baseline Management

✅ **DO:**
- Generate baselines on stable main branch
- Update baseline after verifying performance improvements
- Commit baselines with descriptive messages
- Archive old baselines for historical comparison

❌ **DON'T:**
- Generate baselines on feature branches
- Update baseline to "fix" failing CI without investigation
- Delete archived baselines

### 2. Regression Investigation

✅ **DO:**
- Investigate all >2% regressions before merging
- Profile code to understand root cause
- Document intentional performance tradeoffs in commit messages

❌ **DON'T:**
- Merge PRs with unexplained regressions
- Assume small regressions (<5%) are always noise
- Update baseline without understanding why performance changed

### 3. CI Integration

✅ **DO:**
- Run regression checks on all PRs
- Use consistent hardware for baseline generation
- Keep baselines up-to-date (regenerate after major changes)

❌ **DON'T:**
- Skip regression checks to speed up CI
- Use laptop/desktop baselines for CI comparison
- Let baselines get too old (>1 month or >50 commits)

## Performance Budget Example

Establish performance budgets for critical operations:

| Operation | Size | Baseline | Budget (+5%) | Hard Limit (+10%) |
|-----------|------|----------|--------------|-------------------|
| add_f32 | 100 | 58.5 ns | 61.4 ns | 64.4 ns |
| add_f32 | 1K | 143.6 ns | 150.8 ns | 158.0 ns |
| dot_f32 | 1K | 12.45 µs | 13.07 µs | 13.70 µs |

**Enforcement:**
- **Budget exceeded (+5%)**: CI warning, requires justification
- **Hard limit exceeded (+10%)**: CI fail, requires approval + baseline update

## References

- **Criterion.rs Documentation**: https://bheisler.github.io/criterion.rs/book/
- **Performance Regression Testing (Academic)**: Huang et al., "Performance Regression Testing Target Prioritization via Performance Risk Analysis" (ICSE 2014)
- **Toyota Production System**: Jidoka (built-in quality) applied to performance
- **Google SRE Book**: Chapter 6 - Monitoring Distributed Systems (performance budgets)

## See Also

- [BENCHMARK_ANALYSIS.md]../BENCHMARK_ANALYSIS.md - Detailed benchmark results
- [AVX512_ANALYSIS.md]../AVX512_ANALYSIS.md - AVX-512 performance investigation
- [.performance-baselines/README.md]../.performance-baselines/README.md - Baseline management
- [CLAUDE.md]../CLAUDE.md - Development commands and quality gates