aprender 0.31.2

Next-generation ML framework in pure Rust β€” `cargo install aprender` for the `apr` CLI
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
<!-- PCU: examples-shell-completion | contract: contracts/apr-page-examples-shell-completion-v1.yaml -->
<!-- Example: cargo run -p aprender-core --example none -->
<!-- Status: enforced -->

# Case Study: AI Shell Completion

Train a personalized autocomplete on your shell history in 5 seconds. 100% local, private, fast.

## Quick Start

```bash
# Install
cargo install --path crates/aprender-shell

# Train on your history
aprender-shell train

# Test
aprender-shell suggest "git "
```

## How It Works

```
~/.zsh_history β†’ Parser β†’ N-gram Model β†’ Trie Index β†’ Suggestions
     β”‚                         β”‚              β”‚
  21,729 cmds            40,848 n-grams    <1ms lookup
```

**Algorithm:** Markov chain with trigram context + prefix trie for O(1) lookup.

## Training

```bash
$ aprender-shell train

πŸš€ aprender-shell: Training model...

πŸ“‚ History file: /home/user/.zsh_history
πŸ“Š Commands loaded: 21729
🧠 Training 3-gram model... done!

βœ… Model saved to: ~/.aprender-shell.model

πŸ“ˆ Model Statistics:
   Unique n-grams: 40848
   Vocabulary size: 16100
   Model size: 2016.4 KB
```

## Suggestions

```bash
$ aprender-shell suggest "git "
git commit    0.505
git clone     0.065
git add       0.059
git push      0.035
git checkout  0.031

$ aprender-shell suggest "cargo "
cargo run      0.413
cargo install  0.069
cargo test     0.059
cargo clippy   0.045
```

Scores are frequency-based probabilities from your actual usage.

## Incremental Updates

Don't retrain from scratchβ€”append new commands:

```bash
$ aprender-shell update
πŸ“Š Found 15 new commands
βœ… Model updated (21744 total commands)

$ aprender-shell update
βœ“ Model is up to date (no new commands)
```

**Performance:**
- 0ms when no new commands
- ~10ms per 100 new commands
- Tracks position in history file

## ZSH Integration

Generate the widget:

```bash
aprender-shell zsh-widget >> ~/.zshrc
source ~/.zshrc
```

This adds:
- Ghost text suggestions as you type (gray)
- Tab or Right Arrow to accept
- Updates on every keystroke

## Auto-Retrain

```zsh
# Add to ~/.zshrc

# Option 1: Update after every command (~10ms)
precmd() { aprender-shell update -q & }

# Option 2: Update on shell exit
zshexit() { aprender-shell update -q }
```

## Model Statistics

```bash
$ aprender-shell stats

πŸ“Š Model Statistics:
   N-gram size: 3
   Unique n-grams: 40848
   Vocabulary size: 16100
   Model size: 2016.4 KB

πŸ” Top commands:
    340x  git status
    245x  cargo build
    198x  cd ..
```

## Memory Paging for Large Histories

For very large shell histories (100K+ commands), use memory paging to limit RAM usage:

```bash
# Train with 10MB memory limit (creates .apbundle file)
$ aprender-shell train --memory-limit 10

πŸš€ aprender-shell: Training paged model...

πŸ“‚ History file: /home/user/.zsh_history
πŸ“Š Commands loaded: 150000
🧠 Training 3-gram paged model (10MB limit)... done!

βœ… Paged model saved to: ~/.aprender-shell.apbundle

πŸ“ˆ Model Statistics:
   Segments:        45
   Vocabulary size: 35000
   Memory limit:    10 MB
```

```bash
# Suggestions with paged loading
$ aprender-shell suggest "git " --memory-limit 10

# View paging statistics
$ aprender-shell stats --memory-limit 10

πŸ“Š Paged Model Statistics:
   N-gram size:     3
   Total commands:  150000
   Vocabulary size: 35000
   Total segments:  45
   Loaded segments: 3
   Memory limit:    10.0 MB

πŸ“ˆ Paging Statistics:
   Page hits:       127
   Page misses:     3
   Evictions:       0
   Hit rate:        97.7%
```

**How it works:**
- N-grams are grouped by command prefix (e.g., "git", "cargo")
- Segments are stored in `.apbundle` format
- Only accessed segments are loaded into RAM
- LRU eviction frees memory when limit is reached

See [Model Bundling and Memory Paging](./model-bundling-paging.md) for details.

## Sharing Models

Export your model for teammates:

```bash
# Export
aprender-shell export -m ~/.aprender-shell.model team-model.json

# Import (on another machine)
aprender-shell import team-model.json
```

Use case: Share team-specific command patterns (deployment scripts, project aliases).

## Privacy & Security

**Filtered automatically:**
- Commands containing `password`, `secret`, `token`, `API_KEY`
- AWS credentials, GitHub tokens
- History manipulation commands (`history`, `fc`)

**100% local:**
- No network requests
- No telemetry
- Model stays on your machine

## Architecture

```
crates/aprender-shell/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.rs      # CLI (clap)
β”‚   β”œβ”€β”€ history.rs   # ZSH/Bash/Fish parser
β”‚   β”œβ”€β”€ model.rs     # Markov n-gram model
β”‚   └── trie.rs      # Prefix index
```

### History Parser

Handles multiple formats:

```rust
// ZSH extended: ": 1699900000:0;git status"
// Bash plain: "git status"
// Fish: "- cmd: git status"
```

### N-gram Model

Trigram Markov chain:

```
Context         β†’ Next Token (count)
""              β†’ "git" (340), "cargo" (245), "cd" (198)
"git"           β†’ "commit" (89), "push" (45), "status" (340)
"git commit"    β†’ "-m" (67), "--amend" (12)
```

### Trie Index

O(k) prefix lookup where k = prefix length:

```
g─i─t─ ─s─t─a─t─u─s (count: 340)
      └─c─o─m─m─i─t (count: 89)
      └─p─u─s─h     (count: 45)
```

## Performance: Sub-10ms Verification

Shell completion must feel **instantaneous**. Nielsen's research shows:
- < 100ms: Perceived as instant
- < 10ms: No perceptible delay (ideal)
- > 100ms: Noticeable lag, poor UX

**aprender-shell achieves microsecond latencyβ€”600-22,000x faster than required.**

### Benchmark Results

Run the benchmarks yourself:

```bash
cargo bench --package aprender-shell --bench recommendation_latency
```

#### Suggestion Latency by Model Size

| Model Size | Commands | Prefix | Latency | vs 10ms Target |
|------------|----------|--------|---------|----------------|
| **Small** | 50 | kubectl | **437 ns** | 22,883x faster |
| **Small** | 50 | npm | **530 ns** | 18,868x faster |
| **Small** | 50 | docker | **659 ns** | 15,174x faster |
| **Small** | 50 | cargo | **725 ns** | 13,793x faster |
| **Small** | 50 | git | **1.54 Β΅s** | 6,493x faster |
| **Medium** | 500 | npm | **1.78 Β΅s** | 5,618x faster |
| **Medium** | 500 | docker | **3.97 Β΅s** | 2,519x faster |
| **Medium** | 500 | cargo | **6.53 Β΅s** | 1,532x faster |
| **Medium** | 500 | git | **10.6 Β΅s** | 943x faster |
| **Large** | 5000 | npm | **671 ns** | 14,903x faster |
| **Large** | 5000 | docker | **7.96 Β΅s** | 1,256x faster |
| **Large** | 5000 | kubectl | **12.3 Β΅s** | 813x faster |
| **Large** | 5000 | git | **14.6 Β΅s** | 685x faster |

**Key insight:** Even with 5,000 commands in history, worst-case latency is **14.6 Β΅s** (0.0146 ms).

### Industry Comparison

| System | Typical Latency | aprender-shell Speedup |
|--------|-----------------|------------------------|
| GitHub Copilot | 100-500ms | 10,000-50,000x faster |
| Fish shell completion | 5-20ms | 500-2,000x faster |
| Zsh compinit | 10-50ms | 1,000-5,000x faster |
| Bash completion | 20-100ms | 2,000-10,000x faster |

### Why So Fast?

1. **O(1) Trie Lookup:** Prefix search is O(k) where k = prefix length, not O(n)
2. **In-Memory Model:** No disk I/O during suggestions
3. **Simple Data Structures:** HashMap + Trie, no neural network overhead
4. **Zero Allocations:** Hot path avoids heap allocations

### Benchmark Suite

The `recommendation_latency` benchmark includes:

| Group | What It Measures |
|-------|------------------|
| `suggestion_latency` | Core latency by model size (primary metric) |
| `partial_completion` | Mid-word completion ("git co" β†’ "git commit") |
| `training_throughput` | Commands processed per second during training |
| `cold_start` | Model load + first suggestion latency |
| `serialization` | JSON serialize/deserialize performance |
| `scalability` | Latency growth with model size |
| `paged_model` | Memory-constrained model performance |

## Why N-gram Beats Neural

For shell completion:

| Factor | N-gram | Neural (RNN/Transformer) |
|--------|--------|--------------------------|
| Training time | <1s | Minutes |
| Inference | **<15Β΅s** | 10-50ms |
| Model size | 2MB | 50MB+ |
| Accuracy on shell | 70%+ | 75%+ |
| Cold start | Instant | GPU warmup |

Shell commands are repetitive patterns. N-gram captures this perfectly.

## CLI Reference

```
aprender-shell <COMMAND>

Commands:
  train        Full retrain from history
  update       Incremental update (fast)
  suggest      Get completions for prefix (-c/-k for count)
  stats        Show model statistics
  export       Export model for sharing
  import       Import a shared model
  zsh-widget   Generate ZSH integration code
  fish-widget  Generate Fish shell integration code
  uninstall    Remove widget from shell config
  validate     Validate model accuracy (train/test split)
  augment      Generate synthetic training data
  analyze      Analyze command patterns (CodeFeatureExtractor)
  tune         AutoML hyperparameter tuning (TPE)
  inspect      View model card metadata
  publish      Publish model to Hugging Face Hub

Options:
  -h, --help     Print help
  -V, --version  Print version
```

## Fish Shell Integration

Generate the Fish widget:

```bash
aprender-shell fish-widget >> ~/.config/fish/config.fish
source ~/.config/fish/config.fish
```

Disable temporarily:

```fish
set -gx APRENDER_DISABLED 1
```

## Model Cards & Inspection

View model metadata:

```bash
$ aprender-shell inspect -m ~/.aprender-shell.model

πŸ“‹ Model Card: ~/.aprender-shell.model

═══════════════════════════════════════════
           MODEL INFORMATION
═══════════════════════════════════════════
  ID:           aprender-shell-markov-3gram-20251127
  Name:         Shell Completion Model
  Version:      1.0.0
  Framework:    aprender 0.10.0
  Architecture: MarkovModel
  Parameters:   40848
```

Export formats:

```bash
# JSON (for programmatic access)
aprender-shell inspect -m model.apr --format json

# Hugging Face YAML (for model sharing)
aprender-shell inspect -m model.apr --format huggingface
```

## Publishing to Hugging Face Hub

Share your model with the community:

```bash
# Set token
export HF_TOKEN=hf_xxx

# Publish
aprender-shell publish -m ~/.aprender-shell.model -r username/my-shell-model

# With custom commit message
aprender-shell publish -m model.apr -r org/repo -c "v1.0 release"
```

Without a token, generates README.md and upload instructions.

## Model Validation

Test accuracy with holdout validation:

```bash
$ aprender-shell validate

πŸ”¬ aprender-shell: Model Validation

πŸ“‚ History file: ~/.zsh_history
πŸ“Š Total commands: 21729
βš™οΈ  N-gram size: 3
πŸ“ˆ Train/test split: 80% / 20%

════════════════════════════════════════════
           VALIDATION RESULTS
════════════════════════════════════════════
  Hit@1:    45.2%  (exact match)
  Hit@3:    62.8%  (in top 3)
  Hit@5:    71.4%  (in top 5)
```

## Uninstalling

Remove widget from shell config:

```bash
# Dry run (show what would be removed)
aprender-shell uninstall --dry-run

# Remove from ZSH
aprender-shell uninstall --zsh

# Remove from Fish
aprender-shell uninstall --fish

# Keep model file
aprender-shell uninstall --zsh --keep-model
```

## Troubleshooting

| Issue | Solution |
|-------|----------|
| "Could not find history file" | Specify path: `-f ~/.bash_history` |
| Suggestions too generic | Increase n-gram: `-n 4` |
| Model too large | Decrease n-gram: `-n 2` |
| Slow suggestions | Check model size with `stats` |