# Case Study: AI Shell Completion
Train a personalized autocomplete on your shell history in 5 seconds. 100% local, private, fast.
## Quick Start
```bash
# Install
cargo install --path crates/aprender-shell
# Train on your history
aprender-shell train
# Test
aprender-shell suggest "git "
```
## How It Works
```
~/.zsh_history β Parser β N-gram Model β Trie Index β Suggestions
β β β
21,729 cmds 40,848 n-grams <1ms lookup
```
**Algorithm:** Markov chain with trigram context + prefix trie for O(1) lookup.
## Training
```bash
$ aprender-shell train
π aprender-shell: Training model...
π History file: /home/user/.zsh_history
π Commands loaded: 21729
π§ Training 3-gram model... done!
β
Model saved to: ~/.aprender-shell.model
π Model Statistics:
Unique n-grams: 40848
Vocabulary size: 16100
Model size: 2016.4 KB
```
## Suggestions
```bash
$ aprender-shell suggest "git "
git commit 0.505
git clone 0.065
git add 0.059
git push 0.035
git checkout 0.031
$ aprender-shell suggest "cargo "
cargo run 0.413
cargo install 0.069
cargo test 0.059
cargo clippy 0.045
```
Scores are frequency-based probabilities from your actual usage.
## Incremental Updates
Don't retrain from scratchβappend new commands:
```bash
$ aprender-shell update
π Found 15 new commands
β
Model updated (21744 total commands)
$ aprender-shell update
β Model is up to date (no new commands)
```
**Performance:**
- 0ms when no new commands
- ~10ms per 100 new commands
- Tracks position in history file
## ZSH Integration
Generate the widget:
```bash
aprender-shell zsh-widget >> ~/.zshrc
source ~/.zshrc
```
This adds:
- Ghost text suggestions as you type (gray)
- Tab or Right Arrow to accept
- Updates on every keystroke
## Auto-Retrain
```zsh
# Add to ~/.zshrc
# Option 1: Update after every command (~10ms)
precmd() { aprender-shell update -q & }
# Option 2: Update on shell exit
zshexit() { aprender-shell update -q }
```
## Model Statistics
```bash
$ aprender-shell stats
π Model Statistics:
N-gram size: 3
Unique n-grams: 40848
Vocabulary size: 16100
Model size: 2016.4 KB
π Top commands:
340x git status
245x cargo build
198x cd ..
```
## Memory Paging for Large Histories
For very large shell histories (100K+ commands), use memory paging to limit RAM usage:
```bash
# Train with 10MB memory limit (creates .apbundle file)
$ aprender-shell train --memory-limit 10
π aprender-shell: Training paged model...
π History file: /home/user/.zsh_history
π Commands loaded: 150000
π§ Training 3-gram paged model (10MB limit)... done!
β
Paged model saved to: ~/.aprender-shell.apbundle
π Model Statistics:
Segments: 45
Vocabulary size: 35000
Memory limit: 10 MB
```
```bash
# Suggestions with paged loading
$ aprender-shell suggest "git " --memory-limit 10
# View paging statistics
$ aprender-shell stats --memory-limit 10
π Paged Model Statistics:
N-gram size: 3
Total commands: 150000
Vocabulary size: 35000
Total segments: 45
Loaded segments: 3
Memory limit: 10.0 MB
π Paging Statistics:
Page hits: 127
Page misses: 3
Evictions: 0
Hit rate: 97.7%
```
**How it works:**
- N-grams are grouped by command prefix (e.g., "git", "cargo")
- Segments are stored in `.apbundle` format
- Only accessed segments are loaded into RAM
- LRU eviction frees memory when limit is reached
See [Model Bundling and Memory Paging](./model-bundling-paging.md) for details.
## Sharing Models
Export your model for teammates:
```bash
# Export
aprender-shell export -m ~/.aprender-shell.model team-model.json
# Import (on another machine)
aprender-shell import team-model.json
```
Use case: Share team-specific command patterns (deployment scripts, project aliases).
## Privacy & Security
**Filtered automatically:**
- Commands containing `password`, `secret`, `token`, `API_KEY`
- AWS credentials, GitHub tokens
- History manipulation commands (`history`, `fc`)
**100% local:**
- No network requests
- No telemetry
- Model stays on your machine
## Architecture
```
crates/aprender-shell/
βββ src/
β βββ main.rs # CLI (clap)
β βββ history.rs # ZSH/Bash/Fish parser
β βββ model.rs # Markov n-gram model
β βββ trie.rs # Prefix index
```
### History Parser
Handles multiple formats:
```rust
// ZSH extended: ": 1699900000:0;git status"
// Bash plain: "git status"
// Fish: "- cmd: git status"
```
### N-gram Model
Trigram Markov chain:
```
Context β Next Token (count)
"" β "git" (340), "cargo" (245), "cd" (198)
"git" β "commit" (89), "push" (45), "status" (340)
"git commit" β "-m" (67), "--amend" (12)
```
### Trie Index
O(k) prefix lookup where k = prefix length:
```
gβiβtβ βsβtβaβtβuβs (count: 340)
ββcβoβmβmβiβt (count: 89)
ββpβuβsβh (count: 45)
```
## Performance: Sub-10ms Verification
Shell completion must feel **instantaneous**. Nielsen's research shows:
- < 100ms: Perceived as instant
- < 10ms: No perceptible delay (ideal)
- > 100ms: Noticeable lag, poor UX
**aprender-shell achieves microsecond latencyβ600-22,000x faster than required.**
### Benchmark Results
Run the benchmarks yourself:
```bash
cargo bench --package aprender-shell --bench recommendation_latency
```
#### Suggestion Latency by Model Size
| **Small** | 50 | kubectl | **437 ns** | 22,883x faster |
| **Small** | 50 | npm | **530 ns** | 18,868x faster |
| **Small** | 50 | docker | **659 ns** | 15,174x faster |
| **Small** | 50 | cargo | **725 ns** | 13,793x faster |
| **Small** | 50 | git | **1.54 Β΅s** | 6,493x faster |
| **Medium** | 500 | npm | **1.78 Β΅s** | 5,618x faster |
| **Medium** | 500 | docker | **3.97 Β΅s** | 2,519x faster |
| **Medium** | 500 | cargo | **6.53 Β΅s** | 1,532x faster |
| **Medium** | 500 | git | **10.6 Β΅s** | 943x faster |
| **Large** | 5000 | npm | **671 ns** | 14,903x faster |
| **Large** | 5000 | docker | **7.96 Β΅s** | 1,256x faster |
| **Large** | 5000 | kubectl | **12.3 Β΅s** | 813x faster |
| **Large** | 5000 | git | **14.6 Β΅s** | 685x faster |
**Key insight:** Even with 5,000 commands in history, worst-case latency is **14.6 Β΅s** (0.0146 ms).
### Industry Comparison
| GitHub Copilot | 100-500ms | 10,000-50,000x faster |
| Fish shell completion | 5-20ms | 500-2,000x faster |
| Zsh compinit | 10-50ms | 1,000-5,000x faster |
| Bash completion | 20-100ms | 2,000-10,000x faster |
### Why So Fast?
1. **O(1) Trie Lookup:** Prefix search is O(k) where k = prefix length, not O(n)
2. **In-Memory Model:** No disk I/O during suggestions
3. **Simple Data Structures:** HashMap + Trie, no neural network overhead
4. **Zero Allocations:** Hot path avoids heap allocations
### Benchmark Suite
The `recommendation_latency` benchmark includes:
| `suggestion_latency` | Core latency by model size (primary metric) |
| `partial_completion` | Mid-word completion ("git co" β "git commit") |
| `training_throughput` | Commands processed per second during training |
| `cold_start` | Model load + first suggestion latency |
| `serialization` | JSON serialize/deserialize performance |
| `scalability` | Latency growth with model size |
| `paged_model` | Memory-constrained model performance |
## Why N-gram Beats Neural
For shell completion:
| Training time | <1s | Minutes |
| Inference | **<15Β΅s** | 10-50ms |
| Model size | 2MB | 50MB+ |
| Accuracy on shell | 70%+ | 75%+ |
| Cold start | Instant | GPU warmup |
Shell commands are repetitive patterns. N-gram captures this perfectly.
## CLI Reference
```
aprender-shell <COMMAND>
Commands:
train Full retrain from history
update Incremental update (fast)
suggest Get completions for prefix (-c/-k for count)
stats Show model statistics
export Export model for sharing
import Import a shared model
zsh-widget Generate ZSH integration code
fish-widget Generate Fish shell integration code
uninstall Remove widget from shell config
validate Validate model accuracy (train/test split)
augment Generate synthetic training data
analyze Analyze command patterns (CodeFeatureExtractor)
tune AutoML hyperparameter tuning (TPE)
inspect View model card metadata
publish Publish model to Hugging Face Hub
Options:
-h, --help Print help
-V, --version Print version
```
## Fish Shell Integration
Generate the Fish widget:
```bash
aprender-shell fish-widget >> ~/.config/fish/config.fish
source ~/.config/fish/config.fish
```
Disable temporarily:
```fish
set -gx APRENDER_DISABLED 1
```
## Model Cards & Inspection
View model metadata:
```bash
$ aprender-shell inspect -m ~/.aprender-shell.model
π Model Card: ~/.aprender-shell.model
βββββββββββββββββββββββββββββββββββββββββββ
MODEL INFORMATION
βββββββββββββββββββββββββββββββββββββββββββ
ID: aprender-shell-markov-3gram-20251127
Name: Shell Completion Model
Version: 1.0.0
Framework: aprender 0.10.0
Architecture: MarkovModel
Parameters: 40848
```
Export formats:
```bash
# JSON (for programmatic access)
aprender-shell inspect -m model.apr --format json
# Hugging Face YAML (for model sharing)
aprender-shell inspect -m model.apr --format huggingface
```
## Publishing to Hugging Face Hub
Share your model with the community:
```bash
# Set token
export HF_TOKEN=hf_xxx
# Publish
aprender-shell publish -m ~/.aprender-shell.model -r username/my-shell-model
# With custom commit message
aprender-shell publish -m model.apr -r org/repo -c "v1.0 release"
```
Without a token, generates README.md and upload instructions.
## Model Validation
Test accuracy with holdout validation:
```bash
$ aprender-shell validate
π¬ aprender-shell: Model Validation
π History file: ~/.zsh_history
π Total commands: 21729
βοΈ N-gram size: 3
π Train/test split: 80% / 20%
ββββββββββββββββββββββββββββββββββββββββββββ
VALIDATION RESULTS
ββββββββββββββββββββββββββββββββββββββββββββ
Hit@1: 45.2% (exact match)
Hit@3: 62.8% (in top 3)
Hit@5: 71.4% (in top 5)
```
## Uninstalling
Remove widget from shell config:
```bash
# Dry run (show what would be removed)
aprender-shell uninstall --dry-run
# Remove from ZSH
aprender-shell uninstall --zsh
# Remove from Fish
aprender-shell uninstall --fish
# Keep model file
aprender-shell uninstall --zsh --keep-model
```
## Troubleshooting
| "Could not find history file" | Specify path: `-f ~/.bash_history` |
| Suggestions too generic | Increase n-gram: `-n 4` |
| Model too large | Decrease n-gram: `-n 2` |
| Slow suggestions | Check model size with `stats` |