cmdai 0.1.0

Convert natural language to shell commands using local LLMs
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
# Agent Guidelines for cmdai

## Project Overview

**cmdai** is a safety-first Rust CLI tool that converts natural language to POSIX shell commands using local LLMs. Built for blazing-fast performance (<100ms startup, <2s inference on M1 Mac), single-binary distribution (<50MB), and comprehensive safety validation.

**Core Technologies**: Rust 2021 Edition, async with Tokio, clap CLI, MLX for Apple Silicon, Candle for cross-platform CPU inference.

**License**: AGPL-3.0 (network use requires source disclosure).

---

## Essential Commands

### Build & Run
```bash
# Build debug binary
make build
cargo build

# Build optimized release binary
make release
cargo build --release

# Run with debug logging
RUST_LOG=debug cargo run -- "list all files"
make run-debug

# Install locally
make install
cargo install --path .

# Check binary size (must be <50MB)
make size-check
```

### Testing (Critical - TDD Project)
```bash
# Run all tests (default - quiet mode)
make test
RUST_LOG=warn cargo test -q --all-features

# Run specific test suites
make test-contract          # Contract tests only
make test-integration       # Integration tests only
make test-property          # Property-based tests only

# Verbose test output
make test-verbose
RUST_LOG=debug cargo test --verbose --all-features

# Show stdout/stderr from tests
make test-show-output

# Use nextest (if installed)
make test-nextest

# Watch mode for continuous testing
make test-watch
cargo watch -x "test -q --all-features"
```

### Code Quality (Pre-PR Requirements)
```bash
# Format code (MUST run before commit)
make fmt
cargo fmt --all

# Check formatting without changing files
make fmt-check
cargo fmt --all -- --check

# Run Clippy linter (warnings treated as errors)
make lint
cargo clippy --all-targets --all-features -- -D warnings

# Security audit
make audit
cargo audit

# Run all quality checks + tests
make check
```

### Performance & Documentation
```bash
# Run Criterion benchmarks
make bench
cargo bench

# Generate and open documentation
make doc
cargo doc --no-deps --open

# Profile release build
make profile
```

---

## Project Structure

```
cmdai/
├── src/
│   ├── lib.rs              # Library exports (library-first architecture)
│   ├── main.rs             # CLI entry point (orchestration only, no business logic)
│   ├── models/             # Core data types (no dependencies)
│   ├── safety/             # Command validation (depends only on models)
│   │   ├── mod.rs          # SafetyValidator with 52 pre-compiled patterns
│   │   └── patterns.rs     # Dangerous command pattern database
│   ├── backends/           # LLM backend implementations
│   │   ├── mod.rs          # CommandGenerator trait
│   │   ├── embedded/       # MLX (Apple Silicon) and CPU (Candle) backends
│   │   └── remote/         # Ollama and vLLM backends (feature-gated)
│   ├── cache/              # Model caching with integrity validation
│   ├── config/             # Configuration management (TOML)
│   ├── cli/                # CLI interface and argument parsing
│   ├── execution/          # Shell detection and execution context
│   ├── logging/            # Structured logging with tracing
│   ├── platform/           # Platform detection utilities
│   └── model_loader.rs     # Hugging Face model loading
├── tests/                  # Contract-first test structure
│   ├── *_contract.rs       # Contract tests (API boundaries)
│   ├── *_integration.rs    # Integration tests (workflows)
│   ├── property_tests.rs   # Property-based tests
│   ├── e2e_cli_tests.rs    # E2E user scenarios
│   └── quickstart_scenarios.rs  # Scenario walk-throughs
├── benches/                # Criterion performance benchmarks
├── specs/                  # Product & architecture specifications
├── .specify/memory/        # Project constitution and checklists
├── .github/workflows/      # CI/CD (multi-platform, security audit)
└── exports/                # Generated artifacts for reviews
```

---

## Constitution & Core Principles

### I. Simplicity (MANDATORY)
- **Library-first architecture**: All features in `lib.rs`, `main.rs` orchestrates only
- **Direct framework usage**: No wrapper abstractions around clap, tokio, serde
- **Single data flow**: `CommandRequest → GeneratedCommand → ValidationResult`
- **YAGNI**: No organizational-only patterns (repositories, DTOs, UoW)

### II. Test-First (NON-NEGOTIABLE)
- **RED-GREEN-REFACTOR cycle strictly enforced**
- **No implementation without failing test first** - violations block code review
- **Test ordering**: Contract tests → Integration tests → Implementation → Unit tests
- **Real dependencies in tests** - no mocking unless testing error conditions
- **Git commit granularity**: Tests must be committed before implementation

### III. Safety-First Development
- **Dangerous command detection mandatory** before execution
- **52 pre-compiled regex patterns** covering Critical/High/Moderate risks
- **Risk levels**: Safe (green) → Moderate (yellow) → High (orange) → Critical (red)
- **User confirmation required** for High/Critical operations
- **No unsafe Rust** without explicit justification
- **Validation performance**: <50ms at P95

### IV. Implementation Order (STRICT)
1. **Models first** (`src/models/mod.rs`) - Foundation, no dependencies
2. **Safety second** (`src/safety/`) - Depends only on models
3. **Backends third** (`src/backends/`) - Depends on models
4. **CLI last** (`src/cli/`) - Orchestrates all modules

### V. Observability
- **Structured logging**: Use `tracing` crate with appropriate levels
- **Error handling**: `anyhow` for binaries, `thiserror` for libraries
- **User-facing messages**: Clear, actionable, distinct from debug logs
- **Performance metrics**: Log startup time, validation latency, inference duration

---

## Coding Conventions

### Formatting (rustfmt.toml)
```
max_width = 100
tab_spaces = 4 (no hard tabs)
newline_style = Unix
edition = 2021
reorder_imports = true
use_try_shorthand = true
```

**Before committing**: Always run `cargo fmt --all` or `make fmt`.

### Naming Conventions
- **Types**: UpperCamelCase (`CommandRequest`, `SafetyValidator`)
- **Functions/modules**: snake_case (`generate_command`, `safety/patterns.rs`)
- **Constants**: SCREAMING_SNAKE_CASE (`MAX_COMMAND_LENGTH`)
- **Enum variants**: UpperCamelCase with descriptive names (`RiskLevel::Critical`)

### Linting (Clippy)
- **Warnings treated as errors**: `cargo clippy -- -D warnings`
- **Cognitive complexity threshold**: 25
- **Enum variant name threshold**: 3
- **Allow attributes**: Local with explanation comment only

### Error Handling Patterns
```rust
// Libraries: Use thiserror for typed errors
#[derive(Debug, thiserror::Error)]
pub enum GeneratorError {
    #[error("Backend unavailable: {reason}")]
    BackendUnavailable { reason: String },
    // ...
}

// Binaries: Use anyhow for context chains
use anyhow::{Context, Result};
let config = ConfigManager::load()
    .context("Failed to load configuration")?;
```

### Async Patterns
```rust
// Use async-trait for trait methods
#[async_trait]
pub trait CommandGenerator: Send + Sync {
    async fn generate_command(&self, request: &CommandRequest) 
        -> Result<GeneratedCommand, GeneratorError>;
}

// Tokio runtime in main.rs
#[tokio::main]
async fn main() { /* ... */ }
```

---

## Testing Strategy

### Test-Driven Development Workflow
1. **RED**: Write test, verify it fails (`cargo test --test <test_file>`)
2. **GREEN**: Add minimal code to make test pass (no extra features)
3. **REFACTOR**: Improve code quality while keeping tests green
4. **COMMIT**: Granular commits after each test passes
5. **REPEAT**: Move to next failing test

### Test Types & Priority
1. **Contract tests** (`*_contract.rs`) - API boundaries, trait compliance
2. **Integration tests** (`*_integration.rs`) - Module interactions, workflows
3. **E2E tests** (`e2e_*.rs`) - User scenarios, CLI interface
4. **Property tests** (`property_tests.rs`) - Safety validation with random inputs
5. **Unit tests** (in modules) - Edge cases, error conditions

### Test Environment
```bash
# Set log levels to reduce noise
RUST_LOG=warn cargo test -q

# For debugging specific tests
RUST_LOG=debug cargo test test_name -- --nocapture

# Use nextest for cleaner output
cargo nextest run --all-features
```

### Test Organization
```
tests/
├── backend_trait_contract.rs       # CommandGenerator trait
├── safety_validator_contract.rs    # SafetyValidator API
├── cli_interface_contract.rs       # CLI argument parsing
├── config_contract.rs              # Configuration management
├── cache_contract.rs               # Cache integrity
├── execution_contract.rs           # Shell detection
├── integration_tests.rs            # End-to-end workflows
├── embedded_integration.rs         # Embedded backend tests
├── remote_integration.rs           # Remote backend tests (feature-gated)
├── property_tests.rs               # Property-based testing
├── e2e_cli_tests.rs               # CLI user scenarios
└── quickstart_scenarios.rs        # Scenario walk-throughs
```

### Testing Best Practices
- **Real dependencies**: Don't mock unless testing error conditions
- **Quiet by default**: Use `RUST_LOG=warn` and `-q` flag
- **Document scenarios**: Add new scenarios to `specs/` when extending features
- **Integration required for**: New libraries, contract changes, shared schemas
- **Run full suite before PR**: `make check` (fmt + lint + audit + test)

---

## Feature Flags & Cross-Platform

### Feature Flags (Cargo.toml)
```toml
[features]
default = ["embedded-cpu"]           # CPU inference via Candle
mock-backend = []                    # Mock backend for testing
remote-backends = ["reqwest"]        # Ollama + vLLM support
embedded-mlx = ["cxx", "mlx-rs"]    # Apple Silicon MLX (macOS aarch64)
embedded-cpu = ["candle-core"]       # Cross-platform CPU inference
full = ["remote-backends", "embedded-mlx", "embedded-cpu"]
```

### Build Commands by Feature
```bash
# Default build (CPU inference only)
cargo build --release

# With remote backends
cargo build --release --features remote-backends

# Apple Silicon optimized (macOS only)
cargo build --release --features embedded-mlx,embedded-cpu

# All features
cargo build --release --all-features
```

### Platform-Specific Code
```rust
// Conditional compilation for Apple Silicon
#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
pub use backends::embedded::MlxBackend;

// Feature-gated modules
#[cfg(feature = "remote-backends")]
pub mod remote;
```

### CI/CD Matrix
- **Platforms**: Ubuntu, macOS (Intel + Silicon), Windows
- **Targets**: linux-amd64, linux-arm64, macos-intel, macos-silicon, windows-amd64
- **Tests**: All platforms run lib tests, Ubuntu runs full feature matrix
- **Checks**: Formatting, Clippy, security audit, binary size (<50MB)

---

## Important Gotchas & Patterns

### 1. Rust Environment Setup
**CRITICAL**: Before running any `cargo` command, verify Rust is in PATH:
```bash
which cargo
# If not found, load Rust environment:
. "$HOME/.cargo/env"
```

### 2. Safety Pattern Compilation
- **Patterns compiled once at startup** using `once_cell::Lazy` (30x speedup)
- **52 pre-compiled regex patterns** in `src/safety/patterns.rs`
- **Context-aware matching**: Distinguishes dangerous commands from safe string literals
  - `rm -rf /` → dangerous (executable context)
  - `echo 'rm -rf /' > script.sh` → safe (in quotes)

### 3. Test Failure Messages
- **"old_string not found"** in edit: Check exact whitespace/indentation
- **Async test timeout**: Increase timeout in `.config/nextest.toml`
- **Test output noise**: Use `RUST_LOG=warn` or `RUST_LOG=error`

### 4. Performance Requirements
- **Startup time**: <100ms (target)
- **First inference**: <2s on M1 Mac
- **Safety validation**: <50ms at P95
- **Binary size**: <50MB (use `make size-check`)

### 5. Configuration Files
- **User config**: `~/.config/cmdai/config.toml`
- **Backend config**: Supports embedded (default), Ollama, vLLM
- **Safety config**: Customizable patterns, safety levels (strict/moderate/permissive)

### 6. Library-First Architecture
- **All business logic in `src/lib.rs` exports**
- **`main.rs` orchestrates only** - no business logic
- **Public APIs must have rustdoc comments**
- **Each module has single, well-defined purpose**

### 7. Error Handling
- **Libraries**: Use `thiserror` for typed errors with `#[error("...")]`
- **Binaries**: Use `anyhow` with `.context("...")` chains
- **No panics in production** - use `Result` types
- **User-facing messages**: Helpful, actionable, distinct from debug logs

### 8. Dependency Management
- **`cargo audit`**: Run before merging PRs
- **`deny.toml`**: Gating rules for licenses and vulnerabilities
- **Allowed licenses**: MIT, Apache-2.0, BSD-2/3-Clause, ISC, Unicode-DFS-2016, CC0-1.0
- **Vulnerability policy**: `deny`, unmaintained crates `warn`

### 9. Git Commit Conventions
- **Concise, Title Case subjects** (<72 chars)
- **Imperative mood** ("Add feature" not "Added feature")
- **Optional leading emoji** (🎉, 🐛, 📚, ✨, etc.)
- **PR references**: `(#12)` at end of subject
- **Squash WIP commits** before requesting review

### 10. TDD Enforcement
- **Pre-commit hooks** verify test-first discipline
- **PR reviews validate** TDD workflow
- **Failing to follow TDD** results in rejected changes
- **Git commits must show** tests before implementation

---

## Development Workflow

### Starting a New Feature
1. **Read specs**: Check `specs/` for architecture contracts
2. **Write contract test**: Define API in `tests/*_contract.rs`
3. **Verify RED**: Run `cargo test --test <test_file>` - must fail
4. **Minimal implementation**: Add just enough code to pass
5. **Verify GREEN**: Run tests again - must pass
6. **Refactor**: Improve code quality, keep tests green
7. **Commit**: Granular commit after each test passes
8. **Repeat**: Next failing test

### Pre-PR Checklist
```bash
# 1. Format code
make fmt

# 2. Run linter
make lint

# 3. Security audit
make audit

# 4. Run full test suite
make test

# 5. Check binary size
make size-check

# Or run all at once:
make check
```

### PR Guidelines
- **Describe intent**: What problem does this solve?
- **List touched modules**: Which files/modules changed?
- **Link to specs/issues**: Reference relevant documentation
- **Include transcripts**: Screenshots or command output for user-facing changes
- **Squash WIP commits**: Clean commit history
- **Follow commit conventions**: Title Case, imperative mood, <72 chars

---

## Backend System

### CommandGenerator Trait
All backends implement this async trait:
```rust
#[async_trait]
pub trait CommandGenerator: Send + Sync {
    async fn generate_command(&self, request: &CommandRequest) 
        -> Result<GeneratedCommand, GeneratorError>;
    async fn is_available(&self) -> bool;
    fn backend_info(&self) -> BackendInfo;
    async fn shutdown(&self) -> Result<(), GeneratorError>;
}
```

### Available Backends
1. **Embedded CPU (default)**: Candle framework, cross-platform
2. **Embedded MLX**: Apple Silicon optimized (macOS aarch64)
3. **Ollama**: Local API backend (feature: `remote-backends`)
4. **vLLM**: Remote HTTP API (feature: `remote-backends`)

### Backend Configuration (`~/.config/cmdai/config.toml`)
```toml
[backend]
primary = "embedded"  # or "ollama", "vllm"
enable_fallback = true

[backend.ollama]
base_url = "http://localhost:11434"
model_name = "codellama:7b"

[backend.vllm]
base_url = "http://localhost:8000"
model_name = "codellama/CodeLlama-7b-hf"
```

---

## Safety Validation

### Risk Levels
- **Safe (Green)**: Normal operations, no confirmation
- **Moderate (Yellow)**: Confirmation in strict mode
- **High (Orange)**: Confirmation in moderate mode
- **Critical (Red)**: Blocked in strict mode, explicit confirmation required

### Blocked Operations (Critical)
- Filesystem destruction: `rm -rf /`, `rm -rf ~`, `mkfs`
- Fork bombs: `:(){ :|:& };:`
- Device writes: `dd if=/dev/zero`
- System path modifications: `/bin`, `/usr`, `/etc`
- Privilege escalation: `sudo su`, `chmod 777 /`

### Safety Configuration (`~/.config/cmdai/config.toml`)
```toml
[safety]
enabled = true
level = "moderate"  # strict, moderate, or permissive
require_confirmation = true
custom_patterns = ["additional", "dangerous", "patterns"]
```

### Validation Pipeline
1. **Pattern Matching**: Check against 52 pre-compiled regex patterns
2. **POSIX Compliance**: Validate shell syntax and quoting
3. **Path Validation**: Prevent injection, verify quote escaping
4. **Risk Assessment**: Assign Safe/Moderate/High/Critical level
5. **User Confirmation**: Require explicit approval for High/Critical

---

## Common Tasks

### Adding a New Safety Pattern
1. Add pattern to `src/safety/patterns.rs`
2. Add test to `tests/safety_validator_contract.rs`
3. Verify pattern compiles: `cargo test test_pattern_compilation`
4. Test dangerous command detection
5. Document in specs and update CHANGELOG.md

### Implementing a New Backend
1. Add contract test to `tests/backend_trait_contract.rs`
2. Implement `CommandGenerator` trait in `src/backends/<name>.rs`
3. Add backend-specific configuration to `src/config/`
4. Test availability checking and error handling
5. Add integration test in `tests/<name>_integration.rs`
6. Document configuration in README.md

### Adding a CLI Flag
1. Add field to `Cli` struct in `src/main.rs`
2. Implement `IntoCliArgs` method
3. Add contract test to `tests/cli_interface_contract.rs`
4. Add E2E test to `tests/e2e_cli_tests.rs`
5. Update README.md usage section

### Debugging Test Failures
```bash
# Run single test with output
RUST_LOG=debug cargo test test_name -- --nocapture

# Run with backtraces
RUST_BACKTRACE=1 cargo test test_name

# Use nextest with verbose profile
cargo nextest run -P verbose test_name

# Check test without running
cargo test --no-run --verbose
```

---

## Resources

### Key Files
- **Constitution**: `.specify/memory/constitution.md` - Core principles (MUST READ)
- **Contributing**: `CONTRIBUTING.md` - Development guidelines
- **Claude Guide**: `CLAUDE.md` - AI assistant context
- **Specifications**: `specs/` - Architecture contracts and design docs
- **Changelog**: `CHANGELOG.md` - Version history and breaking changes

### Documentation
- **API Docs**: `cargo doc --no-deps --open`
- **README**: Project overview, quick start, usage examples
- **Code of Conduct**: `CODE_OF_CONDUCT.md`

### CI/CD
- **CI Workflow**: `.github/workflows/ci.yml` - Multi-platform tests, security audit
- **Release Workflow**: `.github/workflows/release.yml` - Binary distribution
- **Issue Templates**: `.github/ISSUE_TEMPLATE/` - Bug reports, feature requests

### External Resources
- [MLX Framework]https://github.com/ml-explore/mlx - Apple Silicon ML
- [vLLM]https://github.com/vllm-project/vllm - High-performance LLM serving
- [Ollama]https://ollama.ai - Local LLM runtime
- [Hugging Face]https://huggingface.co - Model hosting

---

## Quick Reference Card

### Most Common Commands
```bash
# Development cycle
make build          # Build debug
make test           # Run all tests quietly
make fmt            # Format code
make lint           # Run Clippy

# Pre-commit
make check          # Format + lint + audit + test

# Release
make release        # Build optimized binary
make size-check     # Verify <50MB
make install        # Install locally

# Debugging
RUST_LOG=debug cargo run -- "prompt"
RUST_LOG=debug cargo test test_name -- --nocapture
```

### Test Quick Reference
```bash
make test           # All tests (quiet)
make test-contract  # Contract tests
make test-integration  # Integration tests
make test-verbose   # Verbose output
make test-nextest   # Use nextest if available
```

### File Locations
- **User config**: `~/.config/cmdai/config.toml`
- **Cargo cache**: `~/.cargo/`
- **Build output**: `target/release/cmdai`
- **Test artifacts**: `target/nextest/`
- **Benchmark reports**: `target/criterion/`

---

**Last Updated**: 2025-01-24  
**Project Stage**: Active early development (core implementation in progress)  
**Constitution Version**: 1.0.0  
**Rust Edition**: 2021  
**MSRV**: 1.75+