selfware 0.2.0

Your personal AI workshop — software you own, software that lasts
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
# Selfware Project - Comprehensive Review & Recommendations

**Review Date:** 2026-03-05  
**Project:** Selfware - Agentic Coding Harness for Local LLMs  
**Codebase Size:** ~193k lines of Rust code (184 source files, 34 test files)  
**Version:** 0.1.0

---

## Executive Summary

Selfware is an ambitious, well-architected agentic coding framework with an impressive feature set: 54 tools, multi-agent swarm support, TUI dashboard, evolution engine for self-improvement, and a comprehensive safety system. The codebase demonstrates strong engineering practices with ~5,200 tests, feature-flag modularity, and good separation of concerns.

**Overall Grade: A-** (Excellent foundation with specific areas for improvement)

---

## 1. Architecture Assessment

### Strengths

1. **Modular Design**: Clean separation into `agent/`, `tools/`, `safety/`, `cognitive/`, `orchestration/` modules
2. **Feature Flag System**: Well-designed feature gates (`tui`, `resilience`, `self-improvement`, etc.)
3. **PDVR Cycle**: Thoughtful cognitive architecture (Plan-Do-Verify-Reflect)
4. **Plugin Architecture**: Tool registry pattern allows easy extension
5. **Safety-First Design**: Multi-layer safety with path validation, command filtering, sandboxing

### Concerns

1. **Module Size Imbalance**: Some modules are very large:
   - `src/tools/knowledge.rs`: 1,537 lines
   - `src/ui/animations.rs`: 1,503 lines
   - `src/session/local_first.rs`: 2,527 lines
   - `src/ui/garden.rs`: 1,206 lines

2. **Deep Module Nesting**: Some paths are excessively deep (`src/orchestration/workflow_dsl/`)

3. **Circular Dependencies Risk**: `cognitive/` and `agent/` have tight coupling

---

## 2. Detailed Recommendations

### 2.1 Code Organization (High Priority)

#### R1.1: Split Oversized Modules

**Current Issue**: Several files exceed 1,500 lines, violating single responsibility principle.

**Recommendation**:
```
src/tools/knowledge.rs → src/tools/knowledge/{mod,graph,query,export}.rs
src/ui/animations.rs   → src/ui/animations/{mod,particles,transitions,effects}.rs
src/session/local_first.rs → src/session/local/{storage,sync,cache,conflict}.rs
```

**Benefit**: Improved maintainability, faster compile times, easier testing.

#### R1.2: Consolidate Configuration

**Current Issue**: Configuration is scattered across multiple structs in `config/mod.rs` (~1,000+ lines).

**Recommendation**: Split into focused submodules:
```
src/config/
├── mod.rs          # Core Config struct only
├── safety.rs       # SafetyConfig, path validation settings
├── agent.rs        # AgentConfig, cognitive settings  
├── resources.rs    # ResourcesConfig (already exists)
├── evolution.rs    # EvolutionTomlConfig
└── validation.rs   # Config validation logic
```

### 2.2 Safety & Security (Critical Priority)

#### R2.1: Implement Defense in Depth Audit

**Current State**: Good foundation but needs hardening.

**Recommendations**:
1. **Add Fuzz Testing**: Use `cargo-fuzz` for path validation and command parsing
2. **Sandbox Escape Testing**: Regular audits of Docker/container escapes
3. **Secret Scanning**: Expand `SecurityScanner` to detect more patterns (see below)

#### R2.2: Expand Secret Detection Patterns

**Current**: Basic secret scanning in `safety/scanner.rs`.

**Add Patterns**:
```rust
// In src/safety/scanner.rs
const ADDITIONAL_PATTERNS: &[(&str, &str)] = &[
    ("AWS_SESSION_TOKEN", r"(?i)aws_session_token[=:\s]+[a-z0-9/+=]{16,}"),
    ("GitHub_PAT", r"ghp_[a-zA-Z0-9]{36}"),
    ("Slack_Token", r"xox[baprs]-[0-9]{10,13}-[0-9]{10,13}[a-zA-Z0-9-]*"),
    ("Private_Key", r"-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"),
];
```

#### R2.3: Implement Command Allowlist Mode

**Current**: Blacklist-based command filtering.

**Recommendation**: Add opt-in allowlist mode for high-security environments:
```toml
[safety]
mode = "allowlist"  # or "blacklist"
allowed_commands = ["cargo", "git", "rustc", "make"]
```

### 2.3 Error Handling (Medium Priority)

#### R3.1: Standardize Error Context

**Current Issue**: Inconsistent error wrapping patterns across modules.

**Recommendation**: Implement structured error contexts:
```rust
// New type for rich error context
#[derive(Debug)]
pub struct ToolExecutionError {
    pub tool_name: String,
    pub args: serde_json::Value,
    pub phase: ExecutionPhase,
    pub source: anyhow::Error,
    pub suggestions: Vec<String>,  // AI-generated recovery suggestions
}
```

#### R3.2: Add Error Recovery Codes

**Current**: Exit codes defined but not consistently used.

**Recommendation**: Document and enforce exit code usage:
```rust
// In src/errors.rs - already exists, needs enforcement
pub const EXIT_TOOL_NOT_FOUND: u8 = 7;
pub const EXIT_RATE_LIMITED: u8 = 8;
pub const EXIT_CONTEXT_EXHAUSTED: u8 = 9;
```

### 2.4 Testing Strategy (High Priority)

#### R4.1: Increase Integration Test Coverage

**Current**: ~5,200 tests, but integration tests rely heavily on external LLM.

**Recommendations**:
1. **Mock LLM Server**: Create deterministic mock for CI
2. **Contract Tests**: Expand contract testing for API boundaries
3. **Property-Based Tests**: Already have `prop_safety.rs`, `prop_tool_parser.rs` - expand to tools

#### R4.2: Add Performance Regression Tests

**New Recommendation**:
```rust
// tests/performance/token_budget.rs
#[test]
fn token_budget_enforcement() {
    // Ensure token budget is never exceeded by >10%
}

#[test]
fn context_compression_effectiveness() {
    // Verify compression reduces tokens by at least 30%
}
```

#### R4.3: Chaos Testing

**Recommendation**: Add chaos engineering tests:
```rust
// tests/chaos/agent_recovery.rs
#[tokio::test]
async fn agent_recovers_from_panic() {
    // Inject panics at various points, verify recovery
}
```

### 2.5 Documentation (Medium Priority)

#### R5.1: Architecture Decision Records (ADRs)

Create `docs/adr/` directory with decisions:
- ADR-001: Why PDVR instead of ReAct
- ADR-002: Feature flag decomposition strategy
- ADR-003: XML vs native function calling tradeoffs
- ADR-004: Why bincode over JSON for storage

#### R5.2: Inline Documentation Gaps

**Modules needing more docs**:
- `cognitive/memory_hierarchy.rs`: Complex indexing strategy needs explanation
- `orchestration/workflow_dsl/`: Parser implementation lacks examples
- `evolution/daemon.rs`: Tournament selection algorithm needs docs

### 2.6 Performance Optimizations (Medium Priority)

#### R6.1: Memory Usage Audit

**Current Issue**: Several `Vec` allocations that could be pooled.

**Recommendations**:
```rust
// In src/agent/mod.rs - use object pooling for messages
use object_pool::Pool;

static MESSAGE_POOL: Pool<Vec<Message>> = Pool::new(32, || Vec::with_capacity(64));
```

#### R6.2: Async Optimization

**Current Issue**: Some blocking operations in async context.

**Audit Points**:
- `file.read_to_string()` should use `tokio::fs`
- `serde_json::from_str` on large inputs could be streamed
- Git operations use `git2` (blocking) - consider `tokio::task::spawn_blocking`

#### R6.3: Compilation Time

**Current**: ~193k lines, compile times likely significant.

**Recommendations**:
1. Enable `lto = "thin"` in release profile
2. Use `cargo build --timings` to identify bottlenecks
3. Consider workspace split for evolution/tui features

### 2.7 API Design (Low Priority)

#### R7.1: Tool Schema Validation

**Current**: JSON schemas defined but not validated at compile time.

**Recommendation**: Use `schemars` for derive-based schema generation:
```rust
#[derive(JsonSchema)]
pub struct FileReadArgs {
    pub path: PathBuf,
    #[serde(default)]
    pub offset: usize,
}
```

#### R7.2: Streaming Improvements

**Current**: Good streaming foundation in `api/mod.rs`.

**Enhancement**: Add backpressure handling:
```rust
pub struct StreamingResponse {
    response: reqwest::Response,
    chunk_timeout: Duration,
    backpressure_threshold: usize,  // NEW
}
```

### 2.8 Observability (Medium Priority)

#### R8.1: Structured Logging

**Current**: Uses `tracing` but with inconsistent field naming.

**Standardize**:
```rust
// Always use these field names
span!(Level::INFO, "tool_execution", 
    tool.name = %name,
    tool.duration_ms = %duration.as_millis(),
    tool.success = success,
);
```

#### R8.2: Metrics Export

**Current**: Prometheus support exists but limited metrics.

**Add**:
- Tool execution latency histograms
- Token usage by model
- Safety check rejection rates
- Checkpoint save/load durations

---

## 3. Code Quality Issues Found

### Issue 1: Dead Code Warnings Suppressed Globally
**Location**: `src/lib.rs:3`
```rust
#![allow(dead_code, unused_imports, unused_variables)]
```
**Recommendation**: Remove this and fix individual warnings. Suppressing globally hides real issues.

### Issue 2: Panic Handling in Agent Loop
**Location**: `src/agent/loop_control.rs`
**Issue**: Potential for panics to kill the entire process.
**Fix**: Use `catch_unwind` in strategic locations with recovery logic.

### Issue 3: Potential Blocking in Async Context
**Location**: Multiple tool implementations
**Pattern**: `std::fs::read_to_string()` in async fn
**Fix**: Use `tokio::fs` or wrap in `spawn_blocking`.

### Issue 4: Hardcoded Timeouts
**Location**: `src/api/mod.rs:459`
```rust
let request_timeout = config.agent.step_timeout_secs.max(60);
```
**Issue**: Minimum 60s may still be too short for slow local models.
**Fix**: Make minimum configurable via env var.

### Issue 5: Clone-heavy Code
**Location**: `src/agent/mod.rs:116-120`
**Pattern**: Multiple clones in hot path
**Fix**: Use `Arc<str>` or `Arc<String>` for shared immutable data.

---

## 4. Security Recommendations

### S1: Implement Content Security Policy
For TUI mode, ensure no arbitrary code execution via terminal escape sequences.

### S2: Audit All `unsafe` Blocks
Search for `unsafe` usage and document each with SAFETY comments:
```bash
grep -r "unsafe" src/ --include="*.rs" | wc -l  # Should be 0 or minimal
```

### S3: Supply Chain Security
Add to CI:
```yaml
- name: Audit dependencies
  run: cargo audit
  
- name: Check for known vulnerabilities  
  run: cargo deny check advisories
```

### S4: Signed Releases
Current releases are unsigned. Add:
```bash
# Create signed release artifacts
codesign --sign "Developer ID" selfware
```

---

## 5. Feature-Specific Recommendations

### 5.1 Evolution Engine (self-improvement)

**Current**: Good foundation with safety invariants.

**Enhancements**:
1. Add mutation diversity tracking to prevent local optima
2. Implement "cooling schedule" for mutation rate
3. Add evolutionary history visualization
4. Implement A/B testing framework for prompt variants

### 5.2 TUI Dashboard (tui feature)

**Current**: Feature-rich with ratatui.

**Improvements**:
1. Add accessibility mode (screen reader support)
2. Implement proper colorblind-friendly palettes
3. Add mouse support for non-keyboard users
4. Performance: Use double-buffering for animations

### 5.3 Multi-Agent Swarm

**Current**: Basic coordination exists.

**Recommendations**:
1. Add conflict resolution protocols
2. Implement agent reputation scoring
3. Add swarm topology visualization
4. Create "swarm mode" presets (e.g., "code review", "refactoring")

---

## 6. CI/CD Improvements

### Current State (from `.github/workflows/`)
- Basic CI with cargo test
- Security audit job
- Release workflow with artifact upload

### Recommended Additions

```yaml
# Additional jobs to add to ci.yml

performance-regression:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Run benchmarks
      run: cargo bench -- --baseline main

fuzz-testing:
  runs-on: ubuntu-latest
  steps:
    - name: Install cargo-fuzz
      run: cargo install cargo-fuzz
    - name: Run fuzz targets for 10 minutes
      run: cargo fuzz run path_validation --max-total-time=600

docs-check:
  runs-on: ubuntu-latest
  steps:
    - name: Check docs build
      run: cargo doc --no-deps --all-features
    - name: Check for broken links
      run: cargo doc --no-deps 2>&1 | grep -i "warning.*link" || true
```

---

## 7. Documentation Checklist

### User Documentation
- [ ] Complete CLI reference with examples
- [ ] Troubleshooting guide
- [ ] Configuration cookbook (common patterns)
- [ ] Model-specific guides (Qwen, Kimi, LFM2)

### Developer Documentation  
- [x] Architecture overview (`docs/architecture.md`)
- [ ] Contributing guidelines (partial - `CONTRIBUTING.md` exists)
- [ ] Module-level documentation (some gaps)
- [ ] API documentation (rustdocs need review)

### Operations Documentation
- [ ] Deployment guide (Docker, k8s)
- [ ] Monitoring and alerting runbook
- [ ] Backup/recovery procedures for checkpoints

---

## 8. Priority Matrix

| Priority | Item | Effort | Impact |
|----------|------|--------|--------|
| P0 (Critical) | Fix dead_code suppression | Low | High |
| P0 | Add fuzz testing for safety | Medium | High |
| P1 (High) | Split oversized modules | Medium | Medium |
| P1 | Implement mock LLM for tests | Medium | High |
| P1 | Add chaos tests | Medium | High |
| P2 (Medium) | Performance audit | High | Medium |
| P2 | Documentation improvements | Medium | Medium |
| P3 (Low) | API schema validation | Low | Low |
| P3 | TUI accessibility | Medium | Medium |

---

## 9. Conclusion

Selfware is an impressive piece of engineering with a solid foundation. The codebase shows:

**Strengths**:
- Strong safety-first design philosophy
- Good modular architecture
- Comprehensive feature set
- Excellent test coverage (~5,200 tests)
- Active maintenance (recent changelog entries)

**Areas for Focus**:
1. **Code organization**: Split oversized modules
2. **Security hardening**: Fuzz testing, command allowlists
3. **Testing**: Mock LLM server, chaos tests
4. **Documentation**: ADRs, inline docs for complex algorithms

The project is production-ready for its intended use case (local LLM coding assistant), with the evolution engine being an experimental but promising feature that needs continued hardening.

**Recommended Next Steps**:
1. Address P0 items immediately
2. Create roadmap for module refactoring
3. Establish security audit cadence (quarterly)
4. Consider forming a security review board for evolution feature

---

*Review completed by: Claude Code CLI*  
*Total files reviewed: 200+ Rust source files*  
*Documentation reviewed: README, architecture.md, CHANGELOG, CONTRIBUTING, inline docs*