selfware 0.2.0 - Docs.rs

# Selfware Project - Comprehensive Review & Recommendations

**Review Date:** 2026-03-05  
**Project:** Selfware - Agentic Coding Harness for Local LLMs  
**Codebase Size:** ~193k lines of Rust code (184 source files, 34 test files)  
**Version:** 0.1.0

---

## Executive Summary

Selfware is an ambitious, well-architected agentic coding framework with an impressive feature set: 54 tools, multi-agent swarm support, TUI dashboard, evolution engine for self-improvement, and a comprehensive safety system. The codebase demonstrates strong engineering practices with ~5,200 tests, feature-flag modularity, and good separation of concerns.

**Overall Grade: A-** (Excellent foundation with specific areas for improvement)

---

## 1. Architecture Assessment

### Strengths

1. **Modular Design**: Clean separation into `agent/`, `tools/`, `safety/`, `cognitive/`, `orchestration/` modules
2. **Feature Flag System**: Well-designed feature gates (`tui`, `resilience`, `self-improvement`, etc.)
3. **PDVR Cycle**: Thoughtful cognitive architecture (Plan-Do-Verify-Reflect)
4. **Plugin Architecture**: Tool registry pattern allows easy extension
5. **Safety-First Design**: Multi-layer safety with path validation, command filtering, sandboxing

### Concerns

1. **Module Size Imbalance**: Some modules are very large:
   - `src/tools/knowledge.rs`: 1,537 lines
   - `src/ui/animations.rs`: 1,503 lines
   - `src/session/local_first.rs`: 2,527 lines
   - `src/ui/garden.rs`: 1,206 lines

2. **Deep Module Nesting**: Some paths are excessively deep (`src/orchestration/workflow_dsl/`)

3. **Circular Dependencies Risk**: `cognitive/` and `agent/` have tight coupling

---

## 2. Detailed Recommendations

### 2.1 Code Organization (High Priority)

#### R1.1: Split Oversized Modules

**Current Issue**: Several files exceed 1,500 lines, violating single responsibility principle.

**Recommendation**:
```
src/tools/knowledge.rs → src/tools/knowledge/{mod,graph,query,export}.rs
src/ui/animations.rs   → src/ui/animations/{mod,particles,transitions,effects}.rs
src/session/local_first.rs → src/session/local/{storage,sync,cache,conflict}.rs
```

**Benefit**: Improved maintainability, faster compile times, easier testing.

#### R1.2: Consolidate Configuration

**Current Issue**: Configuration is scattered across multiple structs in `config/mod.rs` (~1,000+ lines).

**Recommendation**: Split into focused submodules:
```
src/config/
├── mod.rs          # Core Config struct only
├── safety.rs       # SafetyConfig, path validation settings
├── agent.rs        # AgentConfig, cognitive settings  
├── resources.rs    # ResourcesConfig (already exists)
├── evolution.rs    # EvolutionTomlConfig
└── validation.rs   # Config validation logic
```

### 2.2 Safety & Security (Critical Priority)

#### R2.1: Implement Defense in Depth Audit

**Current State**: Good foundation but needs hardening.

**Recommendations**:
1. **Add Fuzz Testing**: Use `cargo-fuzz` for path validation and command parsing
2. **Sandbox Escape Testing**: Regular audits of Docker/container escapes
3. **Secret Scanning**: Expand `SecurityScanner` to detect more patterns (see below)

#### R2.2: Expand Secret Detection Patterns

**Current**: Basic secret scanning in `safety/scanner.rs`.

**Add Patterns**:
```rust
// In src/safety/scanner.rs
const ADDITIONAL_PATTERNS: &[(&str, &str)] = &[
    ("AWS_SESSION_TOKEN", r"(?i)aws_session_token[=:\s]+[a-z0-9/+=]{16,}"),
    ("GitHub_PAT", r"ghp_[a-zA-Z0-9]{36}"),
    ("Slack_Token", r"xox[baprs]-[0-9]{10,13}-[0-9]{10,13}[a-zA-Z0-9-]*"),
    ("Private_Key", r"-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"),
];
```

#### R2.3: Implement Command Allowlist Mode

**Current**: Blacklist-based command filtering.

**Recommendation**: Add opt-in allowlist mode for high-security environments:
```toml
[safety]
mode = "allowlist"  # or "blacklist"
allowed_commands = ["cargo", "git", "rustc", "make"]
```

### 2.3 Error Handling (Medium Priority)

#### R3.1: Standardize Error Context

**Current Issue**: Inconsistent error wrapping patterns across modules.

**Recommendation**: Implement structured error contexts:
```rust
// New type for rich error context
#[derive(Debug)]
pub struct ToolExecutionError {
    pub tool_name: String,
    pub args: serde_json::Value,
    pub phase: ExecutionPhase,
    pub source: anyhow::Error,
    pub suggestions: Vec<String>,  // AI-generated recovery suggestions
}
```

#### R3.2: Add Error Recovery Codes

**Current**: Exit codes defined but not consistently used.

**Recommendation**: Document and enforce exit code usage:
```rust
// In src/errors.rs - already exists, needs enforcement
pub const EXIT_TOOL_NOT_FOUND: u8 = 7;
pub const EXIT_RATE_LIMITED: u8 = 8;
pub const EXIT_CONTEXT_EXHAUSTED: u8 = 9;
```

### 2.4 Testing Strategy (High Priority)

#### R4.1: Increase Integration Test Coverage

**Current**: ~5,200 tests, but integration tests rely heavily on external LLM.

**Recommendations**:
1. **Mock LLM Server**: Create deterministic mock for CI
2. **Contract Tests**: Expand contract testing for API boundaries
3. **Property-Based Tests**: Already have `prop_safety.rs`, `prop_tool_parser.rs` - expand to tools

#### R4.2: Add Performance Regression Tests

**New Recommendation**:
```rust
// tests/performance/token_budget.rs
#[test]
fn token_budget_enforcement() {
    // Ensure token budget is never exceeded by >10%
}

#[test]
fn context_compression_effectiveness() {
    // Verify compression reduces tokens by at least 30%
}
```

#### R4.3: Chaos Testing

**Recommendation**: Add chaos engineering tests:
```rust
// tests/chaos/agent_recovery.rs
#[tokio::test]
async fn agent_recovers_from_panic() {
    // Inject panics at various points, verify recovery
}
```

### 2.5 Documentation (Medium Priority)

#### R5.1: Architecture Decision Records (ADRs)

Create `docs/adr/` directory with decisions:
- ADR-001: Why PDVR instead of ReAct
- ADR-002: Feature flag decomposition strategy
- ADR-003: XML vs native function calling tradeoffs
- ADR-004: Why bincode over JSON for storage

#### R5.2: Inline Documentation Gaps

**Modules needing more docs**:
- `cognitive/memory_hierarchy.rs`: Complex indexing strategy needs explanation
- `orchestration/workflow_dsl/`: Parser implementation lacks examples
- `evolution/daemon.rs`: Tournament selection algorithm needs docs

### 2.6 Performance Optimizations (Medium Priority)

#### R6.1: Memory Usage Audit

**Current Issue**: Several `Vec` allocations that could be pooled.

**Recommendations**:
```rust
// In src/agent/mod.rs - use object pooling for messages
use object_pool::Pool;

static MESSAGE_POOL: Pool<Vec<Message>> = Pool::new(32, || Vec::with_capacity(64));
```

#### R6.2: Async Optimization

**Current Issue**: Some blocking operations in async context.

**Audit Points**:
- `file.read_to_string()` should use `tokio::fs`
- `serde_json::from_str` on large inputs could be streamed
- Git operations use `git2` (blocking) - consider `tokio::task::spawn_blocking`

#### R6.3: Compilation Time

**Current**: ~193k lines, compile times likely significant.

**Recommendations**:
1. Enable `lto = "thin"` in release profile
2. Use `cargo build --timings` to identify bottlenecks
3. Consider workspace split for evolution/tui features

### 2.7 API Design (Low Priority)

#### R7.1: Tool Schema Validation

**Current**: JSON schemas defined but not validated at compile time.

**Recommendation**: Use `schemars` for derive-based schema generation:
```rust
#[derive(JsonSchema)]
pub struct FileReadArgs {
    pub path: PathBuf,
    #[serde(default)]
    pub offset: usize,
}
```

#### R7.2: Streaming Improvements

**Current**: Good streaming foundation in `api/mod.rs`.

**Enhancement**: Add backpressure handling:
```rust
pub struct StreamingResponse {
    response: reqwest::Response,
    chunk_timeout: Duration,
    backpressure_threshold: usize,  // NEW
}
```

### 2.8 Observability (Medium Priority)

#### R8.1: Structured Logging

**Current**: Uses `tracing` but with inconsistent field naming.

**Standardize**:
```rust
// Always use these field names
span!(Level::INFO, "tool_execution", 
    tool.name = %name,
    tool.duration_ms = %duration.as_millis(),
    tool.success = success,
);
```

#### R8.2: Metrics Export

**Current**: Prometheus support exists but limited metrics.

**Add**:
- Tool execution latency histograms
- Token usage by model
- Safety check rejection rates
- Checkpoint save/load durations

---

## 3. Code Quality Issues Found

### Issue 1: Dead Code Warnings Suppressed Globally
**Location**: `src/lib.rs:3`
```rust
#![allow(dead_code, unused_imports, unused_variables)]
```
**Recommendation**: Remove this and fix individual warnings. Suppressing globally hides real issues.

### Issue 2: Panic Handling in Agent Loop
**Location**: `src/agent/loop_control.rs`
**Issue**: Potential for panics to kill the entire process.
**Fix**: Use `catch_unwind` in strategic locations with recovery logic.

### Issue 3: Potential Blocking in Async Context
**Location**: Multiple tool implementations
**Pattern**: `std::fs::read_to_string()` in async fn
**Fix**: Use `tokio::fs` or wrap in `spawn_blocking`.

### Issue 4: Hardcoded Timeouts
**Location**: `src/api/mod.rs:459`
```rust
let request_timeout = config.agent.step_timeout_secs.max(60);
```
**Issue**: Minimum 60s may still be too short for slow local models.
**Fix**: Make minimum configurable via env var.

### Issue 5: Clone-heavy Code
**Location**: `src/agent/mod.rs:116-120`
**Pattern**: Multiple clones in hot path
**Fix**: Use `Arc<str>` or `Arc<String>` for shared immutable data.

---

## 4. Security Recommendations

### S1: Implement Content Security Policy
For TUI mode, ensure no arbitrary code execution via terminal escape sequences.

### S2: Audit All `unsafe` Blocks
Search for `unsafe` usage and document each with SAFETY comments:
```bash
grep -r "unsafe" src/ --include="*.rs" | wc -l  # Should be 0 or minimal
```

### S3: Supply Chain Security
Add to CI:
```yaml
- name: Audit dependencies
  run: cargo audit
  
- name: Check for known vulnerabilities  
  run: cargo deny check advisories
```

### S4: Signed Releases
Current releases are unsigned. Add:
```bash
# Create signed release artifacts
codesign --sign "Developer ID" selfware
```

---

## 5. Feature-Specific Recommendations

### 5.1 Evolution Engine (self-improvement)

**Current**: Good foundation with safety invariants.

**Enhancements**:
1. Add mutation diversity tracking to prevent local optima
2. Implement "cooling schedule" for mutation rate
3. Add evolutionary history visualization
4. Implement A/B testing framework for prompt variants

### 5.2 TUI Dashboard (tui feature)

**Current**: Feature-rich with ratatui.

**Improvements**:
1. Add accessibility mode (screen reader support)
2. Implement proper colorblind-friendly palettes
3. Add mouse support for non-keyboard users
4. Performance: Use double-buffering for animations

### 5.3 Multi-Agent Swarm

**Current**: Basic coordination exists.

**Recommendations**:
1. Add conflict resolution protocols
2. Implement agent reputation scoring
3. Add swarm topology visualization
4. Create "swarm mode" presets (e.g., "code review", "refactoring")

---

## 6. CI/CD Improvements

### Current State (from `.github/workflows/`)
- Basic CI with cargo test
- Security audit job
- Release workflow with artifact upload

### Recommended Additions

```yaml
# Additional jobs to add to ci.yml

performance-regression:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Run benchmarks
      run: cargo bench -- --baseline main

fuzz-testing:
  runs-on: ubuntu-latest
  steps:
    - name: Install cargo-fuzz
      run: cargo install cargo-fuzz
    - name: Run fuzz targets for 10 minutes
      run: cargo fuzz run path_validation --max-total-time=600

docs-check:
  runs-on: ubuntu-latest
  steps:
    - name: Check docs build
      run: cargo doc --no-deps --all-features
    - name: Check for broken links
      run: cargo doc --no-deps 2>&1 | grep -i "warning.*link" || true
```

---

## 7. Documentation Checklist

### User Documentation
- [ ] Complete CLI reference with examples
- [ ] Troubleshooting guide
- [ ] Configuration cookbook (common patterns)
- [ ] Model-specific guides (Qwen, Kimi, LFM2)

### Developer Documentation  
- [x] Architecture overview (`docs/architecture.md`)
- [ ] Contributing guidelines (partial - `CONTRIBUTING.md` exists)
- [ ] Module-level documentation (some gaps)
- [ ] API documentation (rustdocs need review)

### Operations Documentation
- [ ] Deployment guide (Docker, k8s)
- [ ] Monitoring and alerting runbook
- [ ] Backup/recovery procedures for checkpoints

---

## 8. Priority Matrix

| Priority | Item | Effort | Impact |
|----------|------|--------|--------|
| P0 (Critical) | Fix dead_code suppression | Low | High |
| P0 | Add fuzz testing for safety | Medium | High |
| P1 (High) | Split oversized modules | Medium | Medium |
| P1 | Implement mock LLM for tests | Medium | High |
| P1 | Add chaos tests | Medium | High |
| P2 (Medium) | Performance audit | High | Medium |
| P2 | Documentation improvements | Medium | Medium |
| P3 (Low) | API schema validation | Low | Low |
| P3 | TUI accessibility | Medium | Medium |

---

## 9. Conclusion

Selfware is an impressive piece of engineering with a solid foundation. The codebase shows:

**Strengths**:
- Strong safety-first design philosophy
- Good modular architecture
- Comprehensive feature set
- Excellent test coverage (~5,200 tests)
- Active maintenance (recent changelog entries)

**Areas for Focus**:
1. **Code organization**: Split oversized modules
2. **Security hardening**: Fuzz testing, command allowlists
3. **Testing**: Mock LLM server, chaos tests
4. **Documentation**: ADRs, inline docs for complex algorithms

The project is production-ready for its intended use case (local LLM coding assistant), with the evolution engine being an experimental but promising feature that needs continued hardening.

**Recommended Next Steps**:
1. Address P0 items immediately
2. Create roadmap for module refactoring
3. Establish security audit cadence (quarterly)
4. Consider forming a security review board for evolution feature

---

*Review completed by: Claude Code CLI*  
*Total files reviewed: 200+ Rust source files*  
*Documentation reviewed: README, architecture.md, CHANGELOG, CONTRIBUTING, inline docs*