# Selfware Project - Comprehensive Review & Recommendations
**Review Date:** 2026-03-05
**Project:** Selfware - Agentic Coding Harness for Local LLMs
**Codebase Size:** ~193k lines of Rust code (184 source files, 34 test files)
**Version:** 0.1.0
---
## Executive Summary
Selfware is an ambitious, well-architected agentic coding framework with an impressive feature set: 54 tools, multi-agent swarm support, TUI dashboard, evolution engine for self-improvement, and a comprehensive safety system. The codebase demonstrates strong engineering practices with ~5,200 tests, feature-flag modularity, and good separation of concerns.
**Overall Grade: A-** (Excellent foundation with specific areas for improvement)
---
## 1. Architecture Assessment
### Strengths
1. **Modular Design**: Clean separation into `agent/`, `tools/`, `safety/`, `cognitive/`, `orchestration/` modules
2. **Feature Flag System**: Well-designed feature gates (`tui`, `resilience`, `self-improvement`, etc.)
3. **PDVR Cycle**: Thoughtful cognitive architecture (Plan-Do-Verify-Reflect)
4. **Plugin Architecture**: Tool registry pattern allows easy extension
5. **Safety-First Design**: Multi-layer safety with path validation, command filtering, sandboxing
### Concerns
1. **Module Size Imbalance**: Some modules are very large:
- `src/tools/knowledge.rs`: 1,537 lines
- `src/ui/animations.rs`: 1,503 lines
- `src/session/local_first.rs`: 2,527 lines
- `src/ui/garden.rs`: 1,206 lines
2. **Deep Module Nesting**: Some paths are excessively deep (`src/orchestration/workflow_dsl/`)
3. **Circular Dependencies Risk**: `cognitive/` and `agent/` have tight coupling
---
## 2. Detailed Recommendations
### 2.1 Code Organization (High Priority)
#### R1.1: Split Oversized Modules
**Current Issue**: Several files exceed 1,500 lines, violating single responsibility principle.
**Recommendation**:
```
src/tools/knowledge.rs → src/tools/knowledge/{mod,graph,query,export}.rs
src/ui/animations.rs → src/ui/animations/{mod,particles,transitions,effects}.rs
src/session/local_first.rs → src/session/local/{storage,sync,cache,conflict}.rs
```
**Benefit**: Improved maintainability, faster compile times, easier testing.
#### R1.2: Consolidate Configuration
**Current Issue**: Configuration is scattered across multiple structs in `config/mod.rs` (~1,000+ lines).
**Recommendation**: Split into focused submodules:
```
src/config/
├── mod.rs # Core Config struct only
├── safety.rs # SafetyConfig, path validation settings
├── agent.rs # AgentConfig, cognitive settings
├── resources.rs # ResourcesConfig (already exists)
├── evolution.rs # EvolutionTomlConfig
└── validation.rs # Config validation logic
```
### 2.2 Safety & Security (Critical Priority)
#### R2.1: Implement Defense in Depth Audit
**Current State**: Good foundation but needs hardening.
**Recommendations**:
1. **Add Fuzz Testing**: Use `cargo-fuzz` for path validation and command parsing
2. **Sandbox Escape Testing**: Regular audits of Docker/container escapes
3. **Secret Scanning**: Expand `SecurityScanner` to detect more patterns (see below)
#### R2.2: Expand Secret Detection Patterns
**Current**: Basic secret scanning in `safety/scanner.rs`.
**Add Patterns**:
```rust
// In src/safety/scanner.rs
const ADDITIONAL_PATTERNS: &[(&str, &str)] = &[
("AWS_SESSION_TOKEN", r"(?i)aws_session_token[=:\s]+[a-z0-9/+=]{16,}"),
("GitHub_PAT", r"ghp_[a-zA-Z0-9]{36}"),
("Slack_Token", r"xox[baprs]-[0-9]{10,13}-[0-9]{10,13}[a-zA-Z0-9-]*"),
("Private_Key", r"-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"),
];
```
#### R2.3: Implement Command Allowlist Mode
**Current**: Blacklist-based command filtering.
**Recommendation**: Add opt-in allowlist mode for high-security environments:
```toml
[safety]
mode = "allowlist" # or "blacklist"
allowed_commands = ["cargo", "git", "rustc", "make"]
```
### 2.3 Error Handling (Medium Priority)
#### R3.1: Standardize Error Context
**Current Issue**: Inconsistent error wrapping patterns across modules.
**Recommendation**: Implement structured error contexts:
```rust
// New type for rich error context
#[derive(Debug)]
pub struct ToolExecutionError {
pub tool_name: String,
pub args: serde_json::Value,
pub phase: ExecutionPhase,
pub source: anyhow::Error,
pub suggestions: Vec<String>, // AI-generated recovery suggestions
}
```
#### R3.2: Add Error Recovery Codes
**Current**: Exit codes defined but not consistently used.
**Recommendation**: Document and enforce exit code usage:
```rust
// In src/errors.rs - already exists, needs enforcement
pub const EXIT_TOOL_NOT_FOUND: u8 = 7;
pub const EXIT_RATE_LIMITED: u8 = 8;
pub const EXIT_CONTEXT_EXHAUSTED: u8 = 9;
```
### 2.4 Testing Strategy (High Priority)
#### R4.1: Increase Integration Test Coverage
**Current**: ~5,200 tests, but integration tests rely heavily on external LLM.
**Recommendations**:
1. **Mock LLM Server**: Create deterministic mock for CI
2. **Contract Tests**: Expand contract testing for API boundaries
3. **Property-Based Tests**: Already have `prop_safety.rs`, `prop_tool_parser.rs` - expand to tools
#### R4.2: Add Performance Regression Tests
**New Recommendation**:
```rust
// tests/performance/token_budget.rs
#[test]
fn token_budget_enforcement() {
// Ensure token budget is never exceeded by >10%
}
#[test]
fn context_compression_effectiveness() {
// Verify compression reduces tokens by at least 30%
}
```
#### R4.3: Chaos Testing
**Recommendation**: Add chaos engineering tests:
```rust
// tests/chaos/agent_recovery.rs
#[tokio::test]
async fn agent_recovers_from_panic() {
// Inject panics at various points, verify recovery
}
```
### 2.5 Documentation (Medium Priority)
#### R5.1: Architecture Decision Records (ADRs)
Create `docs/adr/` directory with decisions:
- ADR-001: Why PDVR instead of ReAct
- ADR-002: Feature flag decomposition strategy
- ADR-003: XML vs native function calling tradeoffs
- ADR-004: Why bincode over JSON for storage
#### R5.2: Inline Documentation Gaps
**Modules needing more docs**:
- `cognitive/memory_hierarchy.rs`: Complex indexing strategy needs explanation
- `orchestration/workflow_dsl/`: Parser implementation lacks examples
- `evolution/daemon.rs`: Tournament selection algorithm needs docs
### 2.6 Performance Optimizations (Medium Priority)
#### R6.1: Memory Usage Audit
**Current Issue**: Several `Vec` allocations that could be pooled.
**Recommendations**:
```rust
// In src/agent/mod.rs - use object pooling for messages
use object_pool::Pool;
static MESSAGE_POOL: Pool<Vec<Message>> = Pool::new(32, || Vec::with_capacity(64));
```
#### R6.2: Async Optimization
**Current Issue**: Some blocking operations in async context.
**Audit Points**:
- `file.read_to_string()` should use `tokio::fs`
- `serde_json::from_str` on large inputs could be streamed
- Git operations use `git2` (blocking) - consider `tokio::task::spawn_blocking`
#### R6.3: Compilation Time
**Current**: ~193k lines, compile times likely significant.
**Recommendations**:
1. Enable `lto = "thin"` in release profile
2. Use `cargo build --timings` to identify bottlenecks
3. Consider workspace split for evolution/tui features
### 2.7 API Design (Low Priority)
#### R7.1: Tool Schema Validation
**Current**: JSON schemas defined but not validated at compile time.
**Recommendation**: Use `schemars` for derive-based schema generation:
```rust
#[derive(JsonSchema)]
pub struct FileReadArgs {
pub path: PathBuf,
#[serde(default)]
pub offset: usize,
}
```
#### R7.2: Streaming Improvements
**Current**: Good streaming foundation in `api/mod.rs`.
**Enhancement**: Add backpressure handling:
```rust
pub struct StreamingResponse {
response: reqwest::Response,
chunk_timeout: Duration,
backpressure_threshold: usize, // NEW
}
```
### 2.8 Observability (Medium Priority)
#### R8.1: Structured Logging
**Current**: Uses `tracing` but with inconsistent field naming.
**Standardize**:
```rust
// Always use these field names
span!(Level::INFO, "tool_execution",
tool.name = %name,
tool.duration_ms = %duration.as_millis(),
tool.success = success,
);
```
#### R8.2: Metrics Export
**Current**: Prometheus support exists but limited metrics.
**Add**:
- Tool execution latency histograms
- Token usage by model
- Safety check rejection rates
- Checkpoint save/load durations
---
## 3. Code Quality Issues Found
### Issue 1: Dead Code Warnings Suppressed Globally
**Location**: `src/lib.rs:3`
```rust
#![allow(dead_code, unused_imports, unused_variables)]
```
**Recommendation**: Remove this and fix individual warnings. Suppressing globally hides real issues.
### Issue 2: Panic Handling in Agent Loop
**Location**: `src/agent/loop_control.rs`
**Issue**: Potential for panics to kill the entire process.
**Fix**: Use `catch_unwind` in strategic locations with recovery logic.
### Issue 3: Potential Blocking in Async Context
**Location**: Multiple tool implementations
**Pattern**: `std::fs::read_to_string()` in async fn
**Fix**: Use `tokio::fs` or wrap in `spawn_blocking`.
### Issue 4: Hardcoded Timeouts
**Location**: `src/api/mod.rs:459`
```rust
let request_timeout = config.agent.step_timeout_secs.max(60);
```
**Issue**: Minimum 60s may still be too short for slow local models.
**Fix**: Make minimum configurable via env var.
### Issue 5: Clone-heavy Code
**Location**: `src/agent/mod.rs:116-120`
**Pattern**: Multiple clones in hot path
**Fix**: Use `Arc<str>` or `Arc<String>` for shared immutable data.
---
## 4. Security Recommendations
### S1: Implement Content Security Policy
For TUI mode, ensure no arbitrary code execution via terminal escape sequences.
### S2: Audit All `unsafe` Blocks
Search for `unsafe` usage and document each with SAFETY comments:
```bash
grep -r "unsafe" src/ --include="*.rs" | wc -l # Should be 0 or minimal
```
### S3: Supply Chain Security
Add to CI:
```yaml
- name: Audit dependencies
run: cargo audit
- name: Check for known vulnerabilities
run: cargo deny check advisories
```
### S4: Signed Releases
Current releases are unsigned. Add:
```bash
# Create signed release artifacts
codesign --sign "Developer ID" selfware
```
---
## 5. Feature-Specific Recommendations
### 5.1 Evolution Engine (self-improvement)
**Current**: Good foundation with safety invariants.
**Enhancements**:
1. Add mutation diversity tracking to prevent local optima
2. Implement "cooling schedule" for mutation rate
3. Add evolutionary history visualization
4. Implement A/B testing framework for prompt variants
### 5.2 TUI Dashboard (tui feature)
**Current**: Feature-rich with ratatui.
**Improvements**:
1. Add accessibility mode (screen reader support)
2. Implement proper colorblind-friendly palettes
3. Add mouse support for non-keyboard users
4. Performance: Use double-buffering for animations
### 5.3 Multi-Agent Swarm
**Current**: Basic coordination exists.
**Recommendations**:
1. Add conflict resolution protocols
2. Implement agent reputation scoring
3. Add swarm topology visualization
4. Create "swarm mode" presets (e.g., "code review", "refactoring")
---
## 6. CI/CD Improvements
### Current State (from `.github/workflows/`)
- Basic CI with cargo test
- Security audit job
- Release workflow with artifact upload
### Recommended Additions
```yaml
# Additional jobs to add to ci.yml
performance-regression:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run benchmarks
run: cargo bench -- --baseline main
fuzz-testing:
runs-on: ubuntu-latest
steps:
- name: Install cargo-fuzz
run: cargo install cargo-fuzz
- name: Run fuzz targets for 10 minutes
run: cargo fuzz run path_validation --max-total-time=600
docs-check:
runs-on: ubuntu-latest
steps:
- name: Check docs build
run: cargo doc --no-deps --all-features
- name: Check for broken links
run: cargo doc --no-deps 2>&1 | grep -i "warning.*link" || true
```
---
## 7. Documentation Checklist
### User Documentation
- [ ] Complete CLI reference with examples
- [ ] Troubleshooting guide
- [ ] Configuration cookbook (common patterns)
- [ ] Model-specific guides (Qwen, Kimi, LFM2)
### Developer Documentation
- [x] Architecture overview (`docs/architecture.md`)
- [ ] Contributing guidelines (partial - `CONTRIBUTING.md` exists)
- [ ] Module-level documentation (some gaps)
- [ ] API documentation (rustdocs need review)
### Operations Documentation
- [ ] Deployment guide (Docker, k8s)
- [ ] Monitoring and alerting runbook
- [ ] Backup/recovery procedures for checkpoints
---
## 8. Priority Matrix
| P0 (Critical) | Fix dead_code suppression | Low | High |
| P0 | Add fuzz testing for safety | Medium | High |
| P1 (High) | Split oversized modules | Medium | Medium |
| P1 | Implement mock LLM for tests | Medium | High |
| P1 | Add chaos tests | Medium | High |
| P2 (Medium) | Performance audit | High | Medium |
| P2 | Documentation improvements | Medium | Medium |
| P3 (Low) | API schema validation | Low | Low |
| P3 | TUI accessibility | Medium | Medium |
---
## 9. Conclusion
Selfware is an impressive piece of engineering with a solid foundation. The codebase shows:
**Strengths**:
- Strong safety-first design philosophy
- Good modular architecture
- Comprehensive feature set
- Excellent test coverage (~5,200 tests)
- Active maintenance (recent changelog entries)
**Areas for Focus**:
1. **Code organization**: Split oversized modules
2. **Security hardening**: Fuzz testing, command allowlists
3. **Testing**: Mock LLM server, chaos tests
4. **Documentation**: ADRs, inline docs for complex algorithms
The project is production-ready for its intended use case (local LLM coding assistant), with the evolution engine being an experimental but promising feature that needs continued hardening.
**Recommended Next Steps**:
1. Address P0 items immediately
2. Create roadmap for module refactoring
3. Establish security audit cadence (quarterly)
4. Consider forming a security review board for evolution feature
---
*Review completed by: Claude Code CLI*
*Total files reviewed: 200+ Rust source files*
*Documentation reviewed: README, architecture.md, CHANGELOG, CONTRIBUTING, inline docs*