Selfware
๐ฆ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Your Personal AI Workshop
Software you own. Software that knows you. Software that lasts.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
An artisanal agentic harness for local LLMs (Kimi K2.5, Qwen, etc.) that enables autonomous coding with safety guardrails, task persistence, and a warm terminal aesthetic.
Philosophy
This is selfware โ software crafted for your local workshop, not rented from the cloud. Like a well-worn tool that fits your hand perfectly:
- Runs locally on your hardware, your data stays yours
- Remembers your patterns across sessions
- Grows with your garden โ your codebase is a living thing
Installation
Option 1: Download Prebuilt Binary (Recommended)
Download the latest release for your platform:
| Platform | Architecture | Download |
|---|---|---|
| Linux | x86_64 (Intel/AMD) | selfware-linux-x86_64.tar.gz |
| Linux | aarch64 (ARM64) | selfware-linux-aarch64.tar.gz |
| macOS | Apple Silicon (M1/M2/M3) | selfware-macos-aarch64.tar.gz |
| macOS | Intel | selfware-macos-x86_64.tar.gz |
| Windows | x86_64 | selfware-windows-x86_64.zip |
# Linux/macOS quick install
# Translates platform names: Darwin->macos, arm64->aarch64
OS=
ARCH=
|
# Verify installation
Option 2: Install via Cargo
Option 3: Build from Source
Option 4: Docker
# Build the image
# Run interactively
# Run a specific task
Quick Start
1. Set Up Your LLM Backend
Selfware works with any OpenAI-compatible API. Popular options:
| Backend | Best For | Setup |
|---|---|---|
| vLLM | Fast inference, production | vllm serve Qwen/Qwen3-Coder-Next-FP8 |
| Ollama | Easy setup, consumer hardware | ollama run qwen2.5-coder |
| llama.cpp | Minimal dependencies | ./server -m model.gguf |
| LM Studio | GUI, Windows/Mac | Download and run |
2. Create Configuration
Create selfware.toml in your project directory:
# Your local workshop
= "http://localhost:8000/v1" # Your LLM backend
= "Qwen/Qwen3-Coder-Next-FP8" # Model name
= 65536
= 0.7
[]
= ["./**", "/home/*/projects/**"]
= ["**/.env", "**/secrets/**"]
= ["main"]
[]
= 100
= 600 # 10 min for fast models
= 500000
[]
= true
= 10
= 300
= true
= 3
[]
= 5
= 1000
= 60000
3. Start Coding
# Interactive chat mode
# Run a specific task
# Multi-agent collaboration (16 concurrent agents)
# Analyze your codebase
The Digital Garden
Your codebase is visualized as a digital garden:
โญโ ๐ฑ Your Digital Garden โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ src/ โโโโโโโโโโโโโโโโโโโโ 82% healthy โ
โ ๐ณ mod.rs [THRIVING] last tended 2h ago โ
โ ๐ฟ agent.rs [GROWING] needs water โ
โ ๐ฑ tools.rs [SEEDLING] freshly planted โ
โ โ
โ Season: WINTER โ๏ธ Growth rate: steady โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Files are plants, directories are beds, and your tools are craftsman implements.
Features
- 54 Built-in Tools: File tending, git cultivation, cargo crafting, code foraging
- Multi-Agent Swarm: Up to 16 concurrent agents with role specialization
- Multi-layer Safety: Path guardians, command sentinels, protected groves
- Task Persistence: Checkpoint seeds survive frost (crashes)
- Self-Healing Recovery: Error classification, exponential backoff with jitter, automatic escalation
- Cognitive Architecture: PDVR cycle with working memory
- Selfware UI: Warm amber tones, animated spinners, ASCII art banners
- Multi-Model Support: Works with Qwen3-Coder, Kimi K2.5, DeepSeek, and other local LLMs
- Robust Tool Parser: Handles multiple XML formats from different models
- SAB Benchmark Suite: 12-scenario agentic benchmark with BLOOM/GROW/WILT/FROST scoring
- 4-Hour Patience: Tolerant of slow local models (0.1 tok/s supported)
Environment Variables
Configure Selfware via environment variables (override config file):
| Variable | Description | Default |
|---|---|---|
SELFWARE_ENDPOINT |
LLM API endpoint | http://localhost:8000/v1 |
SELFWARE_MODEL |
Model name | Qwen/Qwen3-Coder-Next-FP8 |
SELFWARE_API_KEY |
API key (if required) | None |
SELFWARE_MAX_TOKENS |
Max tokens per response | 65536 |
SELFWARE_TEMPERATURE |
Sampling temperature | 0.7 |
SELFWARE_TIMEOUT |
Request timeout (seconds) | 600 |
SELFWARE_DEBUG |
Enable debug logging | Disabled |
The Selfware Palette
The UI uses warm, organic colors inspired by aged paper, wood grain, and amber resin:
| Color | Hex | Use |
|---|---|---|
| ๐ Amber | #D4A373 |
Primary actions, warmth |
| ๐ข Garden Green | #606C38 |
Growth, success, health |
| ๐ค Soil Brown | #BC6C25 |
Warnings, needs attention |
| โฌ Ink | #283618 |
Deep text, emphasis |
| ๐ก Parchment | #FEFAE0 |
Light backgrounds |
Status Messages
Instead of cold red/green/yellow:
- BLOOM ๐ธ โ Success, fresh growth
- WILT ๐ฅ โ Warning, needs attention
- FROST โ๏ธ โ Error, needs warmth
Tools Reference
Garden Tending (Files)
| Tool | Metaphor | Description |
|---|---|---|
file_read |
๐ Examine | Read file contents |
file_write |
โ๏ธ Inscribe | Create or overwrite |
file_edit |
๐ง Mend | Search and replace |
directory_tree |
๐บ๏ธ Survey | List structure |
Cultivation (Git)
| Tool | Metaphor | Description |
|---|---|---|
git_status |
๐ Assess | Working tree status |
git_diff |
๐ฌ Compare | Show changes |
git_commit |
๐ฆ Preserve | Create a commit |
git_checkpoint |
๐ท๏ธ Mark | Create checkpoint |
Workshop (Cargo)
| Tool | Metaphor | Description |
|---|---|---|
cargo_test |
๐งช Verify | Run tests |
cargo_check |
โ Validate | Type check |
cargo_clippy |
๐งน Polish | Run lints |
cargo_fmt |
๐ Align | Format code |
Foraging (Search)
| Tool | Metaphor | Description |
|---|---|---|
grep_search |
๐ Hunt | Regex search |
glob_find |
๐งญ Locate | Find by pattern |
symbol_search |
๐ Pinpoint | Find definitions |
Slow Model Support
Designed for local LLMs running on consumer hardware:
Model Speed Timeout Setting
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
> 10 tok/s 300s (5 min)
1-10 tok/s 3600s (1 hour)
< 1 tok/s 14400s (4 hours)
0.08 tok/s Works! Be patient.
The agent will wait. Good things take time.
Task Persistence
Tasks are automatically checkpointed โ your work survives crashes:
# Start a long task
# Power outage? System crash? No problem.
# Resume exactly where you left off
Cognitive Architecture
The agent thinks in cycles:
โญโโโโโโโโโโฎ โญโโโโโโโโโโฎ
โ PLAN โโโโโโโโโโถโ DO โ
โฐโโโโโโโโโโฏ โฐโโโโโโโโโโฏ
โฒ โ
โ โผ
โญโโโโโโโโโโฎ โญโโโโโโโโโโฎ
โ REFLECT โโโโโโโโโโโ VERIFY โ
โฐโโโโโโโโโโฏ โฐโโโโโโโโโโฏ
Working Memory tracks:
- Current plan and progress
- Active hypothesis
- Open questions
- Discovered facts
Episodic Memory learns:
- What approaches worked
- Your preferences
- Project patterns
Development
Run Tests
# All tests (~3,980 tests, ~2 min)
# With resilience features (self-healing, recovery)
# Integration tests with real LLM
# Specific test modules
Test Coverage
| Metric | Value |
|---|---|
| Total Tests | ~3,980 |
| Test Targets | lib (3,615) + unit (238) + e2e (21) + integration (5) + property (100) + doc (1) |
SAB โ Selfware Agentic Benchmark
A 12-scenario agentic coding benchmark that measures how well a local LLM can autonomously fix bugs, write tests, refactor code, and optimize performance โ all through selfware's agent loop.
# Run all 12 scenarios (requires OpenAI-compatible endpoint)
ENDPOINT="http://localhost:8000/v1" MODEL="your-model" \
Scenarios
| Difficulty | Scenario | What It Tests |
|---|---|---|
| Easy | easy_calculator |
Simple arithmetic bug fixes (3-4 bugs) |
| Easy | easy_string_ops |
String manipulation bugs |
| Medium | medium_json_merge |
JSON deep merge logic |
| Medium | medium_bitset |
Bitwise operations and edge cases |
| Medium | testgen_ringbuf |
Write 15+ tests for an untested ring buffer |
| Medium | refactor_monolith |
Split a 210-line monolith into 4 modules |
| Hard | hard_scheduler |
Multi-file scheduler with duration parsing |
| Hard | hard_event_bus |
Event system with async subscribers |
| Hard | security_audit |
Replace 5 vulnerable functions with secure alternatives |
| Hard | perf_optimization |
Fix 5 O(nยฒ)/exponential algorithms |
| Hard | codegen_task_runner |
Implement 12 todo!() method stubs |
| Expert | expert_async_race |
Fix 4 concurrency bugs in a Tokio task pool |
Scoring
Each scenario scores 0โ100:
- 70 pts โ all tests pass after agent edits
- 20 pts โ agent also fixes intentionally broken tests
- 10 pts โ clean exit (no crash, no timeout)
Round ratings: BLOOM (โฅ85) ยท GROW (โฅ60) ยท WILT (โฅ30) ยท FROST (<30)
Benchmark Results โ Qwen3-Coder-Next-FP8 (1M context)
Tested on NVIDIA H100 via vLLM, 6 parallel scenarios, 27 rounds (323 scenario runs):
| Metric | Value |
|---|---|
| Steady-state average (R2โR27) | 90/100 |
| Peak phase (R9โR27) | 91/100 |
| Best round | 96/100 (achieved 8 times) |
| Perfect rounds (12/12 pass) | 16 out of 27 |
| BLOOM rounds (โฅ85) | 22 out of 27 |
| S-tier scenarios (100% reliable) | 5 of 12 |
| Round | Score | Rating | Passed |
|---|---|---|---|
| R1 | 60/100 | GROW | 7/11 |
| R2 | 96/100 | BLOOM | 12/12 |
| R3 | 70/100 | GROW | 9/12 |
| R4 | 87/100 | BLOOM | 11/12 |
| R5 | 79/100 | GROW | 10/12 |
| R6 | 81/100 | GROW | 10/12 |
| R7 | 87/100 | BLOOM | 11/12 |
| R8 | 89/100 | BLOOM | 11/12 |
| R9 | 95/100 | BLOOM | 12/12 |
| R10 | 95/100 | BLOOM | 12/12 |
| R11 | 96/100 | BLOOM | 12/12 |
| R12 | 87/100 | BLOOM | 11/12 |
| R13 | 96/100 | BLOOM | 12/12 |
| R14 | 88/100 | BLOOM | 11/12 |
| R15 | 95/100 | BLOOM | 12/12 |
| R16 | 95/100 | BLOOM | 12/12 |
| R17 | 95/100 | BLOOM | 12/12 |
| R18 | 96/100 | BLOOM | 12/12 |
| R19 | 96/100 | BLOOM | 12/12 |
| R20 | 96/100 | BLOOM | 12/12 |
| R21 | 89/100 | BLOOM | 11/12 |
| R22 | 87/100 | BLOOM | 11/12 |
| R23 | 96/100 | BLOOM | 12/12 |
| R24 | 87/100 | BLOOM | 11/12 |
| R25 | 90/100 | BLOOM | 11/12 |
| R26 | 95/100 | BLOOM | 12/12 |
| R27 | 73/100 | GROW | 9/12 |
Scenario Reliability
| Tier | Scenarios | Pass Rate |
|---|---|---|
| S (100%) | easy_calculator, easy_string_ops, medium_json_merge, perf_optimization, codegen_task_runner |
100% |
| A (>80%) | hard_scheduler, hard_event_bus, expert_async_race, medium_bitset |
89โ96% |
| B (50โ80%) | security_audit, testgen_ringbuf, refactor_monolith |
70โ74% |
Running Your Own Benchmark
# Environment variables
# Your LLM endpoint
# Model name
# Concurrent scenarios (6 recommended)
# Single round
# Results appear in system_tests/projecte2e/reports/<timestamp>/
Recommended Models by Hardware
SAB is designed to benchmark any local LLM. Here are tested and recommended configurations:
GPU Servers (vLLM / llama.cpp)
| Model | Quant | Weights | Min VRAM | Recommended GPU | Context | Notes |
|---|---|---|---|---|---|---|
| Qwen3-Coder-Next-FP8 | FP8 | ~70 GB | 80 GB | H100 / A100 80GB | 1M | Reference model, 90/100 SAB (27 rounds) |
| Qwen3.5-Coder 35B A3B | Q4_K_M | ~22 GB | 24โ32 GB | RTX 5090 (32 GB) | 32โ128K | MoE, fast inference, best bang/buck |
| LFM2 24B A2B | 4-bit | ~13.4 GB | 16โ24 GB | RTX 4090 / 3090 (24 GB) | 32โ64K | Efficient MoE for rapid iteration |
| LFM2.5 1.2B Instruct | Q8 | ~1.25 GB | 2 GB | Any GPU | 8โ16K | Ultra-light, quick prototyping |
Apple Silicon (MLX / llama.cpp / Ollama)
Mac models use unified memory โ your available RAM determines what you can run:
| RAM | Recommended Model | Quant | Context | Use Case |
|---|---|---|---|---|
| 96โ128 GB | Qwen3-Coder 32B | Q8 | 64โ128K | Full SAB, production coding |
| 64 GB | Qwen3.5 35B A3B | Q4_K_M (~22 GB) | 32โ64K | Most scenarios, good context |
| 32 GB | LFM2 24B A2B | 4-bit (~13.4 GB) | 16โ32K | Everyday coding tasks |
| 24 GB | LFM2 24B A2B | 4-bit (~13.4 GB) | 8โ16K | Moderate context, tight fit |
| 16 GB | LFM2.5 1.2B Instruct | Q8 (~1.25 GB) | 8โ16K | Lightweight, fast feedback |
Context window matters. SAB scenarios work best with โฅ32K context. Smaller windows may cause FROST on complex scenarios (hard/expert). Adjust
max_tokensandtoken_budgetinselfware.tomlto match your model's context.
Quick Setup Examples
# RTX 5090 with Qwen3.5 35B (llama.cpp)
# RTX 4090 with LFM2 24B (vLLM)
# Mac M2 Max 64GB with MLX
# Ultra-light (any machine)
Project Structure
src/
โโโ agent/ # Core agent logic, checkpointing, execution
โโโ tools/ # 54 tool implementations
โโโ api/ # LLM client with timeout and retry
โโโ ui/ # Selfware aesthetic (style, animations, banners)
โโโ analysis/ # Code analysis, BM25 search, vector store
โโโ cognitive/ # PDVR cycle, working/episodic memory
โโโ config/ # Configuration management
โโโ devops/ # Container support, process manager
โโโ observability/ # Telemetry and tracing
โโโ orchestration/ # Multi-agent swarm, planning, workflows
โโโ safety/ # Path validation, sandboxing, threat modeling
โโโ self_healing/ # Error classification, recovery executor, backoff
โโโ session/ # Checkpoint persistence
โโโ testing/ # Verification, contract testing, workflow DSL
โโโ memory.rs # Memory management
โโโ tool_parser.rs # Robust multi-format XML parser
โโโ token_count.rs # Token estimation
Multi-Agent System
The agent supports up to 16 concurrent specialists:
# Launch multi-agent chat
# Roles: Architect, Coder, Tester, Reviewer, DevOps, Security
Troubleshooting
"Connection refused"
# Is your LLM backend running?
"Request timeout"
# Increase timeout for slow models
# In selfware.toml:
"Safety check failed"
# Check allowed_paths in config
# The agent only accesses paths you permit
License
MIT License
Acknowledgments
- Built for Qwen3-Coder, Kimi K2.5, LFM2, and other local LLMs
- Inspired by the AiSocratic movement
- UI philosophy: software should feel like a warm workshop, not a cold datacenter
"Tend your garden. The code will grow."
โ selfware proverb