selfware 0.1.0

Your personal AI workshop โ€” software you own, software that lasts
Documentation

Selfware

CI Crates.io Docs.rs License: MIT codecov

    ๐ŸฆŠ โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
       Your Personal AI Workshop
       Software you own. Software that knows you. Software that lasts.
    โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”

An artisanal agentic harness for local LLMs (Kimi K2.5, Qwen, etc.) that enables autonomous coding with safety guardrails, task persistence, and a warm terminal aesthetic.

Philosophy

This is selfware โ€” software crafted for your local workshop, not rented from the cloud. Like a well-worn tool that fits your hand perfectly:

  • Runs locally on your hardware, your data stays yours
  • Remembers your patterns across sessions
  • Grows with your garden โ€” your codebase is a living thing

Installation

Option 1: Download Prebuilt Binary (Recommended)

Download the latest release for your platform:

Platform Architecture Download
Linux x86_64 (Intel/AMD) selfware-linux-x86_64.tar.gz
Linux aarch64 (ARM64) selfware-linux-aarch64.tar.gz
macOS Apple Silicon (M1/M2/M3) selfware-macos-aarch64.tar.gz
macOS Intel selfware-macos-x86_64.tar.gz
Windows x86_64 selfware-windows-x86_64.zip
# Linux/macOS quick install
# Translates platform names: Darwin->macos, arm64->aarch64
OS=$(uname -s | tr '[:upper:]' '[:lower:]' | sed 's/darwin/macos/')
ARCH=$(uname -m | sed 's/arm64/aarch64/')
curl -fsSL "https://github.com/architehc/selfware/releases/latest/download/selfware-${OS}-${ARCH}.tar.gz" | tar -xz
sudo mv selfware /usr/local/bin/

# Verify installation
selfware --help

Option 2: Install via Cargo

cargo install selfware

Option 3: Build from Source

git clone https://github.com/architehc/selfware.git
cd selfware
cargo build --release
./target/release/selfware --help

Option 4: Docker

# Build the image
docker build -t selfware .

# Run interactively
docker run --rm -it -v $(pwd):/workspace selfware chat

# Run a specific task
docker run --rm -it -v $(pwd):/workspace selfware run "Add unit tests"

Quick Start

1. Set Up Your LLM Backend

Selfware works with any OpenAI-compatible API. Popular options:

Backend Best For Setup
vLLM Fast inference, production vllm serve Qwen/Qwen3-Coder-Next-FP8
Ollama Easy setup, consumer hardware ollama run qwen2.5-coder
llama.cpp Minimal dependencies ./server -m model.gguf
LM Studio GUI, Windows/Mac Download and run

2. Create Configuration

Create selfware.toml in your project directory:

# Your local workshop
endpoint = "http://localhost:8000/v1"  # Your LLM backend
model = "Qwen/Qwen3-Coder-Next-FP8"    # Model name
max_tokens = 65536
temperature = 0.7

[safety]
allowed_paths = ["./**", "/home/*/projects/**"]
denied_paths = ["**/.env", "**/secrets/**"]
protected_branches = ["main"]

[agent]
max_iterations = 100
step_timeout_secs = 600     # 10 min for fast models
token_budget = 500000

[continuous_work]
enabled = true
checkpoint_interval_tools = 10
checkpoint_interval_secs = 300
auto_recovery = true
max_recovery_attempts = 3

[retry]
max_retries = 5
base_delay_ms = 1000
max_delay_ms = 60000

3. Start Coding

# Interactive chat mode
selfware chat

# Run a specific task
selfware run "Add unit tests for the authentication module"

# Multi-agent collaboration (16 concurrent agents)
selfware multi-chat

# Analyze your codebase
selfware analyze ./src

The Digital Garden

Your codebase is visualized as a digital garden:

โ•ญโ”€ ๐ŸŒฑ Your Digital Garden โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚                                                                   โ”‚
โ”‚  src/          โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  82% healthy                 โ”‚
โ”‚    ๐ŸŒณ mod.rs        [THRIVING]  last tended 2h ago               โ”‚
โ”‚    ๐ŸŒฟ agent.rs      [GROWING]   needs water                      โ”‚
โ”‚    ๐ŸŒฑ tools.rs      [SEEDLING]  freshly planted                  โ”‚
โ”‚                                                                   โ”‚
โ”‚  Season: WINTER  โ„๏ธ   Growth rate: steady                        โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Files are plants, directories are beds, and your tools are craftsman implements.

Features

  • 54 Built-in Tools: File tending, git cultivation, cargo crafting, code foraging
  • Multi-Agent Swarm: Up to 16 concurrent agents with role specialization
  • Multi-layer Safety: Path guardians, command sentinels, protected groves
  • Task Persistence: Checkpoint seeds survive frost (crashes)
  • Self-Healing Recovery: Error classification, exponential backoff with jitter, automatic escalation
  • Cognitive Architecture: PDVR cycle with working memory
  • Selfware UI: Warm amber tones, animated spinners, ASCII art banners
  • Multi-Model Support: Works with Qwen3-Coder, Kimi K2.5, DeepSeek, and other local LLMs
  • Robust Tool Parser: Handles multiple XML formats from different models
  • SAB Benchmark Suite: 12-scenario agentic benchmark with BLOOM/GROW/WILT/FROST scoring
  • 4-Hour Patience: Tolerant of slow local models (0.1 tok/s supported)

Environment Variables

Configure Selfware via environment variables (override config file):

Variable Description Default
SELFWARE_ENDPOINT LLM API endpoint http://localhost:8000/v1
SELFWARE_MODEL Model name Qwen/Qwen3-Coder-Next-FP8
SELFWARE_API_KEY API key (if required) None
SELFWARE_MAX_TOKENS Max tokens per response 65536
SELFWARE_TEMPERATURE Sampling temperature 0.7
SELFWARE_TIMEOUT Request timeout (seconds) 600
SELFWARE_DEBUG Enable debug logging Disabled

The Selfware Palette

The UI uses warm, organic colors inspired by aged paper, wood grain, and amber resin:

Color Hex Use
๐ŸŸ  Amber #D4A373 Primary actions, warmth
๐ŸŸข Garden Green #606C38 Growth, success, health
๐ŸŸค Soil Brown #BC6C25 Warnings, needs attention
โฌ› Ink #283618 Deep text, emphasis
๐ŸŸก Parchment #FEFAE0 Light backgrounds

Status Messages

Instead of cold red/green/yellow:

  • BLOOM ๐ŸŒธ โ€” Success, fresh growth
  • WILT ๐Ÿฅ€ โ€” Warning, needs attention
  • FROST โ„๏ธ โ€” Error, needs warmth

Tools Reference

Garden Tending (Files)

Tool Metaphor Description
file_read ๐Ÿ” Examine Read file contents
file_write โœ๏ธ Inscribe Create or overwrite
file_edit ๐Ÿ”ง Mend Search and replace
directory_tree ๐Ÿ—บ๏ธ Survey List structure

Cultivation (Git)

Tool Metaphor Description
git_status ๐Ÿ“‹ Assess Working tree status
git_diff ๐Ÿ”ฌ Compare Show changes
git_commit ๐Ÿ“ฆ Preserve Create a commit
git_checkpoint ๐Ÿท๏ธ Mark Create checkpoint

Workshop (Cargo)

Tool Metaphor Description
cargo_test ๐Ÿงช Verify Run tests
cargo_check โœ“ Validate Type check
cargo_clippy ๐Ÿงน Polish Run lints
cargo_fmt ๐Ÿ“ Align Format code

Foraging (Search)

Tool Metaphor Description
grep_search ๐Ÿ”Ž Hunt Regex search
glob_find ๐Ÿงญ Locate Find by pattern
symbol_search ๐Ÿ“ Pinpoint Find definitions

Slow Model Support

Designed for local LLMs running on consumer hardware:

Model Speed          Timeout Setting
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
> 10 tok/s           300s (5 min)
1-10 tok/s           3600s (1 hour)
< 1 tok/s            14400s (4 hours)
0.08 tok/s           Works! Be patient.

The agent will wait. Good things take time.

Task Persistence

Tasks are automatically checkpointed โ€” your work survives crashes:

# Start a long task
selfware run "Refactor authentication system"

# Power outage? System crash? No problem.
selfware journal

# Resume exactly where you left off
selfware resume <task-id>

Cognitive Architecture

The agent thinks in cycles:

    โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ         โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
    โ”‚  PLAN   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚   DO    โ”‚
    โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ         โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
         โ–ฒ                    โ”‚
         โ”‚                    โ–ผ
    โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ         โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
    โ”‚ REFLECT โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ VERIFY  โ”‚
    โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ         โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Working Memory tracks:

  • Current plan and progress
  • Active hypothesis
  • Open questions
  • Discovered facts

Episodic Memory learns:

  • What approaches worked
  • Your preferences
  • Project patterns

Development

Run Tests

# All tests (~3,980 tests, ~2 min)
cargo test --all-features

# With resilience features (self-healing, recovery)
cargo test --features resilience

# Integration tests with real LLM
cargo test --features integration

# Specific test modules
cargo test --test unit            # 238 unit tests
cargo test --test e2e_tools_test  # 21 E2E tool tests

Test Coverage

cargo tarpaulin --all-features --out Html
Metric Value
Total Tests ~3,980
Test Targets lib (3,615) + unit (238) + e2e (21) + integration (5) + property (100) + doc (1)

SAB โ€” Selfware Agentic Benchmark

A 12-scenario agentic coding benchmark that measures how well a local LLM can autonomously fix bugs, write tests, refactor code, and optimize performance โ€” all through selfware's agent loop.

# Run all 12 scenarios (requires OpenAI-compatible endpoint)
ENDPOINT="http://localhost:8000/v1" MODEL="your-model" \
  bash system_tests/projecte2e/run_full_sab.sh

Scenarios

Difficulty Scenario What It Tests
Easy easy_calculator Simple arithmetic bug fixes (3-4 bugs)
Easy easy_string_ops String manipulation bugs
Medium medium_json_merge JSON deep merge logic
Medium medium_bitset Bitwise operations and edge cases
Medium testgen_ringbuf Write 15+ tests for an untested ring buffer
Medium refactor_monolith Split a 210-line monolith into 4 modules
Hard hard_scheduler Multi-file scheduler with duration parsing
Hard hard_event_bus Event system with async subscribers
Hard security_audit Replace 5 vulnerable functions with secure alternatives
Hard perf_optimization Fix 5 O(nยฒ)/exponential algorithms
Hard codegen_task_runner Implement 12 todo!() method stubs
Expert expert_async_race Fix 4 concurrency bugs in a Tokio task pool

Scoring

Each scenario scores 0โ€“100:

  • 70 pts โ€” all tests pass after agent edits
  • 20 pts โ€” agent also fixes intentionally broken tests
  • 10 pts โ€” clean exit (no crash, no timeout)

Round ratings: BLOOM (โ‰ฅ85) ยท GROW (โ‰ฅ60) ยท WILT (โ‰ฅ30) ยท FROST (<30)

Benchmark Results โ€” Qwen3-Coder-Next-FP8 (1M context)

Tested on NVIDIA H100 via vLLM, 6 parallel scenarios, 27 rounds (323 scenario runs):

Metric Value
Steady-state average (R2โ€“R27) 90/100
Peak phase (R9โ€“R27) 91/100
Best round 96/100 (achieved 8 times)
Perfect rounds (12/12 pass) 16 out of 27
BLOOM rounds (โ‰ฅ85) 22 out of 27
S-tier scenarios (100% reliable) 5 of 12
Round Score Rating Passed
R1 60/100 GROW 7/11
R2 96/100 BLOOM 12/12
R3 70/100 GROW 9/12
R4 87/100 BLOOM 11/12
R5 79/100 GROW 10/12
R6 81/100 GROW 10/12
R7 87/100 BLOOM 11/12
R8 89/100 BLOOM 11/12
R9 95/100 BLOOM 12/12
R10 95/100 BLOOM 12/12
R11 96/100 BLOOM 12/12
R12 87/100 BLOOM 11/12
R13 96/100 BLOOM 12/12
R14 88/100 BLOOM 11/12
R15 95/100 BLOOM 12/12
R16 95/100 BLOOM 12/12
R17 95/100 BLOOM 12/12
R18 96/100 BLOOM 12/12
R19 96/100 BLOOM 12/12
R20 96/100 BLOOM 12/12
R21 89/100 BLOOM 11/12
R22 87/100 BLOOM 11/12
R23 96/100 BLOOM 12/12
R24 87/100 BLOOM 11/12
R25 90/100 BLOOM 11/12
R26 95/100 BLOOM 12/12
R27 73/100 GROW 9/12

Scenario Reliability

Tier Scenarios Pass Rate
S (100%) easy_calculator, easy_string_ops, medium_json_merge, perf_optimization, codegen_task_runner 100%
A (>80%) hard_scheduler, hard_event_bus, expert_async_race, medium_bitset 89โ€“96%
B (50โ€“80%) security_audit, testgen_ringbuf, refactor_monolith 70โ€“74%

Running Your Own Benchmark

# Environment variables
export ENDPOINT="http://localhost:8000/v1"   # Your LLM endpoint
export MODEL="Qwen/Qwen3-Coder-Next-FP8"    # Model name
export MAX_PARALLEL=6                         # Concurrent scenarios (6 recommended)

# Single round
bash system_tests/projecte2e/run_full_sab.sh

# Results appear in system_tests/projecte2e/reports/<timestamp>/

Recommended Models by Hardware

SAB is designed to benchmark any local LLM. Here are tested and recommended configurations:

GPU Servers (vLLM / llama.cpp)

Model Quant Weights Min VRAM Recommended GPU Context Notes
Qwen3-Coder-Next-FP8 FP8 ~70 GB 80 GB H100 / A100 80GB 1M Reference model, 90/100 SAB (27 rounds)
Qwen3.5-Coder 35B A3B Q4_K_M ~22 GB 24โ€“32 GB RTX 5090 (32 GB) 32โ€“128K MoE, fast inference, best bang/buck
LFM2 24B A2B 4-bit ~13.4 GB 16โ€“24 GB RTX 4090 / 3090 (24 GB) 32โ€“64K Efficient MoE for rapid iteration
LFM2.5 1.2B Instruct Q8 ~1.25 GB 2 GB Any GPU 8โ€“16K Ultra-light, quick prototyping

Apple Silicon (MLX / llama.cpp / Ollama)

Mac models use unified memory โ€” your available RAM determines what you can run:

RAM Recommended Model Quant Context Use Case
96โ€“128 GB Qwen3-Coder 32B Q8 64โ€“128K Full SAB, production coding
64 GB Qwen3.5 35B A3B Q4_K_M (~22 GB) 32โ€“64K Most scenarios, good context
32 GB LFM2 24B A2B 4-bit (~13.4 GB) 16โ€“32K Everyday coding tasks
24 GB LFM2 24B A2B 4-bit (~13.4 GB) 8โ€“16K Moderate context, tight fit
16 GB LFM2.5 1.2B Instruct Q8 (~1.25 GB) 8โ€“16K Lightweight, fast feedback

Context window matters. SAB scenarios work best with โ‰ฅ32K context. Smaller windows may cause FROST on complex scenarios (hard/expert). Adjust max_tokens and token_budget in selfware.toml to match your model's context.

Quick Setup Examples

# RTX 5090 with Qwen3.5 35B (llama.cpp)
./llama-server -m qwen3.5-coder-35b-a3b-q4_k_m.gguf \
  -c 65536 -ngl 99 --port 8000

# RTX 4090 with LFM2 24B (vLLM)
vllm serve lfm2-24b-a2b --quantization awq --max-model-len 32768

# Mac M2 Max 64GB with MLX
mlx_lm.server --model mlx-community/Qwen3.5-Coder-35B-A3B-4bit \
  --port 8000

# Ultra-light (any machine)
ollama run lfm2.5:1.2b-instruct-q8_0

Project Structure

src/
โ”œโ”€โ”€ agent/          # Core agent logic, checkpointing, execution
โ”œโ”€โ”€ tools/          # 54 tool implementations
โ”œโ”€โ”€ api/            # LLM client with timeout and retry
โ”œโ”€โ”€ ui/             # Selfware aesthetic (style, animations, banners)
โ”œโ”€โ”€ analysis/       # Code analysis, BM25 search, vector store
โ”œโ”€โ”€ cognitive/      # PDVR cycle, working/episodic memory
โ”œโ”€โ”€ config/         # Configuration management
โ”œโ”€โ”€ devops/         # Container support, process manager
โ”œโ”€โ”€ observability/  # Telemetry and tracing
โ”œโ”€โ”€ orchestration/  # Multi-agent swarm, planning, workflows
โ”œโ”€โ”€ safety/         # Path validation, sandboxing, threat modeling
โ”œโ”€โ”€ self_healing/   # Error classification, recovery executor, backoff
โ”œโ”€โ”€ session/        # Checkpoint persistence
โ”œโ”€โ”€ testing/        # Verification, contract testing, workflow DSL
โ”œโ”€โ”€ memory.rs       # Memory management
โ”œโ”€โ”€ tool_parser.rs  # Robust multi-format XML parser
โ””โ”€โ”€ token_count.rs  # Token estimation

Multi-Agent System

The agent supports up to 16 concurrent specialists:

# Launch multi-agent chat
./target/release/selfware multi-chat

# Roles: Architect, Coder, Tester, Reviewer, DevOps, Security

Troubleshooting

"Connection refused"

# Is your LLM backend running?
curl http://localhost:8000/v1/models

"Request timeout"

# Increase timeout for slow models
# In selfware.toml:
[agent]
step_timeout_secs = 14400  # 4 hours

"Safety check failed"

# Check allowed_paths in config
# The agent only accesses paths you permit

License

MIT License

Acknowledgments

  • Built for Qwen3-Coder, Kimi K2.5, LFM2, and other local LLMs
  • Inspired by the AiSocratic movement
  • UI philosophy: software should feel like a warm workshop, not a cold datacenter

    "Tend your garden. The code will grow."
                                    โ€” selfware proverb