do_it
An autonomous coding agent that runs local LLMs via Ollama to read, write, and fix code in your repositories. Works on Windows and Linux with no shell dependency, no Python, no cloud APIs.
Inspired by mini-swe-agent — a minimal, transparent approach to software engineering agents.
do_it extends that foundation with persistent memory, multi-role orchestration, sub-agents, GitHub integration, and a significantly expanded tool set.
Most of the new features were designed and implemented by Claude Sonnet 4.6.
Features
- Local-first — runs entirely on your machine via Ollama, no cloud APIs required
- Cross-platform — Windows (MSVC) and Linux, no shell operators, no Python
- Agent roles — focused tool sets and prompts per task type:
developer,navigator,qa,boss,research,reviewer,memory - Sub-agent orchestration —
bossrole delegates to specialised sub-agents viaspawn_agent; results flow through shared memory - Persistent memory —
.ai/hierarchy: session notes, task plan, knowledge base, architectural decisions, lessons learned. Global~/.do_it/memory for user preferences and cross-project boss insights - Browser integration — headless browser tools (
screenshot,browser_get_text,browser_action,browser_navigate) via CDP; connect Chrome or Lightpanda by settingcdp_urlin config - Agent self-improvement —
tool_requestandcapability_gaptools let the Boss record missing capabilities to~/.do_it/tool_wishlist.md; review to prioritise new tool development - Project auto-detection —
.ai/project.tomlscaffolded on first run with commands, GitHub repo, and agent conventions - GitHub integration —
github_apitool for issues, PRs, branches, commits, file contents (token from env) - Test coverage —
test_coverageauto-detects Rust/Node/Python and runs the right tool - Telegram notifications —
ask_humanfor blocking questions,notifyfor non-blocking progress updates - Loop detection — automatically detects stuck patterns and sends a Telegram alert
- Model routing — use different models per role (e.g. a large coder model for
developer, a small fast one fornavigator) - Vision support — pass an image as
--taskfor visual debugging (requires vision-capable model)
Quick Start
# 1. Pull a model
# 2. Install
# 3. Run
# With a role (recommended)
# Orchestrate a complex task with sub-agents
Roles
Each role restricts the agent to a focused set of tools and a role-specific system prompt. This is critical for smaller models — 6–8 tools instead of 20+ significantly improves output quality.
| Role | Purpose | Key tools |
|---|---|---|
developer |
Write and edit code | read/write file, str_replace, run_command, git, AST, github_api, test_coverage, browser |
navigator |
Explore codebase structure | tree, find_files, search, outline, find_references |
research |
Find information | web_search, fetch_url, memory |
qa |
Run tests, verify changes | test_coverage, diff_repo, git_log, search, github_api, screenshot |
reviewer |
Static code review — no execution | read_file, search, outline, diff_repo, memory, screenshot |
boss |
Plan and orchestrate | memory, spawn_agent, web_search, ask_human, browser, tool_request |
memory |
Manage .ai/ state |
memory_read, memory_write |
Tools
Filesystem: read_file, write_file, str_replace, list_dir, find_files, search_in_files, tree
Execution: run_command, diff_repo, test_coverage
Git: git_status, git_commit, git_log, git_stash
Internet: web_search (DuckDuckGo, no API key), fetch_url, github_api
Code intelligence (Rust, TypeScript, JavaScript, Python, C++, Kotlin):
get_symbols, outline, get_signature, find_references
Memory (.ai/ hierarchy): memory_read, memory_write
Communication: ask_human (Telegram or console), notify (one-way Telegram), finish
Multi-agent: spawn_agent
Browser (requires [browser] in config.toml): screenshot, browser_get_text, browser_action, browser_navigate
Self-improvement: tool_request, capability_gap
Sub-agent Orchestration
The boss role can spawn specialised sub-agents. Sub-agents run in-process with isolated history and communicate through shared .ai/knowledge/ memory.
boss: reads last_session, plan, decisions, user_profile
│
├─ spawn_agent("research", "find best OAuth crates for Axum", memory_key="knowledge/oauth")
├─ spawn_agent("navigator", "locate existing auth middleware", memory_key="knowledge/structure")
├─ spawn_agent("developer", "implement OAuth per the plan")
├─ screenshot("http://localhost:3080/login") ← boss sees the result directly
├─ spawn_agent("reviewer", "review the OAuth implementation", memory_key="knowledge/review_report")
├─ spawn_agent("qa", "verify all tests pass", memory_key="knowledge/qa_report")
└─ notify("OAuth complete, all tests pass") → finish
Persistent Memory
.ai/
├── project.toml ← auto-scaffolded on first run, edit freely
├── prompts/ ← custom role prompt overrides
├── state/
│ ├── current_plan.md ← boss writes task breakdown here
│ ├── last_session.md ← agent reads this on startup
│ ├── session_counter.txt
│ └── external_messages.md ← external inbox, read and cleared on startup
├── logs/history.md
└── knowledge/
├── lessons_learned.md ← QA appends project-specific patterns
├── decisions.md ← architectural decisions and rationale
└── qa_report.md ← latest test results
Global memory in ~/.do_it/ persists across all projects and is read by the boss role at startup:
| File | Purpose |
|---|---|
user_profile.md |
Your preferences: language, stack, workflow style. Boss reads this every session. |
boss_notes.md |
Cross-project insights accumulated by Boss — patterns that apply beyond the current repo. |
tool_wishlist.md |
Missing capabilities recorded by the agent via tool_request and capability_gap. Review to prioritise new tool development. |
Edit ~/.do_it/user_profile.md once and the boss will always know your stack and conventions.
Configuration
# config.toml
= "http://localhost:11434"
= "qwen3.5:9b"
= 0.0
= 4096
= 8
= 6000
# Optional: different models per role
[]
= "qwen3-coder-next"
= "qwen3.5:4b"
= "qwen3.5:4b"
# Optional: Telegram for ask_human and notify
# telegram_token = "..."
# telegram_chat_id = "..."
Config priority: --config flag → ./config.toml → ~/.do_it/config.toml → built-in defaults.
On first run, ~/.do_it/ is created with a full template including user_profile.md, boss_notes.md, and tool_wishlist.md.
Browser backend (optional)
[]
# Connect to a running CDP server — Chrome, Lightpanda, or any CDP-compatible browser
= "ws://127.0.0.1:9222"
# Or launch Chrome locally (coming soon — requires chromiumoxide feature)
# chrome_path = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
# Screenshot output directory (default: .ai/screenshots)
# screenshot_dir = ".ai/screenshots"
Start a CDP server:
# Chrome
# Lightpanda (lightweight, AI-optimised, 9x less RAM than Chrome)
# or via Docker:
CLI
do_it run --task <text|file|image>
--repo <path> (default: .)
--role <role> (default: unrestricted)
--config <path> (default: config.toml)
--system-prompt <text|file>
--max-steps <n> (default: 30)
do_it config [--config <path>]
do_it roles
Roadmap
-
git_push/git_pullstructured tools -
run_background(program, args, id)— dev servers, keep-alive processes - Browser CDP implementation (chromiumoxide backend behind
--features browser) - Parallel sub-agent execution via
tokio::join! -
do_it status/do_it initCLI commands - Tree-sitter backend for more accurate AST analysis
- Web search providers beyond DuckDuckGo
Authors
Project concept inspired by mini-swe-agent. Built by Claude Sonnet 4.6 with oleksandr.public@gmail.com.
License
MIT