Butterfly Bot
Butterfly Bot is an opinionated personal-ops AI assistant built for people who want results, not setup overhead.
It is built to be Zapier-first: most real-world integrations run through Zapier MCP, so the fastest path is simply adding your Zapier MCP token in the Config screen.
Get your Zapier key here: https://zapier.com/mcp and setup your integrations there.
Install
Ubuntu
Download the deb file for Ubuntu from the latest GitHub Release
Mac
Download the .app.zip artifact for macOS from the latest GitHub Release, then unzip and move the .app to /Applications.
If macOS reports the app is "damaged" after download, clear quarantine on the extracted app:
xattr -dr com.apple.quarantine /Applications/butterfly-bot.app
Other
cargo install butterfly-bot
Why users pick it:
- Fast value: works out-of-the-box with default settings.
- Unlimited tokens: designed to support Ollama to run privately on your computer with unlimited use.
- UI-first: polished desktop app with chat, AI activity, and settings.
- Automation: full toolset provided for your always-on agent.
- Integrations: Zapier-first MCP integration model for connecting your existing SaaS stack in minutes.
- Security: WASM-only execution for tools plus OS keychain-backed secrets - no plaintext secrets or insecure tools.
- Memory: best-in-class memory that remembers the facts and when they happened.
Zapier-first setup (60 seconds)
If you only configure one thing, configure Zapier:
- Open the app and go to
Config. - Paste your Zapier MCP token and save the config.
That single token unlocks most production workflows because Butterfly Bot can route actions through Zapier's connected apps (email, calendar, tasks, CRM, docs, alerts, and more).
Want a fast start? Use the ready-made templates in examples/ and paste a context.md + heartbeat.md pair into the app.
Top 5 real-world use cases (Butterfly Bot + Zapier)
These are the most commonly adopted autonomous-agent outcomes and how to implement them with Butterfly Bot.
-
Autonomous email / inbox management
- Use Butterfly Bot planning + tasks + reminders to run recurring inbox cleanup cycles.
- Route actions through Zapier to Gmail/Outlook for labeling, drafting, replying, archiving, and escalation.
- Keep only high-risk decisions human-in-the-loop (for example: send approval for high-priority drafts).
-
Daily morning briefings and proactive digests
- Schedule a wakeup/task that runs each morning before your workday.
- Pull weather, calendar, headlines, watchlists, or team metrics through Zapier app integrations.
- Deliver one consolidated digest in your preferred channel (email, Slack, Telegram, etc.) via Zapier.
-
Calendar, scheduling, and task management
- Let Butterfly Bot parse intent from chat/notes and generate next actions.
- Use Zapier to sync Google Calendar/Outlook + Todoist/Linear/Jira/Asana from one workflow.
- Enforce conflict checks and reminder policies with Butterfly Bot reminders and recurring tasks.
-
Personal/family/executive assistance
- Capture requests in natural language, then let Butterfly Bot break them into executable steps.
- Use Zapier actions for shopping lists, booking flows, subscription tracking, and travel coordination.
- Keep continuity with Butterfly memory so follow-ups reflect previous preferences and decisions.
-
Research, summarization, and monitoring
- Run scheduled monitoring tasks for competitors, markets, topics, or project signals.
- Use Zapier integrations to collect source data and trigger downstream reports.
- Have Butterfly Bot summarize findings, create tasks, and notify only when thresholds are met.
Why this pairing works
- Butterfly Bot provides autonomy, planning, memory, reminders, and always-on execution.
- Zapier MCP provides broad app connectivity without custom per-app engineering.
- Together, you get local-first agent orchestration plus cloud app automation from one simple setup path.
Examples (Context + Heartbeat templates)
Use these templates by opening a pair, copying the markdown, and pasting into the app:
- Paste
context.mdinto theContexttab. - Paste
heartbeat.mdinto theHeartbeattab. - Save and start Heartbeat.
- Autonomous inbox management
- Morning briefings and digests
- Calendar, scheduling, and tasks
- Personal/family/executive assistance
- Research, summarization, and monitoring
How it compares:
| Criterion | Weight | Butterfly Bot | OpenClaw | ZeroClaw | IronClaw |
|---|---|---|---|---|---|
| Workflow completeness | 20 | 5 | 4 | 3 | 4 |
| Reliability and recovery | 20 | 5 | 3 | 4 | 3 |
| UX and visibility | 15 | 5 | 4 | 3 | 4 |
| Security posture | 15 | 5 | 1 | 5 | 4 |
| Setup/onboarding | 10 | 5 | 4 | 5 | 4 |
| Integration leverage/extensibility | 10 | 4 | 5 | 5 | 5 |
| Docs/contributor DX | 10 | 5 | 4 | 5 | 4 |
| Total Weighted (/500) | 100 | 490 | 345 | 415 | 390 |
Tools
Built-in tools included in Butterfly Bot:
mcp— Connects to external MCP servers over streamable HTTP.github— GitHub MCP wrapper for GitHub workflows.zapier— Zapier MCP wrapper for connected app automations.coding— Dedicated coding tool/model for implementation tasks.search_internet— Web search tool (OpenAI, Grok, or Perplexity providers).http_call— Generic HTTP client for external API calls.planning— Structured plans (goals + steps).todo— Ordered checklist-style todos.tasks— Scheduled one-off and recurring tasks.reminders— Reminder creation and lifecycle operations.wakeup— Interval-based wakeup/autonomy task loop.
Tool configuration is convention-first and managed through app defaults plus minimal Config tab controls.
Architecture (Daemon + UI + Always-On Agent)
┌──────────────────────────────────────┐
│ Desktop UI (Dioxus) │
│ - chat, activity, simple settings │
│ - streams tool + agent events │
└───────────────┬──────────────────────┘
│ IPC / local client
v
┌──────────────────────────────┐
│ butterfly-botd │
│ (daemon) │
│ - always-on scheduler │
│ - tools + wakeups │
│ - memory + planning │
└──────────────┬───────────────┘
│
┌─────────────────┼─────────────────┐
v v v
┌────────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Memory System │ │ Tooling Layer │ │ Model Provider │
│ (SQLCipher + │ │ (MCP, HTTP, │ │ (Ollama) │
│ sqlite-vec) │ │ reminders, │ │ │
│ │ │ tasks, etc.) │ │ │
│ │ │ + WASM sandbox│ │ │
│ │ │ runtime │ │ │
└────────────────┘ └───────────────┘ └──────────────────┘
Memory System (Diagram + Rationale)
┌───────────────────────────────┐
│ Conversation │
│ (raw turns + metadata) │
└───────────────┬───────────────┘
│
v
┌────────────────────────────────┐
│ Event + Signal Extractor │
│ (facts, prefs, tasks, entities)│
└───────────────┬────────────────┘
│
┌─────────────┴─────────────┐
│ │
v v
┌──────────────────────────┐ ┌──────────────────────────┐
│ Temporal SQLCipher DB │ │ sqlite-vec Vectors │
│ (structured memories) │ │ (embeddings + rerank) │
└─────────────┬────────────┘ └─────────────┬────────────┘
│ │
v v
┌──────────────────────────┐ ┌──────────────────────────┐
│ Memory Summarizer │ │ Semantic Recall + Rank │
│ (compression + pruning) │ │ (query-time retrieval) │
└─────────────┬────────────┘ └─────────────┬────────────┘
└──────────────┬───────────────┘
v
┌────────────────────────┐
│ Context Assembler │
│ (chat + tools + agent) │
└────────────────────────┘
Temporal knowledge graph (what “temporal” means here)
Memory entries are stored as time-ordered events and entities in the SQLCipher database. Each fact, preference, reminder, and decision is recorded with timestamps and relationships, so recall can answer questions like “when did we decide this?” or “what changed since last week?” without relying on lossy summaries. This timeline-first structure is what makes the memory system a temporal knowledge graph rather than a static summary.
Why this beats “just summarization” or QMD
- Summaries alone lose details. The system stores structured facts in SQLCipher and semantic traces in sqlite-vec so exact preferences, dates, and decisions remain queryable even after summarization.
- QMD-style recall can miss context. Dual storage (structured + vectors) plus reranking yields higher recall and fewer false positives.
- Temporal memory matters. The DB keeps time-ordered events so the assistant can answer “when did we decide X?” without relying on brittle summary phrasing.
- Safer pruning. Summarization is used for compression, not replacement, so older context is condensed while retaining anchors for precise retrieval.
- Faster, cheaper queries. Quick structured lookups handle facts and tasks; semantic search handles fuzzy recall, keeping prompts smaller and more relevant.
Privacy & Security & Always On
- Run locally with Ollama to keep requests and model inference private on your machine.
- Designed for always-on use with unlimited token use (local inference) and customized wakeup and task intervals.
- Conversation data and memory are only stored locally.
- Config JSON is stored in the OS keychain.
- SQLite data is encrypted at rest via SQLCipher when a DB key is set.
Prerequisites
- Rust (via rustup): https://rustup.rs (only if
cargo install) - Ollama is auto-installed on Linux and Mac at first run (via
curl -fsSL https://ollama.com/install.sh | sh) when local Ollama is configured.
Requirements
System
- Rust 1.93+ (only if using
cargo install) - 16GB+ RAM with 8GB+ VRAM (for Linux)
- Certain system libraries for Linux (only if using
cargo install) - 16GB+ RAM with M2 Pro (for Mac)
Models Used
- ministral-3:14b (assistant + summaries)
- embeddinggemma:latest (embedding)
- qllama/bge-reranker-v2-m3 (reranking)
Models auto-download and install if not already installed.
Test Systems
- AMD Threadripper 2950X with 128GB DDR4 with AMD 7900XTX on Ubuntu 24.04.3 (instant responses)
- MSI Raider GE68-HX-14V on Ubuntu 24.04.3 (instant responses)
- M2 Pro Mac Mini with 16GB RAM (~10 second responses)
Build
Test
Coverage (llvm-cov):
If your environment prompts for keychain/keyring access during tests, disable keyring usage for that run:
BUTTERFLY_BOT_DISABLE_KEYRING=1
Run
Debian package via Dioxus (.deb)
If you want to avoid Snap for local testing, build a Debian package directly:
Install the generated package:
If dx is missing:
Run the packaged commands:
Daemon service (optional) is shipped disabled by default and can be managed with:
Notes:
- Snap launchers set a writable app root under
$SNAP_USER_COMMON/butterfly-bot. - The default DB path is
$SNAP_USER_COMMON/butterfly-bot/data/butterfly-bot.db. - Bundled modules are mounted at
./wasm/<tool>_tool.wasminside the app runtime directory. BUTTERFLY_BOT_DISABLE_KEYRING=1is enabled by default in the snap launcher (override if your snap environment provides a working keyring backend).
macOS app bundle via Dioxus (.app)
Build the macOS app bundle and a zipped release artifact:
Open the generated app:
If dx is missing:
Release artifact output:
dist/ButterflyBot.appdist/ButterflyBot_<version>_<arch>.app.zip
macOS release signing + notarization (GitHub Actions)
The release workflow .github/workflows/release-macos-app.yml is configured to sign and notarize macOS app artifacts before uploading to GitHub Releases.
All signing/notarization secrets are optional. If they are not configured, CI still publishes a macOS .app.zip artifact (ad-hoc signed, non-notarized), and users may need to use Open Anyway or clear quarantine.
Required repository secrets:
APPLE_CERTIFICATE_BASE64- Base64-encoded Developer ID Application certificate (.p12)APPLE_CERTIFICATE_PASSWORD- Password used when exporting the.p12certificateAPPLE_KEYCHAIN_PASSWORD- Temporary CI keychain passwordAPPLE_SIGN_IDENTITY- Signing identity name (example:Developer ID Application: Your Name (TEAMID))APPLE_ID- Apple ID email used for notarizationAPPLE_TEAM_ID- Apple Developer Team IDAPPLE_APP_SPECIFIC_PASSWORD- App-specific password for the Apple ID
Create base64 certificate value locally:
|
How To (Context, Heartbeat, Config)
Use this quick sequence for best results with minimal setup:
-
Set your Context first
- Open the
Contexttab and paste your operating context (goals, constraints, preferences, project notes). - Keep it short and actionable; update only when your priorities change.
- Open the
-
Start Heartbeat for always-on automation
- Open
Heartbeatand start the loop so the daemon can run wakeups, tasks, reminders, and tool actions continuously. - If you want fewer background checks, lower activity by increasing the wakeup interval in Config.
- Open
-
Use Config only when needed
- Open
Configto set required secrets and connectivity: - Zapier token (primary; most integrations rely on this)
- GitHub token
- Coding OpenAI API key
- Search provider + search API key
- MCP servers
- Network allow list
- If something fails with a tool call, check provider/key/allowlist first before changing anything else.
- Config is stored in the OS keychain for top security and safety.
- Open
Diagnostics & Security Audit
- Health diagnostics and security audit capabilities remain available at the daemon/API layer.
- Security posture guidance and limits are documented in docs/security-audit.md.
Threat Model (Important)
- Butterfly Bot now has a formal attacker model and trust-boundary definition.
- This covers UI↔daemon, daemon↔tool runtime, daemon↔provider, and daemon↔storage boundaries.
- Primary threats covered: plaintext secret leakage, tool capability escalation, over-permissive network egress, and daemon auth misuse.
- Baseline controls include OS keychain-backed secrets, WASM-only execution for built-in tools, daemon auth checks, and
default_denynetwork posture guidance. - See docs/threat-model.md for full assumptions, residual risks, and hardening priorities.
Memory LLM Configuration
memory.openailets memory operations (embeddings, summarization, reranking) use a different OpenAI-compatible provider than the main agent.- This is useful for running the agent on a remote provider (e.g., Cerebras) while keeping memory on a local Ollama instance.
Convention Mode (WASM-only tools)
- Tool execution is WASM-only for all built-in tools.
- Startup now validates WASM module integrity (magic header) for registered tools and fails fast on invalid/corrupted binaries.
- Per-tool
runtimeconfig is ignored; tool execution is WASM-only. - Per-tool
wasm.moduleis optional and defaults to./wasm/<tool>_tool.wasm. - Zero-config path: place modules at
./wasm/<tool>_tool.wasmfor each tool you run. - Convention defaults include a deny-by-default network posture with allowlisted domains including
mcp.zapier.com.
Build all default tool modules:
This generates:
./wasm/coding_tool.wasm./wasm/mcp_tool.wasm./wasm/http_call_tool.wasm./wasm/github_tool.wasm./wasm/zapier_tool.wasm./wasm/planning_tool.wasm./wasm/reminders_tool.wasm./wasm/search_internet_tool.wasm./wasm/tasks_tool.wasm./wasm/todo_tool.wasm./wasm/wakeup_tool.wasm
tool call
│
v
┌───────────────────────────────┐
│ Sandbox planner │
│ (WASM-only invariant) │
└───────────────┬───────────────┘
│
┌─────────v─────────┐
│ WASM runtime │
│ ./wasm/<tool>_tool│
│ .wasm │
└───────────────────┘
SQLCipher (encrypted storage)
Butterfly Bot uses SQLCipher-backed SQLite when you provide a DB key.
Set the environment variable before running:
If no key is set, storage falls back to plaintext SQLite.
Competitive Feature Matrix (Butterfly Bot vs OpenClaw, ZeroClaw, IronClaw)
Positioning Snapshot
- Butterfly Bot (this repo): Practical personal-agent workflows with daemon + UI + planning/todo/tasks/reminders/wakeup + memory.
- OpenClaw (main competitor): Full personal-assistant platform with broad channels and plugin ecosystem, but currently high operational security risk for typical deployments.
- ZeroClaw: Lean, pluggable Rust agent framework with strong onboarding story and broad provider/channel coverage.
- IronClaw: Platform-style architecture emphasizing sandboxed extensibility (WASM), orchestration, routines, and gateway capabilities.
Feature Matrix
Legend: ✅ strong, 🟨 partial/limited, ❌ not evident
| Area | Butterfly Bot | OpenClaw | ZeroClaw | IronClaw |
|---|---|---|---|---|
| Rust core implementation | ✅ | ❌ (TypeScript-first) | ✅ | ✅ |
| Interactive UI included | ✅ (Dioxus UI) | ✅ (Control UI + WebChat) | 🟨 (CLI-first) | ✅ (TUI/Web gateway) |
| Local daemon/service model | ✅ | ✅ | ✅ | ✅ |
| Config persistence + reload path | ✅ | ✅ | ✅ | ✅ |
| Provider abstraction | ✅ | ✅ | ✅ | ✅ |
| Broad multi-provider catalog | ✅ (relies on Zapier) | ✅ | ✅ | 🟨 (focused provider path + adapters) |
| Agent extension architecture | ✅ (Rust-native modules + MCP integrations; maintainer-curated) | ✅ (plugins/extensions) | ✅ | ✅ |
| Secure tool sandbox model (explicit) | ✅ | 🟨 (sandbox/policy flows exist, but high-risk defaults and misconfiguration exposure remain common) | ✅ | ✅ |
| Memory subsystem | ✅ (SQLite + sqlite-vec hybrid search) | ✅ (core memory + LanceDB plugin path) | ✅ (SQLite/Markdown + hybrid search) | ✅ (workspace memory + hybrid search) |
| Planning + todo/task orchestration | ✅ (native modules) | 🟨 | 🟨 | ✅ |
| Scheduled reminders/heartbeat style automation | ✅ | ✅ | ✅ | ✅ (routines/heartbeat) |
| End-user dynamic plugin building | ❌ (intentional: convention-over-configuration) | 🟨 (plugin/extensibility strong, not builder-centric) | ❌ | ✅ (WASM-oriented builder flow) |
| Zero-step onboarding (no wizard required) | ✅ | ✅ | ✅ | ✅ |
| Documentation breadth for contributors | ✅ | ✅ | ✅ | ✅ |
| Explicit security hardening docs/checklists | ✅ | ✅ | ✅ | ✅ |
| Test breadth/visibility | ✅ | ✅ | ✅ | 🟨 |
Weighted Scorecard (Personal Ops Agent Lens)
Scoring model:
- Score each criterion from 1 to 5 (5 = strongest).
- Weight reflects importance for a personal operations assistant product.
- Weighted score per row =
score × weight. - Total possible = 500 (if all criteria scored 5).
Criteria and Weights
| Criterion | Weight (%) | Why it matters |
|---|---|---|
| Workflow completeness (plan→task→reminder→done) | 20 | Core product value for daily execution. |
| Reliability and failure recovery | 20 | Users trust consistency more than raw feature count. |
| UX and operator visibility | 15 | Faster adoption and better day-2 usability. |
| Security posture and secret hygiene | 15 | Critical for real-world deployment and trust. |
| Setup/onboarding speed | 10 | Strong determinant of conversion and retention. |
| Integration leverage and extensibility | 10 | Measures practical capability breadth, including MCP partner surfaces (e.g., Zapier) and native agent extension velocity. |
| Documentation and contributor DX | 10 | Impacts community velocity and maintainability. |
Current Scoring (Post-Ship Estimate)
Scoring reflects current shipped state for Butterfly Bot after landing local golden-path reliability checks, execution-trace sanity coverage, and trace redaction hardening. It should still be revised as competitors evolve.
| Criterion | Weight | Butterfly Bot | OpenClaw | ZeroClaw | IronClaw |
|---|---|---|---|---|---|
| Workflow completeness | 20 | 5 | 4 | 3 | 4 |
| Reliability and recovery | 20 | 5 | 3 | 4 | 3 |
| UX and visibility | 15 | 5 | 4 | 3 | 4 |
| Security posture | 15 | 5 | 1 | 5 | 4 |
| Setup/onboarding | 10 | 5 | 4 | 5 | 4 |
| Integration leverage/extensibility | 10 | 4 | 5 | 5 | 5 |
| Docs/contributor DX | 10 | 5 | 4 | 5 | 4 |
| Total Weighted (/500) | 100 | 490 | 345 | 415 | 390 |
Path to 500 (Integration Leverage 4→5)
To reach 500/500, the remaining criterion is Integration leverage/extensibility (currently 4/5, weight 10).
Definition of 5/5 for Integration leverage/extensibility:
- Ship at least 5 operator-ready integration playbooks (for example MCP/Zapier/provider workflows) with reproducible steps.
- Each playbook includes: prerequisites, exact configuration, expected outputs, failure modes, and recovery/rollback steps.
- Add a tested compatibility table (integration surface + version/provider assumptions + last validation date).
- Add repeatable verification checks (local smoke tests or scripted validation) for every published playbook.
- Publish reliability evidence for those integration paths (success rate + retry behavior over repeated runs).
Exit rule for score update:
- Keep Integration leverage/extensibility at 4/5 until all criteria above are met and documented.
- Move Integration leverage/extensibility to 5/5 only after evidence is published; total then becomes 500/500.
License
MIT