Claudette
A local-first AI secretary that runs on your own laptop. REPL, fullscreen TUI, one-shot CLI, and a Telegram bot — all driving the same Ollama backend. No cloud brain, no subscription, no telemetry. Single Rust binary.
Install in 30 seconds
Linux / macOS:
|
Windows (PowerShell):
iwr -useb https://raw.githubusercontent.com/mrdushidush/claudette/main/install.ps1 | iex
Then pull a brain and talk:
Prefer not to pipe the network into a shell? Grab a signed archive from Releases and unzip
claudette(orclaudette.exe) onto yourPATH. SHA256 sidecar on every artifact.Rust user?
cargo install claudettestill works. Don't have a GPU? See CPU-only mode — the 4b brain runs on plain CPU, just slower. First time? Opendocs/show-me.mdfor plain-English examples — calendar, notes, weather, screenshots, voice from your phone.

One turn driving four tool groups (
note_list,weather_forecast,tv_get_quote,calendar_list_events) — the brain enables groups on demand and dispatches calls. TUI tabs:[1]Chat [2]Tools [3]Notes [4]Todos [5]HW.
Why Claudette
The open-source AI agent space is crowded with coding-focused tools (Aider, Cline, OpenHands, opencode). Claudette is aimed at a different slot: a general-purpose personal assistant you can voice-note from a bus stop, that runs entirely on your own laptop, with no cloud brain in the loop.
- Truly local by default. No cloud-brain code path exists. Ollama on
localhostis the only required dependency. Every outbound network call (voice TTS, Telegram, web search, GitHub, Google Calendar/Gmail) is opt-in and gated behind a feature you have to turn on. Full inventory inPRIVACY.md. - Fits a single 3060-class GPU. The default
qwen3.5:4bbrain uses ~3.4 GB VRAM; auto-fallback toqwen3.5:9bonly fires on stuck signals. No 32 GB-VRAM hidden requirement. - Messaging-first. None of the comparable tools ship a Telegram bot interface — voice in (Whisper), voice out (edge-tts), and full agent control from your phone.
- Personal, not just code. Tool groups cover Google Calendar, Gmail, scheduler/briefings, notes, todos, markets, weather, web search — code-gen is one capability (via the Codet sidecar), not the whole point.
Honest side-by-side vs. OpenHands, Aider, opencode, Cline, Continue: docs/comparison.md. Claudette isn't the winner in most of them — it's the only one aimed at this specific slot.
Highlights
Four interfaces, one brain
| Mode | Command | What it's for |
|---|---|---|
| REPL | claudette |
Conversational shell. Autosaves every turn. |
| One-shot | claudette "your question" |
Print a reply and exit. Pipe-friendly. |
| TUI | claudette --tui |
Ratatui fullscreen UI with 5 tabs. |
| Telegram bot | claudette --telegram |
Voice-capable remote chat. |
80+ tools, ~200 token base schema
Every tool except enable_tools, get_current_time, and load_workspace_rules lives in a group the model opts into via enable_tools(group). 22 groups as of v0.6.0 (notes, todos, files, code, meta, git, ide, search, advanced, facts, registry, github, markets, telegram, calendar, schedule, gmail, recall, quality [run_tests / diagnostics / apply_patch], semantic [semantic_grep], vision [screenshot_capture / image_describe], clipboard) — schema cost stays flat until the model actually needs the surface.
Brownfield missions: clone, edit, ship a PR — in one tool chain
mission_start("owner/repo") clones into ~/.claudette/missions/<slug>/ and silently re-routes git_status / glob_search / grep_search / write_file / bash into the mission tree. mission_submit auto-branches, commits, pushes, and opens the PR via gh_create_pr. Resumable across sessions via mission_attach.
Forge-mode: autonomous code-change pipeline
claudette --forge "<prompt>" or /forge <prompt> runs a Planner → Coder → Verifier loop against the active mission, with a configurable fix-loop (default 2 rounds) before the PR opens. Roles are routable via ~/.claudettes-forge/models.toml so you can pin a stronger model to Verifier and keep a cheap model on Coder. Inside an existing git repo with no mission active, forge auto-bootstraps an ephemeral mission rooted at the repo toplevel — no clone required. Full walkthrough: docs/forge.md.
Tiered-brain auto-fallback
Three presets (Fast / Auto / Smart). Auto runs qwen3.5:4b and escalates to qwen3.5:9b on stuck signals (empty response after retry, max-iterations hit with no text, ≥ 3 consecutive tool errors). Per-turn revert — not session-sticky. For 16 GB+ VRAM, pin qwen3.6-35b-a3b instead — see Recommended models.
Voice in, voice out, and vision in
Whisper transcription for Telegram voice notes, edge-tts for replies (English or Hebrew). Image attachments in the TUI/REPL via Alt+V (clipboard), drag-drop, or @/path/to/img.png when the loaded brain is multimodal.
Codet sidecar for code generation
generate_code routes through a dedicated coder model (default qwen3-coder:30b, fallback qwen2.5-coder:14b; recommended upgrade qwen3.6-35b-a3b — same model as the brain, no swap dance — see Recommended models). Runs a real syntax check (py_compile, rustc --emit=metadata, tsc --noEmit, etc. — 5 languages), then an Aider-style SEARCH/REPLACE fix loop on failure, then optional pytest/cargo-test/jest. Hot-swaps into VRAM on demand on memory-constrained boxes.
Cross-session semantic recall
/recall <query> searches past conversation turns across sessions via an embedding index (works on Ollama or LM Studio's /v1/embeddings). Drops fragments of relevant past turns straight into the current context.
Three sub-agents
spawn_agent delegates to a Researcher (web + file + code search, 10 turn cap), GitOps (rebase/squash/push, 8 turn cap), or Code Reviewer (read-only, 5 turn cap). Only the final text comes back — sub-agent chatter doesn't pollute the main context.
Per-tool permission gating
ReadOnly tools auto-allow, WorkspaceWrite tools auto-allow, DangerFullAccess prompts [y/N] every time (bash, edit_file, git add/commit/push/checkout, cross-org PRs). Telegram default-denies DangerFullAccess (no TTY).
Hardware
The numbers below describe the comfortable setup. You don't need a GPU — Ollama runs on plain CPU (slower, but viable for a 1b/3b brain). See docs/hardware.md#no-gpu-cpu-only-mode if you don't have one.
| Component | Comfortable minimum | Recommended | Tested on |
|---|---|---|---|
| GPU | 6 GB VRAM (or CPU-only with a smaller brain) | 8 GB VRAM | RTX 3060 Ti 8 GB |
| RAM | 16 GB | 32 GB | 32 GB DDR4 |
| Disk | ~3 GB (brain only) | ~27 GB (brain + fallback + 30b coder) | NVMe SSD |
| OS | Windows 10+, Linux, macOS | Windows 11 / Ubuntu 24.04 / macOS 14+ | Windows 11 Pro |
Full model footprint table, CPU-only recipes, and the 30b-coder-on-8GB-VRAM env recipe: docs/hardware.md.
For the recommended
qwen3.6-35b-a3bsetup (best quality), see the Recommended models section below — 16 GB VRAM or 32 GB RAM with CPU-MoE offload is the practical tier.
Recommended models
The defaults (qwen3.5:4b brain / qwen3-coder:30b coder) are tuned for broad hardware compatibility — they install in under a minute and work on any 8 GB GPU or modern CPU. Beyond that, extensive testing (most recently the 100-prompt regression sweep on 2026-05-20 — 80% raw / ~98% adjusted, zero true regressions) has shown what works best at each tier:
Brain
| Hardware tier | Recommended brain | Notes |
|---|---|---|
| 8 GB VRAM / 16 GB RAM | qwen3.5:4b (Q8) |
Default. Fast, fits everywhere, tool-calling solid. |
| 16 GB VRAM / 32 GB RAM | qwen3.6-35b-a3b |
Best overall by a wide margin. MoE — 35 B total / ~3 B active per token, needs CPU-MoE offload. ~24 t/s baseline / ~43 t/s with MTP on RTX 5060 Ti. |
| 24 GB+ VRAM | qwen3.6-35b-a3b (full GPU) |
Top quality, full GPU residency. |
qwen3.6-35b-a3b is currently distributed via LM Studio (Unsloth GGUF) rather than packaged on Ollama. Flip the backend with CLAUDETTE_OPENAI_COMPAT=1 — see docs/power-user.md. When multiple quants are on disk, pin one explicitly (CLAUDETTE_MODEL=qwen3.6-35b-a3b@q4_k_xl) — LM Studio picks the smallest match otherwise.
Codet sidecar coder
When you use generate_code or --forge:
qwen3.6-35b-a3b— best if the VRAM/RAM budget is there. Same model as the brain means no swap dance between turns.qwen3-coder:30b— current default. Quality coder, available on Ollama, MoE-friendly on 8 GB VRAM with the env recipe.qwen3.6-27b(dense) — top quality but very tight on 16 GB VRAM even at Q4; comfortable on 24 GB+.
Pin a non-default brain via ~/.claudette/.env (CLAUDETTE_MODEL=...) or /brain <model> at runtime. Pin the coder via CLAUDETTE_CODER_MODEL=....
Quick start (full setup)
# 1a. Default path — Ollama with the 3.5 family (works on 8 GB VRAM).
# 1b. Recommended path — LM Studio with qwen3.6 (best on 16 GB+ VRAM).
# Pull `qwen3.6-35b-a3b` from inside LM Studio, then in ~/.claudette/.env:
# CLAUDETTE_OPENAI_COMPAT=1
# OLLAMA_HOST=http://localhost:1234
# CLAUDETTE_MODEL=qwen3.6-35b-a3b@q4_k_xl
# CLAUDETTE_CODER_MODEL=qwen3.6-35b-a3b@q4_k_xl
# See `docs/power-user.md` for the full LM Studio recipe.
# 2. Install Claudette — pick one.
| | # Or download an archive from https://github.com/mrdushidush/claudette/releases/latest
# 3. (Optional) Tokens for tools that need them.
# web_search
# github group
# --telegram mode
# 4. Run.
First launch auto-creates ~/.claudette/ and probes http://localhost:11434. Bypass the probe with CLAUDETTE_SKIP_OLLAMA_PROBE=1 for offline sessions.
Out of the box: notes, todos, files, time, weather, Wikipedia, code search. Brave / GitHub / Google Calendar / Gmail tools light up when you set the relevant token — full table in docs/configuration.md. Want to see what to actually type? Open docs/show-me.md.
Docs
docs/show-me.md— start here: plain-English example prompts (notes, calendar, vision, voice, code)docs/quickstart.md— 30-second start, common flowsdocs/configuration.md— every env var, token file fallbacks, recall settingsdocs/power-user.md— LM Studio recipe, brain pinning, forge knobs, context tuningdocs/hardware.md— VRAM/RAM/disk by preset, CPU-only mode, 30b-on-8GB env recipedocs/usage.md— CLI flags, slash commands, Telegram-only commandsdocs/architecture.md— module layout, tool-group contract, Codet sidecar contractdocs/forge.md— forge-mode pipeline, Submitter contract,models.tomlschema, auto-bootstrapdocs/comparison.md— honest side-by-side vs. opencode / Aider / OpenHands / Cline / Continuedocs/google_setup.md— Calendar + Gmail OAuth walkthroughdocs/deploy.md— Pi / VPS / home-server deploy via docker-composeeditor/vscode/— VS Code extension (REPL/TUI/forge/"ask about selection" commands)PRIVACY.md— every place data can leave your machine, and the conditions for each
Storage layout
~/.claudette/
├── notes/ # Markdown notes (ISO-timestamped, optional tags)
├── files/ # Sandboxed scratch dir for write_file/generate_code
├── sessions/ # Auto-saved + named sessions
├── secrets/ # Token files (github.token, telegram.token, brave.token, …)
├── missions/ # Brownfield mission clones
├── models/ # Whisper model (download separately)
├── recall.sqlite # Cross-session semantic-recall index
├── todos.json # Task list
├── models.toml # Optional model-config overlay
├── fallback.jsonl # Auto-fallback event log
├── .env # Persistent env-var overrides
└── CLAUDETTE.MD # Optional user memory (800-char cap)
Nothing outside ~/.claudette/ is written without explicit permission.
Build from source
Tests: 703 passing, 6 ignored (4 POSIX-only hook tests, 2 live-recall smokes that need an LM Studio embedding server). Before committing: cargo fmt --all && cargo clippy --all-targets --no-deps -- -D warnings && cargo test --lib.
Contributing
See CONTRIBUTING.md. Quick version:
- File bugs at https://github.com/mrdushidush/claudette/issues.
- Conventional Commits:
feat:,fix:,docs:,refactor:,test:,chore:,style:,ci:. - By contributing, you agree your work is licensed under Apache 2.0.
Security issues: please use the private advisory flow in SECURITY.md — don't open a public issue.
Be kind — CODE_OF_CONDUCT.md has the short version.
License
Apache License 2.0 — see LICENSE. Use, modify, redistribute commercially or personally. No trademark grant; don't imply endorsement.
Copyright © 2026 mrdushidush.