voice-echo

Voice interface for Claude Code over the phone. Call in and talk to Claude, or trigger outbound calls from n8n / automation workflows.

Built in Rust. Uses Twilio for telephony, Groq Whisper for speech-to-text, Inworld for text-to-speech, and the Claude Code CLI for reasoning.

Architecture

Voice Pipeline

                         ┌─────────────────────────────────────┐
                         │          voice-echo (axum)        │
                         │                                     │
  Phone ◄──► Twilio ◄──►│  WebSocket ◄──► VAD ──► STT (Groq)  │
                         │                         │           │
                         │                    Claude CLI        │
                         │                         │           │
                         │                    TTS (Inworld)     │
                         │                     mulaw 8kHz       │
                         └─────────────────────────────────────┘

AI-Initiated Outbound Calls (n8n Bridge)

  ┌──────────────┐    trigger     ┌──────────────────┐
  │  Any n8n      │──────────────►│   Orchestrator    │
  │  workflow     │               │   reads registry, │
  │  (alerts,     │               │   routes to       │
  │   cron,       │               │   target module   │
  │   events...) │               └────────┬─────────┘
  └──────────────┘                        │
                                          ▼
                                 ┌──────────────────┐
                                 │   call-human      │
                                 │   builds request, │
                                 │   passes context  │
                                 └────────┬─────────┘
                                          │
                                          │ POST /api/call
                                          │ { to, context }
                                          ▼
                                 ┌──────────────────┐
                                 │  voice-echo     │
                                 │  stores context   │──► Twilio ──► Phone rings
                                 │  per call_sid     │
                                 └────────┬─────────┘
                                          │
                                          │ caller picks up
                                          ▼
                                 ┌──────────────────┐
                                 │  Claude CLI       │
                                 │  first prompt     │
                                 │  includes context │
                                 │  "I'm calling     │
                                 │   because..."     │
                                 └──────────────────┘

Full System

                     ┌──────────┐
  ┌─────────┐        │   n8n    │        ┌───────────────┐
  │ Triggers │──────►│ (Docker) │──────►│ voice-echo  │──► Claude CLI
  │ (cron,   │       │          │  API   │ (Rust, axum)  │
  │  webhook,│       │  orchest.│        └───────┬───────┘
  │  alerts, │       │  call-   │                │
  │  events) │       │  human   │                ▼
  └─────────┘        └──────────┘        ┌───────────────┐
                                         │    Twilio      │◄──► Phone
                                         └───────────────┘

Prerequisites

Rust (1.80+)
Claude Code CLI installed and authenticated
Twilio account with a phone number
Groq API key (free tier works)
Inworld API key (sign up at platform.inworld.ai)
A server with a public HTTPS URL (for Twilio webhooks)
nginx (recommended, for TLS termination and WebSocket proxying)

Installation

1. Clone and build

git clone https://github.com/dnacenta/voice-echo.git
cd voice-echo
cargo build --release

2. Run the setup wizard

./target/release/voice-echo --setup

The wizard walks you through the entire setup:

Checks that rustc, claude, and openssl are available
Prompts for Twilio, Groq, and Inworld credentials (masked input)
Asks for your server's external URL
Generates an API token for the outbound call endpoint
Writes ~/.voice-echo/config.toml
Optionally copies the binary to /usr/local/bin/, installs a systemd service, and generates an nginx reverse proxy config

If you skip the optional steps during the wizard, you can always set them up manually using the templates in deploy/.

3. Twilio webhook

In the Twilio Console, set your phone number's voice webhook to:

POST https://your-server.example.com/twilio/voice

4. Start

voice-echo

Or if you installed the systemd service:

sudo systemctl enable --now voice-echo

Manual configuration

If you prefer to skip the wizard and configure by hand:

mkdir -p ~/.voice-echo
cp config.example.toml ~/.voice-echo/config.toml
cp .env.example ~/.voice-echo/.env
chmod 600 ~/.voice-echo/.env

Edit .env with your API keys, and config.toml for your Twilio phone number and other settings. Secrets are loaded from .env, so leave them empty in the TOML. See deploy/nginx.conf and deploy/voice-echo.service for server setup templates.

You can override the config directory with ECHO_CONFIG=/path/to/config.toml.

Configuration Reference

config.toml

Section	Field	Default	Description
`server`	`host`	--	Bind address (e.g. `0.0.0.0`)
`server`	`port`	--	Bind port (e.g. `8443`)
`server`	`external_url`	--	Public HTTPS URL (overridden by `SERVER_EXTERNAL_URL` env var)
`twilio`	`account_sid`	--	Twilio Account SID (overridden by env var)
`twilio`	`auth_token`	--	Twilio Auth Token (overridden by env var)
`twilio`	`phone_number`	--	Your Twilio phone number (E.164)
`groq`	`api_key`	--	Groq API key (overridden by env var)
`groq`	`model`	`whisper-large-v3-turbo`	Whisper model to use
`inworld`	`api_key`	--	Inworld API key (overridden by env var)
`inworld`	`voice_id`	`Olivia`	Inworld voice name
`inworld`	`model`	`inworld-tts-1.5-max`	Inworld TTS model
`claude`	`session_timeout_secs`	`300`	Conversation session timeout
`claude`	`greeting`	`Hello, this is Echo`	Initial TTS greeting when a call connects
`claude`	`dangerously_skip_permissions`	`false`	Allow Claude CLI to run tools without prompting (see Customizing Claude)
`api`	`token`	--	Bearer token for `/api/*` (overridden by env var)
`vad`	`silence_threshold_ms`	`1500`	Silence duration before utterance ends
`vad`	`energy_threshold`	`50`	Minimum RMS energy to detect speech
`hold_music`	`file`	--	Optional path to a WAV file for hold music
`hold_music`	`volume`	`0.3`	Playback volume (0.0 to 1.0)

Environment variables

All secrets can be set via env vars (recommended) instead of config.toml:

Variable	Overrides
`TWILIO_ACCOUNT_SID`	`twilio.account_sid`
`TWILIO_AUTH_TOKEN`	`twilio.auth_token`
`GROQ_API_KEY`	`groq.api_key`
`INWORLD_API_KEY`	`inworld.api_key`
`ECHO_API_TOKEN`	`api.token`
`SERVER_EXTERNAL_URL`	`server.external_url`
`ECHO_CONFIG`	Config file path
`RUST_LOG`	Log level filter (e.g. `voice_echo=debug,tower_http=debug`)

Customizing Claude

voice-echo spawns the claude CLI for each conversation. Claude Code reads a CLAUDE.md file from the working directory to set its behavior — this is how you turn generic Claude into your personalized voice assistant.

Create a CLAUDE.md in the directory where voice-echo runs (typically the project root or the home directory of the service user). This file should contain instructions tailored for a voice context:

Persona: Define who Claude is on the phone — name, tone, personality.
Voice-first rules: Tell Claude to never use markdown, bullet points, numbered lists, or any text formatting. Everything it outputs will be spoken aloud via TTS.
Brevity: Phone calls are not lectures. Two to four sentences per response is usually enough.
Language: If you want multilingual support, specify which languages and when to switch.
Capabilities: Define what Claude can and can't do — run commands, access APIs, check services, etc.
Boundaries: Set security rules, topics to avoid, or information to never disclose.

Without a CLAUDE.md, Claude will behave as its default self — functional but generic.

Permissions

Claude Code normally prompts for permission before running tools (shell commands, file edits, etc.). On a phone call there's no terminal to approve prompts, so you have two options:

dangerously_skip_permissions = true in config.toml — Claude runs all tools without asking. Powerful but risky. Only use this if you trust the instructions in your CLAUDE.md and have locked down what Claude can access.
Pre-approve tools via Claude Code's settings.json or allowedTools configuration. This gives you granular control over which tools are auto-approved without blanket permission.

See the Claude Code documentation for details on permission configuration.

Server Setup

TLS certificates

Twilio requires HTTPS for webhooks. If you're using nginx (recommended), get a free certificate with certbot:

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d your-server.example.com

Certificates auto-renew via a systemd timer. The nginx template in deploy/nginx.conf is already configured for the Let's Encrypt certificate paths.

systemd

The included service file (deploy/voice-echo.service) runs as root for simplicity. For production, consider creating a dedicated user:

sudo useradd -r -s /usr/sbin/nologin voice-echo

Then update User=voice-echo in the service file and ensure the user has read access to ~/.voice-echo/ and the claude CLI.

Usage

Call in

Just call your Twilio number. You'll hear the configured greeting, then talk normally.

Trigger an outbound call

curl -X POST https://your-server.example.com/api/call \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "to": "+34612345678",
    "context": "Server CPU at 95% for the last 10 minutes. Top processes: n8n 45%, claude 30%."
  }'

The recipient picks up and Claude already knows why it called — context is injected into the first prompt.

`POST /api/call`

Requires Authorization: Bearer <token> header.

Field	Type	Required	Description
`to`	string	yes	Phone number in E.164 format (e.g. `+34612345678`)
`context`	string	no	Injected into Claude's first prompt so it knows why it's calling
`message`	string	no	Twilio `<Say>` greeting before the stream starts (usually not needed since Claude handles the greeting via TTS)

n8n Bridge

voice-echo integrates with n8n through a bridge architecture:

Orchestrator -- central webhook that routes triggers to registered modules
Modules -- individual workflows managed via a JSON registry
call-human -- module that triggers outbound calls with context

Trigger a call from any n8n workflow via the orchestrator:

curl -X POST http://localhost:5678/webhook/orchestrator \
  -H "Content-Type: application/json" \
  -H "X-Bridge-Secret: YOUR_BRIDGE_SECRET" \
  -d '{
    "action": "trigger",
    "module": "call-human",
    "data": {
      "reason": "Server CPU critical",
      "context": "CPU at 95% for 10 minutes. Load average 12.5.",
      "urgency": "high"
    }
  }'

The orchestrator reads the module registry, forwards the payload to the call-human webhook, which calls the voice-echo API with context. When the user picks up, Claude knows exactly what's happening.

Any n8n workflow can trigger calls by routing through the orchestrator. See specs/n8n-bridge-spec.md for the full specification.

Costs

Service	Free tier	Paid
Twilio	Trial credit (~$15)	~$1.15/mo number + per-minute
Groq	Free (rate-limited)	Usage-based
Inworld TTS	Free tier available	~$5/1M chars
Claude Code	Included with Max plan	Or API usage

For personal use with a few calls a day, the running cost is minimal beyond the Twilio number.

Troubleshooting

Twilio returns a 502 or "connection refused" Twilio can't reach your server. Verify nginx is running, your DNS points to the server, and the TLS certificate is valid. Test with curl -I https://your-server.example.com/health.

WebSocket closes immediately Check that nginx has WebSocket proxying enabled (the Upgrade and Connection headers in deploy/nginx.conf). Also check proxy_read_timeout — Twilio media streams are long-lived.

"Failed to load config" on startup The config file is missing or malformed. Run voice-echo --setup to generate it, or manually copy config.example.toml to ~/.voice-echo/config.toml.

Claude doesn't respond or times out Make sure the claude CLI is installed, in PATH, and authenticated. Run claude --version and claude "hello" manually to verify. If running as a systemd service, ensure the service user's PATH includes the Claude binary.

No audio / silence after speaking The VAD energy threshold may be too high for your microphone or phone quality. Lower vad.energy_threshold (try 30 or 20). Check RUST_LOG=voice_echo=debug for VAD activity logs.

TTS sounds robotic or uses the wrong voice Verify your inworld.voice_id is valid. Preview voices at the Inworld TTS Playground. You can also create custom voices in Inworld Studio.

Contributing

See CONTRIBUTING.md for branch naming, commit conventions, and workflow.

License

AGPL-3.0

Acknowledgments

Inspired by NetworkChuck's claude-phone. Rewritten from scratch in Rust with a different architecture -- no intermediate Node.js server, direct WebSocket pipeline, energy-based VAD, and an outbound call API for automation.

voice-echo 0.8.0