adk-rust-mcp-speech
MCP server for text-to-speech with Chirp3-HD. Part of the ADK Rust MCP toolkit.
Overview
High-quality text-to-speech server using Google Cloud TTS Chirp3-HD voices. Supports 30 voices, 70+ languages, custom pronunciations via IPA/X-SAMPA, and fine-grained control over speaking rate and pitch.
Currently implemented: Google Cloud TTS (Chirp3-HD)
Features
- Chirp3-HD Voices — 30 high-quality voices with natural prosody
- 70+ Languages — Automatic language detection with BCP-47 codes
- Custom Pronunciations — IPA and X-SAMPA phonetic alphabets
- Speech Control — Adjustable speaking rate (0.25-4.0x) and pitch (-20 to +20 semitones)
- WAV Output — High-quality uncompressed audio
- Flexible Output — Return base64 or save to local file
- Dual API — Works with Gemini API key or Vertex AI ADC
Installation
Configuration
# Option 1: Gemini API (recommended for getting started)
# Option 2: Vertex AI (for production/enterprise)
For Vertex AI, enable the Cloud TTS API:
Tools
speech_synthesize
Convert text to speech using Chirp3-HD voices.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | Yes | — | Text to synthesize |
voice |
string | No | en-US-Chirp3-HD-Achernar |
Voice name |
language_code |
string | No | en-US |
BCP-47 language code |
speaking_rate |
float | No | 1.0 | Speed (0.25-4.0) |
pitch |
float | No | 0.0 | Pitch in semitones (-20 to +20) |
pronunciations |
array | No | — | Custom pronunciations |
output_file |
string | No | — | Save WAV to local path |
Pronunciation entry:
| Field | Type | Description |
|---|---|---|
word |
string | Word to customize |
phonetic |
string | Phonetic representation |
alphabet |
string | "ipa" or "x-sampa" |
speech_list_voices
List all available Chirp3-HD voices with their supported languages.
Usage Examples
# Stdio (default) — for Claude Desktop, Kiro
# HTTP — for web apps, ADK agents
# SSE — for streaming applications
Basic synthesis
text: "Hello, welcome to the ADK Rust MCP toolkit."
voice: "en-US-Chirp3-HD-Achernar"
output_file: "greeting.wav"
Custom pronunciation
text: "The GIF format was created by CompuServe."
pronunciations: [{"word": "GIF", "phonetic": "dʒɪf", "alphabet": "ipa"}]
output_file: "pronunciation_demo.wav"
Multilingual
text: "Bonjour, comment allez-vous aujourd'hui?"
language_code: "fr-FR"
output_file: "french_greeting.wav"
Output Specs
| Format | Sample Rate | Channels | Encoding |
|---|---|---|---|
| WAV | 24kHz | Mono | 16-bit PCM |
License
Apache-2.0