adk-rust-mcp-speech 0.5.0

MCP server for text-to-speech using Cloud TTS Chirp3-HD
Documentation

adk-rust-mcp-speech

MCP server for text-to-speech with Chirp3-HD. Part of the ADK Rust MCP toolkit.

Overview

High-quality text-to-speech server using Google Cloud TTS Chirp3-HD voices. Supports 30 voices, 70+ languages, custom pronunciations via IPA/X-SAMPA, and fine-grained control over speaking rate and pitch.

Currently implemented: Google Cloud TTS (Chirp3-HD)

Features

  • Chirp3-HD Voices — 30 high-quality voices with natural prosody
  • 70+ Languages — Automatic language detection with BCP-47 codes
  • Custom Pronunciations — IPA and X-SAMPA phonetic alphabets
  • Speech Control — Adjustable speaking rate (0.25-4.0x) and pitch (-20 to +20 semitones)
  • WAV Output — High-quality uncompressed audio
  • Flexible Output — Return base64 or save to local file
  • Dual API — Works with Gemini API key or Vertex AI ADC

Installation

cargo install adk-rust-mcp-speech

Configuration

# Option 1: Gemini API (recommended for getting started)
export GEMINI_API_KEY=your-api-key

# Option 2: Vertex AI (for production/enterprise)
export PROJECT_ID=your-gcp-project
export LOCATION=us-central1

For Vertex AI, enable the Cloud TTS API:

gcloud services enable texttospeech.googleapis.com --project=your-project

Tools

speech_synthesize

Convert text to speech using Chirp3-HD voices.

Parameter Type Required Default Description
text string Yes Text to synthesize
voice string No en-US-Chirp3-HD-Achernar Voice name
language_code string No en-US BCP-47 language code
speaking_rate float No 1.0 Speed (0.25-4.0)
pitch float No 0.0 Pitch in semitones (-20 to +20)
pronunciations array No Custom pronunciations
output_file string No Save WAV to local path

Pronunciation entry:

Field Type Description
word string Word to customize
phonetic string Phonetic representation
alphabet string "ipa" or "x-sampa"

speech_list_voices

List all available Chirp3-HD voices with their supported languages.

Usage Examples

# Stdio (default) — for Claude Desktop, Kiro
adk-rust-mcp-speech

# HTTP — for web apps, ADK agents
adk-rust-mcp-speech --transport http --port 8080

# SSE — for streaming applications
adk-rust-mcp-speech --transport sse --port 8080

Basic synthesis

text: "Hello, welcome to the ADK Rust MCP toolkit."
voice: "en-US-Chirp3-HD-Achernar"
output_file: "greeting.wav"

Custom pronunciation

text: "The GIF format was created by CompuServe."
pronunciations: [{"word": "GIF", "phonetic": "dʒɪf", "alphabet": "ipa"}]
output_file: "pronunciation_demo.wav"

Multilingual

text: "Bonjour, comment allez-vous aujourd'hui?"
language_code: "fr-FR"
output_file: "french_greeting.wav"

Output Specs

Format Sample Rate Channels Encoding
WAV 24kHz Mono 16-bit PCM

License

Apache-2.0