elevenlabs_stt

A type-safe, async Rust client for the ElevenLabs Speech-to-Text API. Transcribe audio and videos to text with a simple, ergonomic API.

Features

Type-safe & Async: Built with Rust's type system and async/await support
Builder Pattern: Intuitive, chainable API for configuring STT requests
Model Support: Full support for ElevenLabs models (models::elevenlabs_models::*)
Customizable: Elevanlabs STT APIs, custom base URLs, and enterprise support
Tokio Ready: Works seamlessly with the Tokio runtime
Audio & Video: Works with audios and videos, up to 3.0GB

Check-out Also:

This project is part of a milestone to implement all ElevenLabs APIs in Rust.

Elevenlabs TTS: ElevenLabs Text-to-Speech API. ✅
Elevenlabs STT: ElevenLabs Speech-to-Text API. ✅
Elevenlabs TTD: ElevenLabs Text-to-Dialogue API. ✅
Elevenlabs TTV: ElevenLabs Text-to-Voice API. ✅
Elevenlabs TTM: ElevenLabs Text-to-Music API. ✅
Elevenlabs SFX: ElevenLabs Sound Effects API. ✅
Elevenlabs VC: ElevenLabs Voice Changer API. ✅
Elevenlabs AUI: ElevenLabs Audio Isolation API. ⏳
Elevenlabs DUB: ElevenLabs Dubbing API. ⏳

Installation

Add this to your Cargo.toml:

[dependencies]
elevenlabs_stt = "0.0.5"

Quick Start

use elevenlabs_stt::{ElevenLabsSTTClient, STTResponse};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ElevenLabsSTTClient::new("your-api-key");

    let file_path = "inputs/speech.mp3";
    let file_content = std::fs::read(file_path)?;

    let stt_reponse: STTResponse = client.speech_to_text(file_content).execute().await?;

    println!("Results: {:?}", stt_reponse);
    Ok(())
}

Examples

Basic Usage

use elevenlabs_stt::{ElevenLabsSTTClient, STTResponse, models, voices};
use std::env;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key =
        env::var("ELEVENLABS_API_KEY").expect("Please set ELEVENLABS_API_KEY environment variable");

    let client = ElevenLabsSTTClient::new(api_key);

    let file_path = "inputs/speech.mp3";
    let file_content = std::fs::read(file_path)?;

    let stt_reponse: STTResponse = client.speech_to_text(file_content).execute().await?;
    println!("Results: {:?}", stt_reponse);
    Ok(())
}

Advanced Configuration

use elevenlabs_stt::{ElevenLabsSTTClient, STTResponse, models, voices};
use std::env;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key =
        env::var("ELEVENLABS_API_KEY").expect("Please set ELEVENLABS_API_KEY environment variable");

    let client = ElevenLabsSTTClient::new(api_key);

    let file_path = "inputs/speech.mp3";
    let file_content = std::fs::read(file_path)?;

    let stt_reponse: STTResponse = client
        .speech_to_text(file_content)
        .model(models::elevanlabs_models::SCRIBE_V1)
        .language_code("en")
        .tag_audio_events(true)
        .timestamps_granularity("word")
        .diarize(true)
        .diarization_threshold(0.22)
        .webhook(false)
        .webhook(false)
        .temperature(0.2)
        .seed(4000)
        .use_multi_channel(false)
        .execute()
        .await?;

    println!("Results: {:?}", stt_reponse);

    Ok(())
}

Running Examples

# Set your API key
export ELEVENLABS_API_KEY=your_api_key_here

# Run the basic example
cargo run --example basic_stt

# Run the advanced example
cargo run --example advanced_stt

API Overview

Method	Description
`ElevenLabsSTTClient::new(String)`	Create client instance (required)*
`.speech_to_text(Option<Vec<u8>>)`	Build a STT request, (File or `cloud_storage_url`) (required)*
`.model(String)`	Select model (optional)
`.language_code(String)`	Force language pronounce/accent only (no translation) (optional)
`.tag_audio_events(bool)`	Tag audio events like (laughter), (footsteps), etc. (optional)
`.num_speakers(u32)`	The max amount of speakers talking in the uploaded file. (optional)
`.timestamps_granularity(String)`	Allowed values: none, word, character. Defaults to word. (optional)
`.diarize(bool)`	Which speaker is currently talking in the uploaded file. (optional)
`.diarization_threshold(f32)`	Can only be set when diarize=True and num_speakers=None. (optional)
`.cloud_storage_url(String)`	URL of the file to transcribe, if this is None, you must provide `file`. (optional)
`.webhook(bool)`	Send the transcription result to configured speech-to-text webhooks. (optional)
`.webhook_id(String)`	Optional specific webhook ID to send the transcription result to. (optional)
`.temperature(f32)`	Controls the randomness of the transcription output, between 0.0 and 2.0 (optional)
`.seed(u32)`	Our system will make a best effort to sample deterministically (optional)
`.use_multi_channel(bool)`	Whether the audio file contains multiple channels (optional)
`.webhook_metadata(String)`	Optional metadata to be included in the webhook response (optional)
`.execute()`	Run request → transcribe file (required)*

Error Handling

The crate uses standard Rust error handling patterns. All async methods return Result types:

match client.speech_to_text(file).execute().await {
    Ok(result) => println!("Transcribed text from file: {}", result.text),
    Err(e) => eprintln!("STT transcription failed: {}", e),
}

Requirements

Rust 1.70+ (for async/await support)
Tokio runtime
Valid ElevenLabs API key

License

Licensed under either of:

at your option.

Contributing

Contributions are welcome! Please feel free to:

Open issues for bugs or feature requests
Submit pull requests with improvements
Improve documentation or examples
Add tests or benchmarks

Before contributing, please ensure your code follows Rust conventions and includes appropriate tests.

Support

If you like this project, consider supporting me on Patreon 💖

Changelog

See CHANGELOG.md for a detailed history of changes.

Note: This crate is not officially affiliated with ElevenLabs. Please refer to the ElevenLabs API documentation for the most up-to-date API information.

elevenlabs_stt 0.0.5