yo-esp 0.1.3

ESP32-S3 client for the yo voice assistant framework (no_std + alloc)
docs.rs failed to build yo-esp-0.1.3
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

yo-esp

Sponsors Buy Me a Coffee

yo-esp is a bare‑metal, no_std audio streaming client for the yo voice assistant.
It runs on the ESP32‑S3 and provides everything needed to capture microphone audio, stream it to a backend server, receive and play back TTS/audio, and react to server‑side wake‑word detection, speech‑to‑text, intent recognition and command execution.

What it does

  • Streams 16 kHz mono audio from an I²S microphone (using the ES7210 ADC driver) to a remote wake‑word / STT / intent server over TCP.
  • Receives audio from the server (TTS, server-side files, HTTP/HTTPS, playlists) and plays it through the I²S speaker (ES8311 DAC).
  • Dispatches callbacks for wake‑word detected, thinking, command executed, and command failed events.
  • Includes built‑in “ding”, “done” and “fail” notification sounds.
  • Completely no_std, runs on bare metal with embassy‑net and esp‑hal.

Installation

Add yo-esp as a dependency in Cargo.toml.

[dependencies]
yo-esp = "0.1.3"

You will also need a compatible network stack (embassy-net), an I²S driver (esp-hal), and the codec drivers (es7210, es8311).
Example Cargo.toml:

[dependencies]
yo-esp = "0.1.3"
embassy-net = { version = "0.5", features = ["tcp", "udp", "dhcpv4", "dns"] }
esp-hal = { version = "0.22", features = ["async", "esp32s3"] }
es7210 = "0.1.0"
es8311 = "0.1.0"
embedded-hal = "1.0"
defmt = "0.3"

Example usage

A minimal main.rs that sets up Wi‑Fi, the network stack, I²S, the codecs, and spawns the yo-esp tasks:

#![no_std]
#![no_main]

use yo_esp::{audio_capture_task, speaker_task, stream_speaker, CommandHandler, play_ding, play_done, play_fail};

// Implement your own callbacks
struct VoiceHandler;

impl CommandHandler for VoiceHandler {
    fn on_detected(&mut self) -> core::pin::Pin<alloc::boxed::Box<dyn core::future::Future<Output = ()> + '_>> {
        Box::pin(async {
            // Wake word heard – play a ding and turn on the display
            play_ding().await;
            // crate::components::display::brightness_set("70");
        })
    }

    fn on_thinking(&mut self) -> core::pin::Pin<alloc::boxed::Box<dyn core::future::Future<Output = ()> + '_>> {
        Box::pin(async {
            // Server is processing speech
        })
    }

    fn on_executed(&mut self, ms: Option<u64>) -> core::pin::Pin<alloc::boxed::Box<dyn core::future::Future<Output = ()> + '_>> {
        Box::pin(async move {
            // Command executed successfully
            play_done().await;
        })
    }

    fn on_failed(&mut self, ms: Option<u64>) -> core::pin::Pin<alloc::boxed::Box<dyn core::future::Future<Output = ()> + '_>> {
        Box::pin(async move {
            // Command failed
            play_fail().await;
        })
    }
}

#[esp_rtos::main]
async fn main(spawner: embassy_executor::Spawner) -> ! {
    // ... (set up Wi‑Fi, I²C, I²S, codecs; see the full example in the repository) ...

    let handler: alloc::boxed::Box<dyn CommandHandler> = alloc::boxed::Box::new(VoiceHandler);

    // Start the speaker DMA pump
    spawner.spawn(speaker_task(i2s_tx_transfer)).ok();
    // Route TCP 12345 to the speaker
    spawner.spawn(stream_speaker(stack, 12345)).ok();
    // Route TCP 12345 from microphone to server
    // A bidirectional connection is established. 
    spawner.spawn(audio_capture_task(i2s_rx, stack, remote_addr, "esp", handler)).ok();

    loop {
        embassy_time::Timer::after(embassy_time::Duration::from_secs(60)).await;
    }
}

[!NOTE] A complete, runnable example can be found in the ESP32‑S3‑BOX‑3-rs repository.

API overview

CommandHandler trait

Method Server byte Meaning
on_detected() 0x01 Wake word detected
on_thinking() 0x02 Speech‑to‑text / intent processing has begun
on_executed(elapsed_ms) 0x03 Command was executed successfully
on_failed(elapsed_ms) 0x04 Command execution failed

Tasks

Task Description
audio_capture_task(i2s_rx, stack, remote_addr, room, handler) Streams microphone audio to the server and dispatches CommandHandler callbacks.
speaker_task(transfer) Pumps audio data from an internal ring buffer (PIPE) to the I²S DAC.
stream_speaker(stack, listen_port) Accepts a TCP connection on listen_port and writes incoming audio into the ring buffer.

Sound helpers

Function Plays
play_ding() The “ding” notification sound
play_done() The “done” success sound
play_fail() The “fail” error sound

You can also push arbitrary audio using play(data: &[u8]).

[!NOTE]
I included a helper script for streaming various audio types to the ESP32‑S3
Supports streaming desktop microphone to ESP32-S3 for intercom mode.
You will find helper at: examples/esp-play.sh

Hardware / platform requirements

  • ESP32‑S3 (the library uses esp‑hal I²S and DMA).

  • ES7210 quad ADC for microphone input (also works with other I²S microphones; adapt the codec driver).

  • ES8311 codec for speaker output.

  • Wi‑Fi connectivity through embassy‑net + esp‑radio.

  • embassy‑executor for async tasks.

Architecture

                  ┌─────────────────────────────┐
                  │     yo-esp (ESP32‑S3)        │
 Microphone ──────┤ I²S RX ──► audio_capture_task│── TCP ──► yo server (STT, TTS, intent)
                  │                              │
       Speaker ◄──┤ I²S TX ◄── speaker_task      │◄─ TCP ─── (Any audio)
                  │          stream_speaker      │
                  └─────────────────────────────┘
  • audio_capture_task reads I²S, converts to mono f32, buffers into chunks of 1280 samples (matching the wake‑word model), and sends them to the server.
  • The server replies with a single byte per chunk to signal wake‑word detection / status events.
  • stream_speaker receives raw PCM data over TCP and pushes it into a lock‑free pipe.
  • speaker_task dequeues from that pipe and writes it to the I²S TX DMA.

Sponsors Buy Me a Coffee

🦆🧑‍🦯 says ⮞ Hi! I'm QuackHack-McBlindy!
Like my work?
Buy me a coffee, or become a sponsor.
Thanks for supporting open source/hungry developers ♥️🦆!

♥️₿ Wallet: pungkula.x

License

This project is licensed under the terms of the MIT license.
See the LICENSE file in the repository for full details.