Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
yo-esp
yo-esp is a bare‑metal, no_std audio streaming client for the yo voice assistant.
It runs on the ESP32‑S3 and provides everything needed to capture microphone audio, stream it to a backend server, receive and play back TTS/audio, and react to server‑side wake‑word detection, speech‑to‑text, intent recognition and command execution.
What it does
- Streams 16 kHz mono audio from an I²S microphone (using the ES7210 ADC driver) to a remote wake‑word / STT / intent server over TCP.
- Receives audio from the server (TTS, server-side files, HTTP/HTTPS, playlists) and plays it through the I²S speaker (ES8311 DAC).
- Dispatches callbacks for wake‑word detected, thinking, command executed, and command failed events.
- Includes built‑in “ding”, “done” and “fail” notification sounds.
- Completely
no_std, runs on bare metal withembassy‑netandesp‑hal.
Installation
Add yo-esp as a dependency in Cargo.toml.
[]
= "0.1.3"
You will also need a compatible network stack (embassy-net), an I²S driver (esp-hal), and the codec drivers (es7210, es8311).
Example Cargo.toml:
[]
= "0.1.3"
= { = "0.5", = ["tcp", "udp", "dhcpv4", "dns"] }
= { = "0.22", = ["async", "esp32s3"] }
= "0.1.0"
= "0.1.0"
= "1.0"
= "0.3"
Example usage
A minimal main.rs that sets up Wi‑Fi, the network stack, I²S, the codecs, and spawns the yo-esp tasks:
use ;
// Implement your own callbacks
;
async !
[!NOTE] A complete, runnable example can be found in the
ESP32‑S3‑BOX‑3-rsrepository.
API overview
CommandHandler trait
| Method | Server byte | Meaning |
|---|---|---|
on_detected() |
0x01 |
Wake word detected |
on_thinking() |
0x02 |
Speech‑to‑text / intent processing has begun |
on_executed(elapsed_ms) |
0x03 |
Command was executed successfully |
on_failed(elapsed_ms) |
0x04 |
Command execution failed |
Tasks
| Task | Description |
|---|---|
audio_capture_task(i2s_rx, stack, remote_addr, room, handler) |
Streams microphone audio to the server and dispatches CommandHandler callbacks. |
speaker_task(transfer) |
Pumps audio data from an internal ring buffer (PIPE) to the I²S DAC. |
stream_speaker(stack, listen_port) |
Accepts a TCP connection on listen_port and writes incoming audio into the ring buffer. |
Sound helpers
| Function | Plays |
|---|---|
play_ding() |
The “ding” notification sound |
play_done() |
The “done” success sound |
play_fail() |
The “fail” error sound |
You can also push arbitrary audio using play(data: &[u8]).
[!NOTE]
I included a helper script for streaming various audio types to the ESP32‑S3
Supports streaming desktop microphone to ESP32-S3 for intercom mode.
You will find helper at:examples/esp-play.sh
Hardware / platform requirements
-
ESP32‑S3 (the library uses esp‑hal I²S and DMA).
-
ES7210 quad ADC for microphone input (also works with other I²S microphones; adapt the codec driver).
-
ES8311 codec for speaker output.
-
Wi‑Fi connectivity through embassy‑net + esp‑radio.
-
embassy‑executorfor async tasks.
Architecture
┌─────────────────────────────┐
│ yo-esp (ESP32‑S3) │
Microphone ──────┤ I²S RX ──► audio_capture_task│── TCP ──► yo server (STT, TTS, intent)
│ │
Speaker ◄──┤ I²S TX ◄── speaker_task │◄─ TCP ─── (Any audio)
│ stream_speaker │
└─────────────────────────────┘
audio_capture_taskreads I²S, converts to monof32, buffers into chunks of1280samples (matching the wake‑word model), and sends them to the server.- The server replies with a single byte per chunk to signal wake‑word detection / status events.
stream_speakerreceives raw PCM data over TCP and pushes it into a lock‑free pipe.speaker_taskdequeues from that pipe and writes it to the I²S TX DMA.
☕
🦆🧑🦯 says ⮞ Hi! I'm QuackHack-McBlindy!
Like my work?
Buy me a coffee, or become a sponsor.
Thanks for supporting open source/hungry developers ♥️🦆!
♥️₿ Wallet: pungkula.x
License
This project is licensed under the terms of the MIT license.
See the LICENSE file in the repository for full details.