Expand description
This library is a wrapper of MSEdge Read aloud function API. You can use it to synthesize text to speech with many voices MS provided.
§How to use
-
You need get a SpeechConfig to configure the voice of text to speech.
You can convert Voice to SpeechConfig simply. Use get_voices_list function to get all available voices.
Voice and SpeechConfig implemented serde::Serialize and serde::Deserialize.
For example:use msedge_tts::voice::get_voices_list; use msedge_tts::tts::SpeechConfig; fn main() { let voices = get_voices_list().unwrap(); let speechConfig = SpeechConfig::from(&voices[0]); }
You can also create SpeechConfig by yourself. Make sure you know the right voice name and audio format.
-
Create a TTS Client or Stream. Both of them have sync and async version. Example below step 3.
-
Synthesize text to speech.
§Sync Client
Call client function synthesize to synthesize text to speech. This function return Type SynthesizedAudio, you can get audio_bytes and audio_metadata.
use msedge_tts::{tts::client::connect, tts::SpeechConfig, voice::get_voices_list}; fn main() { let voices = get_voices_list().unwrap(); for voice in &voices { if voice.name.contains("YunyangNeural") { let config = SpeechConfig::from(voice); let mut tts = connect().unwrap(); let audio = tts .synthesize("Hello, World! 你好,世界!", &config) .unwrap(); break; } } }
§Async Client
Call client function synthesize to synthesize text to speech. This function return Type SynthesizedAudio, you can get audio_bytes and audio_metadata.
use msedge_tts::{tts::client::connect_async, tts::SpeechConfig, voice::get_voices_list_async}; fn main() { smol::block_on(async { let voices = get_voices_list_async().await.unwrap(); for voice in &voices { if voice.name.contains("YunyangNeural") { let config = SpeechConfig::from(voice); let mut tts = connect_async().await.unwrap(); let audio = tts .synthesize("Hello, World! 你好,世界!", &config) .await .unwrap(); break; } } }); }
§Sync Stream
Call Sender Stream function send to synthesize text to speech. Call Reader Stream function read to get data.
read return Option<SynthesizedResponse>, the response may be AudioBytes or AudioMetadata or None. This is because the MSEdge Read aloud API returns multiple data segment and metadata and other information sequentially.Caution: One send corresponds to multiple read. Next send call will block until there no data to read. read will block before you call a send.
use msedge_tts::{ tts::stream::{msedge_tts_split, SynthesizedResponse}, tts::SpeechConfig, voice::get_voices_list, }; use std::{ sync::{ atomic::{AtomicBool, Ordering}, Arc, }, thread::spawn, }; fn main() { let voices = get_voices_list().unwrap(); for voice in &voices { if voice.name.contains("YunyangNeural") { let config = SpeechConfig::from(voice); let (mut sender, mut reader) = msedge_tts_split().unwrap(); let signal = Arc::new(AtomicBool::new(false)); let end = signal.clone(); spawn(move || { sender.send("Hello, World! 你好,世界!", &config).unwrap(); println!("synthesizing...1"); sender.send("Hello, World! 你好,世界!", &config).unwrap(); println!("synthesizing...2"); sender.send("Hello, World! 你好,世界!", &config).unwrap(); println!("synthesizing...3"); sender.send("Hello, World! 你好,世界!", &config).unwrap(); println!("synthesizing...4"); end.store(true, Ordering::Relaxed); }); loop { if signal.load(Ordering::Relaxed) && !reader.can_read() { break; } let audio = reader.read().unwrap(); if let Some(audio) = audio { match audio { SynthesizedResponse::AudioBytes(_) => { println!("read bytes") } SynthesizedResponse::AudioMetadata(_) => { println!("read metadata") } } } else { println!("read None"); } } } } }
§Async Stream
Call Sender Async function send to synthesize text to speech. Call Reader Async function read to get data. read return Option<SynthesizedResponse> as above. send and read block as above.
use msedge_tts::{ tts::{ stream::{msedge_tts_split_async, SynthesizedResponse}, SpeechConfig, }, voice::get_voices_list_async, }; use std::{ sync::{ atomic::{AtomicBool, Ordering}, Arc, }, }; fn main() { smol::block_on(async { let voices = get_voices_list_async().await.unwrap(); for voice in &voices { if voice.name.contains("YunyangNeural") { let config = SpeechConfig::from(voice); let (mut sender, mut reader) = msedge_tts_split_async().await.unwrap(); let signal = Arc::new(AtomicBool::new(false)); let end = signal.clone(); smol::spawn(async move { sender .send("Hello, World! 你好,世界!", &config) .await .unwrap(); println!("synthesizing...1"); sender .send("Hello, World! 你好,世界!", &config) .await .unwrap(); println!("synthesizing...2"); sender .send("Hello, World! 你好,世界!", &config) .await .unwrap(); println!("synthesizing...3"); sender .send("Hello, World! 你好,世界!", &config) .await .unwrap(); println!("synthesizing...4"); end.store(true, Ordering::Relaxed); }) .detach(); loop { if signal.load(Ordering::Relaxed) && !reader.can_read().await { break; } let audio = reader.read().await.unwrap(); if let Some(audio) = audio { match audio { SynthesizedResponse::AudioBytes(_) => { println!("read bytes") } SynthesizedResponse::AudioMetadata(_) => { println!("read metadata") } } } else { println!("read None"); } } } } }); }