text2audio 0.1.1

# text2audio

[![Crates.io](https://img.shields.io/crates/v/text2audio)](https://crates.io/crates/text2audio)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Rust Version](https://img.shields.io/badge/rust-1.70%2B-orange.svg)](https://www.rust-lang.org)

A high-performance Rust library for converting text to audio files using Zhipu AI's GLM models, featuring intelligent text segmentation, parallel processing, and advanced audio merging capabilities.

## Features

- 🤖 **AI-Powered Text Segmentation** - Intelligent semantic text splitting using GLM models for natural-sounding audio
- 🎵 **Multiple Voice Options** - Support for 7 distinct voices with customizable speed and volume
- ⚡ **Parallel Processing** - Concurrent audio generation for improved performance on long texts
- 🔄 **Automatic Retry** - Built-in exponential backoff retry mechanism for robust API calls
- 🛠️ **Flexible Configuration** - Builder pattern API for intuitive customization
- 📦 **Zero Dependencies Audio Processing** - Built-in WAV audio merging without external tools
- 🎯 **Smart Modes** - Automatic direct conversion for short texts, segmented processing for long texts

## Supported AI Models

### Text Segmentation Models
Used for intelligent text splitting and semantic analysis:

- **GLM-4.7** - Latest flagship model with superior semantic understanding
- **GLM-4.6** - Advanced reasoning model for complex text analysis
- **GLM-4.5** - High-performance general-purpose model
- **GLM-4.5-Flash** - Optimized for speed (default)
- **GLM-4.5-Air** - Lightweight and cost-effective model

### Text-to-Speech Model
- **GLM-TTS** - Zhipu AI's dedicated text-to-speech model for high-quality audio generation

## Prerequisites

- **Rust** 1.70 or later
- **Zhipu AI API Key** - Get one from [Zhipu AI Platform](https://open.bigmodel.cn/)
- **Network Connection** - Required for API calls

### Environment Setup

```bash
export ZHIPU_API_KEY="your_api_key_here"
```

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
text2audio = "0.1.0"
tokio = { version = "1", features = ["full"] }
```

Basic usage:

```rust
use text2audio::Text2Audio;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("ZHIPU_API_KEY")?;
    let converter = Text2Audio::new(api_key);
    
    converter.convert("你好，世界！", "output.wav").await?;
    println!("Audio saved to output.wav");
    
    Ok(())
}
```

## Usage Examples

### 1. Basic Text to Audio

```rust
use text2audio::Text2Audio;

let converter = Text2Audio::new(&api_key);
converter.convert("Hello, world!", "hello.wav").await?;
```

### 2. Custom Voice and Speed

```rust
use text2audio::{Text2Audio, Voice};

let converter = Text2Audio::new(&api_key)
    .with_voice(Voice::Xiaochen)
    .with_speed(1.5)  // 50% faster
    .with_volume(2.0);  // Louder

converter.convert("加速版语音", "fast.wav").await?;
```

### 3. Long Text with AI Segmentation

```rust
use text2audio::{Text2Audio, Model};

let long_text = "非常长的文本...";
let converter = Text2Audio::new(&api_key)
    .with_model(Model::GLM4_7)  // Use best model for segmentation
    .with_max_segment_length(300)  // Shorter segments for better flow
    .with_thinking(true);  // Enable thinking mode

converter.convert(long_text, "long_audio.wav").await?;
```

### 4. Parallel Processing for Performance

```rust
use text2audio::{Text2Audio, Voice};

let converter = Text2Audio::new(&api_key)
    .with_voice(Voice::Tongtong)
    .with_parallel(5)  // Process up to 5 segments concurrently
    .with_retry_config(5, Duration::from_millis(200));

converter.convert(very_long_text, "output.wav").await?;
```

### 5. Using Builder Pattern

```rust
use text2audio::{Text2Audio, Model, Voice};
use std::time::Duration;

let converter = Text2Audio::builder(&api_key)
    .model(Model::GLM4_7)
    .voice(Voice::Tongtong)
    .speed(1.2)
    .volume(1.5)
    .max_segment_length(500)
    .parallel(3)
    .thinking(true)
    .retry_config(3, Duration::from_millis(100))
    .build();

converter.convert("优化的长文本", "narration.wav").await?;
```

## Configuration Reference

### Text2Audio Methods

| Method | Type | Range | Default | Description |
|--------|------|-------|---------|-------------|
| `with_model()` | `Model` | enum | `GLM4_5Flash` | AI model for text segmentation |
| `with_voice()` | `Voice` | enum | `Tongtong` | Voice selection for TTS |
| `with_speed()` | `f32` | 0.5 - 2.0 | `1.0` | Speech speed multiplier |
| `with_volume()` | `f32` | 0.0 - 10.0 | `1.0` | Audio volume level |
| `with_max_segment_length()` | `usize` | 100 - 1024 | `500` | Max characters per segment |
| `with_parallel()` | `usize` | 1 - 10 | disabled | Enable concurrent processing |
| `with_thinking()` | `bool` | true/false | `false` | Enable AI thinking mode |
| `with_coding_plan()` | `bool` | true/false | `false` | Use coding plan endpoint |
| `with_retry_config()` | `(u32, Duration)` | custom | `(3, 100ms)` | Retry attempts and delay |

### Voice Options

All voices are provided by Zhipu AI's TTS service:

- **`Voice::Tongtong`** (童童) - Default female voice, clear and natural
- **`Voice::Chuichui`** (锤锤) - Warm and friendly male voice
- **`Voice::Xiaochen`** (晓辰) - Professional narration voice
- **`Voice::Jam`** - Youthful and energetic voice
- **`Voice::Kazi`** - Deep and authoritative voice
- **`Voice::Douji`** (豆鸡) - Cute and playful voice
- **`Voice::Luodo`** - Mature and calm voice

### AI Models

Choose the appropriate model based on your needs:

- **GLM-4.7**: Best for long, complex texts requiring deep semantic understanding
- **GLM-4.6**: Good balance of quality and speed for most use cases
- **GLM-4.5**: Reliable general-purpose model
- **GLM-4.5-Flash**: Fastest processing, ideal for simple texts
- **GLM-4.5-Air**: Most cost-effective for high-volume processing

## Error Handling

The library provides detailed error types for robust error handling:

```rust
use text2audio::{Text2Audio, Error};

match converter.convert(text, "output.wav").await {
    Ok(_) => println!("✓ Conversion successful"),
    Err(Error::EmptyInput) => eprintln!("✗ Error: Input text is empty"),
    Err(Error::TtsApi(msg)) => eprintln!("✗ TTS API Error: {}", msg),
    Err(Error::AiApi(msg)) => eprintln!("✗ AI API Error: {}", msg),
    Err(Error::Audio(msg)) => eprintln!("✗ Audio Processing Error: {}", msg),
    Err(Error::Io(e)) => eprintln!("✗ File I/O Error: {}", e),
    Err(e) => eprintln!("✗ Unexpected Error: {}", e),
}
```

## Architecture

```
text2audio/
├── src/
│   ├── lib.rs           # Main API and Text2Audio struct
│   ├── client.rs        # Zhipu AI API client
│   ├── ai_splitter.rs   # AI-powered text segmentation
│   ├── audio_merger.rs  # WAV audio file merging
│   ├── config.rs        # Voice and configuration types
│   └── error.rs         # Error types and Result alias
├── examples/            # Usage examples
├── assets/              # Sample text files
└── target/              # Build output
```

### Workflow

1. **Input Validation**: Check if text is empty
2. **Length Detection**: 
   - Short text (≤ max_segment_length): Direct TTS conversion
   - Long text (> max_segment_length): AI-powered segmentation
3. **Text Segmentation**: AI model splits text at semantic boundaries
4. **Audio Generation**: 
   - Sequential: One segment at a time
   - Parallel: Multiple segments concurrently (if enabled)
5. **Audio Merging**: Combine all audio segments into final WAV file
6. **Retry Handling**: Automatic retry with exponential backoff on failures

## Running Examples

The project includes comprehensive examples demonstrating various features:

### Basic Example
```bash
cargo run --example simple
```
Converts a short Chinese text to audio using default settings.

### AI Segmentation Example
```bash
cargo run --example ai_splitter
```
Demonstrates AI-powered semantic segmentation for long texts.

### Custom Voice Example
```bash
cargo run --example custom_voice
```
Shows voice customization and parameter tuning.

### Parallel Processing Example
```bash
cargo run --example parallel
```
Illustrates concurrent audio generation for performance.

### File Input Example
```bash
cargo run --example from_file
```
Converts text from a file with optimized settings for long-form content.

### Direct AI Splitter Usage
```bash
cargo run --example ai_splitter
```
Demonstrates direct usage of the AiSplitter component.

## Performance Tips

1. **Choose the Right Model**: Use GLM-4.5-Flash for simple texts, GLM-4.7 for complex content
2. **Enable Parallel Processing**: Set `with_parallel(3-5)` for long texts to significantly reduce total time
3. **Optimize Segment Length**: 
   - 300-500 chars for narrative content
   - 800-1024 chars for technical content
4. **Adjust Retry Config**: Increase retries and delays for unstable networks
5. **Use Thinking Mode**: Enable for texts requiring deep semantic understanding

## Requirements

- **Minimum Rust Version**: 1.70.0
- **Dependencies**: tokio (async runtime), zai-rs (Zhipu AI client), hound (WAV handling)
- **Network**: Stable internet connection for API calls
- **API Key**: Valid Zhipu AI API key with TTS service enabled

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Areas for improvement:

- Additional audio format support (MP3, OGG)
- Custom voice training integration
- Local model inference support
- Batch processing utilities
- Audio post-processing effects

Please feel free to submit issues, feature requests, or pull requests.

## Acknowledgments

- [Zhipu AI](https://www.zhipu.ai/) - For providing the GLM models and TTS API
- [zai-rs](https://crates.io/crates/zai-rs) - Rust client for Zhipu AI API
- [hound](https://crates.io/crates/hound) - WAV audio format handling