stt-cli 0.2.1 - Docs.rs

# TTS-Groq: Architectural Overview

## High-Level Architecture

TTS-Groq is structured around a modular, event-driven architecture that separates concerns into distinct layers:

- **Application Layer:** Manages configuration, startup, shutdown, and orchestrates all components.
- **Audio Layer:** Handles device selection, audio capture, buffering, and streaming.
- **Transcription Layer:** Manages audio chunking, provider abstraction, and result handling.
- **Provider Layer:** Abstracts over different speech-to-text APIs (Groq, OpenAI, etc.).
- **Platform Layer:** Handles platform-specific text insertion and user interaction.
- **Shutdown/Hotkey Layer:** Manages graceful shutdown and hotkey-based activation.


## Component Diagram (Textual)

```
+-----------------+     +------------------+     +------------------+
|  Application    | --> |   Audio Layer    | --> | Transcription    |
|  (App)          |     | (Device, Buffer) |     | Layer            |
+-----------------+     +------------------+     +------------------+
        |                        |                        |
        v                        v                        v
+-----------------+     +------------------+     +------------------+
|  Hotkey/Shutdown|     | Provider Layer   |     | Platform Layer   |
+-----------------+     +------------------+     +------------------+
```

- **Arrows** indicate data/control flow.
- The Application layer coordinates all others.
- Audio Layer streams data to Transcription Layer, which interacts with Providers and sends results to Platform Layer.


## Data Flow
1. **Startup:** Application parses config, initializes logging, audio, providers, and platform handlers.
2. **Audio Capture:** Audio Layer captures microphone input, buffers, and chunks it.
3. **Transcription:** Chunks are sent (asynchronously) to the selected Provider.
4. **Result Handling:** Transcription results are passed to the Platform Layer for text insertion.
5. **User Interaction:** Hotkeys or always-on mode control when audio is processed.
6. **Shutdown:** Graceful shutdown is coordinated across all layers.


## Technology Choices
- **Rust + Tokio:** For safety, performance, and async orchestration.
- **CPAL:** For cross-platform audio device access.
- **Trait-based Abstractions:** For providers and platform handlers, enabling hexagonal architecture and testability.
- **Broadcast/MPSC Channels:** For decoupled, async task communication.


## Extensibility
- **Providers:** Add new transcription APIs by implementing the `TranscriptionProvider` trait.
- **Platforms:** Add new insertion methods by implementing the `TextInserter` trait.
- **Audio Devices:** Easily swap or extend device selection logic.

---

For module-level and function-level details, see [modules.md](modules.md).