stt-cli 0.1.2

Speech to text Cli using Groq API and OpenAI API
# Task: Enhance Audio Handling using cpal examples

Analyze files in `memories/ways_of_cpal` to find useful patterns for audio device reading, management, and streaming.

## Plan

- [X] List files in `memories/ways_of_cpal`
- [X] Identify `audio_001_source_sink.md` and `screenpipe_read_device.md` as key files.
- [X] Analyze `audio_001_source_sink.md` for audio input/output stream handling.
- [X] Analyze `screenpipe_read_device.md` for device enumeration, selection, and configuration.
- [X] Summarize findings and write plan to scratchpad.

## Key Findings

### `audio_001_source_sink.md`

- Demonstrates basic `cpal` usage for audio input (`AudioSource`) and output (`AudioSink`).
- Shows how to:
    - Get default host and devices (`cpal::default_host()`, `host.default_input_device()`, `host.default_output_device()`).
    - Build input/output streams (`device.build_input_stream`, `device.build_output_stream`).
    - Configure `StreamConfig` (channels, sample rate, buffer size).
    - Use `mpsc` channels for asynchronous data transfer between the audio callback and the main application logic.
    - Start stream playback (`stream.play()`).
    - Handle audio data buffers in the `work` function (or equivalent).

### `screenpipe_read_device.md`

- Provides more advanced device management logic.
- Shows how to:
    - Represent audio devices with names and types (`AudioDevice` struct).
    - Parse device names with type identifiers (e.g., "Mic (input)").
    - Enumerate available input and output devices (`host.input_devices()`, `host.output_devices()`).
    - Select devices by name or use the default.
    - Get default stream configurations (`device.default_input_config()`, `device.default_output_config()`).
    - Includes macOS-specific handling for `ScreenCaptureKit` to capture system audio output.
    - Implements a retry mechanism (`with_retry`) for potentially failing host initializations.

## Conclusion

These files provide useful examples for:
1.  **Setting up basic audio input/output streams:** Use patterns from `audio_001_source_sink.md`.
2.  **Enumerating and selecting specific audio devices:** Use patterns from `screenpipe_read_device.md`.
3.  **Handling platform specifics (like macOS system audio):** Refer to `screenpipe_read_device.md`.

---

# Task: Refactor Audio Implementation based on `screenpipe-audio`

Enhance the current audio input/output handling in `tts-groq` by adopting patterns observed in the `screenpipe-audio` example, focusing on robustness, modularity, and concurrency.

## Plan

- [ ] **Analyze `screenpipe-audio` Structure:**
    - [X] List files in `memories/ways_of_cpal/screenpipe-audio`.
    - [X] Examine `src/lib.rs`, `src/core/mod.rs`, `src/core/device.rs`, `src/core/stream.rs`.
    - [X] Identify key abstractions (`AudioStream`, `AudioDevice`) and patterns (async task spawning, broadcast channels, error handling).
- [ ] **Define `AudioStream` Abstraction (`src/audio/stream.rs`):**
    - [ ] Create the struct holding `cpal::Stream`, config, control channels, `broadcast::Sender<Vec<f32>>`, and state flags (`AtomicBool`).
    - [ ] Implement an async constructor (`from_device` or similar) that takes device info.
    - [ ] Implement `spawn_blocking` for the audio thread.
    - [ ] Implement `build_input_stream` (handling sample formats, mono conversion, sending to broadcast).
    - [ ] Implement `error_callback` (handling disconnections).
    - [ ] Implement `stop` method using control channels.
    - [X] Implement `subscribe` method.
    - [X] Add `bytemuck` dependency to `Cargo.toml`.
- [ ] **Refactor Device Management (`src/audio/device.rs` / `device_manager.rs`):**
    - [X] Define or refine `AudioDevice` struct.
    - [X] Consolidate device enumeration and selection logic (Added list/default methods to `AudioDevice`).
    - [X] Make `get_cpal_device_and_config` an instance method of `AudioDevice`.
    - [ ] Update callers (e.g., `stream.rs`) to use the new device structure. // Started
- [ ] **Integrate `AudioStream` into Application Logic (`src/app.rs` / `main.rs`):**
    - [ ] Replace old audio handling with `AudioStream::from_device`.
- [ ] **Update Audio Data Consumers:**
    - [ ] Modify existing audio playback logic (e.g., in `app.rs` or related modules) to `subscribe` to the `AudioStream`'s broadcast channel.
    - [ ] Adapt logic to handle incoming `Vec<f32>` chunks.
- [ ] **Integrate with State Management (`src/audio_state.rs`):**
    - [ ] Update `AudioState` to hold and manage the `AudioStream` instance.
    - [ ] Ensure proper starting and stopping of the `AudioStream` based on application state.
- [ ] **Refactor `src/audio_utils.rs`:**
    - [ ] Move relevant functions into the new modules (`stream.rs`, `device.rs`, `common.rs` if needed) or implement them as methods on the new structs.
    - [ ] Remove or deprecate the old utility functions.
- [ ] **Testing:**
    - [ ] Test audio input/output with the refactored implementation.
    - [ ] Test device selection and error handling (if possible, simulate disconnection).

--------

more analysis 


ALL files we need to modify, with their relevant symbols:

 1 memories/ways_of_cpal/screenpipe-audio/src/core/device.rs
    • list_audio_devices() - Core device enumeration logic
    • get_cpal_device_and_config() - Device initialization
    • parse_audio_device() - Device name validation
    • should_include_output_device() - macOS filtering
    • AudioDevice struct definition
 2 memories/ways_of_cpal/screenpipe-audio/src/core/stream.rs
    • AudioStream struct - Stream lifecycle management
    • spawn_audio_thread() - Stream initialization
    • create_error_callback() - Connection resilience
    • StreamControl enum - Stream command handling


Missing File Recommendation:

 • src/audio_state.rs (already added)
    • RecordingState struct - Critical for state synchronization
    • toggle() method - Hotkey integration


Why This Set Works:

 1 device.rs contains all device detection/initialization primitives
 2 stream.rs handles low-level audio streaming mechanics
 3 audio_state.rs manages recording state synchronization

No need for other files shown unless adding new transcription features. Focus on these three files for core audio pipeline improvements.




Here's how to leverage the files for audio capture/streaming improvements through key architectural patterns:

1. Device Management Enhancements (device.rs)

 • Hot-Swap Detection: Use list_audio_devices() polling + CPAL's native events for instant device change notifications
 • Capability Validation: Extend get_cpal_device_and_config() to validate sample rate/format compatibility with transcription requirements
 • Priority Stack: Implement device scoring in parse_audio_device() based on:
    • Input latency measurements
    • Supported buffer sizes
    • Historical reliability metrics

2. Stream Resilience Patterns (stream.rs)

 • Error Recovery Pipeline:

   graph LR
     Error[Stream Error] --> Callback[create_error_callback]
     Callback --> State[Update RecordingState.active]
     State --> Retry{Retry Policy}
     Retry -->|Immediate| Reinit[Reinitialize AudioStream]
     Retry -->|Backoff| Fallback[Switch to secondary device]

 • Quality-of-Service Monitoring:
    • Track buffer underruns via transmitter timing
    • Monitor sample gaps using stream timestamps
    • Implement quality degradation fallbacks

3. State Synchronization (audio_state.rs)

 • Atomic State Bridge:

   [Stream Thread] ↔ RecordingState.active ↔ [GUI Controls]
             ⬑⇵⬏
     [Error Callback] → [State Rollback]

 • Triple Buffering Pattern for audio data:
    1 Stream thread fills buffer
    2 Processing thread analyzes
    3 Transcription thread consumes

4. Critical Enhancement Path

 1 First: Harden create_error_callback() with reconnect state machine
 2 Then: Add device capability database in device.rs
 3 Finally: Implement QoS-aware buffer management in spawn_audio_thread()

Key Cross-File Dependencies to Manage:

 • Device ↔ Stream: Sample rate validation must sync between device.get_config() and stream initialization
 • State ↔ Error Handling: Error recovery must atomically update both RecordingState and DeviceManager states
 • Buffer ↔ Transcription: Audio chunk sizing must match transcription model's windowing requirements




Alright, let's go through the user's request step by step. The user wants to improve their implementation of audio device detection using CPAL and manage recording via those
devices. They've provided several files and are asking if the current set is correct or if adjustments are needed.

First, I'll review the listed files to determine their relevance to audio device detection and streaming management. The key areas to focus on are device enumeration, stream
initialization, error handling, state management, and configuration.

 1 core/device.rs: This file contains list_audio_devices, get_cpal_device_and_config, and parse_audio_device, which are directly related to device detection and
   initialization. These are crucial for any improvements in device handling.
 2 core/stream.rs: The AudioStream struct and related functions like spawn_audio_thread and create_error_callback manage the audio stream lifecycle and error handling.
   Essential for streaming stability.
 3 audio_state.rs: The RecordingState struct handles atomic state synchronization, which is vital for managing recording states across threads. Necessary for state
   consistency.
 4 config.rs: AppConfig and TranscriptionMode control runtime parameters and recording triggers. Important for configuration-driven behavior but might not need modification
   unless configuration changes are planned.
 5 audio/commands.rs: CpalCommand and conversion functions translate high-level commands to stream operations. Relevant if command handling needs adjustment.

Files like speaker/embedding.rs, transcription/deepgram/mod.rs, and utils/ffmpeg.rs are related to post-processing and output, which aren't directly involved in device
detection or streaming. These can be excluded unless the user's enhancements touch on those areas.

The user might be missing device_manager.rs (referenced in mod.rs), which could centralize device lifecycle management. However, since it's not provided, focusing on existing
files is safer.

Conclusion: The current set is mostly adequate, but removing unrelated files sharpens focus. If the user plans to implement failover or connection pooling, adding
device_manager.rs would be necessary. For now, proceed with the refined list.

**Plan:**
1.  [X] Refactor `src/audio_utils.rs` to use `crate::audio::device::AudioDevice`.
2.  [X] Remove old CPAL thread fields and logic from `App` struct, `App::new`, and `initialize_audio_device`.
3.  [X] Adapt `initialize_audio_device` to work with `AudioDevice` and display capabilities.
4.  [X] Instantiate `AudioStream` in `App::run` using the selected device and initial state.
    *   [X] Add fields `selected_audio_device` and `audio_stream` to `App`.
    *   [X] Fix minor lints in `app.rs` and `audio_utils.rs`.
    *   [X] Call `AudioStream::from_device` in `App::run`.
    *   [X] Store the created `AudioStream` in `self.audio_stream`.
5.  [X] Adapt `start_bridge_task` to subscribe to `AudioStream`'s `data_rx`.
    *   [X] Get the `broadcast::Receiver` from `self.audio_stream`.
    *   [X] Pass the receiver to `start_bridge_task`.
    *   [X] Modify `start_bridge_task` to use the `broadcast::Receiver` instead of the old `futures::channel::mpsc::Receiver`.
    *   [X] Fix subsequent syntax error in `start_processing_task`.
6.  [X] Adapt `start_tasks` to properly manage the bridge task.
7.  [X] Adapt hotkey handling (`setup_hotkeys` / handler) to use `RecordingState`.
    *   [X] Update `setup_hotkeys` to call `recording_state.toggle()`.
    *   [X] Remove `audio_command_sender` from `App`.
    *   [X] Remove `command_sender` from `AudioDeviceManager` and fix constructor call.
8.  [ ] Adapt shutdown logic (`register_shutdown_handlers`) to stop the `AudioStream`.
9.  [ ] Address remaining lints and perform testing.

**Status:**
*   `audio_utils.rs` refactored.
*   `app.rs` refactored (old CPAL logic removed, `App` struct updated, `AudioStream` instantiated, bridge task adapted, hotkey handling adapted).
*   `device_manager.rs` updated (removed command sender).
*   Minor lints fixed, compilation error fixed.
*   Need to adapt shutdown logic.

**Next Step:** Adapt shutdown logic (`register_shutdown_handlers`) to stop the `AudioStream`.



Future roadmap

- [ ] global hotkey (ctrl+Space) to trigger the transcription
- [ ] auto copy to clipboard mode 
- [ ] interactive cli to choose transcription provider, hotkey, audio device