# AGENTS.md
Guide for AI coding agents working on psyche-subtitle-toolkit.
## Quick Reference
```sh
cargo check # compilation check
cargo test # run all tests (105 unit + integration)
cargo clippy --all-targets --all-features -- -D warnings # strict lint
cargo build --release # release binary
cargo run -- inspect <mkv> # inspect MKV tracks
cargo run -- translate --help # show all translate options
```
All commands run from the project root (`/home/enrell/projects/psyche-subtitle-toolkit/`).
## Project Overview
Rust crate + CLI that extracts ASS subtitles from MKV files, translates them via pluggable providers, and muxes the translation back in-place. Designed for [Psyche](https://github.com/Gitlawb/psyche) but usable standalone.
**Key constraints:**
- ASS, SRT, and WebVTT subtitle formats
- In-place MKV remuxing (replaces original track, does not append)
- Sequential file processing (one MKV at a time)
- Local-first: no telemetry, no cloud required, every provider is opt-in
## Project Structure
```
src/
lib.rs # crate root: module declarations + public re-exports
main.rs # CLI binary (clap): inspect + translate subcommands
error.rs # thiserror enum: SubtitleToolkitError
pipeline.rs # translate_mkv (MKV pipeline) + translate_ass (subtitle-only pipeline)
media/
mod.rs
mkv.rs # mkvmerge/mkvextract wrappers: inspect, extract, mux
subtitles/
mod.rs
model.rs # SubtitleDocument, SubtitleCue
ass.rs # ASS parser + renderer (AssSubtitle)
srt.rs # SRT parser + renderer (SrtSubtitle)
structured.rs # numbered text format, tag stripping, chunking
vtt.rs # WebVTT parser + renderer (VttSubtitle)
translation/
mod.rs # Translator trait + TranslationRequest struct
ollama.rs # Ollama /api/generate provider
openai.rs # OpenAI /v1/chat/completions provider
deepl.rs # DeepL /v2/translate provider
google.rs # Google Translate v2 provider
gemini.rs # Gemini generateContent provider
openrouter.rs # OpenRouter /api/v1/chat/completions provider
anthropic.rs # Anthropic /v1/messages provider
```
## Architecture
### Pipeline Flow
```
MKV file → inspect → extract ASS → parse → strip tags → chunk → translate → apply → reinject → render → mux
```
Two entry points:
- `translate_mkv(options, translator: Arc<dyn Translator>)` — full MKV pipeline (file I/O + external tools)
- `translate_ass(ass, target_language, max_concurrent, translator: Arc<dyn Translator>)` — subtitle-only pipeline (no file I/O)
### Translator Trait
```rust
#[async_trait]
pub trait Translator: Send + Sync {
async fn translate(&self, request: TranslationRequest<'_>) -> Result<String>;
}
```
- Input: numbered subtitle text (`<1> hello\n<2> world`) + target language code
- Output: translated text in the same `<N> text` format
- Each provider owns its own prompt construction and HTTP client
- Pipeline is generic over `T: Translator + ?Sized` (supports `dyn Translator`)
### Error Handling
All errors use `SubtitleToolkitError` (thiserror). Provider errors use:
```rust
Translation { provider: &'static str, message: String }
```
## Adding a New Provider
Five steps, no pipeline changes needed:
1. **Create `src/translation/foo.rs`** — struct implementing `Translator`
2. **Add `pub mod foo;`** to `src/translation/mod.rs`
3. **Add doc comment** to the `Translator` trait listing
4. **Add re-export** `pub use translation::foo::FooTranslator;` to `src/lib.rs`
5. **Add CLI support** in `src/main.rs`:
- Add `use` import
- Add `"foo"` to the `--provider` help string
- Add match arm in the provider dispatch
- Update the unknown provider error message
### Provider Implementation Template
Each provider file follows this structure:
```rust
use serde::{Deserialize, Serialize};
use crate::error::{Result, SubtitleToolkitError};
use super::{TranslationRequest, Translator};
pub struct FooTranslator {
client: reqwest::Client,
base_url: String,
api_key: String,
}
impl FooTranslator {
pub fn new(api_key: impl Into<String>) -> Result<Self> { /* ... */ }
pub fn with_base_url(base_url: impl Into<String>, api_key: impl Into<String>) -> Result<Self> { /* ... */ }
}
#[async_trait::async_trait]
impl Translator for FooTranslator {
async fn translate(&self, request: TranslationRequest<'_>) -> Result<String> {
// 1. Send HTTP request
// 2. Check status → Err(SubtitleToolkitError::Translation { provider: "foo", ... })
// 3. Parse response
// 4. Return translated text (trimmed)
}
}
// Private request/response serde types
#[cfg(test)]
mod tests {
// wiremock mock tests: happy path, auth header, error on non-200, edge cases
}
```
### Provider Design Rules
- Use `reqwest::Client` with 120s timeout
- Constructor returns `Result<Self>` (reqwest build can fail)
- `with_base_url` for custom endpoints; `new` for defaults
- Trim `.base_url` trailing slash
- Error on non-200 with `SubtitleToolkitError::Translation { provider, message }`
- Trim whitespace from response text
- Each provider owns its own prompt/request construction (no shared prompt module)
## Testing
### Test Categories (105 tests total)
| Subtitle parsing | 19 | `subtitles/ass.rs` + `subtitles/srt.rs` + `subtitles/vtt.rs` | ASS + SRT + VTT parse + render |
| Structured text | 15 | `subtitles/structured.rs` | Numbered format, tag strip/reinject, chunking |
| Pipeline integration | 22 | `pipeline.rs` | Full subtitle processing, retry, dry-run, resume, concurrency, race condition regression with FakeTranslator |
| Provider mocks | 6-7 each | `translation/*.rs` | HTTP contracts per provider |
| Doc-tests | 12 | `lib.rs` + provider files | Compile-check public examples |
### Writing Provider Tests
Use `wiremock` (dev-dependency) to mock HTTP:
```rust
#[cfg(test)]
mod tests {
use super::*;
use wiremock::matchers::{method, path};
use wiremock::{Mock, MockServer, ResponseTemplate};
#[tokio::test]
async fn translates_numbered_text() {
let server = MockServer::start().await;
Mock::given(method("POST"))
.and(path("/expected/path"))
.respond_with(ResponseTemplate::new(200).set_body_json(/* ... */))
.mount(&server)
.await;
let translator = FooTranslator::with_base_url(server.uri(), "test-key").unwrap();
let result = translator.translate(TranslationRequest {
source_text: "<1> hello",
target_language: "pt-BR",
}).await.unwrap();
assert_eq!(result, "<1> expected");
}
}
```
### Writing Pipeline Tests
Pipeline tests use `FakeTranslator` (defined in `pipeline.rs::tests`) that records calls and returns configured responses. No external tools needed — tests the subtitle processing logic only.
### What Is NOT Tested
- MKV file I/O (requires mkvmerge/mkvextract + test media)
- CLI argument parsing (trivial clap derive)
- Real API calls (all providers use wiremock)
## Conventions
- **Edition:** Rust 2024
- **Safety:** `#![forbid(unsafe_code)]`
- **Errors:** `thiserror` only, no `anyhow`
- **Async:** `tokio` with `features = ["fs", "macros", "process", "rt-multi-thread"]`
- **HTTP:** `reqwest` with `json` + `rustls-tls` features, no default features
- **Lint:** `cargo clippy --all-targets --all-features -- -D warnings` must pass
- **Format:** `cargo fmt` (default rustfmt)
- **No unnecessary abstractions:** each provider is a standalone module, no factory/registry
- **`Arc<dyn Translator>` in library API:** the pipeline uses `Arc<dyn Translator>` for concurrent chunk translation; `Box<dyn>` is only in the CLI for provider dispatch
## External Dependencies
Runtime:
- `mkvmerge`, `mkvextract` (MKVToolNix) must be in PATH for MKV operations
Dev:
- `wiremock` for HTTP mock tests