narrate-this
A Rust SDK that turns text, URLs, or search queries into narrated videos — complete with TTS, captions, and stock visuals.
- Build pipeline from text content to rendered video in a single call
- Pluggable providers — swap any stage by implementing a trait
Here's a cringe demo made using this:
https://github.com/user-attachments/assets/a164e9d6-2fcf-4fd6-905f-0f0d001474d2
I watch random videos when I code, like the news, or random stuff in the background. I made this to be used in an automated pipeline for another personal app, which, in realtime reads from RSS feeds I m interested in and generates this content to satisfy my ADHD brain
Quick start
[]
= "0.1"
= { = "1", = ["full"] }
use ;
async
How the pipeline works
Content Source -> Narration -> Text Transforms -> TTS -> Media -> Audio Storage -> Video Render
Only TTS is required. Everything else is optional — skip content sourcing if you pass raw text, skip media if you just want audio, skip rendering if you don't need video.
Content sources
// Scrape and narrate an article
ArticleUrl
// Search the web and narrate the results
SearchQuery
// Just narrate some text directly (no content provider needed)
Text
Builder API
The builder uses type-state to enforce valid configuration at compile time:
builder
// Content provider (optional — skip for raw text)
.content
// Text transforms (chainable, applied in order)
.text_transform
// TTS provider (the only required piece)
.tts
// Everything below is optional, in any order
// Media planner — see "Media" section below
.media
.renderer
.audio_storage
.cache
.build?;
Media
The .media() builder method takes a MediaPlanner — a single trait that owns all media selection logic.
Stock media only
Use StockMediaPlanner for keyword extraction + stock search (e.g. Pexels):
.media
User-provided assets with AI matching
Use LlmMediaPlanner to provide your own images/videos with descriptions. An LLM matches them to narration chunks based on semantic relevance:
use ;
.media
Media sources can be URLs, local file paths, or raw bytes — the renderer handles all three.
Processing
// Run the full pipeline
let output = pipeline.process.await?;
// With progress callbacks
let output = pipeline.process_with_progress.await?;
// Or just parts of it
let text = pipeline.narrate.await?; // narration only
let tts_result = pipeline.synthesize.await?; // TTS only
Output
Narration style
You can control how the LLM writes the narration:
let scraper = with_config;
Background audio
let config = RenderConfig ;
Providers
Built-in:
| Provider | Service |
|---|---|
ElevenLabsTts |
ElevenLabs |
FirecrawlScraper |
Firecrawl |
OpenAiKeywords / OpenAiTransform |
OpenAI (gpt-4o-mini) |
PexelsSearch |
Pexels |
StockMediaPlanner |
Keywords + stock search |
LlmMediaPlanner |
AI asset matching + stock fallback |
FfmpegRenderer |
Local FFmpeg |
FsAudioStorage |
Local filesystem |
PgCache |
PostgreSQL (feature-gated: pg-cache) |
You can swap in your own by implementing the matching trait:
Traits: TtsProvider, ContentProvider, MediaPlanner, KeywordExtractor, MediaSearchProvider, TextTransformer, AudioStorage, CacheProvider, VideoRenderer.
PostgreSQL cache
[]
= { = "0.1", = ["pg-cache"] }
let pool = connect.await?;
let cache = new;
let pipeline = builder
.tts
.cache
.build?;
Prerequisites
- Rust 2024 edition (1.85+)
- FFmpeg on PATH for video rendering
- A Firecrawl instance for URL/search sources
- API keys for whichever providers you use
Running the examples
# fill in your API keys
Error handling
All errors come back as narrate_this::SdkError with variants for each stage (Tts, Llm, MediaSearch, MediaPlanner, WebScraper, etc.). Non-fatal errors (like a media search miss) are logged as warnings via tracing and won't stop the pipeline.
License
MIT