narrate-this

A Rust SDK that turns text, URLs, or search queries into narrated videos — complete with TTS, captions, and stock visuals.

Build pipeline from text content to rendered video in a single call
Pluggable providers — swap any stage by implementing a trait

Here's a cringe demo made using this:

https://github.com/user-attachments/assets/a164e9d6-2fcf-4fd6-905f-0f0d001474d2

I watch random videos when I code, like the news, or random stuff in the background. I made this to be used in an automated pipeline for another personal app, which, in realtime reads from RSS feeds I m interested in and generates this content to satisfy my ADHD brain

Quick start

[dependencies]

narrate-this = "0.2"

tokio = { version = "1", features = ["full"] }

use narrate_this::{
    ContentPipeline, ContentSource, ElevenLabsConfig, ElevenLabsTts,
    FfmpegRenderer, FirecrawlScraper, FsAudioStorage, OpenAiConfig,
    OpenAiKeywords, PexelsSearch, RenderConfig, StockMediaPlanner,
};

#[tokio::main]
async fn main() -> narrate_this::Result<()> {
    let pipeline = ContentPipeline::builder()
        .content(FirecrawlScraper::new("http://localhost:3002"))
        .tts(ElevenLabsTts::new(ElevenLabsConfig {
            api_key: "your-elevenlabs-key".into(),
            ..Default::default()
        }))
        .media(StockMediaPlanner::new(
            OpenAiKeywords::new(OpenAiConfig {
                api_key: "your-openai-key".into(),
                ..Default::default()
            }),
            PexelsSearch::new("your-pexels-key"),
        ))
        .renderer(FfmpegRenderer::new(), RenderConfig::default())
        .audio_storage(FsAudioStorage::new("./output"))
        .build()?;

    let output = pipeline
        .process(ContentSource::ArticleUrl {
            url: "https://example.com/article".into(),
            title: Some("My Article".into()),
        })
        .await?;

    println!("Video: {}", output.video_path.unwrap());
    Ok(())
}

How the pipeline works

Content Source -> Narration -> Text Transforms -> TTS -> Media -> Audio Storage -> Video Render

Only TTS is required. Everything else is optional — skip content sourcing if you pass raw text, skip media if you just want audio, skip rendering if you don't need video.

Content sources

// Scrape and narrate an article
ContentSource::ArticleUrl {
    url: "https://example.com/article".into(),
    title: Some("Optional title hint".into()),
}

// Search the web and narrate the results
ContentSource::SearchQuery("latest Rust async developments".into())

// Just narrate some text directly (no content provider needed)
ContentSource::Text("Your text to narrate...".into())

Builder API

The builder uses type-state to enforce valid configuration at compile time:

ContentPipeline::builder()
    // Content provider (optional — skip for raw text)
    .content(FirecrawlScraper::new("http://localhost:3002"))

    // Text transforms (chainable, applied in order)
    .text_transform(OpenAiTransform::new(
        &openai_key,
        "Rewrite in a casual podcast style",
    ))

    // TTS provider (the only required piece)
    .tts(ElevenLabsTts::new(ElevenLabsConfig {
        api_key: tts_key,
        voice_id: "custom-voice-id".into(),  // default: "Gr7mLjPA3HhuWxZidxPW"
        speed: 1.2,
        ..Default::default()
    }))

    // Everything below is optional, in any order

    // Media planner — see "Media" section below
    .media(StockMediaPlanner::new(keywords_provider, search_provider))

    .renderer(FfmpegRenderer::new(), RenderConfig {
        output_path: "./output.mp4".into(),
        audio_tracks: vec![
            AudioTrack::new("./background.mp3").volume(0.15),
        ],
        ..Default::default()
    }.pix_fmt("yuv420p"))
    .audio_storage(FsAudioStorage::new("./audio_cache"))
    .cache(my_cache_provider)
    .build()?;

Media

The .media() builder method takes a MediaPlanner — a single trait that owns all media selection logic.

Stock media only

Use StockMediaPlanner for keyword extraction + stock search (e.g. Pexels):

.media(StockMediaPlanner::new(
    OpenAiKeywords::new(OpenAiConfig {
        api_key: "your-openai-key".into(),
        ..Default::default()
    }),
    PexelsSearch::new("your-pexels-key"),
))

User-provided assets with AI matching

Use LlmMediaPlanner to provide your own images/videos with descriptions. An LLM matches them to narration chunks based on semantic relevance:

use narrate_this::{LlmMediaPlanner, MediaAsset, MediaFallback, OpenAiConfig};

.media(
    LlmMediaPlanner::new(OpenAiConfig {
        api_key: "your-openai-key".into(),
        ..Default::default()
    })
    .assets(vec![
        MediaAsset::image("./hero.jpg", "A rocket launching into space"),
        MediaAsset::video("https://example.com/demo.mp4", "App demo walkthrough"),
        MediaAsset::image_bytes(screenshot_bytes, "Dashboard screenshot"),
    ])
    // Optional: fall back to stock search for unmatched chunks
    .stock_search(
        OpenAiKeywords::new(OpenAiConfig {
            api_key: "your-openai-key".into(),
            ..Default::default()
        }),
        PexelsSearch::new("your-pexels-key"),
    )
    .fallback(MediaFallback::StockSearch) // default
    .allow_reuse(true)                    // same asset can appear multiple times
    .max_reuse(Some(2)),                  // but at most twice
)

Media sources can be URLs, local file paths, or raw bytes — the renderer handles all three.

Processing

// Run the full pipeline
let output = pipeline.process(source).await?;

// With progress callbacks
let output = pipeline.process_with_progress(source, |event| {
    match event {
        PipelineProgress::NarrationStarted => println!("Scraping..."),
        PipelineProgress::TtsComplete { audio_bytes, caption_count } => {
            println!("Audio: {audio_bytes} bytes, {caption_count} captions");
        }
        PipelineProgress::RenderComplete { ref path } => {
            println!("Video saved to {path}");
        }
        _ => {}
    }
}).await?;

// Or just parts of it
let text = pipeline.narrate(source).await?;           // narration only
let tts_result = pipeline.synthesize("Text").await?;   // TTS only

Output

pub struct ContentOutput {
    pub narration: String,
    pub audio: Vec<u8>,                    // MP3
    pub captions: Vec<CaptionSegment>,     // word-level timing
    pub media_segments: Vec<MediaSegment>, // source: MediaSource (URL, file path, or bytes)
    pub audio_path: Option<String>,        // if audio storage configured
    pub video_path: Option<String>,        // if renderer configured
}

Narration style

You can control how the LLM writes the narration:

let scraper = FirecrawlScraper::with_config(FirecrawlConfig {
    base_url: "http://localhost:3002".into(),
    style: NarrationStyle::default()
        .role("podcast host")
        .persona("a friendly tech enthusiast")
        .tone("Casual and upbeat")
        .length("3-5 paragraphs, 60-120 seconds when read aloud")
        .structure("Open with a hook, then dive into details"),
    ..Default::default()
});

Background audio

let config = RenderConfig {
    output_path: "./output.mp4".into(),
    audio_tracks: vec![
        AudioTrack::new("./music.mp3")
            .volume(0.15)       // 0.0–1.0, default 0.3
            .start_at(2000)     // delay start by 2s
            .no_loop(),         // loops by default
    ],
    ..Default::default()
};

FFmpeg configuration

Customize encoding settings via RenderConfig. All fields are optional and fall back to sensible defaults:

let config = RenderConfig {
    output_path: "./output.mp4".into(),
    ..Default::default()
}
.video_codec("libx265")           // default: "libx264"
.audio_codec("libopus")           // default: "aac"
.preset("slow")                   // default: "fast"
.crf(18)                          // default: ffmpeg default (23 for x264)
.pix_fmt("yuv420p")               // broad player compatibility
.subtitle_style("FontSize=32,PrimaryColour=&H00FFFF&,Bold=1")
.extra_output_args(["-movflags", "+faststart"]);

Field	Default	FFmpeg flag
`video_codec`	`"libx264"`	`-c:v`
`audio_codec`	`"aac"`	`-c:a`
`preset`	`"fast"`	`-preset`
`crf`	none	`-crf`
`pix_fmt`	none	`-pix_fmt`
`subtitle_style`	`"FontSize=24,PrimaryColour=&HFFFFFF&"`	subtitles force_style
`extra_output_args`	`[]`	any flags before output path

extra_output_args is a catch-all for anything not covered above (e.g. -tune, -b:v, -movflags).

Providers

Built-in:

Provider	Service
`ElevenLabsTts`	ElevenLabs
`FirecrawlScraper`	Firecrawl
`OpenAiKeywords` / `OpenAiTransform`	OpenAI (gpt-4o-mini)
`PexelsSearch`	Pexels
`StockMediaPlanner`	Keywords + stock search
`LlmMediaPlanner`	AI asset matching + stock fallback
`FfmpegRenderer`	Local FFmpeg
`FsAudioStorage`	Local filesystem
`PgCache`	PostgreSQL (feature-gated: `pg-cache`)

You can swap in your own by implementing the matching trait:

#[async_trait]
impl TtsProvider for MyTtsProvider {
    async fn synthesize(&self, text: &str) -> Result<TtsResult> {
        // ...
    }
}

Traits: TtsProvider, ContentProvider, MediaPlanner, KeywordExtractor, MediaSearchProvider, TextTransformer, AudioStorage, CacheProvider, VideoRenderer.

PostgreSQL cache

[dependencies]

narrate-this = { version = "0.1", features = ["pg-cache"] }

let pool = sqlx::PgPool::connect("postgres://localhost/narrate").await?;
let cache = PgCache::new(pool);

let pipeline = ContentPipeline::builder()
    .tts(my_tts)
    .cache(cache)
    .build()?;

Prerequisites

Rust 2024 edition (1.85+)
FFmpeg on PATH for video rendering
A Firecrawl instance for URL/search sources
API keys for whichever providers you use

Running the examples

cp examples/.env.example examples/.env

# fill in your API keys

cargo run --example basic

cargo run --example local_tts

Error handling

All errors come back as narrate_this::SdkError with variants for each stage (Tts, Llm, MediaSearch, MediaPlanner, WebScraper, etc.). Non-fatal errors (like a media search miss) are logged as warnings via tracing and won't stop the pipeline.

License

MIT

narrate-this 0.2.2