<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="assets/captube-aperture-dark.svg">
<img src="assets/captube-aperture.svg" alt="captube" width="160">
</picture>
</p>
<h1 align="center">captube</h1>
<p align="center">Turn any YouTube video into slides.</p>
<p align="center">
<a href="https://crates.io/crates/captube"><img src="https://img.shields.io/crates/v/captube.svg?style=flat-square" alt="crates.io"></a>
<a href="https://docs.rs/captube"><img src="https://img.shields.io/docsrs/captube?style=flat-square" alt="docs.rs"></a>
<a href="https://crates.io/crates/captube"><img src="https://img.shields.io/crates/d/captube.svg?style=flat-square" alt="downloads"></a>
<a href="#license"><img src="https://img.shields.io/crates/l/captube.svg?style=flat-square" alt="license"></a>
</p>
---
captube takes a YouTube lecture URL and gives you back a PDF where every
page is one unique slide, captured from the video itself.
## Install
```bash
cargo install captube
```
Runtime dependencies (not bundled):
- `ffmpeg` and `ffprobe` — on `PATH`
- `yt-dlp` — on `PATH`
## Use
```bash
captube 'https://www.youtube.com/watch?v=<VIDEO_ID>' -o slides.pdf
```
All options:
```
captube <URL> [OPTIONS]
-o, --output <PATH> Output PDF path [default: output.pdf]
--scene-threshold <F> ffmpeg scene score cut-off (0.0-1.0) [default: 0.30]
--fps <F> Sampling fps used during scene scanning [default: 2.0]
--max-width <U32> Maximum px width of embedded frames [default: 1280]
--dedup-threshold <F> Mean pixel diff (0-255) to consider frames
the same slide — raise for fewer pages,
lower to keep subtler slide variations
[default: 20.0]
--keep-workdir Keep intermediate files for inspection
-v, --verbose Print per-frame dedup decisions
```
## How it works
1. **Download** — `yt-dlp` fetches a video-only mp4 at ≤720p.
2. **Keyframe dump** — `ffmpeg -skip_frame nokey` decodes only keyframes
(about one per GOP). Modern H.264 encoders put keyframes on scene
boundaries, so these cover every real slide change — plus duplicates
for slides that outlast a single GOP.
3. **Perceptual dedup** — each keyframe is hashed as a 256×256 grayscale
thumbnail and compared to the previous kept frame by mean absolute
difference. Mouse-cursor-only motion collapses away.
4. **Settle re-extract** — every remaining keyframe is re-extracted via
`-ss pts+0.8`. This bypasses a decoder quirk where `-skip_frame nokey`
occasionally hands out corrupt-looking frames at cross-fade
boundaries, and it also lands on the stable post-transition frame if
the keyframe happened to fall mid-fade.
5. **Final dedup + PDF** — a small-threshold pass collapses any frames
whose settled versions converged onto the same slide; `printpdf`
writes one page per remaining frame.
On a 58-minute lecture the full pipeline (download → PDF) runs in ~17s
on a modern x86_64 box.
## License
Licensed under either of
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT license ([LICENSE-MIT](LICENSE-MIT))
at your option.