# ๐ค Contributing to yt-dlp
Thank you for your interest in contributing! This guide will help you understand our codebase conventions and write code that feels like it belongs here. Every rule exists because it's already applied consistently across the entire codebase โ please follow them to keep things uniform.
---
## ๐ Table of Contents
- [๐ Getting Started](#-getting-started)
- [๐๏ธ Project Architecture](#๏ธ-project-architecture)
- [โ๏ธ Code Style](#๏ธ-code-style)
- [๐ช Nesting depth](#-nesting-depth)
- [๐จ Error Handling](#-error-handling)
- [๐ง Builder Patterns](#-builder-patterns)
- [๐ฆ Model & Data Types](#-model--data-types)
- [๐งฌ Trait Design](#-trait-design)
- [๐ Shared State & Concurrency](#-shared-state--concurrency)
- [โก Async Programming](#-async-programming)
- [๐ Event System](#-event-system)
- [๐ฏ Feature Flags](#-feature-flags)
- [๐ Tracing & Logging](#-tracing--logging)
- [๐ Documentation](#-documentation)
- [๐ Contributing to media-seek](#-contributing-to-media-seek)
- [โ
Verification Checklist](#-verification-checklist)
---
## ๐ Getting Started
### Prerequisites
- **Rust** (edition 2024) โ install via [rustup](https://rustup.rs/)
- **Rust nightly** (for rustfmt) โ `rustup toolchain install nightly --component rustfmt`
- **cargo-hack** โ `cargo install cargo-hack`
- **cargo-deny** โ `cargo install cargo-deny`
### Running the checks
Every PR must pass these commands:
```bash
# Lint all features combined (all backends in a single pass)
cargo clippy --workspace --all-features -- -D warnings
# Check formatting (requires nightly)
cargo +nightly fmt --all -- --check
# Run all doc-tests (workspace-wide)
cargo test --doc --workspace --all-features
# Check dependencies (licenses, advisories, bans)
cargo deny check
# Check for unused dependencies
cargo machete
```
### Branch workflow
1. Fork the repository and create a branch from `develop`
2. Make your changes following the guidelines below
3. Run the verification checks above
4. Open a PR against `develop`
---
## ๐๏ธ Project Architecture
The codebase is a Cargo workspace with two crates. Understanding the layout is essential before making changes:
```
yt-dlp/
โโโ Cargo.toml โ workspace manifest ([workspace] + [package])
โโโ src/ โ yt-dlp crate source
โโโ crates/
โโโ media-seek/ โ standalone container index parsing crate
โโโ Cargo.toml
โโโ src/
โโโ lib.rs โ RangeFetcher trait + parse() dispatch
โโโ error.rs โ Error enum + Result<T> alias
โโโ detect.rs โ magic-byte format detection
โโโ index.rs โ ContainerIndex, SegmentEntry, Inner
โโโ audio/ โ mp3, ogg, flac, pcm (wav+aiff), adts
โโโ video/ โ mp4, webm, flv, avi, ts
```
The `yt-dlp` crate module hierarchy:
```
src/
โโโ lib.rs # ๐ Crate root โ Downloader struct lives here (NOT in a submodule)
โโโ prelude.rs # ๐ค Convenience re-exports for `use yt_dlp::prelude::*`
โโโ macros.rs # ๐งฉ Macros: youtube!, ytdlp_args!, install_libraries!, ternary!
โโโ error.rs # ๐จ Single unified Error enum + type Result<T>
โ
โโโ client/ # ๐ง Builder, download builder, proxy, deps, stream orchestration
โ โโโ builder.rs # DownloaderBuilder (fluent builder)
โ โโโ download_builder.rs # DownloadBuilder<'a> (fluent download API)
โ โโโ proxy.rs # ProxyConfig, ProxyType
โ โโโ deps/ # ๐ฆ Auto-installation of yt-dlp & ffmpeg from GitHub releases
โ โโโ streams/ # ๐งฉ Format selection (VideoSelection trait), orchestration
โ
โโโ download/ # ๐ฅ DownloadManager, Fetcher, segment-based parallel downloads
โโโ events/ # ๐ EventBus, DownloadEvent, EventFilter, hooks, webhooks
โโโ executor/ # โ๏ธ Process runner, FfmpegArgs builder, temp-file+rename
โโโ extractor/ # ๐ก VideoExtractor trait, Youtube & Generic extractors
โโโ metadata/ # ๐ท๏ธ MP3/MP4/FFmpeg/Lofty metadata writing, chapter injection
โโโ model/ # ๐ Data types: Video, Format, Chapter, Playlist, Caption, etc.
โ โโโ utils/ # Serde helpers
โ โโโ selector.rs # VideoQuality, AudioQuality, StoryboardQuality enums
โโโ cache/ # ๐ VideoCache, DownloadCache, PlaylistCache (feature-gated)
โ โโโ backend/ # Backend trait + implementations (memory/moka, json, redb, redis)
โโโ live/ # ๐ด Live recording/streaming (features: live-recording, live-streaming)
โ โโโ hls.rs # HLS manifest parsing via m3u8-rs
โ โโโ recording.rs # Reqwest-based HLS segment recorder (primary)
โ โโโ ffmpeg_recording.rs # FFmpeg-based recorder (fallback)
โโโ stats/ # ๐ StatisticsTracker, GlobalSnapshot (feature: statistics)
โโโ utils/ # ๐ ๏ธ fs, http, platform, retry, validation, url_expiry, subtitle
```
### ๐ Module conventions
| Each directory has a `mod.rs` that declares submodules and re-exports public types | `pub use video::VideoCache;` in `cache/mod.rs` |
| `lib.rs` re-exports the most-used types to crate root | `pub use client::{DownloadBuilder, DownloaderBuilder};` |
| `prelude.rs` re-exports everything for basic usage | Feature-gated with `#[cfg(feature = "...")]` |
| Module-level `//!` doc comments on every `mod.rs` | Describes the module's purpose and architecture |
| Feature-gated modules in `lib.rs` | `#[cfg(feature = "statistics")] pub mod stats;` |
### ๐๏ธ Visibility rules
| `pub` | Types and methods exposed to library users | `pub fn fetch_video_infos(...)` |
| `pub(crate)` | All fields of `Downloader`, internal helpers | `pub(crate) youtube_extractor: Youtube` |
| Private | Implementation details | `fn audio_codec_for_mux(...)` |
> ๐ก Builder struct fields are always **private**. `TypedBuilder` config struct fields are always **`pub`**.
---
## โ๏ธ Code Style
### ๐ Language
All comments, docs, variable names, error messages, and log messages must be in **English**. No exceptions.
### ๐ฅ Imports
```rust
// โ
GOOD โ All imports at the top of the file
use crate::error::Result;
use crate::model::Video;
use std::path::PathBuf;
#[cfg(target_os = "windows")]
use std::os::windows::process::CommandExt;
// โ BAD โ Never import inside function bodies
fn my_function() {
use std::collections::HashMap; // WRONG
}
```
> ๐งฉ **Exception**: inside `macro_rules!` definitions, `$crate::` paths may require local imports.
### ๐ท๏ธ Naming conventions
| Variables & functions | `snake_case` | `download_video`, `is_ready` |
| Types & structs | `PascalCase` | `DownloaderBuilder`, `VideoQuality` |
| Constants | `SCREAMING_SNAKE_CASE` | `DEFAULT_RETRY_ATTEMPTS`, `FORMAT_URL_LIFETIME` |
| Constants prefix | Context prefix | `DEFAULT_`, `CONSERVATIVE_`, `BALANCED_`, `AGGRESSIVE_` |
| Booleans | Intent-driven | `is_ready`, `has_data`, `include_full_data` |
### ๐ Conditional logic
**No more than two raw conditions directly in an `if` (or `while`) guard.** When three or more sub-expressions are combined with `&&` or `||`, each sub-expression must first be bound to a short, descriptively-named `let` boolean before the guard. Boolean variable names must be short and intent-revealing: `is_year`, `is_endlist`, `is_timeout`, etc.
```rust
// โ
single condition โ OK
if probe.len() < 4 { โฆ }
// โ
two raw conditions combined โ OK
if e.starts_with("HTTP 4") && !e.starts_with("HTTP 429") { โฆ }
// โ
named booleans combined โ OK (required when โฅ 3 conditions)
let is_timeout = error.is_timeout();
let is_connect = error.is_connect();
let is_request = error.is_request();
// โ three or more raw expressions inline โ NOT OK
if error.is_timeout() || error.is_connect() || error.is_request() { โฆ }
```
### ๐ซ Lint suppressions
`#[allow(โฆ)]` attributes are **forbidden** in this codebase, with one explicit exception:
- `#[allow(clippy::large_enum_variant)]` on `DownloadEvent` โ boxing all variants for one large variant would add unnecessary indirection throughout the event system.
**Fix the root cause instead of suppressing the lint:**
| `dead_code` | Remove the item, or gate with `#[cfg(feature = "โฆ")]` |
| `unreachable_code` | Use `unreachable!("โฆ")` or gate the fallback with `#[cfg(not(โฆ))]` |
| `clippy::too_many_arguments` | Group related parameters into a dedicated struct |
| `unused_*` | Remove unused imports/variables, or prefix with `_` for intentional non-use |
### ๐ช Nesting depth
**Maximum two levels of nesting inside any function body.** Each loop (`for`, `while`, `loop`), conditional (`if`, `else if`, `match`), or closure that contains control flow counts as one level. Exceeding two levels raises the [SonarCloud Cognitive Complexity](https://www.sonarsource.com/docs/CognitiveComplexity.pdf) above the enforced threshold of 15 and will block your PR.
When a third level is needed, **extract the inner logic into a private helper function** that returns an `Option`, `Result`, or a dedicated struct.
```rust
// โ BAD โ three levels of nesting (loop โ if โ if)
fn scan_tags(probe: &[u8]) {
while let Some(tag) = next_tag(probe) { // level 1
if tag.kind == TagKind::Video { // level 2
if tag.frame_type == FrameType::Key { // level 3 โ NOT allowed
keyframes.push(tag.offset);
}
}
}
}
// โ
GOOD โ max two levels; the inner predicate is extracted
fn is_video_keyframe(tag: &Tag) -> bool {
tag.kind == TagKind::Video && tag.frame_type == FrameType::Key
}
fn scan_tags(probe: &[u8]) {
while let Some(tag) = next_tag(probe) { // level 1
if is_video_keyframe(&tag) { // level 2
keyframes.push(tag.offset);
}
}
}
```
The same rule applies to `match` arms that contain their own `if`/`loop`/`match`:
```rust
// โ BAD โ match arm body itself opens a new level
match block_type {
BlockType::StreamInfo => {
if block_len >= MIN_SIZE { // level 3 when already inside a loop + match
parse_stream_info(block);
}
}
}
// โ
GOOD โ delegate to a helper that handles the guard internally
match block_type {
BlockType::StreamInfo => parse_stream_info(block), // helper does its own guard
}
```
| Hard limit | 2 nesting levels per function |
| What counts | `for`, `while`, `loop`, `if`/`else if`/`else`, `match`, closures with control flow |
| Remedy | Extract inner body into a private `fn`, or use early-return / guard-clause patterns |
| SonarCloud | Max Cognitive Complexity per function: **15** |
### ๐ฏ Parameter types
Use the most appropriate type for public API parameters:
```rust
// โ
GOOD โ Flexible public API
pub fn new(url: impl Into<String>) -> Self { ... }
pub fn with_cookies(mut self, path: impl Into<PathBuf>) -> Self { ... }
pub fn input(mut self, path: impl AsRef<str>) -> Self { ... }
// โ BAD โ Too restrictive
pub fn new(url: String) -> Self { ... }
pub fn new(url: &str) -> Self { ... }
```
For internal functions, use the most optimized type for the operations applied:
- `&str` if you only read the string
- `String` if you need ownership
- `&Path` if you only read the path
- `PathBuf` if you need ownership
### ๐งช Testing
There are **no `#[cfg(test)]` modules** in `src/`. No tests live in `tests/common/` (only shared helpers).
**Test harnesses** โ three separate binaries under `tests/`:
| Unit | `cargo test --test unit --all-features` | Pure logic, no I/O, no network |
| Integration | `cargo test --test integration --all-features` | wiremock servers, tempdir I/O, async flows |
| E2E | `cargo test --test e2e --all-features -- --test-threads=1` | Full download pipeline with wiremock |
| Doctests | `cargo test --doc --workspace` | Code examples in rustdoc |
**Directory conventions** โ test directories mirror `src/` module hierarchy:
```
tests/unit/model/ โ matches src/model/
tests/unit/download/ โ matches src/download/
tests/integration/cache/ โ matches src/cache/
```
Create a subdirectory when a domain has โฅ 2 test files.
**Adding a new test:**
1. Create the test file in the appropriate subdirectory (e.g. `tests/unit/download/new_test.rs`)
2. Register it in the harness entry point (`tests/unit.rs`) with `#[path = "unit/download/new_test.rs"] mod new_test;`
3. Feature-gated tests use `#[cfg(feature = "...")]` on the module declaration in the entry point
**Conventions:**
- Test names follow `fn verb_noun_condition()` (e.g. `fn parse_format_returns_video_type()`)
- All test output goes to `tempfile::tempdir()`, never to project root
- Use `assert_matches!` for error variant checks, `pretty_assertions` for struct comparisons
- Mock servers use `wiremock::MockServer` (dev-dependency)
- Fixtures: JSON in `tests/fixtures/json/`, media in `tests/fixtures/media/`
- ๐ **Benchmarks** โ `benches/benchmarks.rs` with [criterion](https://crates.io/crates/criterion)
- ๐งช **Integration examples** โ `examples/` directory
### ๐ข Magic Numbers & Constants
**Never use raw numeric or byte literals in logic.** Every literal must be extracted to a named `const` at the top of the file.
```rust
// โ
GOOD โ Named constants with clear intent
/// ID3v2 header fixed size in bytes.
const ID3V2_HEADER_SIZE: usize = 10;
/// Maximum bytes to scan for the first sync word.
const SYNC_SEARCH_LIMIT: usize = 8192;
fn skip_id3(data: &[u8]) -> usize {
if data.len() < ID3V2_HEADER_SIZE { return 0; }
// ...
}
// โ BAD โ What does 10 mean? What about 8192?
fn skip_id3(data: &[u8]) -> usize {
if data.len() < 10 { return 0; }
// ...
}
```
| Location | File top, before any `fn` or `impl` |
| Naming | `SCREAMING_SNAKE_CASE` with context prefix (`DEFAULT_`, `BALANCED_`, etc.) |
| Lookup tables | Bitrate tables, sample rate tables โ `const` arrays at file top |
| Magic bytes | `const EBML_MAGIC: &[u8] = &[0x1A, 0x45, 0xDF, 0xA5];` โ never raw in conditionals |
### ๐ฆ Return Types (No Tuples)
**Never return tuples from functions.** Use a named struct instead โ even for two fields.
```rust
// โ
GOOD โ Clear field semantics at call site
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct ByteRange {
pub start: u64,
pub end: u64,
}
fn find_range(&self, time: f64) -> Option<ByteRange> {
// ...
}
// โ BAD โ Opaque meaning, easy to swap fields
fn find_range(&self, time: f64) -> Option<(u64, u64)> {
// ...
}
```
| Scope | Module-private structs are fine if only used internally |
| Derives | At minimum `Debug, Clone` โ add `Copy, PartialEq, Eq` when applicable |
| Fields | Descriptive names that convey semantics |
### ๐ Function Call & Type Qualification
Qualify function calls with **at most one `::`** โ import deeper paths at the top of the file.
```rust
// โ
GOOD โ Import then use short paths
use reqwest::header::{self, HeaderMap, HeaderValue};
let mut headers = HeaderMap::new();
headers.insert(header::CONTENT_TYPE, HeaderValue::from_static("text/plain"));
// โ BAD โ Double-qualified paths
let mut headers = reqwest::header::HeaderMap::new();
headers.insert(reqwest::header::CONTENT_TYPE, reqwest::header::HeaderValue::from_static("text/plain"));
```
| `Self::` for associated fns in `impl` | `Self::new()`, `Self::parse_header(data)` |
| `module::function()` | `detect::probe(data)` |
| `Type::method()` | `String::from("hello")` |
| Import heavily-used types directly | `use std::collections::HashMap;` then `HashMap::new()` |
---
## ๐จ Error Handling
We use a **single unified error type** in `src/error.rs`. Never introduce new error enums (except `HookError` which already exists for hook-specific failures).
### Rules
| **One `Error` enum** | All variants in one enum, grouped by `// === Category ===` comment banners |
| **Type alias** | `pub type Result<T> = std::result::Result<T, Error>;` โ import as `use crate::error::Result;` |
| **Structured fields** | Every variant uses named fields (`operation`, `url`, `reason`, `path`, `source`) โ never just a string |
| **`#[source]`** | Always on the inner error field for proper chaining |
| **Helper constructors** | `Error::io(...)`, `Error::http(...)` โ each logs `tracing::warn!`/`tracing::error!` before constructing |
| **`From` impls** | For `std::io::Error`, `reqwest::Error`, `serde_json::Error`, `JoinError`, `ZipError` โ each logs with `"(automatic conversion)"` suffix |
| **Parameter style** | `impl Into<String>` โ not concrete types |
| **Feature-gated** | `#[cfg(feature = "cache-redb")] Database { ... }`, `#[cfg(feature = "cache-redis")] Redis { ... }` |
| **No `anyhow`** | Always use the crate's own `Error` / `Result` |
### Example: Adding a new error variant
```rust
// In src/error.rs, add to the appropriate category section:
// ==================== Video & Format Errors ====================
/// My new error description.
#[error("Something failed for {video_id}: {reason}")]
MyNewError {
video_id: String,
reason: String,
},
```
And add a helper constructor:
```rust
pub fn my_new_error(video_id: impl Into<String>, reason: impl Into<String>) -> Self {
let video_id = video_id.into();
let reason = reason.into();
tracing::warn!(video_id = video_id, reason = reason, "Something failed");
Self::MyNewError { video_id, reason }
}
```
---
## ๐ง Builder Patterns
Two builder styles coexist โ use the right one for the right job:
### A) Manual builder (consuming `mut self`)
Used for: `DownloaderBuilder`, `DownloadBuilder`, `WebhookConfig`, `FfmpegArgs`
```rust
// โ
Builder methods prefixed with `with_` and consuming `mut self`
pub fn with_timeout(mut self, timeout: Duration) -> Self {
self.timeout = timeout;
self
}
// โ
Terminal method
pub async fn build(self) -> Result<Downloader> { ... }
```
| Method prefix | `with_` (e.g. `with_args`, `with_timeout`, `with_proxy`, `with_cache`) |
| Self parameter | Always `mut self` (consuming) โ **never `&mut self`** |
| Terminal method | `.build()` or `.execute()` |
| Field visibility | Private |
### B) `TypedBuilder` derive
Used for: config structs (`ManagerConfig`, `RetryPolicy`, `ExpiryConfig`)
```rust
#[derive(Debug, Clone, TypedBuilder)]
pub struct ManagerConfig {
#[builder(default = SpeedProfile::default().max_concurrent_downloads())]
pub max_concurrent_downloads: usize,
}
```
| Field visibility | `pub` |
| Defaults | `#[builder(default = ...)]` |
### C) Post-build mutation on `Downloader`
After `.build()`, use `set_*`/`add_*` methods (not `with_*`) to mutate the `Downloader` instance:
```rust
downloader.set_user_agent("my-agent");
downloader.set_timeout(Duration::from_secs(30));
downloader.set_args(vec!["--no-playlist".into()]);
downloader.add_arg("--flat-playlist");
downloader.set_cookies("cookies.txt");
downloader.set_cookies_from_browser("chrome");
downloader.set_netrc();
```
| Self parameter | `&mut self` (borrowing) โ returns `&mut Self` for chaining |
| Prefix for replacing | `set_` (e.g. `set_cookies`, `set_user_agent`, `set_timeout`) |
| Prefix for appending | `add_` (e.g. `add_arg`) |
> ๐ก **Don't confuse** builder `with_*` methods (consuming `mut self`, used before `.build()`) with post-build `set_*`/`add_*` methods (borrowing `&mut self`, used after `.build()`).
---
## ๐ฆ Model & Data Types
### Standard derive sets
| **Simple enums** | `Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize` + `Default` with `#[default]` |
| **Complex structs** (with `f64`) | `Debug, Clone, PartialEq, Serialize, Deserialize` โ manual `Eq`/`Hash` |
| **Simple structs** (no floats) | `Debug, Clone, PartialEq, Eq, Serialize, Deserialize` |
### Serde patterns
| `#[serde(flatten)]` | Struct composition (e.g. `Format` flattens `CodecInfo`, `VideoResolution`, etc.) |
| `#[serde(rename = "...")]` | Field name mapping from JSON (`"timestamp"`, `"acodec"`) |
| `#[serde(rename_all = "snake_case")]` | Enum variant renaming |
| `#[serde(default)]` | Optional collections and fields |
| `#[serde(other)]` | `Unknown` variant for forward compatibility |
| `#[serde(skip)]` | Derived/internal fields (e.g. `video_id` on `Format`) |
| `json_none` deserializer | Turns `"none"` strings to `Option::None` (in `model/utils/serde.rs`) |
| `#[serde_as(deserialize_as = "DefaultOnNull")]` | From `serde_with`, for nullable JSON fields |
| Custom `Deserialize` visitor | Polymorphic types (e.g. `DrmStatus` accepts bool or string) |
| `ordered_float::OrderedFloat<f64>` | Only when `f64` needs `Hash`/`Eq` |
### ๐จ๏ธ Display format
**Always** use the format `TypeName(key=value, key=value)`:
```rust
impl fmt::Display for Video {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "Video(id={}, title={:?}, channel={:?}, formats={})",
self.id, self.title, self.channel.as_deref().unwrap_or("Unknown"), self.formats.len())
}
}
```
| Only essential fields | Never full serialization |
| `Option` fields | `as_deref().unwrap_or("none")` or `unwrap_or("unknown")` |
| Enum constant variants | `f.write_str("VariantName")` |
| Enum variants with fields | `write!(f, "Variant(key={})", val)` |
### ๐ Custom `Hash` implementations
Hash **only identity fields** โ not all struct fields:
```rust
impl Hash for Video {
fn hash<H: Hasher>(&self, state: &mut H) {
self.id.hash(state);
self.title.hash(state);
self.channel.hash(state);
self.channel_id.hash(state);
}
}
```
---
## ๐งฌ Trait Design
### Which pattern to use?
| `#[async_trait]` | Trait used as `dyn Trait` (trait objects) | `VideoExtractor`, `EventHook` |
| RPITIT (`impl Future + Send`) | Dispatched via concrete enum, never `dyn` | Cache backend traits |
| `DynClone + clone_trait_object!` | Need to clone trait objects | `EventHook` |
| `Downcast + impl_downcast!` | Runtime downcasting of trait objects | `VideoExtractor` |
### `#[async_trait]` example
```rust
#[async_trait]
pub trait VideoExtractor: Downcast + Send + Sync + fmt::Debug {
async fn fetch_video(&self, url: &str) -> Result<Video>;
fn name(&self) -> ExtractorName;
fn supports_url(&self, url: &str) -> bool;
}
impl_downcast!(VideoExtractor);
```
### RPITIT example
```rust
pub trait VideoBackend: Send + Sync + std::fmt::Debug {
fn get(&self, url: &str) -> impl Future<Output = Result<Option<Video>>> + Send;
fn put(&self, url: String, video: Video) -> impl Future<Output = Result<()>> + Send;
}
```
> ๐ Trait method declarations carry **full rustdoc**; implementations may add only a brief clarifying comment.
---
## ๐ Shared State & Concurrency
### Primitives used
| `Arc<reqwest::Client>` | Shared HTTP client with connection pooling |
| `Arc<Mutex<...>>` | Mutable shared state (download queues, task maps, next_id counter) |
| `Arc<Semaphore>` | Concurrency limit for parallel downloads |
| `Arc<AtomicU64>` / `Arc<AtomicBool>` | Lock-free counters and flags |
| `Arc<RwLock<...>>` | Read-heavy shared state (hook registry, stats, webhooks) |
| `Arc<DownloadEvent>` | Events in broadcast channel (efficient cloning) |
| `Arc<dyn Fn(...) + Send + Sync>` | Callbacks and filter predicates |
| `tokio_util::sync::CancellationToken` | Graceful shutdown |
### โ ๏ธ Important rules
| Async locks | Use `tokio::sync::Mutex` and `tokio::sync::RwLock` |
| Sync locks | `std::sync::Mutex` **only** for progress counters and non-async contexts |
| Lock safety | **Never** hold a `tokio` lock across `.await` points |
| Simple counters | Prefer `Arc<AtomicU64>` over `Arc<Mutex<u64>>` |
| Caches on Downloader | `Option<Arc<VideoCache>>` |
---
## โก Async Programming
| Runtime | `tokio` (multi-threaded) |
| Task spawning | `tokio::spawn` for concurrency |
| Multiple tasks | `tokio::select!` for managing cancellations |
| Structured concurrency | Prefer scoped tasks and clean cancellation paths |
| Timeouts | `tokio::time::timeout` with kill on timeout |
| Blocking work | Offload to `tokio::task::spawn_blocking` (used for `serde_json::from_reader`, CPU-intensive parsing) |
| Time operations | `tokio::time::sleep` and `tokio::time::interval` |
| HTTP | `reqwest` with `Arc<Client>` connection pooling |
### Channels
| `tokio::sync::mpsc` | Webhook delivery queue (bounded, backpressure) |
| `tokio::sync::broadcast` | Event broadcasting to multiple subscribers |
| `tokio::sync::oneshot` | One-time task communication |
---
## ๐ Event System
The event system lives in `src/events/` and follows a three-phase delivery pattern:
### Architecture
| `EventBus` | Wraps `broadcast::Sender<Arc<DownloadEvent>>` |
| `DownloadEvent` | Large enum โ **all variants use named fields** (no tuple variants) |
| `EventFilter` | Predicate-based with `Vec<Arc<dyn Fn(&DownloadEvent) -> bool + Send + Sync>>` |
| `HookRegistry` | `Arc<RwLock<Vec<Box<dyn EventHook>>>>` |
| `simple_hook!` | Macro to create hooks from closures |
### Event emission order (in `Downloader::emit_event()`)
1. ๐ช **Hooks** โ with timeout (`#[cfg(feature = "hooks")]`)
2. ๐ก **Webhooks** โ non-blocking (`#[cfg(feature = "webhooks")]`)
3. ๐ข **Broadcast bus** โ always
### Adding a new event variant
```rust
// In DownloadEvent โ always use named fields:
// โ
GOOD
MyNewEvent {
download_id: u64,
reason: String,
},
// โ BAD โ No tuple variants
MyNewEvent(u64, String),
```
---
## ๐ฏ Feature Flags
### Available features
| `hooks` | Rust event callbacks | None |
| `webhooks` | HTTP event delivery | None |
| `statistics` | Real-time analytics | None |
| `cache-memory` *(default)* | In-memory Moka cache | `moka` |
| `cache-json` | JSON file backend | None |
| `cache-redb` | Embedded redb backend | `redb` |
| `cache-redis` | Distributed Redis backend | `redis` |
| `live-recording` | Live stream recording (HLS) | `m3u8-rs` |
| `live-streaming` | Live fragment streaming (HLS) | `m3u8-rs` |
| `rustls` | TLS backend | `reqwest/rustls` |
| `hickory-dns` | Async DNS resolver | `reqwest/hickory-dns` |
| `profiling` | Heap profiler | `dhat` |
### โ๏ธ `cache` cfg is emitted by `build.rs`
The `cache` cfg is **not** a Cargo feature โ it is a custom `cfg` emitted by `build.rs` when any cache backend
(`cache-memory`, `cache-json`, `cache-redb`, or `cache-redis`) is enabled. Users cannot activate it directly,
and it is invisible in `Cargo.toml`. Use `#[cfg(cache)]` to guard code that requires any cache backend.
### Backend selection
`build.rs` emits `persistent_cache` when any of `cache-json`, `cache-redb`, or `cache-redis` is enabled. Multiple persistent features may be active simultaneously โ the `multiple_persistent_backends` cfg and its associated `compile_error!` have been removed.
When exactly one persistent feature is compiled in, `CacheConfig::persistent_backend` is auto-deduced and may be left as `None`. When more than one is compiled in, `persistent_backend` **must** be set explicitly to a `PersistentBackendKind` variant; leaving it `None` causes `CacheLayer::from_config` to return `Error::AmbiguousCacheBackend` at runtime.
```rust
use yt_dlp::prelude::*;
// Multiple backends compiled in โ pick one at runtime:
let config = CacheConfig::builder()
.cache_dir("cache")
.persistent_backend(PersistentBackendKind::Redb) // required when multiple compiled in
.build();
```
### Conditional compilation patterns
```rust
// Module-level guard for all cache code (cfg emitted by build.rs)
#[cfg(cache)]
// Backend-specific modules
#[cfg(feature = "cache-json")]
pub mod json;
// Persistent backend guard (any of json/redb/redis)
#[cfg(persistent_cache)]
// Feature-gated struct fields
#[cfg(feature = "hooks")]
pub(crate) hook_registry: Option<events::HookRegistry>,
```
### โ Forbidden patterns
- **Never use `#[cfg(...)]` on function parameters.** It makes function signatures unreadable and call sites overly complex. If a parameter is feature-dependent, either feature-gate the entire function, or use a config struct / builder pattern where the specific field is feature-gated.
---
## ๐ Tracing & Logging
Tracing is an **unconditional dependency** โ every important function must have tracing.
### Rules at a glance
| Macro style | Always fully-qualified: `tracing::debug!(...)` โ **never import the macros** |
| No `#[instrument]` | Never use the `#[instrument]` attribute |
| Structured fields | `key = value`, `key = ?value` (Debug), `key = %value` (Display) |
| No interpolation | Never `tracing::debug!("msg {}", var)` โ always structured fields |
### Log levels
| `trace` | Hot paths, data transforms (rare โ prefer deleting) | โ
Yes |
| `debug` | Function entry/exit, parameters, config, internal ops | โ
Yes |
| `info` | Key milestones (download start/end, fetch, install, shutdown) | โ
Yes |
| `warn` | Recoverable failures, retries, fallbacks | โ No emoji |
| `error` | Unrecoverable per-item failures | โ No emoji |
### ๐จ Emoji prefixes
Every `trace`/`debug`/`info` message **must** start with one domain emoji:
| ๐ฆ | Install / dependencies |
| ๐ก | Fetch / extract |
| ๐ฅ | Download |
| ๐ฌ | Combine / mux |
| โ๏ธ | Postprocess / ffmpeg |
| ๐ท๏ธ | Metadata |
| ๐ฌ | Subtitle |
| ๐ผ๏ธ | Thumbnail |
| ๐ | Playlist |
| โ
| Success / completion |
| ๐ | Retry / update |
| ๐ง | Config / setup / builder |
| ๐ | Cache / lookup |
| โ๏ธ | Internal / utility |
| ๐ | Statistics |
| ๐ | Events |
| ๐งฉ | Format selection |
| ๐ | Shutdown |
### Example
```rust
// โ
GOOD
tracing::debug!(url = %url, timeout = ?timeout, "๐ฅ Starting download");
tracing::info!(video_id = video_id, formats = formats.len(), "๐ก Video fetched");
tracing::warn!(url = %url, attempt = attempt, "Retry after failure");
// โ BAD
tracing::debug!("Starting download for {}", url); // No interpolation
tracing::info!("Video fetched"); // No structured fields
tracing::warn!("โ ๏ธ Retry"); // No emoji on warn
```
### What NOT to trace
- โ Trivial getters/setters that just return or set a field
- โ Pure transforms (`to_ffmpeg_name`, `is_empty`, enum-to-string)
- โ Simple constant lookups / match on enum returning a value
---
## ๐ Documentation
Every public function, method, and trait method must have a **rustdoc comment**:
### Template
```rust
/// Brief one-line description.
///
/// Optional extended description.
///
/// # Arguments
///
/// * `param` - Description
///
/// # Errors
///
/// Returns an error if ...
///
/// # Returns
///
/// Description of return value.
///
/// # Examples
///
/// ```rust,no_run
/// # use yt_dlp::prelude::*;
/// # #[tokio::main]
/// # async fn main() -> Result<(), Box<dyn std::error::Error>> {
/// let downloader = Downloader::builder(libraries, "output").build().await?;
/// # Ok(())
/// # }
/// ```
```
### Section rules
| `# Arguments` | Only if params beyond `&self`/`&mut self` |
| `# Errors` | Only if returns `Result` |
| `# Returns` | Only if returns a value (not `()`) |
| `# Examples` | Main public API entry points (`Downloader::new`, `download`, `fetch`, etc.) |
### Additional rules
| Trait methods | Full rustdoc on the **trait declaration**; impls may add only a brief comment |
| Getters | Minimum one-liner + `# Returns` |
| Setters | Minimum one-liner + `# Arguments` |
| Builder methods | Minimum one-liner + `# Arguments` |
| Examples | Use `no_run` or `ignore` for network/binary-dependent code |
---
## โ๏ธ Process Execution
The crate runs external processes (`yt-dlp`, `ffmpeg`) through a controlled abstraction:
| `Executor` | `src/executor/mod.rs` | Wraps `tokio::process::Command` with piped I/O and timeout |
| `ProcessOutput` | `src/executor/process.rs` | `{ stdout, stderr, code }` |
| `FfmpegArgs` | `src/executor/ffmpeg.rs` | Fluent builder: `.input()`, `.codec_copy()`, `.args()`, `.output()`, `.build()` |
| `run_ffmpeg_with_tempfile()` | `src/executor/ffmpeg.rs` | Temp file + rename pattern for atomic writes |
### Key patterns
- โฑ๏ธ **Timeout**: `tokio::time::timeout` + `process.kill()` on timeout
- ๐ช **Windows**: `command.creation_flags(0x08000000)` (CREATE_NO_WINDOW) behind `#[cfg(target_os = "windows")]`
- ๐ **Temp + rename**: FFmpeg writes to a temp file, then renames atomically โ never write directly to the final output
- ๐งต **CPU-heavy parsing**: `tokio::task::spawn_blocking` for `serde_json::from_reader` and other CPU-intensive work
---
## ๐งฉ Macros
Defined in `src/macros.rs` and `src/events/hooks.rs`:
| `youtube!($yt_dlp, $ffmpeg, $output)` | Convenience `Downloader` constructor |
| `ytdlp_args![...]` | Args builder (string list or key-value pairs) |
| `install_libraries!($dir)` | Async binary installation |
| `ternary!($cond, $true, $false)` | Ternary operator |
| `simple_hook!` | Create an `EventHook` from a closure |
All macros must use `$crate::` fully-qualified paths for robustness. The `use` inside `macro_rules!` bodies is the **only** exception to the "imports at module top" rule.
---
## ๐ Contributing to media-seek
`crates/media-seek/` is a standalone crate published independently to [crates.io](https://crates.io/crates/media-seek). Changes to it follow the same code conventions as the main crate, with a few important constraints.
### Constraints
| **No feature flags** | All formats are always compiled in โ no conditional compilation inside `media-seek` |
| **No `reqwest`** | The crate is transport-agnostic. Callers implement `RangeFetcher`. |
| **No `serde`** | No serialization โ pure parsing only |
| **No `async_trait`** | `RangeFetcher` uses RPITIT (`impl Future + Send`), not `#[async_trait]` |
| **No tuples** | `ByteRange { start, end }` instead of `(u64, u64)` |
| **Named constants** | All magic numbers (sync bytes, header sizes, bitrate tables) as `const` at file top |
| **dedup safety** | `dedup_by_key` only after sorting by the **same key**; re-sort after dedup if needed |
### Where to make changes
| Audio format parser | `crates/media-seek/src/audio/` (`mp3.rs`, `ogg.rs`, `flac.rs`, `pcm.rs`, `adts.rs`) |
| Video format parser | `crates/media-seek/src/video/` (`mp4.rs`, `webm.rs`, `flv.rs`, `avi.rs`, `ts.rs`) |
| Format detection | `crates/media-seek/src/detect.rs` |
| Index data types | `crates/media-seek/src/index.rs` |
| Error handling | `crates/media-seek/src/error.rs` |
| Public API | `crates/media-seek/src/lib.rs` |
### Tracing conventions
Every `pub(crate) fn parse()` / `pub(crate) async fn parse()` must have entry and success tracing:
```rust
// At function start:
tracing::debug!(probe_len = probe.len(), "โ๏ธ Parsing <Format> stream");
// Just before each successful return:
tracing::debug!(segments = result.len(), "โ
<Format> index parsed");
```
Use `โ๏ธ` for internal operations and `โ
` for success โ same as the main crate. No emoji on `warn!` or `error!`.
### Checking your changes
```bash
# media-seek standalone lint
cargo clippy -p media-seek -- -D warnings
# Run media-seek unit + integration tests
cargo test --test unit --all-features -- media_seek
cargo test --test integration --all-features -- media_seek
# Doc-tests (both crates)
cargo test --doc --workspace
```
---
## โ
Verification Checklist
Before submitting your PR, make sure:
- [ ] ๐ `cargo clippy --workspace --all-features -- -D warnings` โ zero warnings
- [ ] ๐ `cargo +nightly fmt --all -- --check` โ properly formatted
- [ ] ๐งช `cargo test --test unit --all-features` โ all unit tests pass
- [ ] ๐งช `cargo test --test integration --all-features` โ all integration tests pass
- [ ] ๐งช `cargo test --test e2e --all-features -- --test-threads=1` โ all E2E tests pass
- [ ] ๐งช `cargo test --doc --workspace --all-features` โ all doc-tests pass
- [ ] ๐ `cargo deny check` โ no dependency issues
- [ ] ๐งน `cargo machete` โ no unused dependencies
- [ ] ๐ All new public items have rustdoc following the template
- [ ] ๐จ All tracing uses structured fields + emoji prefix
- [ ] ๐จ Errors use the existing `Error` enum with structured fields
- [ ] ๐ฅ All `use` imports are at the top of the file
- [ ] ๐ข No magic numbers โ all literals extracted to named `const` at file top
- [ ] ๐ฆ No tuple return types โ use named structs instead
- [ ] ๐ No double-qualified paths โ import types and use short names
- [ ] ๐ All text (comments, docs, logs) is in English
- [ ] ๐ช No function exceeds 2 nesting levels โ extract deeper logic into private helpers
---
<div align="center">
<strong>Thank you for contributing! ๐</strong>
<br>
<sub>If you have questions, open a <a href="https://github.com/boul2gom/yt-dlp/discussions">Discussion</a> โ we're happy to help.</sub>
</div>