yt-transcript-rs
yt-transcript-rs
is a Rust library for fetching and working with YouTube video transcripts. It allows you to retrieve transcripts in various languages, list available transcripts for a video, and fetch video details.

This project is heavily inspired by the Python module youtube-transcript-api originally developed by Jonas Depoix.
📢 Important Note (v0.1.8+): Due to YouTube's permanent API changes, this library now uses YouTube's internal InnerTube API exclusively for transcript fetching. This provides more reliable access than the traditional external API approach and requires no special configuration.
Table of Contents
Features
- Fetch transcripts from YouTube videos in various languages
- List all available transcripts for a video
- Retrieve translations of transcripts
- Get detailed information about YouTube videos
- Access video microformat data including available countries and embed information
- Retrieve streaming formats and quality options for videos
- Fetch all video information in a single request for optimal performance
- Support for proxy configuration and cookie authentication
- Customizable HTML processing with configurable link formatting
- Robust HTML entity handling and whitespace preservation
- InnerTube API Integration - Uses YouTube's internal API for reliable transcript access (v0.1.8+)
- Future-proof architecture that bypasses YouTube's external API limitations
Installation
Add yt-transcript-rs
to your Cargo.toml
:
cargo add yt-transcript-rs
Or manually add it to your Cargo.toml
:
[dependencies]
yt-transcript-rs = "0.1.8"
Usage
Fetch a transcript
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "5MuIMqhT8DM";
let languages = &["en"];
let preserve_formatting = false;
println!("Fetching transcript for video ID: {}", video_id);
match api.fetch_transcript(video_id, languages, preserve_formatting).await {
Ok(transcript) => {
println!("Successfully fetched transcript!");
println!("Video ID: {}", transcript.video_id);
println!(
"Language: {} ({})",
transcript.language, transcript.language_code
);
println!("Is auto-generated: {}", transcript.is_generated);
println!("Number of snippets: {}", transcript.snippets.len());
println!("\nTranscript content:");
for (_i, snippet) in transcript.snippets.iter().take(5).enumerate() {
println!(
"[{:.1}-{:.1}s] {}",
snippet.start,
snippet.start + snippet.duration,
snippet.text
);
}
println!("... (truncated)");
}
Err(e) => {
println!("Failed to fetch transcript: {:?}", e);
}
}
Ok(())
}
List available transcripts
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "arj7oStGLkU";
println!("Listing available transcripts for video ID: {}", video_id);
match api.list_transcripts(video_id).await {
Ok(transcript_list) => {
println!("Successfully retrieved transcript list!");
println!("Video ID: {}", transcript_list.video_id);
let mut count = 0;
let mut translatable_count = 0;
println!("\nAvailable transcripts:");
for transcript in &transcript_list {
count += 1;
let translatable = if transcript.is_translatable() {
translatable_count += 1;
"[translatable]"
} else {
""
};
println!(
"{}: {} ({}) {}",
count, transcript.language, transcript.language_code, translatable
);
if transcript.is_translatable() && count == 1 {
println!(" Available translations:");
for (i, lang) in transcript.translation_languages.iter().take(5).enumerate() {
println!(" {}: {} ({})", i + 1, lang.language, lang.language_code);
}
if transcript.translation_languages.len() > 5 {
println!(
" ... and {} more",
transcript.translation_languages.len() - 5
);
}
}
}
println!("\nTotal transcripts: {}", count);
println!("Translatable transcripts: {}", translatable_count);
}
Err(e) => {
println!("Failed to list transcripts: {:?}", e);
}
}
Ok(())
}
Fetch video details
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
println!("YouTube Video Details Example");
println!("------------------------------");
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "arj7oStGLkU";
println!("Fetching video details for: {}", video_id);
match api.fetch_video_details(video_id).await {
Ok(details) => {
println!("\nVideo Details:");
println!("-------------");
println!("Video ID: {}", details.video_id);
println!("Title: {}", details.title);
println!("Author: {}", details.author);
println!("Channel ID: {}", details.channel_id);
println!("View Count: {}", details.view_count);
println!("Length: {} seconds", details.length_seconds);
println!("Is Live Content: {}", details.is_live_content);
if let Some(keywords) = details.keywords {
println!("\nKeywords:");
for (i, keyword) in keywords.iter().enumerate().take(10) {
println!(" {}: {}", i + 1, keyword);
}
if keywords.len() > 10 {
println!(" ... and {} more", keywords.len() - 10);
}
}
println!("\nThumbnails: {} available", details.thumbnails.len());
for (i, thumb) in details.thumbnails.iter().enumerate() {
println!(
" {}: {}x{} - {}",
i + 1,
thumb.width,
thumb.height,
thumb.url
);
}
println!("\nDescription:");
let description = if details.short_description.len() > 300 {
format!("{}...", &details.short_description[..300])
} else {
details.short_description.clone()
};
println!("{}", description);
}
Err(e) => {
println!("Failed to fetch video details: {:?}", e);
}
}
Ok(())
}
### Fetch microformat data
```rust
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "arj7oStGLkU";
println!("Fetching microformat data for: {}", video_id);
match api.fetch_microformat(video_id).await {
Ok(microformat) => {
println!("\nMicroformat Data:");
println!("-----------------");
if let Some(title) = µformat.title {
println!("Title: {}", title);
}
if let Some(channel) = µformat.owner_channel_name {
println!("Channel: {}", channel);
}
if let Some(views) = µformat.view_count {
println!("View Count: {}", views);
}
if let Some(likes) = µformat.like_count {
println!("Like Count: {}", likes);
}
if let Some(category) = µformat.category {
println!("Category: {}", category);
}
if let Some(is_unlisted) = microformat.is_unlisted {
println!("Is Unlisted: {}", is_unlisted);
}
if let Some(is_family_safe) = microformat.is_family_safe {
println!("Is Family Safe: {}", is_family_safe);
}
if let Some(countries) = µformat.available_countries {
println!("Available in {} countries", countries.len());
}
if let Some(embed) = µformat.embed {
if let Some(iframe_url) = &embed.iframe_url {
println!("Embed URL: {}", iframe_url);
}
}
}
Err(e) => {
println!("Failed to fetch microformat data: {:?}", e);
}
}
Ok(())
}
Fetch streaming data
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
println!("YouTube Streaming Data Example");
println!("------------------------------");
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "arj7oStGLkU";
println!("Fetching streaming data for: {}", video_id);
match api.fetch_streaming_data(video_id).await {
Ok(streaming_data) => {
println!("\nStreaming Data:");
println!("--------------");
println!("Expires in: {} seconds", streaming_data.expires_in_seconds);
println!("\nCombined Formats (video+audio): {}", streaming_data.formats.len());
println!("Adaptive Formats: {}", streaming_data.adaptive_formats.len());
if let Some(format) = streaming_data.formats.first() {
println!("\nSample format information:");
println!(" ITAG: {}", format.itag);
if let (Some(w), Some(h)) = (format.width, format.height) {
println!(" Resolution: {}x{}", w, h);
}
println!(" Bitrate: {} bps", format.bitrate);
println!(" MIME type: {}", format.mime_type);
}
let video_count = streaming_data.adaptive_formats
.iter()
.filter(|f| f.mime_type.starts_with("video/"))
.count();
let audio_count = streaming_data.adaptive_formats
.iter()
.filter(|f| f.mime_type.starts_with("audio/"))
.count();
println!("\nAdaptive format breakdown:");
println!(" Video formats: {}", video_count);
println!(" Audio formats: {}", audio_count);
}
Err(e) => {
println!("Failed to fetch streaming data: {:?}", e);
}
}
Ok(())
}
Fetch all video information at once
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
println!("YouTube Video Infos (All-in-One) Example");
println!("----------------------------------------");
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "arj7oStGLkU";
println!("Fetching all video information in a single request...");
match api.fetch_video_infos(video_id).await {
Ok(infos) => {
println!("\nVideo Details:");
println!("Title: {}", infos.video_details.title);
println!("Author: {}", infos.video_details.author);
println!("Length: {} seconds", infos.video_details.length_seconds);
if let Some(category) = &infos.microformat.category {
println!("Category: {}", category);
}
if let Some(countries) = &infos.microformat.available_countries {
println!("Available in {} countries", countries.len());
}
println!("\nStreaming Options:");
println!("Video formats: {}", infos.streaming_data.formats.len());
println!("Adaptive formats: {}", infos.streaming_data.adaptive_formats.len());
let highest_res = infos.streaming_data.adaptive_formats
.iter()
.filter_map(|f| f.height)
.max()
.unwrap_or(0);
println!("Highest resolution: {}p", highest_res);
let transcript_count = infos.transcript_list.transcripts().count();
println!("\nAvailable transcripts: {}", transcript_count);
println!("\nAll data retrieved in a single network request!");
}
Err(e) => {
println!("Failed to fetch video information: {:?}", e);
}
}
Ok(())
}
Customize transcript formatting
use anyhow::Result;
use yt_transcript_rs::transcript_parser::TranscriptParser;
#[tokio::main]
async fn main() -> Result<()> {
let default_parser = TranscriptParser::new(false);
let formatted_parser = TranscriptParser::new(true);
let markdown_parser = TranscriptParser::with_config(false, "[{text}]({url})")?;
let html_parser = TranscriptParser::with_config(false, "<a href=\"{url}\">{text}</a>")?;
let html = r#"<p>Check out <a href="https://example.com">this link</a> for more info.</p>"#;
let default_text = default_parser.html_to_plain_text(html);
let formatted_text = formatted_parser.process_with_formatting(html);
let markdown_text = markdown_parser.html_to_plain_text(html);
let html_text = html_parser.html_to_plain_text(html);
println!("Default format: {}", default_text);
println!("Preserved HTML: {}", formatted_text);
println!("Markdown links: {}", markdown_text);
println!("HTML links: {}", html_text);
Ok(())
}
This feature is particularly useful when you need to:
- Format transcript links according to specific output needs
- Create transcripts for different display contexts (web, terminal, documents)
- Preserve certain HTML tags for styling while removing others
- Ensure proper entity decoding for symbols like apostrophes and quotes
Requirements
- Rust 1.56 or higher
tokio
for async execution
Advanced Usage
Using Proxies
You can configure the API to use a proxy server:
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use yt_transcript_rs::proxies::ProxyConfig;
#[tokio::main]
async fn main() -> Result<()> {
let proxy = Box::new(ProxyConfig {
url: "http://your-proxy-server:8080".to_string(),
username: Some("username".to_string()),
password: Some("password".to_string()),
});
let api = YouTubeTranscriptApi::new(Some(proxy), None, None)?;
let video_id = "5MuIMqhT8DM";
let languages = &["en"];
let transcript = api.fetch_transcript(video_id, languages, false).await?;
println!("Fetched transcript via proxy!");
Ok(())
}
Using Cookie Authentication
For videos that require authentication:
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use std::path::Path;
#[tokio::main]
async fn main() -> Result<()> {
let cookie_path = Path::new("path/to/cookies.txt");
let api = YouTubeTranscriptApi::new(Some(cookie_path.as_ref()), None, None)?;
let video_id = "private_video_id";
let languages = &["en"];
let transcript = api.fetch_transcript(video_id, languages, false).await?;
println!("Successfully authenticated and fetched transcript!");
Ok(())
}
Serializing and Deserializing Video Information
You can serialize video information for storage or transmission between systems. The library provides full support for serde
serialization and deserialization of the VideoInfos
struct and related types.
use anyhow::Result;
use reqwest::Client;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use yt_transcript_rs::models::VideoInfos;
#[tokio::main]
async fn main() -> Result<()> {
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "dQw4w9WgXcQ";
let infos = api.fetch_video_infos(video_id).await?;
let json = serde_json::to_string(&infos)?;
println!("Serialized data size: {} bytes", json.len());
std::fs::write("video_info.json", &json)?;
let json = std::fs::read_to_string("video_info.json")?;
let deserialized = serde_json::from_str::<VideoInfos>(&json)?;
println!("Title: {}", deserialized.video_details.title);
let client = Client::new();
if let Ok(transcript) = deserialized.transcript_list.find_transcript(&["en"]) {
let fetched = transcript.fetch(&client, false).await?;
println!("Transcript text: {}", fetched.text());
}
Ok(())
}
The serialization support makes it easy to:
- Cache video information to reduce YouTube API requests
- Send video data between microservices
- Store video information in databases
- Create backup systems for important videos
Note that when deserializing a Transcript
, you'll need to provide a Client
when calling fetch()
, as HTTP clients cannot be serialized.
Error Handling
The library provides specific error types for handling different failure scenarios:
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use yt_transcript_rs::errors::CouldNotRetrieveTranscriptReason;
#[tokio::main]
async fn main() -> Result<()> {
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "5MuIMqhT8DM";
match api.fetch_transcript(video_id, &["en"], false).await {
Ok(transcript) => {
println!("Successfully fetched transcript with {} snippets", transcript.snippets.len());
Ok(())
},
Err(e) => {
match e.reason {
Some(CouldNotRetrieveTranscriptReason::NoTranscriptFound) => {
println!("No transcript found for this video");
},
Some(CouldNotRetrieveTranscriptReason::TranslationLanguageNotAvailable) => {
println!("The requested translation language is not available");
},
Some(CouldNotRetrieveTranscriptReason::VideoUnavailable) => {
println!("The video is unavailable or does not exist");
},
_ => println!("Other error: {:?}", e),
}
Err(e.into())
}
}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Release Process
Releases are automated through GitHub Actions workflows. To create a new release:
- Update the version in
Cargo.toml
- Update the changelog in
CHANGELOG.md
- Create and push a new tag:
VERSION=$(grep '^version = ' Cargo.toml | cut -d'"' -f2)
git tag v$VERSION && git push origin v$VERSION
This will automatically trigger the release workflow which will:
- Build the crate
- Run tests
- Create a GitHub release
- Publish to crates.io
Contributing
Contributions are welcome! Here's how you can contribute:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
)
- Commit your changes (
git commit -m 'Add some amazing feature'
)
- Push to the branch (
git push origin feature/amazing-feature
)
- Open a Pull Request
Development Setup
git clone https://github.com/akinsella/yt-transcript-rs.git
cd yt-transcript-rs
cargo build
cargo test
cargo clippy --all-targets --features ci --all-features -- -D warnings
cargo clippy --all-targets --features ci --all-features --fix -- -D warnings
cargo fmt
cargo fmt --all
Follow these steps to ensure cargo-husky is properly installed and configured:
1. **Install cargo-husky as a dev dependency**:
```bash
cargo add --dev cargo-husky
-
Configure cargo-husky in Cargo.toml:
Add the following to your Cargo.toml
file:
[dev-dependencies]
cargo-husky = { version = "1", features = ["precommit-hook", "run-cargo-fmt", "run-cargo-clippy", "run-cargo-check"] }
-
Verify the installation:
After adding cargo-husky, run cargo build
once to ensure the git hooks are installed:
cargo build
-
Verify the hooks were created:
Check if the pre-commit hook file was created:
ls -la .git/hooks/pre-commit
You should see a pre-commit file, and it should be executable.
-
Test the pre-commit hook:
Make a small change to any file, then try to commit it:
echo "// Test comment" >> src/lib.rs
git add src/lib.rs
git commit -m "Test commit"
If the hook is working correctly, it should run:
cargo fmt
to format the code
cargo check
to verify compilation
cargo clippy
to check for lints
-
Troubleshooting:
If the hooks aren't running:
- Make sure the hook file is executable:
chmod +x .git/hooks/pre-commit
- Try rebuilding the project:
cargo clean && cargo build
- Check the content of the pre-commit file to ensure it's correct
-
Customizing hook behavior:
You can customize the hook behavior by adding a .cargo-husky/hooks/pre-commit
file with your custom script. cargo-husky will use this file instead of generating its own.
-
Skipping hooks when needed:
In rare cases when you need to bypass the hooks, you can use:
git commit -m "Your message" --no-verify
However, this should be used sparingly and only in exceptional circumstances.
By following these steps, cargo-husky will enforce code quality standards on every commit, helping maintain a clean and consistent codebase.
Acknowledgments