ytx-cli 0.1.0

Extract YouTube transcripts from the terminal. Pipe-friendly, no API key needed.
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

A Rust CLI tool that fetches YouTube video transcripts. Scrapes the YouTube page for the InnerTube API key, calls the InnerTube player API with an ANDROID client (to get caption URLs without PoToken requirements), then fetches and parses the caption XML. Output goes to stdout by default.

## Build & Run

```bash
cargo build              # Debug build
cargo build --release    # Release build
ytx <URL_OR_ID>          # Print transcript to stdout
ytx <URL_OR_ID> -t       # Include timestamps
ytx <URL_OR_ID> -l ja    # Specify language (default: en)
ytx <URL_OR_ID> -o out.txt  # Write to file instead of stdout
```

## Architecture

Single-file application (`src/main.rs`) with this flow:

1. **CLI parsing**`clap` derive API. Accepts URL/video ID, optional `-o` output path, `-t` timestamps flag, `-l` language code (default `en`).
2. **Video ID extraction** — regex-based extraction supporting `?v=`, `youtu.be/`, and `shorts/` URL formats. Bare 11-char IDs passed through as-is.
3. **Transcript fetching** (`fetch_transcript`) — HTTP pipeline: fetch video page → extract `INNERTUBE_API_KEY` → POST to InnerTube player API (ANDROID client) → pick best caption track by language → fetch caption XML. Rejects URLs with `exp=xpe` (PoToken-gated).
4. **XML parsing** (`parse_caption_xml`) — parses both srv3 (`<p>` tags, `t` attr in ms) and legacy (`<text>` tags, `start` attr in seconds) caption formats using `roxmltree`. Decodes HTML entities and strips inner tags.
5. **Output** — joins segment text (optionally with `[MM:SS]` or `[HH:MM:SS]` timestamps), prints to stdout or writes to file via `-o`. Status/errors go to stderr.