pub struct TranscriptParser { /* private fields */ }
Expand description
§TranscriptParser
Parses YouTube transcript XML data into structured transcript snippets.
This parser handles YouTube’s XML format for transcripts and can:
- Extract text content, timing information, and duration
- Optionally preserve specified HTML formatting tags
- Remove unwanted HTML tags
§Usage Example
use yt_transcript_rs::transcript_parser::TranscriptParser;
// Create a parser that strips all formatting
let parser = TranscriptParser::new(false);
// Or create a parser that preserves certain formatting tags (bold, italic, etc.)
let formatting_parser = TranscriptParser::new(true);
// Parse XML transcript data
let xml = r#"
<transcript>
<text start="0.0" dur="1.0">This is a transcript</text>
<text start="1.0" dur="1.5">With multiple entries</text>
</transcript>
"#;
let snippets = parser.parse(xml).unwrap();
Parser for YouTube transcript XML data
Implementations§
Source§impl TranscriptParser
impl TranscriptParser
Sourcepub fn with_config(
preserve_formatting: bool,
link_format: &str,
) -> Result<Self, Error>
pub fn with_config( preserve_formatting: bool, link_format: &str, ) -> Result<Self, Error>
Creates a new transcript parser with additional configuration options.
§Parameters
preserve_formatting
- Iftrue
, certain HTML formatting tags (like bold, italic) will be kept in the transcript. Iffalse
, all HTML tags will be removed.link_format
- A format string for rendering links. Must contain{text}
and{url}
placeholders. For example, “{text} ({url})” will render as “Google (https://google.com)”.
§Returns
A new TranscriptParser
instance configured according to the preferences.
§Example
Sourcepub fn new(preserve_formatting: bool) -> Self
pub fn new(preserve_formatting: bool) -> Self
Creates a new transcript parser.
§Parameters
preserve_formatting
- Iftrue
, certain HTML formatting tags (like bold, italic) will be kept in the transcript. Iffalse
, all HTML tags will be removed.
§Returns
A new TranscriptParser
instance configured according to the formatting preference.
§Example
// Create a parser that removes all HTML tags
let plain_parser = TranscriptParser::new(false);
// Create a parser that preserves formatting tags
let formatted_parser = TranscriptParser::new(true);
Sourcepub fn parse(
&self,
raw_data: &str,
) -> Result<Vec<FetchedTranscriptSnippet>, Error>
pub fn parse( &self, raw_data: &str, ) -> Result<Vec<FetchedTranscriptSnippet>, Error>
Parses YouTube transcript XML into a collection of transcript snippets.
This method takes raw XML data from YouTube transcripts and processes it into
structured FetchedTranscriptSnippet
objects that contain:
- Text content (with optional formatting)
- Start time in seconds
- Duration in seconds
§Parameters
raw_data
- The raw XML string containing transcript data from YouTube
§Returns
Result<Vec<FetchedTranscriptSnippet>, anyhow::Error>
- A vector of transcript snippets on success, or an error if parsing fails
§Errors
This function will return an error if:
- The XML data is malformed and cannot be parsed
- Required attributes are missing or invalid
§Example
let parser = TranscriptParser::new(false);
let snippets = parser.parse(xml).unwrap();
for snippet in snippets {
println!("[{:.1}-{:.1}s] {}",
snippet.start,
snippet.start + snippet.duration,
snippet.text);
}
Sourcepub fn process_with_formatting(&self, text: &str) -> String
pub fn process_with_formatting(&self, text: &str) -> String
Processes text to preserve only specific allowed HTML formatting tags.
This method:
- Identifies all HTML tags in the text
- Keeps only the tags listed in
FORMATTING_TAGS
- Removes all other HTML tags
§Parameters
text
- The text containing HTML tags to process
§Returns
A string with only the allowed formatting tags preserved and all others removed.
§Example (internal usage)
// Only <b> and <i> tags would be preserved, <span> would be removed
let result = parser.process_with_formatting(input);
// Result would be "<b>Bold</b> and span and <i>italic</i>"