Struct TranscriptParser

Source
pub struct TranscriptParser { /* private fields */ }
Expand description

§TranscriptParser

Parses YouTube transcript XML data into structured transcript snippets.

This parser handles YouTube’s XML format for transcripts and can:

  • Extract text content, timing information, and duration
  • Optionally preserve specified HTML formatting tags
  • Remove unwanted HTML tags

§Usage Example

use yt_transcript_rs::transcript_parser::TranscriptParser;

// Create a parser that strips all formatting
let parser = TranscriptParser::new(false);

// Or create a parser that preserves certain formatting tags (bold, italic, etc.)
let formatting_parser = TranscriptParser::new(true);

// Parse XML transcript data
let xml = r#"
    <transcript>
        <text start="0.0" dur="1.0">This is a transcript</text>
        <text start="1.0" dur="1.5">With multiple entries</text>
    </transcript>
"#;

let snippets = parser.parse(xml).unwrap();

Parser for YouTube transcript XML data

Implementations§

Source§

impl TranscriptParser

Source

pub fn with_config( preserve_formatting: bool, link_format: &str, ) -> Result<Self, Error>

Creates a new transcript parser with additional configuration options.

§Parameters
  • preserve_formatting - If true, certain HTML formatting tags (like bold, italic) will be kept in the transcript. If false, all HTML tags will be removed.
  • link_format - A format string for rendering links. Must contain {text} and {url} placeholders. For example, “{text} ({url})” will render as “Google (https://google.com)”.
§Returns

A new TranscriptParser instance configured according to the preferences.

§Example
Source

pub fn new(preserve_formatting: bool) -> Self

Creates a new transcript parser.

§Parameters
  • preserve_formatting - If true, certain HTML formatting tags (like bold, italic) will be kept in the transcript. If false, all HTML tags will be removed.
§Returns

A new TranscriptParser instance configured according to the formatting preference.

§Example
// Create a parser that removes all HTML tags
let plain_parser = TranscriptParser::new(false);

// Create a parser that preserves formatting tags
let formatted_parser = TranscriptParser::new(true);
Source

pub fn parse( &self, raw_data: &str, ) -> Result<Vec<FetchedTranscriptSnippet>, Error>

Parses YouTube transcript XML into a collection of transcript snippets.

This method takes raw XML data from YouTube transcripts and processes it into structured FetchedTranscriptSnippet objects that contain:

  • Text content (with optional formatting)
  • Start time in seconds
  • Duration in seconds
§Parameters
  • raw_data - The raw XML string containing transcript data from YouTube
§Returns
  • Result<Vec<FetchedTranscriptSnippet>, anyhow::Error> - A vector of transcript snippets on success, or an error if parsing fails
§Errors

This function will return an error if:

  • The XML data is malformed and cannot be parsed
  • Required attributes are missing or invalid
§Example
let parser = TranscriptParser::new(false);
let snippets = parser.parse(xml).unwrap();

for snippet in snippets {
    println!("[{:.1}-{:.1}s] {}",
        snippet.start,
        snippet.start + snippet.duration,
        snippet.text);
}
Source

pub fn process_with_formatting(&self, text: &str) -> String

Processes text to preserve only specific allowed HTML formatting tags.

This method:

  1. Identifies all HTML tags in the text
  2. Keeps only the tags listed in FORMATTING_TAGS
  3. Removes all other HTML tags
§Parameters
  • text - The text containing HTML tags to process
§Returns

A string with only the allowed formatting tags preserved and all others removed.

§Example (internal usage)
// Only <b> and <i> tags would be preserved, <span> would be removed
let result = parser.process_with_formatting(input);
// Result would be "<b>Bold</b> and span and <i>italic</i>"

Trait Implementations§

Source§

impl Debug for TranscriptParser

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MaybeSendSync for T