pub struct TranscriptParser { /* private fields */ }
Expand description
§TranscriptParser
Parses YouTube transcript XML data into structured transcript snippets.
This parser handles YouTube’s XML format for transcripts and can:
- Extract text content, timing information, and duration
- Optionally preserve specified HTML formatting tags
- Remove unwanted HTML tags
§Usage Example
use yt_transcript_rs::transcript_parser::TranscriptParser;
// Create a parser that strips all formatting
let parser = TranscriptParser::new(false);
// Or create a parser that preserves certain formatting tags (bold, italic, etc.)
let formatting_parser = TranscriptParser::new(true);
// Parse XML transcript data
let xml = r#"
<transcript>
<text start="0.0" dur="1.0">This is a transcript</text>
<text start="1.0" dur="1.5">With multiple entries</text>
</transcript>
"#;
let snippets = parser.parse(xml).unwrap();
Parser for YouTube transcript XML data
Implementations§
Source§impl TranscriptParser
impl TranscriptParser
Sourcepub fn new(preserve_formatting: bool) -> Self
pub fn new(preserve_formatting: bool) -> Self
Creates a new transcript parser.
§Parameters
preserve_formatting
- Iftrue
, certain HTML formatting tags (like bold, italic) will be kept in the transcript. Iffalse
, all HTML tags will be removed.
§Returns
A new TranscriptParser
instance configured according to the formatting preference.
§Example
// Create a parser that removes all HTML tags
let plain_parser = TranscriptParser::new(false);
// Create a parser that preserves formatting tags
let formatted_parser = TranscriptParser::new(true);
Sourcepub fn parse(
&self,
raw_data: &str,
) -> Result<Vec<FetchedTranscriptSnippet>, Error>
pub fn parse( &self, raw_data: &str, ) -> Result<Vec<FetchedTranscriptSnippet>, Error>
Parses YouTube transcript XML into a collection of transcript snippets.
This method takes raw XML data from YouTube transcripts and processes it into
structured FetchedTranscriptSnippet
objects that contain:
- Text content (with optional formatting)
- Start time in seconds
- Duration in seconds
§Parameters
raw_data
- The raw XML string containing transcript data from YouTube
§Returns
Result<Vec<FetchedTranscriptSnippet>, anyhow::Error>
- A vector of transcript snippets on success, or an error if parsing fails
§Errors
This function will return an error if:
- The XML data is malformed and cannot be parsed
- Required attributes are missing or invalid
§Example
let parser = TranscriptParser::new(false);
let snippets = parser.parse(xml).unwrap();
for snippet in snippets {
println!("[{:.1}-{:.1}s] {}",
snippet.start,
snippet.start + snippet.duration,
snippet.text);
}
Sourcepub fn process_with_formatting(&self, text: &str) -> String
pub fn process_with_formatting(&self, text: &str) -> String
Processes text to preserve only specific allowed HTML formatting tags.
This method:
- Identifies all HTML tags in the text
- Keeps only the tags listed in
FORMATTING_TAGS
- Removes all other HTML tags
§Parameters
text
- The text containing HTML tags to process
§Returns
A string with only the allowed formatting tags preserved and all others removed.
§Example (internal usage)
// Only <b> and <i> tags would be preserved, <span> would be removed
let result = parser.process_with_formatting(input);
// Result would be "<b>Bold</b> and span and <i>italic</i>"
Trait Implementations§
Auto Trait Implementations§
impl Freeze for TranscriptParser
impl RefUnwindSafe for TranscriptParser
impl Send for TranscriptParser
impl Sync for TranscriptParser
impl Unpin for TranscriptParser
impl UnwindSafe for TranscriptParser
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more