Struct JsVarParser

Source
pub struct JsVarParser { /* private fields */ }
Expand description

Parser for extracting JavaScript variables from HTML content.

This parser is designed to extract and parse JavaScript object literals assigned to variables in HTML source code, specifically targeting YouTube’s page structure. It handles nested objects, escaping, and proper JSON parsing.

§Features

  • Extracts JavaScript variables by name from HTML content
  • Handles nested objects with proper brace matching
  • Supports both character-by-character parsing and regex fallbacks
  • Converts extracted JavaScript objects to Rust values via serde_json

§Example

// Create a parser for the "ytInitialPlayerResponse" variable
let parser = JsVarParser::new("ytInitialPlayerResponse");

// HTML content containing the JavaScript variable
let html = r#"
  <script>
    var ytInitialPlayerResponse = {"captions": {"playerCaptionsTracklistRenderer":
      {"captionTracks": [{"baseUrl": "https://example.com", "name": {"simpleText": "English"}}]}}};
  </script>
"#;

// Parse the variable
let json = parser.parse(html, "dQw4w9WgXcQ")?;

// Access extracted data
if let Some(captions) = json.get("captions") {
    println!("Found captions data: {}", captions);
}

Implementations§

Source§

impl JsVarParser

Source

pub fn new(var_name: &str) -> Self

Creates a new JavaScript variable parser for the specified variable name.

§Parameters
  • var_name - The name of the JavaScript variable to extract (e.g., “ytInitialPlayerResponse”)
§Returns

A new JsVarParser instance configured to extract the specified variable.

§Example
// Create a parser for YouTube's initial player response
let player_response_parser = JsVarParser::new("ytInitialPlayerResponse");

// Create a parser for YouTube's initial data
let initial_data_parser = JsVarParser::new("ytInitialData");
Source

pub fn parse( &self, html: &str, video_id: &str, ) -> Result<Value, CouldNotRetrieveTranscript>

Parses a JavaScript variable from HTML content and converts it to a JSON value.

This method tries multiple parsing strategies:

  1. First, it attempts a character-by-character approach for precise extraction
  2. If that fails, it falls back to regular expression patterns
§Parameters
  • html - The HTML content containing the JavaScript variable
  • video_id - The YouTube video ID (used for error reporting)
§Returns
  • Result<serde_json::Value, CouldNotRetrieveTranscript> - The parsed JSON value or an error
§Errors

Returns a CouldNotRetrieveTranscript error with YouTubeDataUnparsable reason when:

  • The variable is not found in the HTML
  • The variable value cannot be parsed as valid JSON
  • The braces in the JavaScript object are mismatched
§Example
let parser = JsVarParser::new("ytInitialPlayerResponse");
let html = r#"<script>var ytInitialPlayerResponse = {"captions": {"available": true}};</script>"#;

match parser.parse(html, "dQw4w9WgXcQ") {
    Ok(json) => {
        println!("Successfully extracted variable: {}", json);
         
        // Access nested properties
        if let Some(captions) = json.get("captions") {
            if let Some(available) = captions.get("available") {
                println!("Captions available: {}", available);
            }
        }
    },
    Err(e) => {
        println!("Failed to parse: {:?}", e.reason);
    }
}

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MaybeSendSync for T