pub struct JsVarParser { /* private fields */ }
Expand description
Parser for extracting JavaScript variables from HTML content.
This parser is designed to extract and parse JavaScript object literals assigned to variables in HTML source code, specifically targeting YouTube’s page structure. It handles nested objects, escaping, and proper JSON parsing.
§Features
- Extracts JavaScript variables by name from HTML content
- Handles nested objects with proper brace matching
- Supports both character-by-character parsing and regex fallbacks
- Converts extracted JavaScript objects to Rust values via serde_json
§Example
// Create a parser for the "ytInitialPlayerResponse" variable
let parser = JsVarParser::new("ytInitialPlayerResponse");
// HTML content containing the JavaScript variable
let html = r#"
<script>
var ytInitialPlayerResponse = {"captions": {"playerCaptionsTracklistRenderer":
{"captionTracks": [{"baseUrl": "https://example.com", "name": {"simpleText": "English"}}]}}};
</script>
"#;
// Parse the variable
let json = parser.parse(html, "dQw4w9WgXcQ")?;
// Access extracted data
if let Some(captions) = json.get("captions") {
println!("Found captions data: {}", captions);
}
Implementations§
Source§impl JsVarParser
impl JsVarParser
Sourcepub fn new(var_name: &str) -> Self
pub fn new(var_name: &str) -> Self
Creates a new JavaScript variable parser for the specified variable name.
§Parameters
var_name
- The name of the JavaScript variable to extract (e.g., “ytInitialPlayerResponse”)
§Returns
A new JsVarParser
instance configured to extract the specified variable.
§Example
// Create a parser for YouTube's initial player response
let player_response_parser = JsVarParser::new("ytInitialPlayerResponse");
// Create a parser for YouTube's initial data
let initial_data_parser = JsVarParser::new("ytInitialData");
Sourcepub fn parse(
&self,
html: &str,
video_id: &str,
) -> Result<Value, CouldNotRetrieveTranscript>
pub fn parse( &self, html: &str, video_id: &str, ) -> Result<Value, CouldNotRetrieveTranscript>
Parses a JavaScript variable from HTML content and converts it to a JSON value.
This method tries multiple parsing strategies:
- First, it attempts a character-by-character approach for precise extraction
- If that fails, it falls back to regular expression patterns
§Parameters
html
- The HTML content containing the JavaScript variablevideo_id
- The YouTube video ID (used for error reporting)
§Returns
Result<serde_json::Value, CouldNotRetrieveTranscript>
- The parsed JSON value or an error
§Errors
Returns a CouldNotRetrieveTranscript
error with YouTubeDataUnparsable
reason when:
- The variable is not found in the HTML
- The variable value cannot be parsed as valid JSON
- The braces in the JavaScript object are mismatched
§Example
let parser = JsVarParser::new("ytInitialPlayerResponse");
let html = r#"<script>var ytInitialPlayerResponse = {"captions": {"available": true}};</script>"#;
match parser.parse(html, "dQw4w9WgXcQ") {
Ok(json) => {
println!("Successfully extracted variable: {}", json);
// Access nested properties
if let Some(captions) = json.get("captions") {
if let Some(available) = captions.get("available") {
println!("Captions available: {}", available);
}
}
},
Err(e) => {
println!("Failed to parse: {:?}", e.reason);
}
}
Auto Trait Implementations§
impl Freeze for JsVarParser
impl RefUnwindSafe for JsVarParser
impl Send for JsVarParser
impl Sync for JsVarParser
impl Unpin for JsVarParser
impl UnwindSafe for JsVarParser
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more