Expand description
§webpage-info
A modern Rust library to extract metadata from web pages: title, description, OpenGraph, Schema.org, links, and more.
§Features
- Parse HTML from strings, files, or URLs
- Extract common metadata (title, description, language)
- Parse OpenGraph protocol data
- Parse Schema.org JSON-LD structured data
- Extract all links from the document
- Async HTTP client with configurable options
§Quick Start
use webpage_info::WebpageInfo;
#[tokio::main]
async fn main() -> webpage_info::Result<()> {
// Fetch and parse a webpage
let info = WebpageInfo::fetch("https://example.org").await?;
println!("Title: {:?}", info.html.title);
println!("Description: {:?}", info.html.description);
println!("Links: {}", info.html.links.len());
Ok(())
}§Parsing Local HTML
use webpage_info::HtmlInfo;
let html = "<html><head><title>Hello</title></head><body>World</body></html>";
let info = HtmlInfo::from_string(html, None).unwrap();
assert_eq!(info.title, Some("Hello".to_string()));§Custom HTTP Options
use std::time::Duration;
use webpage_info::{WebpageInfo, HttpOptions};
#[tokio::main]
async fn main() -> webpage_info::Result<()> {
let options = HttpOptions::new()
.timeout(Duration::from_secs(60))
.user_agent("MyBot/1.0")
.allow_insecure(true);
let info = WebpageInfo::fetch_with_options("https://example.org", options).await?;
Ok(())
}§Without HTTP (parsing only)
If you don’t need HTTP fetching, disable the default http feature:
[dependencies]
webpage-info = { version = "1.0", default-features = false }Structs§
- Html
Info - Parsed HTML document information.
- Http
Info - HTTP response information.
- Http
Options - Configuration for HTTP requests.
- Link
- A link found in the HTML document.
- Opengraph
- OpenGraph metadata for a webpage.
- Opengraph
Media - Media object (image, video, or audio) in OpenGraph.
- Schema
Org - Schema.org structured data item.
- Webpage
Info - Complete webpage information including HTTP and HTML data.
Enums§
- Error
- Errors that can occur when fetching or parsing webpage information.
Type Aliases§
- Result
- Result type alias for webpage-info operations.