Crate webpage_info

Crate webpage_info 

Source
Expand description

§webpage-info

A modern Rust library to extract metadata from web pages: title, description, OpenGraph, Schema.org, links, and more.

§Features

  • Parse HTML from strings, files, or URLs
  • Extract common metadata (title, description, language)
  • Parse OpenGraph protocol data
  • Parse Schema.org JSON-LD structured data
  • Extract all links from the document
  • Async HTTP client with configurable options

§Quick Start

use webpage_info::WebpageInfo;

#[tokio::main]
async fn main() -> webpage_info::Result<()> {
    // Fetch and parse a webpage
    let info = WebpageInfo::fetch("https://example.org").await?;

    println!("Title: {:?}", info.html.title);
    println!("Description: {:?}", info.html.description);
    println!("Links: {}", info.html.links.len());

    Ok(())
}

§Parsing Local HTML

use webpage_info::HtmlInfo;

let html = "<html><head><title>Hello</title></head><body>World</body></html>";
let info = HtmlInfo::from_string(html, None).unwrap();
assert_eq!(info.title, Some("Hello".to_string()));

§Custom HTTP Options

use std::time::Duration;
use webpage_info::{WebpageInfo, HttpOptions};

#[tokio::main]
async fn main() -> webpage_info::Result<()> {
    let options = HttpOptions::new()
        .timeout(Duration::from_secs(60))
        .user_agent("MyBot/1.0")
        .allow_insecure(true);

    let info = WebpageInfo::fetch_with_options("https://example.org", options).await?;
    Ok(())
}

§Without HTTP (parsing only)

If you don’t need HTTP fetching, disable the default http feature:

[dependencies]
webpage-info = { version = "1.0", default-features = false }

Structs§

HtmlInfo
Parsed HTML document information.
HttpInfo
HTTP response information.
HttpOptions
Configuration for HTTP requests.
Link
A link found in the HTML document.
Opengraph
OpenGraph metadata for a webpage.
OpengraphMedia
Media object (image, video, or audio) in OpenGraph.
SchemaOrg
Schema.org structured data item.
WebpageInfo
Complete webpage information including HTTP and HTML data.

Enums§

Error
Errors that can occur when fetching or parsing webpage information.

Type Aliases§

Result
Result type alias for webpage-info operations.