Webpage.rs
Small library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more
Usage
use webpage::{Webpage, WebpageOptions};
let info = Webpage::from_url("http://www.rust-lang.org/en-US/", WebpageOptions::default())
.expect("Could not read from URL");
let http = info.http;
assert_eq!(http.ip, "54.192.129.71".to_string());
assert!(http.headers[0].starts_with("HTTP"));
assert!(http.body.starts_with("<!DOCTYPE html>"));
assert_eq!(http.url, "https://www.rust-lang.org/en-US/".to_string()); assert_eq!(http.content_type, "text/html".to_string());
let html = info.html;
assert_eq!(html.title, Some("The Rust Programming Language".to_string()));
assert_eq!(html.description, Some("A systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.".to_string()));
assert_eq!(html.opengraph.og_type, "website".to_string());
You can also get HTML info about local data:
use webpage::HTML;
let html = HTML::from_file("index.html", None);
Features
Serialization
If you need to be able to serialize the data provided by the library using serde, you can include specify the serde
feature while declaring your dependencies in Cargo.toml
:
webpage = { version = "1.1", features = ["serde"] }
No curl dependency
The curl
feature is enabled by default but is optional. This is useful if you do not need a HTTP client but already have the HTML data at hand.
All fields
pub struct Webpage {
pub http: HTTP, pub html: HTML, }
pub struct HTTP {
pub ip: String,
pub transfer_time: Duration,
pub redirect_count: u32,
pub content_type: String,
pub response_code: u32,
pub headers: Vec<String>, pub url: String, pub body: String,
}
pub struct HTML {
pub title: Option<String>,
pub description: Option<String>,
pub url: Option<String>, pub feed: Option<String>,
pub language: Option<String>, pub text_content: String,
pub meta: HashMap<String, String>,
pub opengraph: Opengraph,
pub schema_org: Vec<SchemaOrg>,
}
pub struct Opengraph {
pub og_type: String,
pub properties: HashMap<String, String>,
pub images: Vec<Object>,
pub videos: Vec<Object>,
pub audios: Vec<Object>,
}
pub struct OpengraphObject {
pub url: String,
pub properties: HashMap<String, String>,
}
pub struct SchemaOrg {
pub schema_type: String,
pub value: serde_json::Value,
}
Options
The following configurations are available:
pub struct WebpageOptions {
allow_insecure: false,
follow_location: true,
max_redirections: 5,
timeout: Duration::from_secs(10),
useragent: "Webpage - Rust crate - https:}
let options = WebpageOptions { allow_insecure: true, ..Default::default() };
let info = Webpage::from_url(&url, options).expect("Halp, could not fetch");