html-meta-scraper 0.3.0

Scrape and extract metadata like title, description, images, and favicon from HTML documents.
Documentation
  • Coverage
  • 100%
    17 out of 17 items documented12 out of 17 items with examples
  • Size
  • Source code size: 53.87 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 441.28 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 9s Average build duration of successful builds.
  • all releases: 29s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • 46ki75/html-meta-scraper
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • 46ki75

html-meta-scraper

Scrape and extract metadata like title, description, images, and favicon from HTML documents.

Features

  • Extract <title>, OGP metadata (og:title, og:description, og:image)
  • Extract Twitter Card metadata (twitter:title, twitter:description, twitter:image)
  • Extract favicon (<link rel="icon" href="...">)
  • Prioritized fallback (e.g., og:titletwitter:title<title>)

Installation

Add this to your Cargo.toml:

[dependencies]
html-meta-scraper = "0.1.0"

Example

use html_meta_scraper::MetaScraper;

let html = r#"
    <html>
        <head>
            <meta property="og:title" content="Example Title" />
            <meta name="twitter:description" content="Example Description" />
            <link rel="icon" href="/favicon.ico" />
        </head>
    </html>
"#;

let scraper = MetaScraper::new(html);

assert_eq!(scraper.title(), Some("Example Title".to_string()));
assert_eq!(scraper.description(), Some("Example Description".to_string()));
assert_eq!(scraper.favicon(), Some("/favicon.ico".to_string()));

API Overview

Method Description
title() Retrieves page title (og:titletwitter:title<title>)
description() Retrieves page description (og:descriptiontwitter:descriptiondescription)
image() Retrieves page image URL (og:imagetwitter:image)
favicon() Retrieves favicon URL (<link rel="icon">)
lang() Retrieves language (<html lang="en">)
extract_* methods Low-level methods to extract specific metadata