Skip to main content

Crate web_capture

Crate web_capture 

Source
Expand description

§web-capture

A library and CLI/microservice to render web pages as HTML, Markdown, or PNG screenshots.

§Features

  • Fetch HTML content from URLs
  • Convert HTML to Markdown
  • Capture PNG screenshots of web pages
  • Convert relative URLs to absolute URLs
  • Support for headless browser rendering via browser-commander

§Example

use web_capture::{fetch_html, convert_html_to_markdown, capture_screenshot};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Fetch HTML from a URL
    let html = fetch_html("https://example.com").await?;
    println!("HTML length: {}", html.len());

    // Convert HTML to Markdown
    let markdown = convert_html_to_markdown(&html, Some("https://example.com"))?;
    println!("Markdown: {}", markdown);

    // Capture a screenshot
    let screenshot = capture_screenshot("https://example.com").await?;
    println!("Screenshot size: {} bytes", screenshot.len());

    Ok(())
}

Re-exports§

pub use browser::BrowserEngine;

Modules§

animation
Animation capture module (R2).
batch
Batch processing and configuration module (R7).
browser
Browser automation module
extract_images
Extract base64 data URI images from markdown and save as files.
figures
Figure image extraction and download module (R4).
gdocs
Google Docs capture module.
html
HTML processing module
latex
LaTeX formula extraction module (R1).
localize_images
Markdown image localization module (R5).
markdown
Markdown conversion module
metadata
Article metadata extraction module (R1).
postprocess
Markdown post-processing pipeline (R1).
themed_image
Dual-themed screenshot capture module (R3).
verify
Content verification module (R6).

Structs§

EnhancedMarkdownResult
Result of enhanced HTML-to-Markdown conversion.
EnhancedOptions
Options for enhanced HTML-to-Markdown conversion.

Enums§

WebCaptureError
Error types for web-capture operations

Constants§

VERSION
Version of the web-capture library

Functions§

capture_screenshot
Capture a PNG screenshot of a URL
convert_html_to_markdown
Convert HTML content to Markdown
convert_html_to_markdown_enhanced
Convert HTML to Markdown with enhanced options.
convert_relative_urls
Convert relative URLs to absolute URLs in HTML content
convert_to_utf8
Convert HTML content to UTF-8 encoding
fetch_html
Fetch HTML content from a URL
render_html
Render HTML content from a URL using a headless browser

Type Aliases§

Result
Result type for web-capture operations