Skip to main content

Crate web_capture

Crate web_capture 

Source
Expand description

§web-capture

A library and CLI/microservice to render web pages as HTML, Markdown, or PNG screenshots.

§Features

  • Fetch HTML content from URLs
  • Convert HTML to Markdown
  • Capture PNG screenshots of web pages
  • Convert relative URLs to absolute URLs
  • Support for headless browser rendering via browser-commander

§Example

use web_capture::{fetch_html, convert_html_to_markdown, capture_screenshot};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Fetch HTML from a URL
    let html = fetch_html("https://example.com").await?;
    println!("HTML length: {}", html.len());

    // Convert HTML to Markdown
    let markdown = convert_html_to_markdown(&html, Some("https://example.com"))?;
    println!("Markdown: {}", markdown);

    // Capture a screenshot
    let screenshot = capture_screenshot("https://example.com").await?;
    println!("Screenshot size: {} bytes", screenshot.len());

    Ok(())
}

Re-exports§

pub use browser::BrowserEngine;
pub use search::search;
pub use search::SearchDiagnostics;
pub use search::SearchResult;
pub use search::SearchResultItem;
pub use search::DEFAULT_LIMIT;
pub use search::DEFAULT_PROVIDER;
pub use search::SEARCH_PROVIDERS;

Modules§

animation
Animation capture module (R2).
archive
Build a self-contained ZIP archive from raw HTML.
batch
Batch processing and configuration module (R7).
browser
Browser automation module
extract_images
Extract base64 data URI images from markdown and save as files.
figures
Figure image extraction and download module (R4).
gdocs
Google Docs capture module.
github
GitHub repository-page capture helpers.
html
HTML processing module
kreuzberg
Kreuzberg html-to-markdown integration module.
latex
LaTeX formula extraction module (R1).
localize_images
Markdown image localization module (R5).
markdown
Markdown conversion module
metadata
Article metadata extraction module (R1).
postprocess
Markdown post-processing pipeline (R1).
search
Structured search-provider capture (issue #130).
themed_image
Dual-themed screenshot capture module (R3).
verify
Content verification module (R6).
xpaste
xpaste.pro URL helpers shared by the CLI and HTTP server.

Structs§

EnhancedMarkdownResult
Result of enhanced HTML-to-Markdown conversion.
EnhancedOptions
Options for enhanced HTML-to-Markdown conversion.

Enums§

WebCaptureError
Error types for web-capture operations

Constants§

VERSION
Version of the web-capture library

Functions§

capture_screenshot
Capture a PNG screenshot of a URL
convert_html_to_markdown
Convert HTML content to Markdown
convert_html_to_markdown_enhanced
Convert HTML to Markdown with enhanced options.
convert_relative_urls
Convert relative URLs to absolute URLs in HTML content
convert_to_utf8
Convert HTML content to UTF-8 encoding
convert_with_kreuzberg
Convert HTML to Markdown using the kreuzberg html-to-markdown library.
convert_with_kreuzberg_enhanced
Convert HTML to Markdown using kreuzberg after applying enhanced scoping options.
fetch_html
Fetch HTML content from a URL
render_html
Render HTML content from a URL using a headless browser

Type Aliases§

Result
Result type for web-capture operations