Expand description
§web-capture
A library and CLI/microservice to render web pages as HTML, Markdown, or PNG screenshots.
§Features
- Fetch HTML content from URLs
- Convert HTML to Markdown
- Capture PNG screenshots of web pages
- Convert relative URLs to absolute URLs
- Support for headless browser rendering via browser-commander
§Example
use web_capture::{fetch_html, convert_html_to_markdown, capture_screenshot};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Fetch HTML from a URL
let html = fetch_html("https://example.com").await?;
println!("HTML length: {}", html.len());
// Convert HTML to Markdown
let markdown = convert_html_to_markdown(&html, Some("https://example.com"))?;
println!("Markdown: {}", markdown);
// Capture a screenshot
let screenshot = capture_screenshot("https://example.com").await?;
println!("Screenshot size: {} bytes", screenshot.len());
Ok(())
}Re-exports§
pub use browser::BrowserEngine;
Modules§
- animation
- Animation capture module (R2).
- batch
- Batch processing and configuration module (R7).
- browser
- Browser automation module
- extract_
images - Extract base64 data URI images from markdown and save as files.
- figures
- Figure image extraction and download module (R4).
- gdocs
- Google Docs capture module.
- html
- HTML processing module
- latex
- LaTeX formula extraction module (R1).
- localize_
images - Markdown image localization module (R5).
- markdown
- Markdown conversion module
- metadata
- Article metadata extraction module (R1).
- postprocess
- Markdown post-processing pipeline (R1).
- themed_
image - Dual-themed screenshot capture module (R3).
- verify
- Content verification module (R6).
Structs§
- Enhanced
Markdown Result - Result of enhanced HTML-to-Markdown conversion.
- Enhanced
Options - Options for enhanced HTML-to-Markdown conversion.
Enums§
- WebCapture
Error - Error types for web-capture operations
Constants§
- VERSION
- Version of the web-capture library
Functions§
- capture_
screenshot - Capture a PNG screenshot of a URL
- convert_
html_ to_ markdown - Convert HTML content to Markdown
- convert_
html_ to_ markdown_ enhanced - Convert HTML to Markdown with enhanced options.
- convert_
relative_ urls - Convert relative URLs to absolute URLs in HTML content
- convert_
to_ utf8 - Convert HTML content to UTF-8 encoding
- fetch_
html - Fetch HTML content from a URL
- render_
html - Render HTML content from a URL using a headless browser
Type Aliases§
- Result
- Result type for web-capture operations