Crate web2llm

Expand description

§web2llm

web2llm is a high-performance Rust crate designed to fetch web pages and convert their core content into clean, token-efficient Markdown. It’s optimized for feeding data into Large Language Models (LLMs) and RAG pipelines.

§Key Features

High Performance: Zero-copy tree traversal, LTO, and efficient scoring.
Clean Output: Strips navigation, headers, footers, and non-essential attributes.
Shared Browser: Single persistent headless Chromium instance for dynamic pages (requires rendered feature).
Adaptive Fetch: Automatically detects SPAs and uses a browser fallback for full rendering.
SSRF Protection: Validates URLs and blocks private host access by default.
Robots.txt Compliance: Optionally respects robots.txt rules.
Rate Limiting: Built-in support for throttling and concurrency control.
Recursive Crawling: Discovers in-content links breadth-first and fetches them in one batch.

§Quick Start

The easiest way to get started is using the convenience fetch function:

use web2llm::fetch;

#[tokio::main]
async fn main() {
    // Fetch a page with default configuration
    match fetch("https://example.com".to_string()).await {
        Ok(result) => {
            println!("Title: {}", result.title);
            println!("Markdown content:\n{}", result.markdown());
        }
        Err(e) => eprintln!("Error: {}", e),
    }
}

For more control, use the Web2llm struct with a custom Web2llmConfig.

Re-exports§

pub use config::Web2llmConfig;
pub use error::Web2llmError;
pub use output::PageResult;

Modules§

config
error
output

Structs§

CrawlConfig: Configuration for recursive crawling.
Web2llm: The main entry point for the web2llm pipeline.

Enums§

FetchMode: Defines the strategy used to fetch a page.

Functions§

batch_fetch: Convenience function — fetches multiple urls using Web2llmConfig::default.
crawl: Convenience function — crawls url using Web2llmConfig::default.
fetch: Convenience function — fetches url using Web2llmConfig::default.

Crate web2llm

Crate web2llm Copy item path

§web2llm

§Key Features

§Quick Start

Re-exports§

Modules§

Structs§

Enums§

Functions§

Crate web2llm