crawlberg 1.0.1

High-performance web crawling engine
Documentation

crawlberg

Rust crate for crawlberg — the core high-performance web crawling engine used by every language binding. Provides async crawling, structured scrape output, SSRF-safe networking, browser rendering, and Markdown conversion.

What This Package Provides

  • Same crawler as every binding — one Rust engine behind Python, Node.js, Ruby, Go, Java, .NET, PHP, Elixir, Dart, Kotlin Android, Swift, Zig, WASM, and C FFI.
  • Structured scrape output — HTML, Markdown, metadata, links, assets, response headers, and extraction warnings with consistent field names.
  • Crawl controls — depth, page limits, concurrency, URL filters, robots/sitemap handling, rate limits, and partial failure reporting.
  • Rendering path — optional browser rendering for JavaScript-heavy pages; direct HTTP path for fast static pages.

Installation

cargo add crawlberg

Agent plugin

The crawlberg plugin is available via the xberg-io/plugins marketplace.

/plugin marketplace add xberg-io/plugins
/plugin install crawlberg@xberg

Works with Claude Code, Codex, Cursor, Gemini CLI, Factory Droid, GitHub Copilot CLI, and opencode. See the marketplace README for harness-specific install instructions.

Quick Start

use crawlberg::{create_engine, scrape};

#[tokio::main]
async fn main() -> crawlberg::Result<()> {
    let engine = create_engine(None)?;
    let result = scrape(&engine, "https://example.com").await?;

    println!("{}", result.metadata.title.as_deref().unwrap_or("(no title)"));
    println!("{}", result.markdown.content);
    println!("{} links", result.links.len());

    Ok(())
}

API Reference

Full API documentation is available at docs.crawlberg.xberg.io.

Key functions:

  • create_engine(config?) — Create a crawl engine with optional configuration
  • scrape(engine, url) — Scrape a single URL
  • crawl(engine, url) — Crawl a website following links
  • map_urls(engine, url) — Discover all pages on a site
  • batch_scrape(engine, urls) — Scrape multiple URLs concurrently
  • batch_crawl(engine, urls) — Crawl multiple seed URLs concurrently

Contributing

Contributions are welcome! Please see our Contributing Guide for details.

Part of Xberg.dev

  • Xberg — document intelligence: text, tables, metadata from 91+ formats with optional OCR.
  • Xberg Enterprise — managed extraction API with SDKs, dashboards, and observability.
  • crawlberg — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
  • html-to-markdown — fast, lossless HTML→Markdown engine.
  • liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
  • tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
  • alef — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.
  • Discord — community, roadmap, announcements.

License

This project is licensed under MIT License.

Links