halldyll-core 0.1.0

Core scraping engine for Halldyll - high-performance async web scraper for AI agents
Documentation
//! # Halldyll Core
//!
//! High-performance async web scraping engine designed for AI data collection.
//!
//! ## Features
//!
//! - **Async HTTP Fetching**: Connection pooling, compression, retries with exponential backoff
//! - **Crawl Management**: URL normalization (RFC 3986), frontier scheduling, deduplication
//! - **Politeness**: robots.txt (RFC 9309), adaptive rate limiting per domain
//! - **Content Extraction**: Text, links, images, videos, structured data (JSON-LD, OpenGraph)
//! - **Security**: SSRF protection, domain allowlists, resource limits
//! - **Storage**: WARC (ISO 28500), snapshots with content hashing
//! - **Observability**: Structured logging, metrics, distributed tracing
//!
//! ## Example
//!
//! ```rust,no_run
//! use halldyll_core::{Orchestrator, Config};
//! use url::Url;
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//!     let config = Config::default();
//!     let orchestrator = Orchestrator::new(config)?;
//!     
//!     let url = Url::parse("https://example.com")?;
//!     let result = orchestrator.scrape(&url).await?;
//!     
//!     println!("Title: {:?}", result.document.title);
//!     println!("Text length: {}", result.document.main_text.len());
//!     Ok(())
//! }
//! ```

#![warn(missing_docs)]
#![warn(clippy::all)]
#![deny(unsafe_code)]

pub mod types;
pub mod fetch;
pub mod crawl;
pub mod politeness;
pub mod parse;
pub mod render;
pub mod storage;
pub mod security;
pub mod observe;
pub mod sitemap;
pub mod orchestrator;

// Re-exports for convenience
pub use types::{Document, Assets, Provenance, Error, Config};
pub use types::error::Result;
pub use orchestrator::Orchestrator;

// Production-ready utilities
pub use fetch::{CircuitBreaker, CircuitBreakerConfig};
pub use observe::{
    HealthChecker, HealthResponse, HealthStatus, HealthMetrics,
    PrometheusExporter, MetricsCollector, MetricsSnapshot,
    GracefulShutdown, ShutdownResult,
};