Skip to main content

Module sitemap

Module sitemap 

Source
Expand description

Sitemap / sitemap-index source adapter Sitemap / sitemap-index ScrapingService adapter

Parses XML sitemaps (<urlset>) and sitemap index files (<sitemapindex>), emitting discovered URLs with metadata for downstream pipeline nodes.

Supports:

  • Standard sitemaps (<urlset> with <url> entries)
  • Sitemap index files (<sitemapindex> with nested <sitemap> refs)
  • Gzipped sitemaps (.xml.gz) via flate2
  • Filtering by lastmod date range or priority threshold

§Example

use stygian_graph::adapters::sitemap::SitemapAdapter;
use stygian_graph::ports::{ScrapingService, ServiceInput};
use serde_json::json;

let adapter = SitemapAdapter::new(reqwest::Client::new(), 5);
let input = ServiceInput {
    url: "https://example.com/sitemap.xml".into(),
    params: json!({}),
};
let output = adapter.execute(input).await.unwrap();
println!("{}", output.data); // JSON array of discovered URLs

Structs§

SitemapAdapter
Sitemap / sitemap-index source adapter.
SitemapEntry
A single URL entry extracted from a sitemap.