Expand description
Sitemap / sitemap-index source adapter
Sitemap / sitemap-index ScrapingService adapter
Parses XML sitemaps (<urlset>) and sitemap index files (<sitemapindex>),
emitting discovered URLs with metadata for downstream pipeline nodes.
Supports:
- Standard sitemaps (
<urlset>with<url>entries) - Sitemap index files (
<sitemapindex>with nested<sitemap>refs) - Gzipped sitemaps (
.xml.gz) viaflate2 - Filtering by
lastmoddate range orprioritythreshold
§Example
use stygian_graph::adapters::sitemap::SitemapAdapter;
use stygian_graph::ports::{ScrapingService, ServiceInput};
use serde_json::json;
let adapter = SitemapAdapter::new(reqwest::Client::new(), 5);
let input = ServiceInput {
url: "https://example.com/sitemap.xml".into(),
params: json!({}),
};
let output = adapter.execute(input).await.unwrap();
println!("{}", output.data); // JSON array of discovered URLsStructs§
- Sitemap
Adapter - Sitemap / sitemap-index source adapter.
- Sitemap
Entry - A single URL entry extracted from a sitemap.