Expand description
Cartography engine: sitemap parsing, structured data extraction, feature encoding, and map assembly.
Modules§
- action_
encoder - Encode extracted actions into ActionRecord OpCodes.
- feature_
encoder - Encode extraction results into 128-float feature vectors.
- mapper
- Mapper: orchestrates the entire mapping process.
- page_
classifier - Classify a page using extraction results + URL patterns.
- rate_
limiter - Rate limiter for polite crawling.
- robots
- Parse robots.txt files.
- sitemap
- Parse sitemap.xml and sitemap index files.
- url_
classifier - Classify URLs by pattern into PageType.