Skip to main content

Module cartography

Module cartography 

Source
Expand description

Cartography engine: sitemap parsing, structured data extraction, feature encoding, and map assembly.

Modules§

action_encoder
Encode extracted actions into ActionRecord OpCodes.
feature_encoder
Encode extraction results into 128-float feature vectors.
mapper
Mapper: orchestrates the entire mapping process.
page_classifier
Classify a page using extraction results + URL patterns.
rate_limiter
Rate limiter for polite crawling.
robots
Parse robots.txt files.
sitemap
Parse sitemap.xml and sitemap index files.
url_classifier
Classify URLs by pattern into PageType.