website_crawler 0.2.3

crawl all urls on a website async & sync
Documentation

crawler

A11yWatch

Crawls websites to gather all possible pages really fast and uses gRPC.

Getting Started

Make sure to have Rust installed or use Docker. This project requires that you start up another gRPC server on port 50051 following proto spec - gRPC node example. This crawler is optimized for reduced latency and performance as it can handle over 10,000 pages within seconds.

  1. cargo run or docker compose up

Docker Image

You can use the program as a docker image.

a11ywatch/crawler.

Crate

You can use the crate to setup a gRPC server to run on the machine.

gRPC

In order to use the crawler atm you need to add the grpc client based in the proto location called website.proto. Streams support is in the making to remove the extra need for the client.

LICENSE

Check the license file in the root of the project.