maman 0.5.1

Rust Web Crawler
Documentation

Maman

Maman is a Rust Web Crawler saving pages on Redis.

Pages are send to list <MAMAN_ENV>:queue:maman using Sidekiq job format:

{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
    "document":"<html><body><a href='#' /><a href='/new' /></html>",
    "urls": ["http://example.net/new"],
    "extra": [],
    "headers": {"content-type": "text/html"},
    "url": "http://example.net/"
    }
}

Dependencies

  • Redis

Installation

cargo install maman

Usage

REDIS_URL="redis://127.0.0.1/" maman URL [LIMIT]

LIMIT must be an interger or 0 is the default, meaning no limit.

LICENSE

The MIT License

Copyright (c) 2016 Laurent Arnoud laurent@spkdev.net


Build Version License Project status