docs.rs failed to build mocra-0.2.7
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
mocra-0.1.6
mocra
A distributed, event-driven crawling and data collection framework for Rust.
English | 中文
mocra is a Rust framework for building scalable data collection pipelines. It models crawling as queue-driven processing stages orchestrated by a DAG execution engine, with automatic scaling from single-process to distributed clusters.
Features
- DAG-based module system — define linear chains or custom fan-out/fan-in graphs
- Queue-driven pipeline — task → request → download → parse → store, fully decoupled
- Auto-scaling runtime — single-node (in-memory) or distributed (Redis/Kafka) with zero code changes
- Bounded concurrency — semaphore-controlled worker pools with pause/resume/shutdown
- Middleware pipeline — download, data transformation, and storage middleware with weight-based ordering
- Built-in control plane — HTTP API for health, metrics, pause/resume, task injection, and DLQ inspection
- Prometheus metrics — unified
mocra_*metric families for throughput, latency, errors, and backlog - Cron scheduling — periodic task execution with cron expressions
- Error recovery — policy-driven retry, fallback gates, circuit breakers, and dead-letter queues
Quick Start
Add to your Cargo.toml:
[]
= "0.2"
= "0.1"
= { = "1", = ["full"] }
= "1"
= "0.3"
Write a module:
use Arc;
use async_trait;
use stream;
use ;
use *;
use LoginInfo;
use State;
use Engine;
;
;
async
Create config.toml:
= "my_crawler"
[]
= "sqlite://data/crawler.db?mode=rwc"
[]
= 60
[]
= 30
= 10
[]
= 3
[]
= 5000
Run:
Architecture
┌─────────────────────────────────────────────────────────┐
│ Engine │
│ │
│ TaskEvent ──▶ generate() ──▶ download() ──▶ parser() │
│ │ │ │ │ │
│ [Task Q] [Request Q] [Response Q] [Parser Q] │
│ │ │
│ ┌────────────┬──────┘ │
│ ▼ ▼ │
│ [Data Store] [Next Node] │
│ [Error Q → DLQ] │
└─────────────────────────────────────────────────────────┘
Each stage is decoupled by a message queue. Queues are local Tokio channels in single-node mode, or Redis Streams / Kafka in distributed mode — same code, zero changes.
DAG Execution
Define complex pipelines with fan-out and fan-in:
async
┌── branch_a ──┐
start ─┤ ├── merge
└── branch_b ──┘
Single-Node vs Distributed
| Single-Node | Distributed | |
|---|---|---|
| Config | No cache.redis |
Add cache.redis |
| Queues | Tokio mpsc (in-memory) | Redis Streams or Kafka |
| Cache | In-memory | Redis |
| Locks | Local mutex | Redis distributed locks |
| Workers | 1 process | N processes, same binary |
| Code changes | None | None |
Switch to distributed by adding Redis to your config:
[]
= "redis://localhost:6379"
Documentation
| Document | Description |
|---|---|
| Getting Started | Installation and first module |
| Architecture | System design and pipeline internals |
| Module Development | ModuleTrait, ModuleNodeTrait, data passing |
| DAG Guide | DAG definition, fan-out/fan-in, advance gates |
| Middleware | Download, data, and storage middleware |
| Configuration | Full TOML configuration reference |
| API Reference | HTTP control plane endpoints |
| Deployment | Single-node, distributed, monitoring |
Examples
See the simple/ directory for runnable examples:
simple/module_node_trait_dag.rs— Fan-out/fan-in DAG module
Monitoring
# Start Prometheus + Grafana
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000
# Metrics: http://localhost:8080/metrics
License
Licensed under either of:
- MIT license
- Apache License, Version 2.0
at your option.