scatter-proxy
Async request scheduler for unreliable SOCKS5 proxies — multi-path race for maximum throughput.
Features
- Per-host pool isolation — [
ScatterProxyRouter] assigns each target host its own proxy pool, health tracker, and scheduler with its own independentScatterProxyConfig. A struggling host cannot starve tasks for healthy hosts, and proxy eviction is scoped per-host. - Multi-path race — fan out each request across K proxies simultaneously; first successful response wins, losers are cancelled.
- Adaptive fan-out — K auto-adjusts based on current success rate and the number of healthy proxies; boosts automatically during cold start to collect data fast.
- Never-fail scheduler — the scheduler retries indefinitely; tasks are never dropped due to attempt limits or internal timeouts. Caller-side deadlines are opt-in via
handle.with_timeout()ortokio::time::timeout. - Per-(proxy, host) rate limiting — configurable minimum intervals with per-host overrides prevent tripping upstream abuse detection.
- Health tracking — sliding-window success/failure counters per (proxy, host) pair with latency stats.
- Exponential-backoff cooldown — consecutively-failing (proxy, host) pairs back off automatically.
- Delayed retry queue — temporarily unrunnable tasks are parked until their next eligible time instead of hot-loop requeueing.
- Proxy eviction — proxies with 0% global success rate after sufficient samples are marked dead.
socks5h://by default — DNS resolution happens on the proxy side, preventing local DNS leaks.- State persistence — JSON snapshots with atomic writes enable hot restarts with no warm-up penalty.
- Automatic source refresh — proxy lists are periodically re-fetched from configured URLs.
- Pluggable body classifier — implement the
BodyClassifiertrait to decide whether a response is good, blocked, or errored. - Observability — structured logging via
tracingwith periodic metrics summaries and real-timePoolMetrics.
Quick Start
use ;
use HashMap;
use Duration;
async
Per-Host Pool Isolation
When crawling multiple target sites, use ScatterProxyRouter to give each host its own independent proxy pool. A host where all proxies are in cooldown or getting blocked cannot affect throughput for other hosts.
use ;
async
Each host's metrics log lines are automatically prefixed with [hostname] so multi-host
output is easy to grep:
INFO [szse.cn] throughput=2.1/s | success=18% | pool: 4016 healthy …
INFO [sse.com.cn] throughput=8.4/s | success=71% | pool: 4016 healthy …
Custom Classifier
Implement BodyClassifier to control how responses are categorised:
use ;
;
// Then pass it when constructing the pool:
// let pool = ScatterProxy::new(config, MyClassifier).await?;
Configuration
All fields on ScatterProxyConfig with their types and defaults:
| Field | Type | Default | Description |
|---|---|---|---|
sources |
Vec<String> |
[] (uses built-in free proxy lists) |
URLs of proxy source lists (line-delimited ip:port). When empty, defaults to DEFAULT_PROXY_SOURCES — a curated set of free SOCKS5 lists. |
source_refresh_interval |
Duration |
600s |
How often to re-fetch proxy sources |
rate_limit |
RateLimitConfig |
(see below) | Per-(proxy, host) rate-limiting settings |
proxy_timeout |
Duration |
8s |
Timeout for a single proxy connection attempt |
max_concurrent_per_request |
usize |
3 |
Base number of proxy paths raced per request (K); boosted automatically during cold start |
max_inflight |
usize |
100 |
Global in-flight concurrency limit |
task_pool_capacity |
usize |
1000 |
Maximum number of pending tasks in the pool |
health_window |
usize |
30 |
Sliding window size for health tracking |
cooldown_base |
Duration |
30s |
Base cooldown after consecutive failures |
cooldown_max |
Duration |
300s |
Maximum cooldown duration |
cooldown_consecutive_fails |
usize |
3 |
Consecutive failures before entering cooldown |
eviction_min_samples |
usize |
30 |
Minimum samples before a proxy can be evicted |
state_file |
Option<PathBuf> |
None |
File path for JSON state persistence |
state_save_interval |
Duration |
300s |
How often to persist state to disk |
metrics_log_interval |
Duration |
30s |
How often to log the metrics summary line |
prefer_remote_dns |
bool |
true |
Use socks5h:// for remote DNS resolution |
name |
Option<String> |
None |
Label prepended to metrics log lines as [name]; set automatically by ScatterProxyRouter |
RateLimitConfig fields:
| Field | Type | Default | Description |
|---|---|---|---|
default_interval |
Duration |
500ms |
Minimum interval between requests per (proxy, host) pair |
host_overrides |
HashMap<String, Duration> |
{} |
Per-host interval overrides |
Architecture
┌──────────────┐
│ Client Code │
└──────┬───────┘
│ submit(request)
▼
┌──────────────┐
│ ScatterProxy│
│ (TaskPool) │
└──────┬───────┘
│ pick K healthy proxies
▼
┌─────────────────────┐
│ Scheduler │
│ ┌───────────────┐ │
│ │ Rate Limiters │ │
│ │ Health Scores │ │
│ │ Ready Queue │ │
│ │ Delayed Heap │ │
│ │ Adaptive K │ │
│ └───────────────┘ │
└─────────┬──────────-┘
│ fan-out K paths
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Proxy #1 │ │ Proxy #2 │ │ Proxy #3 │
│ socks5h │ │ socks5h │ │ socks5h │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└─────────────┼─────────────┘
│ first good response wins
▼
┌───────────────────┐
│ BodyClassifier │
│ → Success │
│ → ProxyBlocked │
│ → TargetError │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ ScatterResponse │
│ {status, headers, │
│ body} │
└───────────────────┘
Background tasks:
• Scheduler workers — drain ready queue, promote delayed tasks, dispatch fan-out attempts
• State persistence — periodic JSON snapshots (atomic writes)
• Metrics logger — periodic tracing summary lines
• Source refresh — re-fetches proxy lists on interval
Integration Testing
Unit tests run without any external dependencies:
Integration tests exercise the full pipeline with real free SOCKS5 proxy sources from the internet. They are marked #[ignore] by default. To run them:
SCATTER_INTEGRATION=1
Note: Integration tests require network access and depend on third-party proxy sources being available. They may be flaky in CI due to proxy churn.
Migration Notes
0.8.0
ScatterProxyRouter::new()signature changed. The second argument (sharedScatterProxyConfig) is gone; instead pass(host, ScatterProxyConfig)tuples so each host is configured independently. See the example above.direct://localhostsupport has been fully removed fromProxyManager. All connections go through SOCKS5 proxies.
0.5.0
- The circuit breaker subsystem has been removed. Temporary unrunnable tasks are now handled by the delayed retry queue.
PoolMetricsno longer exposes circuit-breaker state. New counters include delayed tasks, requeues, zero-available events, dispatch count, and skip reasons.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.