# Load Balancing
Load balancing distributes requests across worker nodes to optimize resource utilization, minimize response time, and prevent overload. RpcNet provides multiple strategies to suit different workload patterns.
## Available Strategies
RpcNet includes three built-in load balancing strategies:
```rust
use rpcnet::cluster::LoadBalancingStrategy;
// Available strategies
LoadBalancingStrategy::RoundRobin // Even distribution
LoadBalancingStrategy::Random // Random selection
LoadBalancingStrategy::LeastConnections // Pick least loaded (recommended)
```
### 1. Round Robin
Distributes requests evenly across all available workers in sequence.
```
Request Flow:
Request 1 → Worker A
Request 2 → Worker B
Request 3 → Worker C
Request 4 → Worker A (cycle repeats)
Request 5 → Worker B
...
```
**Algorithm**:
```rust
fn select_worker(&mut self, workers: &[Worker]) -> &Worker {
let worker = &workers[self.index % workers.len()];
self.index += 1;
worker
}
```
**When to use**:
- ✅ Workers have identical capabilities
- ✅ Requests have similar processing time
- ✅ Simple, predictable distribution needed
- ❌ Workers have different performance characteristics
- ❌ Requests vary significantly in complexity
**Pros**:
- Simple and deterministic
- Perfect load distribution over time
- No state tracking required
**Cons**:
- Doesn't account for current load
- Doesn't handle heterogeneous workers well
- Can send requests to overloaded nodes
### 2. Random
Selects a random worker for each request.
```
Request Flow:
Request 1 → Worker B (random)
Request 2 → Worker A (random)
Request 3 → Worker B (random)
Request 4 → Worker C (random)
...
```
**Algorithm**:
```rust
fn select_worker(&self, workers: &[Worker]) -> &Worker {
let idx = rand::thread_rng().gen_range(0..workers.len());
&workers[idx]
}
```
**When to use**:
- ✅ Stateless workloads
- ✅ Workers have identical capabilities
- ✅ No session affinity required
- ✅ Want to avoid coordinating state across requestors
- ❌ Need predictable distribution
**Pros**:
- No coordination required (fully stateless)
- Good distribution with large request counts
- Simple implementation
**Cons**:
- Uneven short-term distribution
- Doesn't account for current load
- Probabilistic rather than deterministic
### 3. Least Connections (Recommended)
Selects the worker with the fewest active connections.
```
Worker Status:
Worker A: 5 active connections
Worker B: 2 active connections ← SELECTED
Worker C: 8 active connections
Next request → Worker B (has least connections)
```
**Algorithm**:
```rust
fn select_worker(&self, workers: &[Worker]) -> &Worker {
workers
.iter()
.min_by_key(|w| w.active_connections.load(Ordering::Relaxed))
.unwrap()
}
```
**When to use**:
- ✅ Long-lived connections (streaming, websockets)
- ✅ Variable request processing time
- ✅ Workers have different capacities
- ✅ **Recommended default for most use cases**
- ❌ Very short requests (overhead not worth it)
**Pros**:
- Adapts to actual load in real-time
- Handles heterogeneous workers well
- Prevents overload automatically
**Cons**:
- Slight overhead tracking connection counts
- Requires connection counting infrastructure
## Using Load Balancing
### With WorkerRegistry
```rust
use rpcnet::cluster::{WorkerRegistry, LoadBalancingStrategy};
// Create registry with desired strategy
let registry = Arc::new(WorkerRegistry::new(
cluster,
LoadBalancingStrategy::LeastConnections // Change strategy here
));
registry.start().await;
// Select worker automatically using configured strategy
let worker = registry.select_worker(Some("role=worker")).await?;
println!("Selected worker: {} at {}", worker.label, worker.addr);
```
### With ClusterClient
```rust
use rpcnet::cluster::{ClusterClient, ClusterClientConfig};
// ClusterClient uses the registry's configured strategy
let config = ClusterClientConfig::default();
let client = Arc::new(ClusterClient::new(registry, config));
// Automatic load-balanced routing
let result = client.call_worker("compute", request, Some("role=worker")).await?;
```
## Strategy Comparison
### Performance Characteristics
| **Round Robin** | O(1) | O(1) | Low | Uniform loads |
| **Random** | O(1) | O(1) | Medium | Stateless |
| **Least Connections** | O(N) | O(N) | High | Variable loads |
### Distribution Quality
**Test scenario**: 1000 requests to 3 workers with varying processing times
| **Round Robin** | 333 | 333 | 334 | 0.58 |
| **Random** | 328 | 345 | 327 | 9.86 |
| **Least Connections** | 280 | 390 | 330 | 55.52 |
**Note**: Round Robin appears most even, but this ignores actual load (processing time per request). Least Connections adapts to real load.
### Real-World Scenarios
#### Scenario 1: Identical Workers, Uniform Requests
```
Workers: 3x m5.large (identical)
Requests: 1KB data, 50ms processing
```
**Best strategy**: Round Robin or Random
- All strategies perform similarly
- Round Robin slightly more predictable
#### Scenario 2: Heterogeneous Workers
```
Workers:
- 2x m5.large (2 CPU, 8GB RAM)
- 1x m5.xlarge (4 CPU, 16GB RAM)
Requests: CPU-intensive (100-500ms)
```
**Best strategy**: Least Connections
- Larger worker naturally gets more requests
- Prevents overload on smaller workers
#### Scenario 3: Variable Request Complexity
```
Workers: 3x m5.large (identical)
Requests:
- 70% simple (10ms)
- 20% medium (100ms)
- 10% complex (1000ms)
```
**Best strategy**: Least Connections
- Workers with complex requests get fewer new ones
- Prevents queue buildup
#### Scenario 4: Streaming Workloads
```
Workers: 3x GPU instances
Requests: Long-lived video transcoding streams
```
**Best strategy**: Least Connections
- Critical to balance active streams
- Round Robin would overload sequentially
## Advanced Techniques
### Weighted Load Balancing
Weight workers by capacity:
```rust
// Tag workers with capacity
cluster.set_tag("capacity", "100"); // Large worker
cluster.set_tag("capacity", "50"); // Small worker
// Custom selection logic
fn select_weighted_worker(workers: &[Worker]) -> &Worker {
let total_capacity: u32 = workers.iter()
.map(|w| w.tags.get("capacity").unwrap().parse::<u32>().unwrap())
.sum();
let mut rand_val = rand::thread_rng().gen_range(0..total_capacity);
for worker in workers {
let capacity = worker.tags.get("capacity").unwrap().parse::<u32>().unwrap();
if rand_val < capacity {
return worker;
}
rand_val -= capacity;
}
unreachable!()
}
```
### Locality-Aware Load Balancing
Prefer workers in the same zone/region:
```rust
async fn select_local_worker(
registry: &WorkerRegistry,
client_zone: &str,
) -> Result<Worker> {
// Try local workers first
let filter = format!("role=worker,zone={}", client_zone);
if let Ok(worker) = registry.select_worker(Some(&filter)).await {
return Ok(worker);
}
// Fall back to any worker
registry.select_worker(Some("role=worker")).await
}
```
### Affinity-Based Load Balancing
Route requests from the same client to the same worker:
```rust
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
fn select_with_affinity(client_id: &str, workers: &[Worker]) -> &Worker {
let mut hasher = DefaultHasher::new();
client_id.hash(&mut hasher);
let hash = hasher.finish() as usize;
&workers[hash % workers.len()]
}
```
**Use cases**:
- Session-based workloads
- Client-specific caching
- Stateful processing
### Load Shedding
Reject requests when all workers are overloaded:
```rust
async fn select_with_shedding(
registry: &WorkerRegistry,
max_connections: usize,
) -> Result<Worker> {
let worker = registry.select_worker(Some("role=worker")).await?;
if worker.active_connections >= max_connections {
return Err(anyhow::anyhow!("All workers at capacity"));
}
Ok(worker)
}
```
## Monitoring and Metrics
### Track Load Distribution
```rust
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::collections::HashMap;
struct LoadBalancerMetrics {
requests_per_worker: Arc<Mutex<HashMap<Uuid, AtomicUsize>>>,
}
impl LoadBalancerMetrics {
async fn record_request(&self, worker_id: Uuid) {
let mut map = self.requests_per_worker.lock().await;
map.entry(worker_id)
.or_insert_with(|| AtomicUsize::new(0))
.fetch_add(1, Ordering::Relaxed);
}
async fn get_distribution(&self) -> HashMap<Uuid, usize> {
let map = self.requests_per_worker.lock().await;
map.iter()
.map(|(id, count)| (*id, count.load(Ordering::Relaxed)))
.collect()
}
}
```
### Monitor Worker Health
```rust
async fn monitor_worker_load(registry: Arc<WorkerRegistry>) {
loop {
tokio::time::sleep(Duration::from_secs(10)).await;
let workers = registry.workers().await;
for worker in workers {
let load_pct = (worker.active_connections as f64 / worker.capacity as f64) * 100.0;
if load_pct > 80.0 {
log::warn!(
"Worker {} at {}% capacity ({} connections)",
worker.label,
load_pct,
worker.active_connections
);
}
// Report to metrics system
metrics::gauge!("worker.load_pct", load_pct, "worker" => worker.label.clone());
metrics::gauge!("worker.connections", worker.active_connections as f64, "worker" => worker.label.clone());
}
}
}
```
## Best Practices
### 1. Choose the Right Strategy
```rust
// Default recommendation
LoadBalancingStrategy::LeastConnections // Handles most cases well
// Use Round Robin if:
// - All workers identical
// - All requests uniform
// - Need deterministic distribution
// Use Random if:
// - Completely stateless
// - Multiple load balancers
// - Want to avoid coordination overhead
```
### 2. Tag Workers Appropriately
```rust
// Provide rich metadata for routing decisions
cluster.set_tag("role", "worker");
cluster.set_tag("capacity", "100");
cluster.set_tag("zone", "us-west-2a");
cluster.set_tag("instance_type", "m5.xlarge");
cluster.set_tag("gpu", "true");
```
### 3. Monitor Load Distribution
```rust
// Log worker selection for debugging
let worker = registry.select_worker(Some("role=worker")).await?;
log::debug!(
"Selected worker {} (connections: {})",
worker.label,
worker.active_connections
);
```
### 4. Handle No Workers Available
```rust
// Gracefully handle empty worker pool
match registry.select_worker(Some("role=worker")).await {
Ok(worker) => {
// Process with worker
}
Err(e) => {
log::error!("No workers available: {}", e);
// Return error to client or queue request
}
}
```
### 5. Test Under Load
```rust
// Benchmark different strategies
#[tokio::test]
async fn bench_load_balancing() {
let strategies = vec![
LoadBalancingStrategy::RoundRobin,
LoadBalancingStrategy::Random,
LoadBalancingStrategy::LeastConnections,
];
for strategy in strategies {
let registry = WorkerRegistry::new(cluster.clone(), strategy);
registry.start().await;
let start = Instant::now();
for _ in 0..10_000 {
registry.select_worker(Some("role=worker")).await?;
}
let duration = start.elapsed();
println!("{:?}: {:?}", strategy, duration);
}
}
```
## Troubleshooting
### Uneven Load Distribution
**Symptom**: One worker consistently gets more requests than others.
**Debug**:
```rust
// Check active connections
let workers = registry.workers().await;
for worker in workers {
println!("{}: {} connections", worker.label, worker.active_connections);
}
```
**Common causes**:
- Using Least Connections with short-lived requests (connections finish before next selection)
- Worker capacity differences not accounted for
- Some workers slower to release connections
**Solution**:
- Try Round Robin for uniform short requests
- Use weighted load balancing for heterogeneous workers
- Ensure connections are properly closed
### Worker Overload
**Symptom**: Workers running out of resources despite load balancing.
**Debug**:
```rust
// Monitor worker metrics
for worker in registry.workers().await {
println!(
"{}: {} connections (capacity: {})",
worker.label,
worker.active_connections,
worker.capacity
);
}
```
**Common causes**:
- Too few workers for load
- Worker capacity set too high
- Requests taking longer than expected
**Solution**:
- Add more workers
- Implement load shedding
- Scale worker resources
### Strategy Not Applied
**Symptom**: Load balancing seems random despite configuring strategy.
**Debug**:
```rust
// Verify registry configuration
println!("Strategy: {:?}", registry.strategy());
```
**Common causes**:
- Wrong registry instance used
- Strategy changed after initialization
- Multiple registries with different configs
**Solution**:
- Use single registry instance
- Configure strategy at creation time
- Pass registry via Arc for sharing
## Performance Impact
### Overhead by Strategy
Measured on 3-node cluster, 100K requests:
| **Round Robin** | 15ns | 0 bytes | 0.0015ms |
| **Random** | 42ns | 0 bytes | 0.0042ms |
| **Least Connections** | 180ns | 8 bytes | 0.018ms |
**Conclusion**: All strategies add negligible overhead (< 0.02ms) compared to network latency (~0.1-1ms).
### Throughput Impact
Load balancing does not reduce throughput:
```
Direct RPC (no load balancing): 172K RPS
With Round Robin: 171K RPS (-0.5%)
With Random: 170K RPS (-1.1%)
With Least Connections: 168K RPS (-2.3%)
```
**Conclusion**: Load balancing overhead is minimal, well worth the improved distribution.
## Next Steps
- **[Health Checking](health.md)** - Ensure selected workers are healthy
- **[Failures](failures.md)** - Handle worker failures gracefully
## References
- [Load Balancing Algorithms](https://en.wikipedia.org/wiki/Load_balancing_(computing)) - Overview of strategies
- [Least Connections Algorithm](https://www.nginx.com/resources/glossary/load-balancing/) - Industry standard
- [Consistent Hashing](https://en.wikipedia.org/wiki/Consistent_hashing) - Advanced affinity technique