photon-etcd-cluster
A lightweight Rust library for cluster coordination using etcd. Provides leader election and node registry with minimal dependencies and no platform lock-in (unlike Kubernetes-native solutions).
Why photon-etcd-cluster?
- Platform-agnostic: Works anywhere etcd runs - bare metal, VMs, containers, or cloud
- Minimal dependencies: Only etcd required, no Kubernetes or other orchestrators
- Reactive API: Event-driven with broadcast channels and watch streams - no polling required
- Lock-free reads: O(1) access to cluster state via
watch::Receiver::borrow() - Node Metrics: Optional system metrics collection (CPU, memory, load)
Use Cases
- Organize distributed workers into logical groups
- Elect a single leader per group for coordination tasks (cache invalidation, job scheduling)
- Dynamic service discovery for load balancers
- Health monitoring with automatic failure detection
- Weighted load balancing based on real-time node metrics (CPU, memory, queue depth)
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
= { = "1", = ["rt-multi-thread", "macros"] }
For system metrics collection (CPU, memory, load average), enable the system-metrics feature:
[]
= { = "0.1", = ["system-metrics"] }
Worker Process (ClusterNode)
use ClusterNode;
use HealthStatus;
use broadcast;
async
Load Balancer / Service Discovery
use ;
async
Event-Driven Updates (Recommended)
React to cluster changes as they happen using broadcast events:
use ;
async
Watch-Based Metrics (Efficient State Observation)
Use watch channels for metrics or state observation - more efficient than polling:
use ServiceDiscovery;
async
Node Metrics for Weighted Load Balancing
Nodes can report system metrics (CPU, memory, load average) that load balancers can use for weighted traffic distribution. Enable the system-metrics feature and use ClusterNodeBuilder:
use ;
use broadcast;
async
Custom Metrics Collector
Implement MetricsCollector trait for application-specific metrics:
use ;
use json;
let node = new
.metrics_collector
.metrics_update_interval
.build;
Reading Node Metrics (Load Balancer Side)
use ;
let discovery = new;
// React to metric changes
let mut events = discovery.subscribe;
while let Ok = events.recv.await
// Or query directly
for node in discovery.nodes.iter
Standard Metadata Keys
The metadata_keys module provides standard key names:
| Key | Type | Description |
|---|---|---|
cpu_usage_percent |
f64 |
CPU usage (0-100%) |
memory_usage_percent |
f64 |
Memory usage (0-100%) |
memory_available_bytes |
u64 |
Available memory in bytes |
load_avg_1m |
f64 |
1-minute load average |
active_connections |
u32 |
Active connection count |
requests_per_second |
f64 |
Request throughput |
queue_depth |
u32 |
Pending work queue size |
Real-World Example: HTTP Load Balancer with Dynamic Backends
This example shows how to build a load balancer that automatically discovers backend servers using ServiceDiscovery. Based on actual production usage with Pingora.
use ;
use SocketAddr;
/// Manages dynamic backend discovery for a load balancer
Key patterns demonstrated:
- Multiple service groups: Separate discovery instances for different backend types (workers, cache, etc.)
- Lock-free reads:
discovery.nodes()is safe to call on every request - Leader routing: Route write operations to the elected leader
- Built-in wait helpers:
wait_for_nodes()uses watch channels internally - no polling
Architecture
┌─────────────────────────────────────────────────────────────┐
│ etcd Cluster │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ registry/{group} │ │ election/{group} │ │
│ │ /node-1 │ │ (leader key) │ │
│ │ /node-2 │ │ │ │
│ │ /node-N │ │ │ │
│ └──────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲ ▲
│ watch │ campaign/proclaim
│ │
┌──────────┴──────────┐ ┌───────┴────────┐
│ ServiceDiscovery │ │ ClusterNode │
│ - subscribe() │ │ - run() │
│ - watch_nodes() │ │ - is_leader() │
│ - nodes() │ │ - health │
│ - leader() │ │ │
└─────────────────────┘ └────────────────┘
Components
| Component | Purpose | Used By |
|---|---|---|
| ClusterNode | Node registration, leader election, health management | Worker processes |
| ClusterNodeBuilder | Fluent builder for ClusterNode with metrics configuration | Worker processes |
| ServiceDiscovery | Reactive cluster state via events and watch channels | Load balancers |
| ClusterEvent | Enum for cluster state changes (join/leave/leader/updated) | Event subscribers |
| Node | Serializable node data (id, ip, last_seen, metadata) | Both |
| MetricsCollector | Trait for collecting node metrics | Worker processes |
| SystemMetricsCollector | Built-in collector for CPU, memory, load (requires system-metrics feature) |
Worker processes |
Features
Leader Election
Uses etcd's native campaign/proclaim APIs for distributed consensus:
- Campaign: Blocks until leadership is acquired
- Proclaim: Periodic heartbeat to maintain leadership
- Resign: Graceful leadership handover on shutdown
Health Monitoring
Heartbeat-based health tracking:
- Lease keep-alive failures mark node as unhealthy after 3 consecutive failures
- Node triggers reconnection after 10 consecutive failures
- Automatic recovery when connectivity restores
Resilient Connectivity
- Exponential backoff on etcd connection failures (1s → 30s max)
- Graceful reconnection after network partitions
- Shutdown signal integration for clean termination
Node Metrics (Optional)
Nodes can publish arbitrary JSON metadata (CPU, memory, custom metrics) for load balancer consumption:
- Schema-less: Any JSON-serializable data via
serde_json::Value - Pluggable collection: Implement
MetricsCollectortrait for custom metrics - Built-in system metrics:
SystemMetricsCollectorprovides CPU, memory, load average (requiressystem-metricsfeature) - Separate update task: Metrics updates run independently from lease keep-alive
- Change detection:
NodeUpdatedevents emitted when metadata changes
Reactive API
ServiceDiscovery provides three ways to observe cluster state:
- Event subscription via
subscribe(): Push-basedClusterEventnotifications - Watch channels via
watch_nodes()/watch_leader(): Efficient state observation - Direct access via
nodes()/leader(): O(1) lock-free reads
Events emitted:
NodeJoined(Node)/NodeLeft(Node)/NodeUpdated { old, new }LeaderElected(Node)/LeaderLostReady/Disconnected/Reconnected
etcd Key Structure
registry/
└── {group_name}/
├── node-1 → {"id":"node-1","ip":"192.168.1.10","last_seen":1234567890,"metadata":{...}}
├── node-2 → {"id":"node-2","ip":"192.168.1.11","last_seen":1234567891,"metadata":{...}}
└── ...
election/
└── {group_name} → (etcd election key, value = current leader ID)
Node metadata example (with SystemMetricsCollector):
Configuration
ClusterNode / ClusterNodeBuilder
| Parameter | Type | Default | Description |
|---|---|---|---|
etcd_endpoints |
Vec<String> |
Required | etcd cluster endpoints |
node_id |
String |
Required | Unique node identifier |
node_ip |
IpAddr |
Required | Node's IP address |
group_name |
String |
Required | Logical group for nodes |
ttl |
i64 |
5 |
Lease TTL in seconds |
metrics_collector |
impl MetricsCollector |
NoopMetricsCollector |
Metrics collection implementation |
metrics_update_interval |
u64 |
0 (disabled) |
Seconds between metrics updates |
Performance
- ServiceDiscovery.nodes(): O(1) lock-free read via
watch::Receiver::borrow() - Throughput: 10M accesses in <2s (benchmark validated)
- Memory: ~5 MiB for 1000 nodes
- Event channel: 256-message buffer for broadcast subscribers
Requirements
- Rust 1.85+ (edition 2024)
- etcd v3.5+
- Docker (for integration tests only)
Tested with etcd v3.5.21.
Build System
# Cargo
## Testing
Integration Test Coverage
- Single node self-election
- Leader re-election on failure
- Node reconnection after etcd restarts
- Scalability with 20+ concurrent nodes
- Node metadata storage and retrieval
- Metadata update propagation via
NodeUpdatedevents
Comparison with Alternatives
| Feature | photon-etcd-cluster | kube-leader-election | memberlist | chitchat |
|---|---|---|---|---|
| Language | Rust | Rust | Go | Rust |
| Backend | etcd | Kubernetes | None (P2P) | None (P2P) |
| Leader Election | Yes | Yes | No | No |
| Node Registry | Yes | No | Yes | Yes |
| Node Metrics | Yes | No | Limited | Limited |
| Platform Independent | Yes | No (K8s only) | Yes | Yes |
| External Dependencies | etcd | Kubernetes | None | None |
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Roman Gushel
Roadmap
- Leader Priority/Weighting
- Graceful Leadership Transfer
- Node Tagging & Filtering
- TLS/mTLS Support for etcd Connections
- etcd Authentication Support
- Graceful Degradation & Health Check States
- Application-Level Health Checks
- Topology/Zone Awareness
- Observability: Prometheus Metrics
- Circuit Breaker for etcd Operations