flux-limiter 0.6.3

A rate limiter based on the Generic Cell Rate Algorithm (GCRA).
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
# Flux Limiter Architecture


This document provides a comprehensive overview of the Flux Limiter's architecture, design decisions, and implementation details.

## Table of Contents


- [Overview]#overview
- [Core Architecture]#core-architecture
- [GCRA Algorithm]#gcra-algorithm
- [Component Design]#component-design
- [Concurrency Model]#concurrency-model
- [Error Handling Architecture]#error-handling-architecture
- [Performance Design]#performance-design
- [Testing Architecture]#testing-architecture
- [Design Decisions]#design-decisions
- [Future Extensibility]#future-extensibility

## Overview


Flux Limiter is a high-performance, thread-safe rate limiter implementing the Generic Cell Rate Algorithm (GCRA). The architecture prioritizes:

- **Performance**: O(1) operations with nanosecond precision
- **Correctness**: Mathematically precise algorithm implementation
- **Reliability**: Comprehensive error handling and graceful degradation
- **Flexibility**: Generic client IDs and configurable policies
- **Observability**: Rich metadata for monitoring and HTTP headers

### Key Architectural Principles


1. **Lock-Free Concurrency**: Uses atomic operations and lock-free data structures
2. **Zero-Allocation Hot Path**: Minimizes memory allocation in rate limiting decisions
3. **Clock Abstraction**: Enables testing and handles time-related failures
4. **Type Safety**: Leverages Rust's type system for correctness guarantees
5. **Graceful Degradation**: Continues operation despite partial failures

## Core Architecture


```
┌─────────────────────────────────────────────────────────────┐
│                    Flux Limiter                             │
├─────────────────────────────────────────────────────────────┤
│  Client API                                                 │
│  ├─ check_request(client_id) -> Result<Decision, Error>     │
│  ├─ cleanup_stale_clients(threshold) -> Result<(), Error>   │
│  └─ rate(), burst() -> f64                                  │
├─────────────────────────────────────────────────────────────┤
│  Core Components                                            │
│  ├─ FluxLimiter<T, C>     │ Main rate limiter struct        │
│  ├─ FluxLimiterConfig     │ Configuration management        │
│  ├─ FluxLimiterDecision   │ Rich decision metadata          │
│  └─ FluxLimiterError      │ Comprehensive error handling    │
├─────────────────────────────────────────────────────────────┤
│  Algorithm Layer                                            │
│  ├─ GCRA Implementation   │ Generic Cell Rate Algorithm     │
│  ├─ Nanosecond Precision  │ u64 nanosecond calculations     │
│  └─ TAT Tracking          │ Theoretical Arrival Time        │
├─────────────────────────────────────────────────────────────┤
│  Storage Layer                                              │
│  ├─ DashMap<T, u64>       │ Lock-free concurrent hash map   │
│  ├─ Atomic Operations     │ Thread-safe state updates       │
│  └─ Memory Management     │ Automatic cleanup mechanisms    │
├─────────────────────────────────────────────────────────────┤
│  Time Abstraction                                           │
│  ├─ Clock Trait           │ Pluggable time source           │
│  ├─ SystemClock           │ Production time implementation   │
│  └─ TestClock             │ Deterministic test time         │
└─────────────────────────────────────────────────────────────┘
```

### Data Flow


```
Request → check_request(client_id)
Clock::now() → Current Time (nanoseconds)
DashMap::get(client_id) → Previous TAT
GCRA Calculation
Decision: Allow/Deny + Metadata
DashMap::insert(client_id, new_TAT)
Return FluxLimiterDecision
```

## GCRA Algorithm


### Algorithm Choice


**Generic Cell Rate Algorithm (GCRA)** was chosen over Token Bucket for several reasons:

1. **Mathematical Precision**: Avoids floating-point precision issues
2. **Stateless Calculation**: No background token refill processes
3. **Efficient State**: One timestamp per client vs. token count + last refill
4. **Deterministic**: Exact timing calculations with integer arithmetic

### GCRA Implementation Details


The algorithm maintains a **Theoretical Arrival Time (TAT)** for each client:

```rust
// Core GCRA logic
let current_time_nanos = clock.now()?;
let previous_tat_nanos = client_state.get(&client_id).unwrap_or(current_time_nanos);

// Check if request conforms (is within tolerance)
let is_conforming = current_time_nanos >= previous_tat_nanos.saturating_sub(tolerance_nanos);

if is_conforming {
    // Allow request and update TAT
    let new_tat_nanos = current_time_nanos.max(previous_tat_nanos) + rate_nanos;
    client_state.insert(client_id, new_tat_nanos);
    // Return allowed decision
} else {
    // Deny request
    let retry_after_nanos = previous_tat_nanos.saturating_sub(tolerance_nanos).saturating_sub(current_time_nanos);
    // Return denied decision with retry_after
}
```

### Mathematical Foundation


- **Rate Interval (T)**: `1 / rate_per_second` seconds = `1_000_000_000 / rate_per_second` nanoseconds
- **Tolerance (τ)**: `burst_capacity * rate_interval` nanoseconds
- **TAT Update**: `TAT' = max(current_time, previous_TAT) + T`
- **Conformance Test**: `current_time >= TAT - τ`

### Precision Guarantees


- All calculations use `u64` nanoseconds (no floating-point drift)
- Supports rates up to 1 billion requests/second with nanosecond precision
- Handles edge cases with saturating arithmetic to prevent overflow

## Component Design


### FluxLimiter<T, C>


The core struct uses generics for flexibility:

```rust
pub struct FluxLimiter<T, C = SystemClock>
where
    T: Hash + Eq + Clone,  // Client identifier type
    C: Clock,              // Time source
{
    rate_nanos: u64,                    // Rate interval in nanoseconds
    tolerance_nanos: u64,               // Burst tolerance in nanoseconds
    client_state: Arc<DashMap<T, u64>>, // Client TAT storage
    clock: C,                           // Time abstraction
}
```

**Design Rationale**:
- **Generic Client ID**: Supports `String`, `IpAddr`, `u64`, custom types
- **Clock Abstraction**: Enables testing and handles time failures
- **Arc<DashMap>**: Thread-safe, lock-free concurrent access
- **Nanosecond Storage**: Maintains precision throughout calculations

### Configuration System


```rust
#[derive(Debug, Clone)]

pub struct FluxLimiterConfig {
    rate_per_second: f64,   // User-friendly rate specification
    burst_capacity: f64,    // User-friendly burst specification
}
```

**Builder Pattern Support**:
```rust
let config = FluxLimiterConfig::new(0.0, 0.0)
    .rate(100.0)
    .burst(50.0);
```

**Validation**:
- Rate must be positive (> 0.0)
- Burst must be non-negative (≥ 0.0)
- Validation occurs at construction time

### Decision Metadata


```rust
pub struct FluxLimiterDecision {
    pub allowed: bool,                    // Primary decision
    pub retry_after_seconds: Option<f64>, // When to retry (if denied)
    pub remaining_capacity: Option<f64>,  // Remaining burst capacity
    pub reset_time_nanos: u64,           // When window resets
}
```

**Rich Metadata Enables**:
- HTTP rate limit headers (X-RateLimit-Remaining, Retry-After)
- Client-side backoff strategies
- Monitoring and observability
- Debugging and diagnostics

## Concurrency Model


### Lock-Free Design


```
Multiple Threads
FluxLimiter (Shared)
Arc<DashMap<ClientId, TAT>>
Lock-free hash map operations
```

**Thread Safety Guarantees**:
1. **Read Operations**: Multiple concurrent readers without contention
2. **Write Operations**: Atomic updates with minimal contention
3. **Memory Ordering**: Relaxed ordering sufficient for TAT updates
4. **ABA Prevention**: TAT values are monotonically increasing

### DashMap Choice


**Why DashMap over alternatives**:
- **Better than std::HashMap + Mutex**: Avoids global locking
- **Better than RwLock**: No reader/writer contention
- **Better than atomic maps**: Mature, battle-tested implementation
- **Segmented locking**: Reduces contention compared to single lock

### Memory Consistency


```rust
// TAT updates are atomic and isolated per client
client_state.insert(client_id, new_tat);  // Atomic operation

// Concurrent access to different clients is lock-free
// Concurrent access to same client is serialized by DashMap
```

## Error Handling Architecture


### Error Type Hierarchy


```rust
pub enum FluxLimiterError {
    InvalidRate,           // Configuration: rate ≤ 0
    InvalidBurst,          // Configuration: burst < 0  
    ClockError(ClockError), // Runtime: clock failure
}

pub enum ClockError {
    SystemTimeError,       // System time unavailable
}
```

### Error Propagation Flow


```
Clock::now() → Result<u64, ClockError>
FluxLimiter::check_request() → Result<Decision, FluxLimiterError>
Application Layer → Implements error policy
```

### Recovery Strategies


1. **Configuration Errors**: Fail fast at startup
2. **Clock Errors**: Graceful degradation options:
   - **Fail-open**: Allow requests during clock failures
   - **Fail-closed**: Deny requests during clock failures
   - **Fallback**: Use alternative rate limiting

### Clock Abstraction Benefits


```rust
pub trait Clock: Send + Sync {
    fn now(&self) -> Result<u64, ClockError>;
}
```

**Handles Real-World Issues**:
- System clock going backwards (NTP adjustments)
- Clock resolution limitations
- System suspend/resume
- Virtualization time skips

## Performance Design


### Hot Path Optimization


The `check_request()` method is optimized for minimal latency:

1. **Single Clock Call**: One time source access per request
2. **Single Map Operation**: Either get or get+insert
3. **Integer Arithmetic**: No floating-point operations
4. **No Allocations**: Reuses existing memory
5. **Minimal Branching**: Straight-line execution

### Memory Layout


```rust
FluxLimiter {
    rate_nanos: u64,         // 8 bytes - cache-friendly
    tolerance_nanos: u64,    // 8 bytes
    client_state: Arc<..>,   // 8 bytes - pointer to shared state
    clock: C,                // Usually zero-sized for SystemClock
}
```

**Cache Efficiency**:
- Small struct size (24-32 bytes)
- Frequently accessed fields grouped together
- Arc enables sharing without duplication

### Algorithmic Complexity


- **Time**: O(1) for `check_request()`
- **Space**: O(number of active clients)
- **Cleanup**: O(number of clients) but infrequent
- **Contention**: O(1) per client, lock-free across clients

### Scalability Characteristics


```
Clients     Memory Usage    Latency
1K          ~32KB          < 1μs
100K        ~3.2MB         < 1μs  
1M          ~32MB          < 1μs
10M         ~320MB         < 1μs
```

## Testing Architecture


### Test Clock Design


```rust
pub struct TestClock {
    time: Arc<AtomicU64>,        // Current time in nanoseconds
    should_fail: Arc<AtomicBool>, // Failure simulation flag
}
```

**Key Features**:
- **Deterministic**: Controlled time progression
- **Thread-safe**: Can be shared across test threads
- **Failure Simulation**: Can simulate clock errors
- **Precise Control**: Nanosecond-level time manipulation

### Test Organization


```
tests/ratelimiter/
├── fixtures/
│   ├── test_clock.rs      # TestClock implementation
│   └── mod.rs
├── gcra_algorithm_tests.rs # Core algorithm correctness
├── config_tests.rs        # Configuration validation
├── error_tests.rs         # Error handling and recovery
├── cleanup_tests.rs       # Memory management
├── performance_tests.rs   # Performance characteristics
└── main.rs               # Test module organization
```

### Error Testing Strategy


```rust
// Simulate clock failure
clock.fail_next_call();
let result = limiter.check_request("client1");
assert!(result.is_err());

// Verify recovery
let result = limiter.check_request("client1");  
assert!(result.is_ok());
```

**Comprehensive Error Coverage**:
- Clock failures during rate limiting
- Clock failures during cleanup
- Multiple consecutive failures
- Recovery after failures
- Partial operation failures

## Design Decisions


### Algorithm Selection: GCRA vs Token Bucket


**GCRA Advantages**:
- ✅ Exact mathematical precision
- ✅ No background processes
- ✅ Simpler state management
- ✅ Better burst handling

**Token Bucket Drawbacks**:
- ❌ Floating-point precision drift
- ❌ Requires background token refill
- ❌ More complex state (tokens + timestamp)
- ❌ Timer management overhead

### Data Structure Selection: DashMap vs Alternatives


**DashMap Benefits**:
- ✅ Lock-free reads and writes
- ✅ Mature and well-tested
- ✅ Good performance characteristics
- ✅ Segmented locking reduces contention

**Alternative Rejections**:
- **HashMap + Mutex**: Global locking bottleneck
- **RwLock<HashMap>**: Reader/writer contention
- **Custom lock-free map**: Development complexity

### Time Representation: Nanoseconds vs Duration


**Nanosecond Choice**:
- ✅ Integer arithmetic (no floating-point errors)
- ✅ Maximum precision for high-rate scenarios
- ✅ Simple overflow handling with saturating operations
- ✅ Direct system time compatibility

### Error Handling: Result Types vs Panics


**Result Type Benefits**:
- ✅ Explicit error handling
- ✅ Graceful degradation options
- ✅ Better observability
- ✅ Production-ready reliability

**Panic Rejection**:
- ❌ Difficult to recover from
- ❌ Poor user experience
- ❌ Hard to monitor and debug

### Generic Design: Client ID Types


**Generic Benefits**:
- ✅ Flexibility for different use cases
- ✅ Zero-cost abstractions
- ✅ Type safety
- ✅ Performance (no boxing/dynamic dispatch)

**Common Client ID Types**:
- `String`: User IDs, API keys
- `IpAddr`: IP-based rate limiting  
- `u64`: High-performance numeric IDs
- Custom types: Complex client identification

## Future Extensibility


### Planned Architecture Enhancements


1. **Async Support**:
   ```rust
   async fn check_request_async(&self, client_id: T) -> Result<Decision, Error>
   ```
   - Non-blocking I/O integration
   - Async clock sources
   - Future-based cleanup

2. **Persistence Layer**:
   ```rust
   trait StateStore: Send + Sync {
       async fn load_state(&self, client_id: &T) -> Result<Option<u64>, Error>;
       async fn save_state(&self, client_id: &T, tat: u64) -> Result<(), Error>;
   }
   ```
   - Redis backend
   - Database persistence
   - Distributed state management

3. **Observability Integration**:
   ```rust
   trait RateLimiterMetrics {
       fn record_decision(&self, client_id: &T, allowed: bool, latency: Duration);
       fn record_error(&self, error: &FluxLimiterError);
   }
   ```
   - Prometheus metrics
   - OpenTelemetry traces
   - Custom metric backends

4. **Alternative Algorithms**:
   ```rust
   trait RateLimitAlgorithm<T> {
       fn check_request(&self, client_id: &T, current_time: u64) -> Decision;
   }
   ```
   - Token bucket implementation
   - Sliding window algorithms
   - Leaky bucket variants

### Extensibility Patterns


1. **Trait-Based Design**: Core algorithms as pluggable traits
2. **Generic Storage**: Abstract storage backend
3. **Middleware Pattern**: Composable rate limiting policies
4. **Plugin Architecture**: Runtime-configurable behaviors

### Backward Compatibility Strategy


- **Semantic Versioning**: Clear breaking change communication
- **Feature Flags**: Optional new functionality
- **Deprecation Path**: Gradual migration for breaking changes
- **Type Aliases**: Maintain existing API surface

## Conclusion


The Flux Limiter architecture prioritizes correctness, performance, and reliability through:

- **Mathematical Precision**: GCRA algorithm with nanosecond timing
- **Lock-Free Concurrency**: High-performance multi-threaded access
- **Comprehensive Error Handling**: Graceful degradation and recovery
- **Rich Observability**: Detailed metadata for monitoring
- **Flexible Design**: Generic client IDs and pluggable time sources
- **Robust Testing**: Deterministic test infrastructure

This architecture provides a solid foundation for high-performance rate limiting while maintaining the flexibility needed for diverse production environments.