tracing_throttle/lib.rs
1//! # tracing-throttle
2//!
3//! High-performance log deduplication and rate limiting for the `tracing` ecosystem.
4//!
5//! This crate provides a `tracing::Layer` that suppresses repetitive log events based on
6//! configurable policies. Events are deduplicated by their signature (level, message, and
7//! fields), so identical log events are throttled together.
8//!
9//!
10//! ## Quick Start
11//!
12//! ```rust,no_run
13//! use tracing_throttle::{TracingRateLimitLayer, Policy};
14//! use tracing_subscriber::prelude::*;
15//! use std::time::Duration;
16//!
17//! // Uses safe defaults: 100 events, 10k signature limit
18//! let rate_limit = TracingRateLimitLayer::builder()
19//! .with_policy(Policy::count_based(100).unwrap())
20//! .build()
21//! .unwrap();
22//!
23//! // Or customize:
24//! let rate_limit = TracingRateLimitLayer::builder()
25//! .with_policy(Policy::count_based(100).unwrap())
26//! .with_max_signatures(50_000) // Custom limit
27//! .with_summary_interval(Duration::from_secs(30))
28//! .build()
29//! .unwrap();
30//!
31//! // Apply the rate limit as a filter to your fmt layer
32//! tracing_subscriber::registry()
33//! .with(tracing_subscriber::fmt::layer().with_filter(rate_limit))
34//! .init();
35//! ```
36//!
37//! ## Features
38//!
39//! - **Count-based limiting**: Allow N events, then suppress the rest
40//! - **Time-window limiting**: Allow K events per time period
41//! - **Exponential backoff**: Emit at exponentially increasing intervals (1st, 2nd, 4th, 8th...)
42//! - **Custom policies**: Implement your own rate limiting logic
43//! - **Per-signature throttling**: Different messages are throttled independently
44//! - **LRU eviction**: Optional memory limits with automatic eviction of least recently used signatures
45//! - **Observability metrics**: Built-in tracking of allowed, suppressed, and evicted events
46//! - **Fail-safe circuit breaker**: Fails open during errors to preserve observability
47//!
48//! ## Observability
49//!
50//! Monitor rate limiting behavior with built-in metrics:
51//!
52//! ```rust,no_run
53//! # use tracing_throttle::{TracingRateLimitLayer, Policy};
54//! # let rate_limit = TracingRateLimitLayer::builder()
55//! # .with_policy(Policy::count_based(100).unwrap())
56//! # .build()
57//! # .unwrap();
58//! // Get current metrics
59//! let metrics = rate_limit.metrics();
60//! println!("Events allowed: {}", metrics.events_allowed());
61//! println!("Events suppressed: {}", metrics.events_suppressed());
62//! println!("Signatures evicted: {}", metrics.signatures_evicted());
63//!
64//! // Get snapshot for calculations
65//! let snapshot = metrics.snapshot();
66//! println!("Suppression rate: {:.2}%", snapshot.suppression_rate() * 100.0);
67//! ```
68//!
69//! ## Fail-Safe Operation
70//!
71//! The library uses a circuit breaker to fail open during errors, preserving
72//! observability over strict rate limiting:
73//!
74//! ```rust,no_run
75//! # use tracing_throttle::{TracingRateLimitLayer, CircuitState};
76//! # let rate_limit = TracingRateLimitLayer::new();
77//! // Check circuit breaker state
78//! let cb = rate_limit.circuit_breaker();
79//! match cb.state() {
80//! CircuitState::Closed => println!("Normal operation"),
81//! CircuitState::Open => println!("Failing open - allowing all events"),
82//! CircuitState::HalfOpen => println!("Testing recovery"),
83//! }
84//! ```
85//!
86//! ## Memory Management
87//!
88//! By default, tracks up to 10,000 unique event signatures with LRU eviction.
89//! Each signature uses approximately 150-250 bytes.
90//!
91//! **Typical memory usage:**
92//! - 10,000 signatures (default): ~1.5-2.5 MB
93//! - 50,000 signatures: ~7.5-12.5 MB
94//! - 100,000 signatures: ~15-25 MB
95//!
96//! **Configuration:**
97//! ```rust,no_run
98//! # use tracing_throttle::TracingRateLimitLayer;
99//! // Increase limit for high-cardinality applications
100//! let rate_limit = TracingRateLimitLayer::builder()
101//! .with_max_signatures(50_000)
102//! .build()
103//! .unwrap();
104//!
105//! // Monitor usage
106//! let sig_count = rate_limit.signature_count();
107//! let evictions = rate_limit.metrics().signatures_evicted();
108//! ```
109//!
110//! ### Memory Usage Breakdown
111//!
112//! Each tracked signature consumes memory for:
113//!
114//! ```text
115//! Per-Signature Memory:
116//! ├─ EventSignature (hash key) ~32 bytes (u64 hash)
117//! ├─ EventState (value) ~120-200 bytes
118//! │ ├─ Policy state ~40-80 bytes (depends on policy type)
119//! │ ├─ SuppressionCounter ~40 bytes (atomic counters + timestamp)
120//! │ └─ Metadata overhead ~40 bytes (DashMap internals)
121//! └─ Total per signature ~150-250 bytes (varies with policy)
122//! ```
123//!
124//! **Estimated memory usage at different signature limits:**
125//!
126//! | Signatures | Memory (typical) | Memory (worst case) | Use Case |
127//! |------------|------------------|---------------------|----------|
128//! | 1,000 | ~150 KB | ~250 KB | Small apps, few event types |
129//! | 10,000 (default) | ~1.5 MB | ~2.5 MB | Most applications |
130//! | 50,000 | ~7.5 MB | ~12.5 MB | High-cardinality apps |
131//! | 100,000 | ~15 MB | ~25 MB | Very large systems |
132//!
133//! **Additional overhead:**
134//! - Metrics: ~100 bytes (atomic counters)
135//! - Circuit breaker: ~200 bytes (state tracking)
136//! - Layer structure: ~500 bytes
137//! - **Total fixed overhead: ~800 bytes**
138//!
139//! ### Signature Cardinality Analysis
140//!
141//! **What affects signature cardinality?**
142//!
143//! Event signatures are computed from `(level, message, fields)`. Your cardinality
144//! depends on how many unique combinations you emit:
145//!
146//! ```rust,no_run
147//! # use tracing::info;
148//! // Low cardinality (good) - same signature for all occurrences
149//! info!("User login successful"); // Always same signature
150//!
151//! // Medium cardinality - signatures vary by field values
152//! # let id = 123;
153//! info!(user_id = %id, "User login"); // One signature per unique user_id
154//!
155//! // High cardinality (danger) - unique signature per event
156//! # let uuid = "abc";
157//! info!(request_id = %uuid, "Processing"); // New signature every time!
158//! ```
159//!
160//! **Cardinality examples:**
161//!
162//! | Pattern | Unique Signatures | Memory Impact |
163//! |---------|-------------------|---------------|
164//! | Static messages only | ~10-100 | Minimal (~10 KB) |
165//! | Messages + stable IDs (user, tenant) | ~1,000-10,000 | Low (1-2 MB) |
166//! | Messages + session IDs | ~10,000-100,000 | Medium (10-25 MB) |
167//! | Messages + request UUIDs | Unbounded | **High risk** |
168//!
169//! **How to estimate your cardinality:**
170//!
171//! 1. **Count unique log templates** in your codebase
172//! 2. **Multiply by field cardinality** (unique values per field)
173//! 3. **Example calculation:**
174//! - 50 unique log messages
175//! - 10 severity levels used
176//! - Average 20 unique user IDs per message
177//! - **Estimated: 50 × 20 = 1,000 signatures** (✓ well below default)
178//!
179//! ### Configuration Guidelines
180//!
181//! **When to use the default (10k signatures):**
182//! - ✅ Most applications with structured logging
183//! - ✅ Log messages use stable identifiers (user_id, tenant_id, service_name)
184//! - ✅ You're unsure about cardinality
185//! - ✅ Memory is not severely constrained
186//!
187//! **When to increase the limit:**
188//!
189//! ```rust,no_run
190//! # use tracing_throttle::TracingRateLimitLayer;
191//! let rate_limit = TracingRateLimitLayer::builder()
192//! .with_max_signatures(50_000) // 5-10 MB overhead
193//! .build()
194//! .expect("valid config");
195//! ```
196//!
197//! - ✅ High log volume with many unique event types (>10k)
198//! - ✅ Large distributed system with many services/endpoints
199//! - ✅ You've measured cardinality and need more capacity
200//! - ✅ Memory is available (10+ MB is acceptable)
201//!
202//! **When to use unlimited signatures:**
203//!
204//! ```rust,no_run
205//! # use tracing_throttle::TracingRateLimitLayer;
206//! let rate_limit = TracingRateLimitLayer::builder()
207//! .with_unlimited_signatures() // ⚠️ Unbounded memory growth
208//! .build()
209//! .expect("valid config");
210//! ```
211//!
212//! - ⚠️ **Use with extreme caution** - can cause unbounded memory growth
213//! - ✅ Controlled environments (short-lived processes, tests)
214//! - ✅ Known bounded cardinality with monitoring in place
215//! - ✅ Memory constraints are not a concern
216//! - ❌ **Never use** if logging includes UUIDs, timestamps, or other high-cardinality data
217//!
218//! ### Monitoring Memory Usage
219//!
220//! **Check signature count in production:**
221//!
222//! ```rust,no_run
223//! # use tracing_throttle::TracingRateLimitLayer;
224//! # use tracing::warn;
225//! # let rate_limit = TracingRateLimitLayer::new();
226//! // In a periodic health check or metrics reporter:
227//! let sig_count = rate_limit.signature_count();
228//! let evictions = rate_limit.metrics().signatures_evicted();
229//!
230//! if sig_count > 8000 {
231//! warn!("Approaching signature limit: {}/10000", sig_count);
232//! }
233//!
234//! if evictions > 1000 {
235//! warn!("High eviction rate: {} signatures evicted", evictions);
236//! }
237//! ```
238//!
239//! **Integrate with memory profilers:**
240//!
241//! ```bash
242//! # Use Valgrind Massif for heap profiling
243//! valgrind --tool=massif --massif-out-file=massif.out ./your-app
244//!
245//! # Analyze with ms_print
246//! ms_print massif.out
247//!
248//! # Look for DashMap and EventState allocations
249//! ```
250//!
251//! **Signs you need to adjust signature limits:**
252//!
253//! | Symptom | Likely Cause | Action |
254//! |---------|--------------|--------|
255//! | High eviction rate (>1000/min) | Cardinality > limit | Increase `max_signatures` |
256//! | Memory growth over time | Unbounded cardinality | Fix logging (remove UUIDs), add limit |
257//! | Low signature count (<100) | Over-provisioned | Can reduce limit safely |
258//! | Frequent evictions + suppression | Limit too low | Increase limit or reduce cardinality |
259
260// Domain layer - pure business logic
261pub mod domain;
262
263// Application layer - orchestration
264pub mod application;
265
266// Infrastructure layer - external adapters
267pub mod infrastructure;
268
269// Re-export commonly used types for convenience
270pub use domain::{
271 policy::{
272 CountBasedPolicy, ExponentialBackoffPolicy, Policy, PolicyDecision, PolicyError,
273 RateLimitPolicy, TimeWindowPolicy,
274 },
275 signature::EventSignature,
276 summary::{SuppressionCounter, SuppressionSummary},
277};
278
279pub use application::{
280 circuit_breaker::{CircuitBreaker, CircuitBreakerConfig, CircuitState},
281 emitter::EmitterConfigError,
282 limiter::RateLimiter,
283 metrics::{Metrics, MetricsSnapshot},
284 ports::{Clock, Storage},
285 registry::SuppressionRegistry,
286};
287
288pub use infrastructure::{
289 clock::SystemClock,
290 layer::{BuildError, TracingRateLimitLayer, TracingRateLimitLayerBuilder},
291 storage::ShardedStorage,
292};