tracing_throttle/lib.rs
1//! # tracing-throttle
2//!
3//! High-performance log deduplication and rate limiting for the `tracing` ecosystem.
4//!
5//! This crate provides a `tracing::Layer` that suppresses repetitive log events based on
6//! configurable policies. Events are deduplicated by their signature (level, message, and
7//! fields), so identical log events are throttled together.
8//!
9//!
10//! ## Quick Start
11//!
12//! ```rust,no_run
13//! use tracing_throttle::{TracingRateLimitLayer, Policy};
14//! use tracing_subscriber::prelude::*;
15//! use std::time::Duration;
16//!
17//! // Use sensible defaults: 50 burst capacity, 1 token/sec (60/min), 10k signature limit
18//! let rate_limit = TracingRateLimitLayer::new();
19//!
20//! // Or customize for high-volume applications:
21//! let rate_limit = TracingRateLimitLayer::builder()
22//! .with_policy(Policy::token_bucket(100.0, 10.0).unwrap()) // 100 burst, 600/min
23//! .with_max_signatures(50_000) // Custom limit
24//! .with_summary_interval(Duration::from_secs(30))
25//! .build()
26//! .unwrap();
27//!
28//! // Apply the rate limit as a filter to your fmt layer
29//! tracing_subscriber::registry()
30//! .with(tracing_subscriber::fmt::layer().with_filter(rate_limit))
31//! .init();
32//! ```
33//!
34//! ## Features
35//!
36//! - **Token bucket limiting**: Burst tolerance with smooth recovery (recommended default)
37//! - **Time-window limiting**: Allow K events per time period with natural reset
38//! - **Count-based limiting**: Allow N events, then suppress the rest (no recovery)
39//! - **Exponential backoff**: Emit at exponentially increasing intervals (1st, 2nd, 4th, 8th...)
40//! - **Custom policies**: Implement your own rate limiting logic
41//! - **Per-signature throttling**: Different messages are throttled independently
42//! - **LRU eviction**: Optional memory limits with automatic eviction of least recently used signatures
43//! - **Observability metrics**: Built-in tracking of allowed, suppressed, and evicted events
44//! - **Fail-safe circuit breaker**: Fails open during errors to preserve observability
45//!
46//! ## Observability
47//!
48//! Monitor rate limiting behavior with built-in metrics:
49//!
50//! ```rust,no_run
51//! # use tracing_throttle::{TracingRateLimitLayer, Policy};
52//! # let rate_limit = TracingRateLimitLayer::builder()
53//! # .with_policy(Policy::count_based(100).unwrap())
54//! # .build()
55//! # .unwrap();
56//! // Get current metrics
57//! let metrics = rate_limit.metrics();
58//! println!("Events allowed: {}", metrics.events_allowed());
59//! println!("Events suppressed: {}", metrics.events_suppressed());
60//! println!("Signatures evicted: {}", metrics.signatures_evicted());
61//!
62//! // Get snapshot for calculations
63//! let snapshot = metrics.snapshot();
64//! println!("Suppression rate: {:.2}%", snapshot.suppression_rate() * 100.0);
65//! ```
66//!
67//! ## Fail-Safe Operation
68//!
69//! The library uses a circuit breaker to fail open during errors, preserving
70//! observability over strict rate limiting:
71//!
72//! ```rust,no_run
73//! # use tracing_throttle::{TracingRateLimitLayer, CircuitState};
74//! # let rate_limit = TracingRateLimitLayer::new();
75//! // Check circuit breaker state
76//! let cb = rate_limit.circuit_breaker();
77//! match cb.state() {
78//! CircuitState::Closed => println!("Normal operation"),
79//! CircuitState::Open => println!("Failing open - allowing all events"),
80//! CircuitState::HalfOpen => println!("Testing recovery"),
81//! }
82//! ```
83//!
84//! ## Memory Management
85//!
86//! By default, tracks up to 10,000 unique event signatures with LRU eviction.
87//! Each signature uses approximately 150-250 bytes.
88//!
89//! **Typical memory usage:**
90//! - 10,000 signatures (default): ~1.5-2.5 MB
91//! - 50,000 signatures: ~7.5-12.5 MB
92//! - 100,000 signatures: ~15-25 MB
93//!
94//! **Configuration:**
95//! ```rust,no_run
96//! # use tracing_throttle::TracingRateLimitLayer;
97//! // Increase limit for high-cardinality applications
98//! let rate_limit = TracingRateLimitLayer::builder()
99//! .with_max_signatures(50_000)
100//! .build()
101//! .unwrap();
102//!
103//! // Monitor usage
104//! let sig_count = rate_limit.signature_count();
105//! let evictions = rate_limit.metrics().signatures_evicted();
106//! ```
107//!
108//! ### Memory Usage Breakdown
109//!
110//! Each tracked signature consumes memory for:
111//!
112//! ```text
113//! Per-Signature Memory:
114//! ├─ EventSignature (hash key) ~32 bytes (u64 hash)
115//! ├─ EventState (value) ~120-200 bytes
116//! │ ├─ Policy state ~40-80 bytes (depends on policy type)
117//! │ ├─ SuppressionCounter ~40 bytes (atomic counters + timestamp)
118//! │ └─ Metadata overhead ~40 bytes (DashMap internals)
119//! └─ Total per signature ~150-250 bytes (varies with policy)
120//! ```
121//!
122//! **Estimated memory usage at different signature limits:**
123//!
124//! | Signatures | Memory (typical) | Memory (worst case) | Use Case |
125//! |------------|------------------|---------------------|----------|
126//! | 1,000 | ~150 KB | ~250 KB | Small apps, few event types |
127//! | 10,000 (default) | ~1.5 MB | ~2.5 MB | Most applications |
128//! | 50,000 | ~7.5 MB | ~12.5 MB | High-cardinality apps |
129//! | 100,000 | ~15 MB | ~25 MB | Very large systems |
130//!
131//! **Additional overhead:**
132//! - Metrics: ~100 bytes (atomic counters)
133//! - Circuit breaker: ~200 bytes (state tracking)
134//! - Layer structure: ~500 bytes
135//! - **Total fixed overhead: ~800 bytes**
136//!
137//! ### Signature Cardinality Analysis
138//!
139//! **What affects signature cardinality?**
140//!
141//! Event signatures are computed from `(level, message, fields)`. Your cardinality
142//! depends on how many unique combinations you emit:
143//!
144//! ```rust,no_run
145//! # use tracing::info;
146//! // Low cardinality (good) - same signature for all occurrences
147//! info!("User login successful"); // Always same signature
148//!
149//! // Medium cardinality - signatures vary by field values
150//! # let id = 123;
151//! info!(user_id = %id, "User login"); // One signature per unique user_id
152//!
153//! // High cardinality (danger) - unique signature per event
154//! # let uuid = "abc";
155//! info!(request_id = %uuid, "Processing"); // New signature every time!
156//! ```
157//!
158//! **Cardinality examples:**
159//!
160//! | Pattern | Unique Signatures | Memory Impact |
161//! |---------|-------------------|---------------|
162//! | Static messages only | ~10-100 | Minimal (~10 KB) |
163//! | Messages + stable IDs (user, tenant) | ~1,000-10,000 | Low (1-2 MB) |
164//! | Messages + session IDs | ~10,000-100,000 | Medium (10-25 MB) |
165//! | Messages + request UUIDs | Unbounded | **High risk** |
166//!
167//! **How to estimate your cardinality:**
168//!
169//! 1. **Count unique log templates** in your codebase
170//! 2. **Multiply by field cardinality** (unique values per field)
171//! 3. **Example calculation:**
172//! - 50 unique log messages
173//! - 10 severity levels used
174//! - Average 20 unique user IDs per message
175//! - **Estimated: 50 × 20 = 1,000 signatures** (✓ well below default)
176//!
177//! ### Configuration Guidelines
178//!
179//! **When to use the default (10k signatures):**
180//! - ✅ Most applications with structured logging
181//! - ✅ Log messages use stable identifiers (user_id, tenant_id, service_name)
182//! - ✅ You're unsure about cardinality
183//! - ✅ Memory is not severely constrained
184//!
185//! **When to increase the limit:**
186//!
187//! ```rust,no_run
188//! # use tracing_throttle::TracingRateLimitLayer;
189//! let rate_limit = TracingRateLimitLayer::builder()
190//! .with_max_signatures(50_000) // 5-10 MB overhead
191//! .build()
192//! .expect("valid config");
193//! ```
194//!
195//! - ✅ High log volume with many unique event types (>10k)
196//! - ✅ Large distributed system with many services/endpoints
197//! - ✅ You've measured cardinality and need more capacity
198//! - ✅ Memory is available (10+ MB is acceptable)
199//!
200//! **When to use unlimited signatures:**
201//!
202//! ```rust,no_run
203//! # use tracing_throttle::TracingRateLimitLayer;
204//! let rate_limit = TracingRateLimitLayer::builder()
205//! .with_unlimited_signatures() // ⚠️ Unbounded memory growth
206//! .build()
207//! .expect("valid config");
208//! ```
209//!
210//! - ⚠️ **Use with extreme caution** - can cause unbounded memory growth
211//! - ✅ Controlled environments (short-lived processes, tests)
212//! - ✅ Known bounded cardinality with monitoring in place
213//! - ✅ Memory constraints are not a concern
214//! - ❌ **Never use** if logging includes UUIDs, timestamps, or other high-cardinality data
215//!
216//! ### Monitoring Memory Usage
217//!
218//! **Check signature count in production:**
219//!
220//! ```rust,no_run
221//! # use tracing_throttle::TracingRateLimitLayer;
222//! # use tracing::warn;
223//! # let rate_limit = TracingRateLimitLayer::new();
224//! // In a periodic health check or metrics reporter:
225//! let sig_count = rate_limit.signature_count();
226//! let evictions = rate_limit.metrics().signatures_evicted();
227//!
228//! if sig_count > 8000 {
229//! warn!("Approaching signature limit: {}/10000", sig_count);
230//! }
231//!
232//! if evictions > 1000 {
233//! warn!("High eviction rate: {} signatures evicted", evictions);
234//! }
235//! ```
236//!
237//! **Integrate with memory profilers:**
238//!
239//! ```bash
240//! # Use Valgrind Massif for heap profiling
241//! valgrind --tool=massif --massif-out-file=massif.out ./your-app
242//!
243//! # Analyze with ms_print
244//! ms_print massif.out
245//!
246//! # Look for DashMap and EventState allocations
247//! ```
248//!
249//! **Signs you need to adjust signature limits:**
250//!
251//! | Symptom | Likely Cause | Action |
252//! |---------|--------------|--------|
253//! | High eviction rate (>1000/min) | Cardinality > limit | Increase `max_signatures` |
254//! | Memory growth over time | Unbounded cardinality | Fix logging (remove UUIDs), add limit |
255//! | Low signature count (<100) | Over-provisioned | Can reduce limit safely |
256//! | Frequent evictions + suppression | Limit too low | Increase limit or reduce cardinality |
257
258// Domain layer - pure business logic
259pub mod domain;
260
261// Application layer - orchestration
262pub mod application;
263
264// Infrastructure layer - external adapters
265pub mod infrastructure;
266
267// Re-export commonly used types for convenience
268pub use domain::{
269 policy::{
270 CountBasedPolicy, ExponentialBackoffPolicy, Policy, PolicyDecision, PolicyError,
271 RateLimitPolicy, TimeWindowPolicy, TokenBucketPolicy,
272 },
273 signature::EventSignature,
274 summary::{SuppressionCounter, SuppressionSummary},
275};
276
277pub use application::{
278 circuit_breaker::{CircuitBreaker, CircuitBreakerConfig, CircuitState},
279 emitter::EmitterConfigError,
280 limiter::RateLimiter,
281 metrics::{Metrics, MetricsSnapshot},
282 ports::{Clock, Storage},
283 registry::SuppressionRegistry,
284};
285
286#[cfg(feature = "async")]
287pub use application::emitter::{EmitterHandle, ShutdownError};
288
289pub use infrastructure::{
290 clock::SystemClock,
291 layer::{BuildError, TracingRateLimitLayer, TracingRateLimitLayerBuilder},
292 storage::ShardedStorage,
293};
294
295#[cfg(feature = "async")]
296pub use infrastructure::layer::SummaryFormatter;