Crate tracing_throttle

Crate tracing_throttle 

Source
Expand description

§tracing-throttle

High-performance log deduplication and rate limiting for the tracing ecosystem.

This crate provides a tracing::Layer that suppresses repetitive log events based on configurable policies. Events are deduplicated by their signature (level, target, message, and ALL field values by default). Use .with_excluded_fields() to exclude high-cardinality fields like request_id or trace_id.

§Best Practices

For detailed guidance on using tracing-throttle effectively, including:

  • How event signatures work and why message templates matter
  • Choosing the right policy for your use case
  • Per-entity throttling patterns (per-user, per-endpoint, etc.)
  • Memory management strategies for high-cardinality scenarios
  • Common anti-patterns to avoid

See the Best Practices Guide for a comprehensive guide with examples.

§Quick Start

use tracing_throttle::{TracingRateLimitLayer, Policy};
use tracing_subscriber::prelude::*;
use std::time::Duration;

// Use sensible defaults: 50 burst capacity, 1 token/sec (60/min), 10k signature limit
let rate_limit = TracingRateLimitLayer::new();

// Or customize for high-volume applications:
let rate_limit = TracingRateLimitLayer::builder()
    .with_policy(Policy::token_bucket(100.0, 10.0).unwrap())  // 100 burst, 600/min
    .with_max_signatures(50_000)  // Custom limit
    .with_excluded_fields(vec!["request_id".to_string(), "trace_id".to_string()])
    .with_summary_interval(Duration::from_secs(30))
    .build()
    .unwrap();

// Apply the rate limit as a filter to your fmt layer
tracing_subscriber::registry()
    .with(tracing_subscriber::fmt::layer().with_filter(rate_limit))
    .init();

§Features

§Rate Limiting Policies

  • Token bucket limiting: Burst tolerance with smooth recovery (recommended default)
  • Time-window limiting: Allow K events per time period with natural reset
  • Count-based limiting: Allow N events, then suppress the rest (no recovery)
  • Exponential backoff: Emit at exponentially increasing intervals (1st, 2nd, 4th, 8th…)
  • Custom policies: Implement your own rate limiting logic

§Eviction Strategies

  • LRU eviction: Evict least recently used signatures (default)
  • Priority-based: Custom priority functions to keep important events (ERROR over INFO)
  • Memory-based: Enforce byte limits with automatic memory tracking
  • Combined: Use both priority and memory constraints together

§Other Features

  • Per-signature throttling: Different messages are throttled independently
  • Observability metrics: Built-in tracking of allowed, suppressed, and evicted events
  • Fail-safe circuit breaker: Fails open during errors to preserve observability

§Event Signatures

Events are deduplicated based on their signature. By default, signatures include:

  • Event level (INFO, WARN, ERROR, etc.)
  • Target (module path)
  • Message text
  • ALL event field values

Event field values ARE included by default. This ensures that semantically different events are not accidentally deduplicated:

error!(user_id = 123, "Failed to fetch user");  // Signature: (ERROR, target, "Failed to fetch user", user_id=123)
error!(user_id = 456, "Failed to fetch user");  // DIFFERENT signature - both logged!

These are different failures for different users and should both be logged.

§Excluding High-Cardinality Fields

To prevent memory issues from high-cardinality fields that don’t change the event’s meaning (like request_id, trace_id, timestamp), use .with_excluded_fields():

let layer = TracingRateLimitLayer::builder()
    .with_excluded_fields(vec!["request_id".to_string(), "trace_id".to_string()])
    .build()
    .unwrap();

Now events with the same meaningful fields but different request IDs are deduplicated:

error!(user_id = 123, request_id = "abc", "Failed to fetch user");  // Logged
error!(user_id = 123, request_id = "def", "Failed to fetch user");  // Throttled (same user_id)

See tests/event_fields.rs for complete examples.

§Exempting Critical Events

Some events should never be throttled, such as security alerts, audit logs, or compliance events. Use .with_exempt_targets() to bypass rate limiting for specific targets:

let layer = TracingRateLimitLayer::builder()
    .with_exempt_targets(vec![
        "myapp::security".to_string(),
        "myapp::audit".to_string(),
    ])
    .build()
    .unwrap();

Events from exempt targets always pass through:

// These are never throttled (exempt target)
error!(target: "myapp::security", "Security breach detected");
info!(target: "myapp::audit", user = "alice", action = "login", "Audit log");

// These get throttled normally
info!("Regular application log");

Exempt events still count toward metrics (recorded as “allowed”), ensuring visibility into total event volume.

See tests/exempt_targets.rs for complete examples.

§Observability

Monitor rate limiting behavior with built-in metrics:

// Get current metrics
let metrics = rate_limit.metrics();
println!("Events allowed: {}", metrics.events_allowed());
println!("Events suppressed: {}", metrics.events_suppressed());
println!("Signatures evicted: {}", metrics.signatures_evicted());

// Get snapshot for calculations
let snapshot = metrics.snapshot();
println!("Suppression rate: {:.2}%", snapshot.suppression_rate() * 100.0);

§Eviction Strategies

Control which event signatures are kept when storage limits are reached:

§LRU (Default)

let layer = TracingRateLimitLayer::builder()
    .with_max_signatures(10_000)  // Uses LRU eviction by default
    .build()
    .unwrap();

§Priority-Based

Keep important events (ERROR) over less important ones (INFO):

let layer = TracingRateLimitLayer::builder()
    .with_max_signatures(5_000)
    .with_eviction_strategy(EvictionStrategy::Priority {
        max_entries: 5_000,
        priority_fn: Arc::new(|_sig, state| {
            match state.metadata.as_ref().map(|m| m.level.as_str()) {
                Some("ERROR") => 100,
                Some("WARN") => 50,
                Some("INFO") => 10,
                _ => 5,
            }
        }),
    })
    .build()
    .unwrap();

§Memory-Based

Enforce memory limits with automatic tracking:

let layer = TracingRateLimitLayer::builder()
    .with_eviction_strategy(EvictionStrategy::Memory {
        max_bytes: 5 * 1024 * 1024,  // 5MB limit
    })
    .build()
    .unwrap();

§Combined

Use both priority and memory constraints:

let layer = TracingRateLimitLayer::builder()
    .with_eviction_strategy(EvictionStrategy::PriorityWithMemory {
        max_entries: 10_000,
        priority_fn: Arc::new(|_sig, state| {
            match state.metadata.as_ref().map(|m| m.level.as_str()) {
                Some("ERROR") => 100,
                _ => 10,
            }
        }),
        max_bytes: 10 * 1024 * 1024,
    })
    .build()
    .unwrap();

See examples/eviction.rs for complete working examples.

§Fail-Safe Operation

The library uses a circuit breaker to fail open during errors, preserving observability over strict rate limiting:

// Check circuit breaker state
let cb = rate_limit.circuit_breaker();
match cb.state() {
    CircuitState::Closed => println!("Normal operation"),
    CircuitState::Open => println!("Failing open - allowing all events"),
    CircuitState::HalfOpen => println!("Testing recovery"),
}

§Memory Management

By default, tracks up to 10,000 unique event signatures with LRU eviction. Each signature uses approximately 200-400 bytes (includes event metadata for summaries).

Typical memory usage:

  • 10,000 signatures (default): ~2-4 MB
  • 50,000 signatures: ~10-20 MB
  • 100,000 signatures: ~20-40 MB

Configuration:

// Increase limit for high-cardinality applications
let rate_limit = TracingRateLimitLayer::builder()
    .with_max_signatures(50_000)
    .build()
    .unwrap();

// Monitor usage
let sig_count = rate_limit.signature_count();
let evictions = rate_limit.metrics().signatures_evicted();

§Memory Usage Breakdown

Each tracked signature consumes memory for:

Per-Signature Memory:
├─ EventSignature (hash key)      ~32 bytes  (u64 hash)
├─ EventState (value)              ~170-370 bytes
│  ├─ Policy state                 ~40-80 bytes (depends on policy type)
│  ├─ SuppressionCounter           ~40 bytes (atomic counters + timestamp)
│  ├─ EventMetadata (Optional)     ~50-200 bytes (level, message, target, fields)
│  │  ├─ Level string              ~8 bytes
│  │  ├─ Message string            ~20-100 bytes (depends on message length)
│  │  ├─ Target string             ~20-50 bytes (module path)
│  │  └─ Fields (BTreeMap)         ~0-50 bytes (depends on field count)
│  └─ Metadata overhead            ~40 bytes (DashMap internals)
└─ Total per signature             ~200-400 bytes (varies with policy & message length)

Estimated memory usage at different signature limits:

SignaturesMemory (typical)Memory (worst case)Use Case
1,000~200 KB~400 KBSmall apps, few event types
10,000 (default)~2 MB~4 MBMost applications
50,000~10 MB~20 MBHigh-cardinality apps
100,000~20 MB~40 MBVery large systems

Additional overhead:

  • Metrics: ~100 bytes (atomic counters)
  • Circuit breaker: ~200 bytes (state tracking)
  • Layer structure: ~500 bytes
  • Total fixed overhead: ~800 bytes

§Signature Cardinality Analysis

What affects signature cardinality?

By default, signatures include (level, target, message, ALL field values). This means each unique combination of field values creates a new signature.

// Low cardinality (good) - no fields, same signature every time
info!("User login successful");  // Always same signature

// Medium cardinality - one signature per unique user_id value
info!(user_id = 123, "User login");  // Signature includes user_id=123
info!(user_id = 456, "User login");  // DIFFERENT signature (user_id=456)

// High cardinality (danger) - new signature for every request
info!(request_id = %uuid, "Processing");  // New signature every time!
// Solution: Exclude high-cardinality fields
// .with_excluded_fields(vec!["request_id".to_string()])

Cardinality examples:

PatternConfigUnique SignaturesMemory Impact
Static messages onlyDefault~10-100Minimal (~10 KB)
Messages with stable IDsDefault~1,000-10,000Low (1-2 MB)
Per-user + per-endpointDefault~100,000+Medium (10-25 MB)
With request_id fieldDefaultUnboundedHigh risk
With .with_excluded_fields(["request_id"])Exclude UUIDs~1,000-10,000Low (1-2 MB)

How to estimate your cardinality:

  1. Count unique log templates in your codebase
  2. Multiply by field cardinality (unique values per field)
  3. Example calculation:
    • 50 unique log messages
    • 10 severity levels used
    • Average 20 unique user IDs per message
    • Estimated: 50 × 20 = 1,000 signatures (✓ well below default)

§Configuration Guidelines

When to use the default (10k signatures):

  • ✅ Most applications with structured logging
  • ✅ Log messages use stable identifiers (user_id, tenant_id, service_name)
  • ✅ You’re unsure about cardinality
  • ✅ Memory is not severely constrained

When to increase the limit:

let rate_limit = TracingRateLimitLayer::builder()
    .with_max_signatures(50_000)  // 5-10 MB overhead
    .build()
    .expect("valid config");
  • ✅ High log volume with many unique event types (>10k)
  • ✅ Large distributed system with many services/endpoints
  • ✅ You’ve measured cardinality and need more capacity
  • ✅ Memory is available (10+ MB is acceptable)

When to use unlimited signatures:

let rate_limit = TracingRateLimitLayer::builder()
    .with_unlimited_signatures()  // ⚠️ Unbounded memory growth
    .build()
    .expect("valid config");
  • ⚠️ Use with extreme caution - can cause unbounded memory growth
  • ✅ Controlled environments (short-lived processes, tests)
  • ✅ Known bounded cardinality with monitoring in place
  • ✅ Memory constraints are not a concern
  • Never use if logging includes UUIDs, timestamps, or other high-cardinality data

§Monitoring Memory Usage

Check signature count in production:

// In a periodic health check or metrics reporter:
let sig_count = rate_limit.signature_count();
let evictions = rate_limit.metrics().signatures_evicted();

if sig_count > 8000 {
    warn!("Approaching signature limit: {}/10000", sig_count);
}

if evictions > 1000 {
    warn!("High eviction rate: {} signatures evicted", evictions);
}

Integrate with memory profilers:

# Use Valgrind Massif for heap profiling
valgrind --tool=massif --massif-out-file=massif.out ./your-app

# Analyze with ms_print
ms_print massif.out

# Look for DashMap and EventState allocations

Signs you need to adjust signature limits:

SymptomLikely CauseAction
High eviction rate (>1000/min)Cardinality > limitIncrease max_signatures
Memory growth over timeUnbounded cardinalityFix logging (remove UUIDs), add limit
Low signature count (<100)Over-provisionedCan reduce limit safely
Frequent evictions + suppressionLimit too lowIncrease limit or reduce cardinality

Re-exports§

pub use domain::policy::CountBasedPolicy;
pub use domain::policy::ExponentialBackoffPolicy;
pub use domain::policy::Policy;
pub use domain::policy::PolicyDecision;
pub use domain::policy::PolicyError;
pub use domain::policy::RateLimitPolicy;
pub use domain::policy::TimeWindowPolicy;
pub use domain::policy::TokenBucketPolicy;
pub use domain::signature::EventSignature;
pub use domain::summary::SuppressionCounter;
pub use domain::summary::SuppressionSummary;
pub use application::circuit_breaker::CircuitBreaker;
pub use application::circuit_breaker::CircuitBreakerConfig;
pub use application::circuit_breaker::CircuitState;
pub use application::emitter::EmitterConfigError;
pub use application::limiter::RateLimiter;
pub use application::metrics::Metrics;
pub use application::metrics::MetricsSnapshot;
pub use application::ports::Clock;
pub use application::ports::EvictionCandidate;
pub use application::ports::EvictionPolicy;
pub use application::ports::Storage;
pub use application::registry::SuppressionRegistry;
pub use application::emitter::EmitterHandle;
pub use application::emitter::ShutdownError;
pub use infrastructure::clock::SystemClock;
pub use infrastructure::eviction::LruEviction;
pub use infrastructure::eviction::MemoryEviction;
pub use infrastructure::eviction::PriorityEviction;
pub use infrastructure::eviction::PriorityFn;
pub use infrastructure::eviction::PriorityWithMemoryEviction;
pub use infrastructure::layer::BuildError;
pub use infrastructure::layer::EvictionStrategy;
pub use infrastructure::layer::TracingRateLimitLayer;
pub use infrastructure::layer::TracingRateLimitLayerBuilder;
pub use infrastructure::storage::ShardedStorage;
pub use infrastructure::layer::SummaryFormatter;

Modules§

application
Application layer - orchestration of domain logic.
domain
Domain layer - pure business logic with no external dependencies.
infrastructure
Infrastructure layer - external adapters and integrations.