Expand description
§CL-TDS
Cache-Locked Temporal Decay Sketch — a fixed-memory, lock-free streaming algorithm for detecting heavy hitters in real-time data streams.
CL-TDS answers one question: “What items are appearing most frequently in this stream, right now?” — using only 1 MB of memory, with old data fading automatically.
§Getting Started
Add to your Cargo.toml:
[dependencies]
cl-tds = { git = "https://github.com/ddsha441981/cl-tds" }Basic usage:
use cl_tds::ClTds;
let sketch = ClTds::new();
// Record items (accepts any u64 hash)
sketch.increment(0xDEAD_BEEF);
sketch.increment(0xDEAD_BEEF);
sketch.increment(0xCAFE_BABE);
// Query frequency estimates
assert!(sketch.query(0xDEAD_BEEF) >= 2);
assert!(sketch.query(0xCAFE_BABE) >= 1);§How To Use
CL-TDS is a library. You integrate it into your application and feed it your data. The algorithm does not care where data comes from — network packets, log files, database queries, Kafka streams, or anything else.
§Step 1: Hash your data into u64
CL-TDS operates on u64 identifiers. Hash your domain objects:
use std::hash::{Hash, Hasher};
use std::collections::hash_map::DefaultHasher;
fn to_id<T: Hash>(item: &T) -> u64 {
let mut h = DefaultHasher::new();
item.hash(&mut h);
h.finish()
}
// Works with any type:
// to_id(&"192.168.1.1") → for IP addresses
// to_id(&"POST /api/pay") → for API endpoints
// to_id(&"#trending") → for hashtags
// to_id(&12345u64) → for user IDs§Step 2: Choose your mode
use cl_tds::ClTds;
// Manual mode — you control when data decays (good for batch processing)
let sketch = ClTds::new();
sketch.increment(42);
sketch.tick_epoch(); // data halves each tick
// Auto mode — data decays based on wall clock (good for real-time)
let sketch = ClTds::with_epoch_interval(1000); // decay every 1 second
sketch.increment(42);
// count automatically halves every second — no manual ticking needed§Step 3: Feed your stream and query
use cl_tds::ClTds;
let sketch = ClTds::with_epoch_interval(5000); // decay every 5s
// Your data loop (network, logs, events — anything)
for packet in packets {
sketch.increment(packet);
}
// Check if something is a heavy hitter
let threshold = 1000;
if sketch.query(suspect) > threshold {
// alert! this item appeared too frequently
}§Real-World Integration Patterns
All patterns use the same core:
sketch.increment(hash) to record, sketch.query(hash) to check.
Only the data source and threshold change per domain.
§Network Security — DDoS Detection
let sketch = ClTds::with_epoch_interval(1000); // 1s window
// Feed source IPs from packet capture
sketch.increment(hash_str("1.2.3.4"));
if sketch.query(hash_str("1.2.3.4")) > 10_000 { /* DDoS alert */ }§API Rate Limiting
let limiter = ClTds::with_epoch_interval(60_000); // 1-min window
// Feed user IDs from request handler
limiter.increment(user_id);
if limiter.query(user_id) > 100 { /* throttle: >100 req/min */ }§Log Analysis — Error Surge Detection
let monitor = ClTds::with_epoch_interval(10_000); // 10s window
// Feed endpoint+status from log pipeline
monitor.increment(hash_str("POST /api/checkout 500"));
if monitor.query(hash_str("POST /api/checkout 500")) > 50 { /* alert */ }§Gaming — Anti-Cheat
let detector = ClTds::with_epoch_interval(5_000); // 5s window
// Feed player action events
detector.increment(player_id);
if detector.query(player_id) > 500 { /* abnormal speed — possible cheat */ }§Telecom — SIM-Box Fraud
let cdr = ClTds::with_epoch_interval(3600_000); // 1-hour window
// Feed caller numbers from CDR stream
cdr.increment(hash_str("+91-9999000001"));
if cdr.query(hash_str("+91-9999000001")) > 1_000 { /* fraud alert */ }§IoT — Faulty Sensor Detection
let sensors = ClTds::with_epoch_interval(60_000); // 1-min window
// Feed sensor alert events
sensors.increment(sensor_id);
if sensors.query(sensor_id) > 100 { /* sensor #42 firing too often */ }§Social Media — Trending Hashtags
let trends = ClTds::with_epoch_interval(30_000); // 30s window
// Feed hashtags from post stream
trends.increment(hash_str("#BreakingNews"));
if trends.query(hash_str("#BreakingNews")) > 5_000 { /* trending! */ }§Fintech — Card Fraud
let fraud = ClTds::with_epoch_interval(3600_000); // 1-hour window
// Feed card numbers from transaction stream
fraud.increment(hash_str("card-XXXX-4567"));
if fraud.query(hash_str("card-XXXX-4567")) > 50 { /* suspicious activity */ }Same pattern works for all 17 proven domains — DNS monitoring, CDN caching,
click fraud, spam detection, search query tracking, smart grid monitoring, and more.
See Applicable Domains section for the full list.
§Key Properties
| Property | Value |
|---|---|
| Memory | Fixed 1 MB (4 rows × 65536 cols × 4 bytes) |
| Operations | increment() O(1), query() O(1) |
| Thread safety | Lock-free atomic CAS — safe from any number of threads |
| Decay | Lazy — O(1) per touch, no background thread |
| Dependencies | Zero — pure std::sync::atomic |
| Error bound (ε) | ≈ 0.0000414 (overcount per stream item) |
| Failure probability (δ) | ≤ 1.8% (4 independent rows) |
| Persistence | ClTds::to_bytes / ClTds::from_bytes |
§Multi-Instance Scaling
Each sketch uses exactly 1 MB. A typical CPU L3 cache is 6–12 MB, so you can run 5 independent sketches simultaneously — each monitoring a different stream — and all fit in L3 cache without evicting each other.
use cl_tds::ClTds;
use std::sync::Arc;
use std::thread;
// 5 sketches × 1 MB = 5 MB — fits in any modern L3 cache
let sketches: Vec<Arc<ClTds>> = (0..5)
.map(|_| Arc::new(ClTds::with_epoch_interval(1000)))
.collect();
// Each thread monitors a different domain — zero contention
// thread::spawn(move || {
// // Thread 0: Network DDoS monitoring
// // Thread 1: API rate limiting
// // Thread 2: Log error tracking
// // Thread 3: Click fraud detection
// // Thread 4: DNS query monitoring
// sketch.increment(i as u64);
// });Benchmarked: 5 threads running concurrently, each processing its own domain (Network, Crypto, Analytics, Gaming, IP Flood) — all maintaining full throughput with zero speed loss.
§Bring Your Own Data
CL-TDS accepts any u64 hash, so you can feed it data from any source —
CSV files, JSON logs, database streams, Kafka topics, or raw sockets.
Read from a file:
use cl_tds::ClTds;
use std::io::{BufRead, BufReader};
use std::fs::File;
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
fn hash_line(s: &str) -> u64 {
let mut h = DefaultHasher::new();
s.hash(&mut h);
h.finish()
}
let sketch = ClTds::with_epoch_interval(1000);
let file = File::open("traffic.csv").unwrap();
for line in BufReader::new(file).lines() {
if let Ok(item) = line {
sketch.increment(hash_line(&item));
}
}
// Now query any item's frequency
println!("Frequency: {}", sketch.query(hash_line("suspicious_item")));§Applicable Domains
CL-TDS works for any domain that needs: heavy hitter detection in a continuous stream with bounded memory and temporal decay.
17 domains benchmarked and proven: Network security (DDoS), financial trading (HFT), web analytics, gaming anti-cheat, DNS monitoring, API rate limiting, click fraud, CDN caching, telecom fraud, IoT sensors, log analysis, spam detection, social trending, search tracking, smart grid, fintech fraud, and IP flood detection.
Not suitable for: Exact counting (ledgers/voting), listing unique items,
item deletion (GDPR), small datasets (< 100K — use HashMap),
relationship or sequence detection.
§CL-TDS vs HashMap
| Unique Items | HashMap | CL-TDS | Savings |
|---|---|---|---|
| 100K | 1.9 MB | 1 MB | 2x |
| 1M | 29.8 MB | 1 MB | 30x |
| 5M | 119 MB | 1 MB | 119x |
| 10M | 238 MB | 1 MB | 238x |
CL-TDS is also 1.8x faster on insert and 4.8x faster on query because it fits entirely in L3 cache. Plus HashMap can’t do temporal decay — stale data stays forever.
§Testing
cargo test # 40 unit tests + 14 doc tests
cargo run --example basic # Insert/query demo
cargo run --example decay # Temporal decay visualization
cargo run --example multithread # Concurrent stress test
cargo run --example persistence # Save/restore to diskStructs§
- ClTds
- The main sketch data structure. Uses 1 MB of memory to track item frequencies in a stream, with old data fading out automatically.
- Row
- One row of the sketch matrix (65536 atomic counters, cache-line aligned).
Constants§
- DEPTH
- Matrix depth — number of independent hash rows. δ = e^{-DEPTH} ≈ 1.8% with DEPTH=4.
- MAX_
COUNT - Maximum counter value before saturation.
- WIDTH
- Matrix width — columns per row. 2^16 = 65536.
Functions§
- decay_
steps - Calculates how many epochs have passed since a cell was last touched. Handles 8-bit timestamp wraparound. Result is capped at 24 because shifting a 24-bit counter more than 24 times always gives zero.
- pack
- Packs a timestamp (upper 8 bits) and a counter (lower 24 bits) into one
u32cell. - unpack
- Extracts the timestamp and counter from a packed
u32cell.