kelora 1.1.0

A command-line log analysis tool with embedded Rhai scripting
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
# Missing Feature Ideas for Kelora

This document contains creative, useful features that fit Kelora's design principles and architecture but haven't been implemented yet. All suggestions are:

- ✅ Simple to implement (30-200 lines each)
- ✅ Fit the streaming pipeline architecture
- ✅ Actually missing (not currently possible without verbose workarounds)
- ✅ Broadly useful for common log analysis patterns
- ✅ Follow Kelora's existing conventions

## Datetime & Time-Based Features

### 1. Time Bucketing/Rounding for Timestamps (`dt.round_to()`)

**What:** Round timestamps to nearest time interval for easy grouping and aggregation.

**Why useful:** When analyzing logs, you often want to group events by hour, minute, or 5-minute intervals. Currently you'd need manual arithmetic with timestamps. This would be a simple datetime method.

**Example:**
```rhai
// Round to nearest hour for grouping
e.ts = to_datetime(e.timestamp).round_to("1h")

// Round to 5-minute buckets for rate analysis
bucket_time = to_datetime(e.ts).round_to("5m")
track_count(bucket_time.to_string())
```

**Implementation:** Simple datetime manipulation using existing Duration types. ~50 lines in `datetime.rs`.

---

### 9. Burst/Session Detection (`track_burst()`)

**What:** Detect when events cluster within a time window - identify attack bursts, user sessions, batch processing windows.

**Why useful:** Common security/monitoring pattern. "Alert when >50 failed logins in 5 minutes" is currently complex. This makes it trivial.

**Example:**
```rhai
// Detect authentication attacks
burst_size = track_burst("failed_login_" + e.ip, "5m")
if burst_size > 50 {
    e.alert = "POSSIBLE_BRUTE_FORCE"
    e.burst_size = burst_size
}

// Identify batch processing windows
batch_events = track_burst("job_" + e.job_id, "10s")
e.batch_position = batch_events

// Auto-reset after idle period
user_session_size = track_burst("user_" + e.user_id, "30m")
```

**Implementation:** Time-windowed counter with expiration, ~100 lines in `tracking.rs`.

---

## Tracking & State Management

### 2. Event Deduplication (`track_seen()`)

**What:** Track whether a value has been seen before, perfect for deduplicating events based on request IDs, hashes, or combinations of fields.

**Why useful:** Log shippers often produce duplicates. Currently you'd need to manually maintain a hash set in the state map. A built-in function would make this trivial.

**Example:**
```rhai
// Skip duplicate request IDs
if track_seen("request_ids", e.request_id) {
    skip()
}

// Dedupe by content hash
content_hash = e.to_json().hash()
if !track_seen("events", content_hash) {
    e  // Only emit first occurrence
}
```

**Implementation:** Uses internal HashSet in tracking state. ~80 lines in `tracking.rs`.

---

### 6. First/Last Value Tracking (`track_first()` / `track_last()`)

**What:** Remember the first or last seen value for a key - useful for baseline comparisons, time range detection, and status changes.

**Why useful:** Common pattern in monitoring: "what was the initial value?" or "what's the latest status?" Currently requires manual state management.

**Example:**
```rhai
// Track when each service first appeared
first_seen = track_first("service_start_" + e.service, e.timestamp)
e.uptime_since = first_seen

// Compare current vs initial memory
baseline_mem = track_first("baseline_memory", e.memory_mb)
e.mem_growth = e.memory_mb - baseline_mem

// Track latest status per host
latest_status = track_last("host_" + e.hostname, e.status)
if latest_status == "degraded" && e.status == "healthy" {
    print("✅ " + e.hostname + " recovered")
}
```

**Implementation:** Simple key-value store in tracking state, ~40 lines.

---

### 8. Auto-Incrementing Sequence Numbers (`track_sequence()`)

**What:** Add monotonic sequence IDs to events - critical for detecting missing events, ordering guarantees, or adding synthetic keys.

**Why useful:** When correlating across systems or detecting gaps in event streams. Dead simple feature that's surprisingly useful.

**Example:**
```rhai
// Add sequence number to every event
e.seq = track_sequence("global")

// Per-service sequences
e.service_seq = track_sequence("seq_" + e.service_name)

// Detect gaps: if seq jumped by >1, events were lost
prev = track_last("prev_seq", e.seq)
if e.seq - prev > 1 {
    print("⚠️ Missing " + (e.seq - prev - 1) + " events!")
}
```

**Implementation:** Simple counter in tracking state, ~25 lines.

---

### 13. Event Correlation Tracker (`track_correlation()`)

**What:** Match paired events (request/response, start/end) and emit when both arrive.

**Why useful:** Distributed systems generate paired events. Correlating them reveals latency, completion rates, and failures. Currently requires complex state management.

**Example:**
```rhai
// Match HTTP requests with responses
if e.event_type == "request" {
    track_correlation("http_" + e.request_id, e, "request")
} else if e.event_type == "response" {
    pair = track_correlation("http_" + e.request_id, e, "response")
    if pair != () {
        // Emit combined event with both request and response
        #{
            request_id: e.request_id,
            latency: e.timestamp - pair.timestamp,
            status: e.status,
            endpoint: pair.endpoint
        }
    }
}
```

**Implementation:** State storage with timeout/cleanup, ~120 lines in `tracking.rs`.

---

### 15. Threshold-Based Heavy Hitters (`track_if_above()`)

**What:** Only track items that appear above a threshold - automatic filtering of rare values.

**Why useful:** When tracking top items, you don't want to waste memory on singletons. This auto-prunes low-frequency values.

**Example:**
```rhai
// Only track IPs with >10 requests (ignore one-offs)
track_if_above("frequent_ips", e.ip, 10)

// Track only popular endpoints
track_if_above("hot_paths", e.path, 100)

// Auto-filter noise, metrics shows only significant items
```

**Implementation:** Delayed insertion until threshold met, ~70 lines in `tracking.rs`.

---

## Statistical & Metrics Functions

### 3. Rate/Throughput Calculation (`track_rate()`)

**What:** Automatically calculate events per second/minute/hour for performance monitoring.

**Why useful:** Understanding throughput is critical for monitoring. Currently you'd need complex window logic or manual timestamp tracking. This would make it one-liner.

**Example:**
```rhai
// Track overall throughput
track_rate("total_eps")  // events per second

// Track per-endpoint rates
track_rate("endpoint_" + e.path, "1m")  // per minute

// At end, metrics shows: endpoint_/api/users: 145.2/min
```

**Implementation:** Tracks timestamps internally, calculates rolling rate. ~100 lines in `tracking.rs`.

---

### 5. Field Change Detection (`track_delta()`)

**What:** Track numeric changes between consecutive events - perfect for monitoring counters, detecting anomalies, or calculating derivatives.

**Why useful:** Many systems log cumulative counters (bytes processed, requests served). You often want the delta between readings. Currently requires manual state tracking.

**Example:**
```rhai
// Track request count increases
delta = track_delta("api_requests", e.total_requests)
if delta > 1000 {
    print("🔥 Spike detected: " + delta + " requests")
}

// Monitor memory growth
mem_delta = track_delta("memory_" + e.service, e.memory_mb)
if mem_delta > 500 {
    e.mem_increase = mem_delta
    e  // Flag events with big memory jumps
}
```

**Implementation:** Stores previous value in state, returns difference. ~60 lines in `tracking.rs`.

---

### 7. Streaming Approximate Percentiles (`track_percentiles()`)

**What:** Calculate percentiles on-the-fly without storing all values - essential for latency monitoring at scale.

**Why useful:** Current `percentile()` requires collecting all values in an array. For millions of events, this is memory-prohibitive. Streaming approximation using t-digest solves this.

**API Design:** `track_percentiles()` (plural) is the ONLY `track_*()` function that auto-suffixes metric names, because:
- You almost always want multiple percentiles (p50, p95, p99)
- They're parameterized by the percentile value
- Auto-suffixing prevents awkward repetition: `track_percentiles("api_latency_p95", e.dur, 95)`
- Plural name signals the special behavior

**Example:**
```rhai
// Track multiple percentiles - creates api_latency_p50, api_latency_p95, api_latency_p99
track_percentiles("api_latency", e.duration_ms, [50, 95, 99])

// Single percentile (use array with one element)
track_percentiles("api_latency", e.duration_ms, [95])
// → creates api_latency_p95

// Per-endpoint tracking
track_percentiles("latency_" + e.path, e.response_time, [95, 99])
// → creates latency_/api/users_p95, latency_/api/users_p99

// Metrics output:
//   api_latency_p50 = 123
//   api_latency_p95 = 245
//   api_latency_p99 = 890
```

**Implementation:**
- t-digest algorithm (1-2% accuracy, industry standard)
- Each percentile gets own t-digest: `api_latency_p95`, `api_latency_p99`
- Fully parallel-safe: each metric merges independently
- ~200-250 lines in `tracking.rs`

**Accuracy:** 1-2% relative error (e.g., true p95=250ms → reports 248-252ms). Good enough for operational monitoring, not for SLO compliance billing.

---

### 11. Moving Average for Smoothing (`track_moving_avg()`)

**What:** Calculate moving average over the last N values - smooth out spikes and see trends.

**Why useful:** Raw metrics are noisy. Moving averages reveal patterns. Great for dashboards and anomaly detection.

**Example:**
```rhai
// Smooth response times over last 100 requests
smooth_latency = track_moving_avg("api_latency", e.duration_ms, 100)
if e.duration_ms > smooth_latency * 3 {
    e.anomaly = "LATENCY_SPIKE"
}

// Track rolling error rate
errors = track_moving_avg("error_rate", e.is_error ? 1 : 0, 1000)
if errors > 0.05 {
    print("⚠️ Error rate above 5%: " + (errors * 100) + "%")
}
```

**Implementation:** Circular buffer in tracking state, ~80 lines in `tracking.rs`.

---

## Sampling & Filtering

### 4. Simple Counter-Based Sampling (`sample_every()`)

**What:** Sample every Nth event - simpler alternative to hash-based bucket sampling for when you just want "every 10th event."

**Why useful:** `bucket()` does deterministic sampling (great for consistent sampling across runs), but sometimes you just want "give me 10% of events" or "every 100th line." This complements the existing sampling.

**Example:**
```rhai
// Keep only every 100th event for high-volume logs
if !sample_every(100) { skip() }

// 1% sampling for metrics
if sample_every(100) {
    track_count("sampled_errors")
}
```

**Implementation:** Simple counter in state. ~30 lines in `rhai_functions/random.rs`.

---

## String & Parsing Functions

### 10. Regex Multi-Field Extraction (`absorb_regex()`)

**What:** Extract multiple named capture groups from a regex into event fields in one operation.

**Why useful:** Common pattern: parse unstructured text into structured fields. Currently needs multiple `extract_regex()` calls. This does it in one shot. Consistent with existing `absorb_kv()` function.

**Example:**
```rhai
// Parse custom log format in one go - modifies event in-place
e.absorb_regex(
    r"User (?P<user>\w+) from (?P<ip>[\d.]+) (?P<action>\w+) (?P<resource>.*)"
)
// Now e.user, e.ip, e.action, e.resource are all populated

// Parse error messages from a field
e.msg.absorb_regex(
    r"Error (?P<code>\d+): (?P<message>.*) at (?P<location>.*)"
)
// Extract named groups and add to event: e.code, e.message, e.location
```

**Implementation:** Use existing regex infrastructure, ~60 lines in `strings.rs`.

---

### 12. Smart Text Truncation (`truncate_words()`)

**What:** Truncate strings but preserve word boundaries - avoid cutting mid-word for cleaner output.

**Why useful:** When limiting message length for display or storage, cutting mid-word looks broken. This keeps it readable.

**Example:**
```rhai
// Truncate long messages cleanly
e.summary = e.message.truncate_words(50)  // Max 50 chars, break at word boundary

// Preserve formatting
e.preview = e.description.truncate_words(100, "...")

// Compare:
// Bad:  "The quick brown fox jum..."
// Good: "The quick brown fox..."
```

**Implementation:** String splitting with UTF-8 awareness, ~40 lines in `strings.rs`.

---

## Array & Collection Functions

### 14. Array Set Operations (`intersect()`, `difference()`, `union()`)

**What:** Set operations on arrays - find common elements, differences, combinations.

**Why useful:** Common when comparing tags, roles, permissions, or features between events. Currently requires verbose filtering.

**Example:**
```rhai
// Find common tags between events
common_tags = current_tags.intersect(previous_tags)

// Find newly added permissions
new_perms = after.permissions.difference(before.permissions)

// Combine feature flags
all_features = baseline_features.union(experimental_features)

// Alert on tag changes
added = e.tags.difference(baseline_tags)
if added.len() > 0 {
    e.tags_added = added
}
```

**Implementation:** HashSet operations, ~50 lines in `arrays.rs`.

---

## Summary by Category

### High Priority (Simple + High Impact)
- **Time Bucketing (`dt.round_to()`)** - Essential for grouping by time intervals
- **Multi-Field Extraction (`absorb_regex()`)** - Parse unstructured text in one shot
- **Streaming Percentiles (`track_percentiles()`)** - Parallel-safe, memory-efficient latency tracking
- **Counter Sampling (`sample_every()`)** - Complements existing bucket sampling
- **First/Last Tracking** - Common pattern, easy to implement

### Medium Priority (More Complex but Very Useful)
- **Burst Detection** - Security and monitoring use cases (sequential only)
- **Delta Tracking** - Counter monitoring and anomaly detection (sequential only)
- **Moving Average** - Noise reduction and trend detection
- **Set Operations** - Collection comparison utilities
- **Smart Truncation** - Polish for text handling

### Sequential-Mode Only (Cannot Work in Parallel)
- **Deduplication (`track_seen()`)** - Requires global state lookup
- **Sequence Numbers (`track_sequence()`)** - Inherently serial
- **Event Correlation (`track_correlation()`)** - Paired events may hit different workers
- **Rate Calculation (`track_rate()`)** - Requires event ordering by time

### Advanced Features (Require More Design)
- **Threshold Tracking** - Heavy hitter detection with auto-pruning
- **Cardinality Estimation (`track_unique()`)** - HyperLogLog for parallel-safe unique counts

## Implementation Notes

All features:
- Use existing infrastructure (tracking state, datetime wrappers, etc.)
- Follow Kelora's conventions (method-style calls, error handling)
- Require no external dependencies
- Fit within the streaming pipeline model
- Work in both sequential and parallel modes (where applicable)

Features that require `--metrics` flag:
- All `track_*` functions (consistent with existing tracking functions)

Features that work only in sequential mode:
- `track_seen()` (requires global state lookup)
- `track_sequence()` (sequences are inherently serial)
- `track_correlation()` (requires consistent state)
- `track_burst()` (time-window tracking needs ordering)
- `track_rate()` (requires event ordering)
- `track_delta()` (requires consecutive events)
- `track_moving_avg()` (needs ordered sliding window)

Features fully compatible with `--parallel`:
- `dt.round_to()` (stateless transformation)
- `absorb_regex()` (stateless parsing)
- `track_percentiles()` (t-digest merges, fully parallel-safe)
- `sample_every()` (with thread-local counters)
- `truncate_words()` (stateless string operation)
- Array set operations (stateless)
- `track_if_above()` (threshold filtering, mergeable)

## Special Note on Auto-Suffixing

After analyzing all existing `track_*()` usage patterns, **auto-suffixing is NOT applied to existing functions** because:
- 90% of usage is single-operation counting where suffix adds noise: `track_count("requests")``requests` is clearer than `requests_count`
- Users already encode semantics in names: `latency_total_ms`, `latency_p99`, `latency_samples`
- Pipe namespace patterns would break: `metric|service_sum` puts suffix in wrong place

**ONLY `track_percentiles()` auto-suffixes** because:
- Plural name signals the behavior
- Percentiles are always multi-valued (p50, p95, p99 together)
- Prevents awkward repetition: `track_percentiles("api_latency", e.dur, [95, 99])` is cleaner than manually repeating the key three times

See `dev/auto-suffix-examples-review.md` for detailed analysis of existing patterns.