kelora 0.8.1

A command-line log analysis tool with embedded Rhai scripting
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
Kelora Span Spec (tumbling, sequential)

Intent

Provide non-overlapping aggregation over a stream with a minimal, deterministic UX. Spans are batches formed by count or event time. Per-event scripts still run; a hook runs once per span.

⸻

CLI

--span <N|DURATION>
--span-close '<Rhai>'

	•	--span <N> → count-based: close after every N events that pass --filter (if specified). The count applies to the filtered event stream.
	•	--span <DURATION> → time-based: close on fixed time boundaries derived from the events' timestamp field (see "Time source").
	•	--span-close → Rhai snippet executed once when a span closes. Span helpers are available only here.

Sequential only. If --parallel is supplied, Kelora prints a warning and runs sequentially.

⸻

Time source & alignment (time-based mode)
	•	Time is taken from the event's ts (Kelora's canonical timestamp field).
	•	First event with valid ts anchors the cadence to absolute boundaries:
	•	Compute the boundary period containing the first event, aligned to the duration.
	•	Example: first ts = 12:03:27 with --span 1m → first span is [12:03:00, 12:04:00), then [12:04:00, 12:05:00), etc.
	•	Events with missing/invalid ts do not affect anchor selection.
	•	Intervals are half-open: [start, end).
	•	Implementation note: Use integer milliseconds for timestamp comparisons to avoid floating-point precision issues.

Missing/invalid ts:
	•	Strict mode (`--strict`): event is an error (fail-fast).
	•	Resilient (default): event is processed normally (--filter and --exec run) but excluded from time-span aggregation. It is NOT added to any span buffer and does NOT appear in span.events or contribute to span.size. Metadata assigned: meta.span_status = "unassigned", meta.span_id = null, meta.span_start and meta.span_end are null.

⸻

Late events (time-based)

Definitions assume duration_ms is the configured span length (milliseconds) and anchor_start_ms is the start of the first window (derived from the first valid timestamp).

An event is late iff its ts falls into a span that has already been closed, including windows that chronologically precede the anchor event.

Policy (no flags):
	1.	We do not reopen or mutate closed spans.
	2.	Event is tagged:
		•	meta.span_status = "late"
		•	meta.span_id = "<start ISO8601>/<duration>" of the closed span it would've belonged to.
		•	meta.span_start, meta.span_end reflect that window's bounds.
	•	Window bounds use integer math to avoid float precision bugs:
		•	delta_ms = event_ts_ms - anchor_start_ms
		•	k = delta_ms.div_euclid(duration_ms) // floor division handles negatives
		•	window_start_ms = anchor_start_ms + k * duration_ms
		•	window_end_ms = window_start_ms + duration_ms
	3.	Per-event scripts (--exec, --filter) still run.
	4.	Late events are emitted to output (unless suppressed by --filter or --exec).
	5.	Internal counter late_events increments (visible in --stats).

If an event belongs to the currently open span, it's included and meta.span_status = "included".

Count-based mode has no late events (spans follow arrival order).

Note: For accurate time-based aggregation, users should pre-sort logs by timestamp (e.g., `sort -k <timestamp-field>`).

⸻

Span context (Rhai) — available only during --span-close

Note: Window helpers (window_events(), window_size(), etc.) are NOT available in --span-close context. Use the span object exclusively.
The hook fires after the event that closed the span finishes all per-event stages, so span.events only contains events that survived the pipeline. The hook still runs even if span.size == 0 (e.g., every event was filtered or marked unassigned); scripts decide whether to emit anything.

Read-only span object:
	•	span.start → DateTime (start bound; () for count spans)
	•	span.end → DateTime (end bound; () for count spans)
	•	span.id → String
	•	Time: "{ISO_START}/{DURATION}" (e.g., 2025-10-15T12:03:00Z/1m)
	•	Count: "#<index>" (0-based)
	•	span.events → Array of events in the span, in arrival order
	•	span.size → Int (event count)
	•	span.metrics → Map snapshot of metrics recorded during the span

Metadata (assigned during per-event processing, available in --exec, --filter, and --span-close):
	•	meta.span_status (String: "included" | "late" | "unassigned" | "filtered")
	•	meta.span_id (String or null)
	•	meta.span_start (DateTime or null)
	•	meta.span_end (DateTime or null)

Status values:
	•	"included" → Event is buffered in current span (normal case)
	•	"late" → Valid ts but arrived after its span closed (time mode only)
	•	"unassigned" → Missing/invalid ts (time mode only)
	•	"filtered" → Failed --filter, not buffered or emitted

⸻

Processor semantics
	1.	Per-event phase (always):
	•	Before the first user-provided stage runs, compute span alignment for the event and assign provisional metadata: meta.span_id/span_start/span_end plus meta.span_status = "included", "late", or "unassigned". This metadata is visible to every --exec/--filter stage.
	•	Run CLI-ordered --exec/--filter stages. Each stage can inspect meta.span_*.
	•	If a --filter returns false, immediately set meta.span_status = "filtered", skip any later stages, and do not emit or buffer the event.
	•	When the event exits the per-event pipeline without being filtered, emit it normally. If meta.span_status is still "included", append it to the current span buffer for span.events; "late" and "unassigned" events continue to flow but never enter the buffer.
	2.	Boundary detection:
	•	Count: when N events with status "included" have accumulated → close.
	•	Time: when an event's ts crosses into a newer interval → close the current span.
	•	Filtered events: in count mode, don't count toward N; in time mode, advance boundaries but aren't buffered.
	3.	Close phase:
	•	After the event that triggered the boundary finishes the per-event pipeline, run --span-close once with the span object; access buffered events via span.events. span.size may be 0 if every event was filtered or marked unassigned.
	•	span.metrics snapshots tracked values for the span; the snapshot is cleared after --span-close finishes.
	•	--span-close can emit span-level summaries via emit_each().
	•	Reset span state and start the next span.
	4.	End of input or interrupt signal (SIGINT/SIGTERM):
	•	If a span is open with buffered events, close it and run --span-close before termination.
	•	First SIGINT/SIGTERM during --span-close: defer signal until script completes.
	•	Second SIGINT/SIGTERM within 2 seconds: force immediate exit (code 130/143).

No synthetic empty spans are emitted (gaps with no events produce no span).

Example: With --span 1m and events at 12:00, 12:05, only the two spans containing events are emitted (not 12:01, 12:02, 12:03, 12:04).

⸻

Interactions
	•	--window N (sliding) is unchanged and orthogonal. If both are provided, --span still governs close hooks; window helpers reflect the sliding window within each per-event execution.
	•	--take limits overall output. If the take limit is reached mid-span, the current span is closed and --span-close runs before termination.
	•	--since/--until filter which events enter the stream; spans form over what remains.
	•	--stats shows:
	•	total_spans_closed (number of spans that closed)
	•	avg_events_per_span (mean span size)
	•	late_events (time-based only; count of events with meta.span_status = "late")
	•	unassigned_events (time-based only; count of events with meta.span_status = "unassigned")
	•	--metrics works as usual; use track_* in per-event or --span-close. See "Span metrics access" below for reading metrics.

⸻

Span metrics access

In --span-close context, two read-only maps are available:

	•	span.metrics → Metrics collected since the current span opened. The map is refreshed before --span-close runs and cleared afterward.
	•	metrics → Global metrics accumulated for the whole run (same as --end stage with --metrics enabled).

Patterns:
	•	Per-span aggregation: Use track_* in --exec to accumulate, then read values directly from span.metrics during --span-close (e.g., `let metrics = span.metrics; let hits = if metrics.contains("hits") { metrics["hits"] } else { 0 };`). The map is empty if the span produced no tracked values.
	•	Cross-span state: Read metrics[...] for cumulative totals and continue updating them with track_* if you need rolling aggregates across spans.

Both maps are immutable from Rhai scripts; functions such as metrics.pop() are unavailable. Span consumption never mutates the global metrics map, so the final --metrics report remains intact.

⸻

Filter interaction decision table

Event disposition by mode and condition:

┌─────────────────┬──────────────┬────────────────┬──────────────────┬─────────────────┐
│ Condition       │ Counted for  │ Buffered for   │ Emitted to       │ span_status     │
│                 │ Boundary?    │ span.events?   │ Output?          │                 │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ COUNT MODE                                                                            │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Passes --filter │ Yes (N++)    │ Yes            │ Yes              │ "included"      │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Fails --filter  │ No           │ No             │ No               │ "filtered"      │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ TIME MODE                                                                             │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Valid ts,       │ Yes (closes  │ Yes            │ Yes              │ "included"      │
│ passes filter   │ if new       │                │                  │                 │
│                 │ interval)    │                │                  │                 │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Valid ts,       │ Yes (closes  │ No             │ No               │ "filtered"      │
│ fails filter    │ if new       │                │                  │                 │
│                 │ interval)    │                │                  │                 │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Missing/invalid │ No           │ No             │ Yes              │ "unassigned"    │
│ ts, passes      │              │                │                  │                 │
│ filter          │              │                │                  │                 │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Late ts (after  │ No           │ No             │ Yes              │ "late"          │
│ span closed),   │              │                │                  │                 │
│ passes filter   │              │                │                  │                 │
└─────────────────┴──────────────┴────────────────┴──────────────────┴─────────────────┘

Key principle: In time mode, all events with valid ts participate in boundary detection (deterministic span boundaries regardless of filter logic). Only "included" events are buffered.

⸻

Error handling
	•	Invalid --span argument → usage error (Exit 2).
	•	Accepted forms:
	•	Count: /^[1-9]\d*$/
	•	Duration: <int>(ms|s|m|h) (e.g., 500ms, 10s, 1m, 2h)
	•	Time-based with missing ts:
	•	Strict: error per event (Exit 1 if encountered).
	•	Resilient: mark as "unassigned" status; proceed.
	•	If --parallel is set: print a warning "--span forces sequential execution" and continue sequentially.
	•	If count N > 100,000: print a warning about potential memory usage. Consider breaking into smaller spans or using time-based mode.

⸻

Performance & memory
	•	Bounded memory: only the current span is buffered (plus normal pipeline state).
	•	Time-based: a late event never reopens prior spans, so memory does not grow with disorder.
	•	Count-based: buffer size ≤ N events.

⸻

Examples

Count spans, rolling averages

kelora -j --span 500 \
  --exec 'track_sum("lat", e.latency.to_int())' \
  --span-close '
    let n = span.size;
    let metrics = span.metrics;
    let sum = if metrics.contains("lat") { metrics["lat"] } else { 0 };
    emit_each([#{span: span.id, n: n, avg_latency: if n > 0 { sum / n } else { 0 }}]);
  '

Time spans, log late arrivals

kelora -j --span 1m \
  --exec '
    if meta.span_status == "late" {
      eprint("⚠️  Late: " + e.ts + " → " + meta.span_id);
    }
    if meta.span_status == "included" {
      track_count("hits");
    }
  ' \
  --span-close '
    let metrics = span.metrics;
    let hits = if metrics.contains("hits") { metrics["hits"] } else { 0 };
    emit_each([#{start: span.start, end: span.end, hits: hits}]);
  '

Time spans, emit histogram per minute

kelora -j --span 1m \
  --exec 'track_bucket("status", e.status.to_int())' \
  --span-close '
    let metrics = span.metrics;
    let hist = if metrics.contains("status") { metrics["status"] } else { [] };
    emit_each([#{start: span.start, end: span.end, status_hist: hist}]);
  '

Running total across spans

kelora -j --span 500 \
  --exec 'track_count("hits")' \
  --span-close '
    let metrics = span.metrics;
    let span_hits = if metrics.contains("hits") { metrics["hits"] } else { 0 };
    track_sum("total", span_hits); // add just this span's delta
    let cumulative = if metrics.contains("total") { metrics["total"] } else { 0 };
    emit_each([#{
      span: span.id,
      span_hits: span_hits,
      total: cumulative
    }]);
  '


⸻

Help text additions (concise)

--span <N|DUR>         Form non-overlapping spans by count (N) or time (DUR: 500ms, 2s, 1m, 1h).
                        Sequential only; forces sequential mode if --parallel is set.
                        Time mode uses event ts; spans are [start, end). Late events never mutate closed spans.

--span-close <RHAI>    Run once whenever a span closes.
                        Available: span.id, span.start, span.end, span.events, span.size, span.metrics.
                        Metrics: span.metrics (per span, read-only), metrics dict (global, read-only).
                        Each event carries meta.span_status, meta.span_id, meta.span_start, meta.span_end.


⸻

Design philosophy

	1.	Arrival order is truth: Spans form based on when events arrive, not when they claim to have occurred. Late events never mutate history.

	2.	Streaming by default: Events flow through immediately with metadata assigned. Buffering is internal and minimal (current span only).

	3.	Explicit over implicit: Event disposition is visible in meta.span_status. No silent drops or state mutations.

	4.	Fail-safe defaults: Missing timestamps mark events as unassigned but don't halt processing. Run with --strict if you prefer fail-fast timestamp handling.

	5.	Single responsibility: Spans aggregate; filters filter. A filtered event in time mode still advances the clock (deterministic boundaries).

	6.	Two knobs only: Minimal surface area keeps the feature legible and composable.

⸻

Implementation notes

Edge cases & clarifications:

1. Timestamp comparison precision
	•	Use integer milliseconds (or nanoseconds) for all timestamp comparisons internally.
	•	Avoid floating-point comparisons to prevent boundary detection bugs.
	•	Events with ts at exact boundary (e.g., 12:04:00.000 when span ends at 12:04:00) belong to the next span per half-open interval semantics.

2. Filter interaction with boundaries (time-based)
	•	All events (including those failing --filter) participate in span boundary detection.
	•	Filtered-out events advance the span clock but are not buffered.
	•	This ensures deterministic span boundaries regardless of filter logic.

3. First event anchoring (time-based)
	•	Only the first event with a valid ts anchors the time alignment.
	•	Events with missing/invalid ts are skipped during anchor selection.
	•	Once anchored, all subsequent boundary calculations are deterministic.

4. Large count spans
	•	Count mode buffers all N events in memory until close.
	•	For N > 100,000, warn about potential OOM risk.
	•	Recommendation: use time-based mode for high-volume streams.

5. Signal handling during close
	•	First SIGINT/SIGTERM during --span-close: defer signal until script completes.
	•	This ensures span summaries are not corrupted by interruption.
	•	Second SIGINT/SIGTERM within 2 seconds: force immediate exit (code 130/143).
	•	Recommendation: Print "Received signal, waiting for span close... (Ctrl+C again to force quit)" on first signal.

6. Empty time spans
	•	Time gaps between events do NOT produce synthetic empty spans.
	•	Example: --span 1m with events at 12:00 and 12:05 → only 2 spans emitted.
	•	Spans can still close with span.size == 0 if every event in the interval was filtered or marked unassigned; --span-close still runs so scripts can emit zeros or diagnostics.
	•	Users expecting regular time series should generate synthetic events upstream.

7. Metrics access patterns
	•	track_* functions operate globally; span-level diffs are derived automatically.
	•	Read span.metrics during --span-close for per-span summaries (e.g., `let metrics = span.metrics; let hits = metrics["hits"];`). The map empties after each close.
	•	Global metrics remain available via metrics[...] and feed the final --metrics report.
	•	No manual reset step is required; span metrics never accumulate across spans.

⸻

Troubleshooting

Common issues and solutions:

Q: Why is span.size always 0?
A: Check if events are failing --filter. Filtered events (span_status="filtered") don't enter the buffer. Only "included" events count.

Q: Late events not appearing in output?
A: They are emitted (check span_status="late" in output) but not in span.events. Late events never enter closed spans.

Q: Metrics growing unexpectedly across spans?
A: track_* updates the global metrics map. Use span.metrics to read the per-span delta; it clears automatically after each close. Global metrics continue to grow until you intentionally track them to a new value.

Q: First span seems wrong or partial?
A: First event with valid ts anchors time alignment. If first event is mid-interval (e.g., 12:03:27 with --span 1m), first span is [12:03:00, 12:04:00). Pre-sort logs by timestamp for accuracy.

Q: Getting "unassigned" events in time mode?
A: Events have missing or invalid ts field. Check timestamp parsing. Run with --strict to turn them into hard errors, or fix upstream data.

Q: Span boundaries seem inconsistent?
A: In time mode, all events with valid ts advance the clock, even filtered ones. This ensures deterministic boundaries regardless of filter logic.

⸻

Testing recommendations

Critical edge cases to cover in integration tests:

1. End-of-stream handling
	•	Partial span at EOF is closed and --span-close runs.
	•	SIGINT/SIGTERM mid-stream: current span closes gracefully.

2. Events without timestamps (time-based)
	•	Missing ts: span_status="unassigned", not buffered, but emitted.
	•	Invalid ts format: same handling as missing.
	•	First event has missing ts: next valid ts anchors alignment.

3. Boundary precision
	•	Events at exact boundary time (e.g., 12:04:00.000) belong to next span.
	•	Multiple events with identical timestamps stay in same span.
	•	Millisecond-level precision maintained throughout.

4. Filter interactions
	•	Count mode: --filter affects span size (only events that pass every filter count toward N).
	•	Time mode: --filter doesn't affect boundary detection, but filtered events not buffered.
	•	Late events still pass through --filter.

5. Empty spans
	•	Time gap produces no synthetic spans (e.g., 12:00 → 12:05 with --span 1m = 2 spans, not 6).
	•	First span after anchor may be partial if first event is mid-interval.

6. Late events
	•	Event arrives after span closed: span_status="late", not buffered.
	•	Late event still runs through --exec and --filter.
	•	late_events counter increments.

7. Large counts
	•	--span 100000 triggers warning but works.
	•	Memory usage scales linearly with N in count mode.

8. Interaction with other flags
	•	--take N: closes current span before stopping.
	•	--since/--until: spans form over filtered stream.
	•	--window N: orthogonal; window helpers work in per-event context.

9. Signal handling
	•	SIGINT during --span-close: script completes before exit.
	•	SIGINT during per-event --exec: current span closes gracefully.

10. Metrics access operations
	•	span.metrics: contains only the metrics recorded during the current span; cleared after use.
	•	metrics dict or metrics[key]: read cumulative values without side effects.
	•	Span metrics do not require manual reset and never pollute later spans.