Kelora Span Spec (tumbling, sequential)
Intent
Provide non-overlapping aggregation over a stream with a minimal, deterministic UX. Spans are batches formed by count or event time. Per-event scripts still run; a hook runs once per span.
⸻
CLI
• --span <N> → count-based: close after every N events that pass --filter (if specified). The count applies to the filtered event stream.
• --span <DURATION> → time-based: close on fixed time boundaries derived from the events' timestamp field (see "Time source").
• --span-close → Rhai snippet executed once when a span closes. Span helpers are available only here.
Sequential only. If --parallel is supplied, Kelora prints a warning and runs sequentially.
⸻
Time source & alignment (time-based mode)
• Time is taken from the event's ts (Kelora's canonical timestamp field).
• First event with valid ts anchors the cadence to absolute boundaries:
• Compute the boundary period containing the first event, aligned to the duration.
• Example: first ts = 12:03:27 with --span 1m → first span is [12:03:00, 12:04:00), then [12:04:00, 12:05:00), etc.
• Events with missing/invalid ts do not affect anchor selection.
• Intervals are half-open: [start, end).
• Implementation note: Use integer milliseconds for timestamp comparisons to avoid floating-point precision issues.
Missing/invalid ts:
• Strict mode (`--strict`): event is an error (fail-fast).
• Resilient (default): event is processed normally (--filter and --exec run) but excluded from time-span aggregation. It is NOT added to any span buffer and does NOT appear in span.events or contribute to span.size. Metadata assigned: meta.span_status = "unassigned", meta.span_id = null, meta.span_start and meta.span_end are null.
⸻
Late events (time-based)
Definitions assume duration_ms is the configured span length (milliseconds) and anchor_start_ms is the start of the first window (derived from the first valid timestamp).
An event is late iff its ts falls into a span that has already been closed, including windows that chronologically precede the anchor event.
Policy (no flags):
1. We do not reopen or mutate closed spans.
2. Event is tagged:
• meta.span_status = "late"
• meta.span_id = "<start ISO8601>/<duration>" of the closed span it would've belonged to.
• meta.span_start, meta.span_end reflect that window's bounds.
• Window bounds use integer math to avoid float precision bugs:
• delta_ms = event_ts_ms - anchor_start_ms
• k = delta_ms.div_euclid(duration_ms) // floor division handles negatives
• window_start_ms = anchor_start_ms + k * duration_ms
• window_end_ms = window_start_ms + duration_ms
3. Per-event scripts (--exec, --filter) still run.
4. Late events are emitted to output (unless suppressed by --filter or --exec).
5. Internal counter late_events increments (visible in --stats).
If an event belongs to the currently open span, it's included and meta.span_status = "included".
Count-based mode has no late events (spans follow arrival order).
Note: For accurate time-based aggregation, users should pre-sort logs by timestamp (e.g., `sort -k <timestamp-field>`).
⸻
Span context (Rhai) — available only during --span-close
Note: Window helpers (window_events(), window_size(), etc.) are NOT available in --span-close context. Use the span object exclusively.
The hook fires after the event that closed the span finishes all per-event stages, so span.events only contains events that survived the pipeline. The hook still runs even if span.size == 0 (e.g., every event was filtered or marked unassigned); scripts decide whether to emit anything.
Read-only span object:
• span.start → DateTime (start bound; () for count spans)
• span.end → DateTime (end bound; () for count spans)
• span.id → String
• Time: "{ISO_START}/{DURATION}" (e.g., 2025-10-15T12:03:00Z/1m)
• Count: "#<index>" (0-based)
• span.events → Array of events in the span, in arrival order
• span.size → Int (event count)
• span.metrics → Map snapshot of metrics recorded during the span
Metadata (assigned during per-event processing, available in --exec, --filter, and --span-close):
• meta.span_status (String: "included" | "late" | "unassigned" | "filtered")
• meta.span_id (String or null)
• meta.span_start (DateTime or null)
• meta.span_end (DateTime or null)
Status values:
• "included" → Event is buffered in current span (normal case)
• "late" → Valid ts but arrived after its span closed (time mode only)
• "unassigned" → Missing/invalid ts (time mode only)
• "filtered" → Failed --filter, not buffered or emitted
⸻
Processor semantics
1. Per-event phase (always):
• Before the first user-provided stage runs, compute span alignment for the event and assign provisional metadata: meta.span_id/span_start/span_end plus meta.span_status = "included", "late", or "unassigned". This metadata is visible to every --exec/--filter stage.
• Run CLI-ordered --exec/--filter stages. Each stage can inspect meta.span_*.
• If a --filter returns false, immediately set meta.span_status = "filtered", skip any later stages, and do not emit or buffer the event.
• When the event exits the per-event pipeline without being filtered, emit it normally. If meta.span_status is still "included", append it to the current span buffer for span.events; "late" and "unassigned" events continue to flow but never enter the buffer.
2. Boundary detection:
• Count: when N events with status "included" have accumulated → close.
• Time: when an event's ts crosses into a newer interval → close the current span.
• Filtered events: in count mode, don't count toward N; in time mode, advance boundaries but aren't buffered.
3. Close phase:
• After the event that triggered the boundary finishes the per-event pipeline, run --span-close once with the span object; access buffered events via span.events. span.size may be 0 if every event was filtered or marked unassigned.
• span.metrics snapshots tracked values for the span; the snapshot is cleared after --span-close finishes.
• --span-close can emit span-level summaries via emit_each().
• Reset span state and start the next span.
4. End of input or interrupt signal (SIGINT/SIGTERM):
• If a span is open with buffered events, close it and run --span-close before termination.
• First SIGINT/SIGTERM during --span-close: defer signal until script completes.
• Second SIGINT/SIGTERM within 2 seconds: force immediate exit (code 130/143).
No synthetic empty spans are emitted (gaps with no events produce no span).
Example: With --span 1m and events at 12:00, 12:05, only the two spans containing events are emitted (not 12:01, 12:02, 12:03, 12:04).
⸻
Interactions
• --window N (sliding) is unchanged and orthogonal. If both are provided, --span still governs close hooks; window helpers reflect the sliding window within each per-event execution.
• --take limits overall output. If the take limit is reached mid-span, the current span is closed and --span-close runs before termination.
• --since/--until filter which events enter the stream; spans form over what remains.
• --stats shows:
• total_spans_closed (number of spans that closed)
• avg_events_per_span (mean span size)
• late_events (time-based only; count of events with meta.span_status = "late")
• unassigned_events (time-based only; count of events with meta.span_status = "unassigned")
• --metrics works as usual; use track_* in per-event or --span-close. See "Span metrics access" below for reading metrics.
⸻
Span metrics access
In --span-close context, two read-only maps are available:
• span.metrics → Metrics collected since the current span opened. The map is refreshed before --span-close runs and cleared afterward.
• metrics → Global metrics accumulated for the whole run (same as --end stage with --metrics enabled).
Patterns:
• Per-span aggregation: Use track_* in --exec to accumulate, then read values directly from span.metrics during --span-close (e.g., `let metrics = span.metrics; let hits = if metrics.contains("hits") { metrics["hits"] } else { 0 };`). The map is empty if the span produced no tracked values.
• Cross-span state: Read metrics[...] for cumulative totals and continue updating them with track_* if you need rolling aggregates across spans.
Both maps are immutable from Rhai scripts; functions such as metrics.pop() are unavailable. Span consumption never mutates the global metrics map, so the final --metrics report remains intact.
⸻
Filter interaction decision table
Event disposition by mode and condition:
┌─────────────────┬──────────────┬────────────────┬──────────────────┬─────────────────┐
│ Condition │ Counted for │ Buffered for │ Emitted to │ span_status │
│ │ Boundary? │ span.events? │ Output? │ │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ COUNT MODE │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Passes --filter │ Yes (N++) │ Yes │ Yes │ "included" │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Fails --filter │ No │ No │ No │ "filtered" │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ TIME MODE │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Valid ts, │ Yes (closes │ Yes │ Yes │ "included" │
│ passes filter │ if new │ │ │ │
│ │ interval) │ │ │ │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Valid ts, │ Yes (closes │ No │ No │ "filtered" │
│ fails filter │ if new │ │ │ │
│ │ interval) │ │ │ │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Missing/invalid │ No │ No │ Yes │ "unassigned" │
│ ts, passes │ │ │ │ │
│ filter │ │ │ │ │
├─────────────────┼──────────────┼────────────────┼──────────────────┼─────────────────┤
│ Late ts (after │ No │ No │ Yes │ "late" │
│ span closed), │ │ │ │ │
│ passes filter │ │ │ │ │
└─────────────────┴──────────────┴────────────────┴──────────────────┴─────────────────┘
Key principle: In time mode, all events with valid ts participate in boundary detection (deterministic span boundaries regardless of filter logic). Only "included" events are buffered.
⸻
Error handling
• Invalid --span argument → usage error (Exit 2).
• Accepted forms:
• Count: /^[1-9]\d*$/
• Duration: <int>(ms|s|m|h) (e.g., 500ms, 10s, 1m, 2h)
• Time-based with missing ts:
• Strict: error per event (Exit 1 if encountered).
• Resilient: mark as "unassigned" status; proceed.
• If --parallel is set: print a warning "--span forces sequential execution" and continue sequentially.
• If count N > 100,000: print a warning about potential memory usage. Consider breaking into smaller spans or using time-based mode.
⸻
Performance & memory
• Bounded memory: only the current span is buffered (plus normal pipeline state).
• Time-based: a late event never reopens prior spans, so memory does not grow with disorder.
• Count-based: buffer size ≤ N events.
⸻
Examples
Count spans, rolling averages
kelora -j --span 500 \
--exec 'track_sum("lat", e.latency.to_int())' \
--span-close '
let n = span.size;
let metrics = span.metrics;
let sum = if metrics.contains("lat") { metrics["lat"] } else { 0 };
emit_each([#{span: span.id, n: n, avg_latency: if n > 0 { sum / n } else { 0 }}]);
'
Time spans, log late arrivals
kelora -j --span 1m \
--exec '
if meta.span_status == "late" {
eprint("⚠️ Late: " + e.ts + " → " + meta.span_id);
}
if meta.span_status == "included" {
track_count("hits");
}
' \
--span-close '
let metrics = span.metrics;
let hits = if metrics.contains("hits") { metrics["hits"] } else { 0 };
emit_each([#{start: span.start, end: span.end, hits: hits}]);
'
Time spans, emit histogram per minute
kelora -j --span 1m \
--exec 'track_bucket("status", e.status.to_int())' \
--span-close '
let metrics = span.metrics;
let hist = if metrics.contains("status") { metrics["status"] } else { [] };
emit_each([#{start: span.start, end: span.end, status_hist: hist}]);
'
Running total across spans
kelora -j --span 500 \
--exec 'track_count("hits")' \
--span-close '
let metrics = span.metrics;
let span_hits = if metrics.contains("hits") { metrics["hits"] } else { 0 };
track_sum("total", span_hits); // add just this span's delta
let cumulative = if metrics.contains("total") { metrics["total"] } else { 0 };
emit_each([#{
span: span.id,
span_hits: span_hits,
total: cumulative
}]);
'
⸻
Help text additions (concise)
Time mode uses event ts; spans are [start, end). Late events never mutate closed spans.
--span-close <RHAI> Run once whenever a span closes.
Available: span.id, span.start, span.end, span.events, span.size, span.metrics.
Metrics: span.metrics (per span, read-only), metrics dict (global, read-only).
Each event carries meta.span_status, meta.span_id, meta.span_start, meta.span_end.
⸻
Design philosophy
1. Arrival order is truth: Spans form based on when events arrive, not when they claim to have occurred. Late events never mutate history.
2. Streaming by default: Events flow through immediately with metadata assigned. Buffering is internal and minimal (current span only).
3. Explicit over implicit: Event disposition is visible in meta.span_status. No silent drops or state mutations.
4. Fail-safe defaults: Missing timestamps mark events as unassigned but don't halt processing. Run with --strict if you prefer fail-fast timestamp handling.
5. Single responsibility: Spans aggregate; filters filter. A filtered event in time mode still advances the clock (deterministic boundaries).
6. Two knobs only: Minimal surface area keeps the feature legible and composable.
⸻
Implementation notes
Edge cases & clarifications:
1. Timestamp comparison precision
• Use integer milliseconds (or nanoseconds) for all timestamp comparisons internally.
• Avoid floating-point comparisons to prevent boundary detection bugs.
• Events with ts at exact boundary (e.g., 12:04:00.000 when span ends at 12:04:00) belong to the next span per half-open interval semantics.
2. Filter interaction with boundaries (time-based)
• All events (including those failing --filter) participate in span boundary detection.
• Filtered-out events advance the span clock but are not buffered.
• This ensures deterministic span boundaries regardless of filter logic.
3. First event anchoring (time-based)
• Only the first event with a valid ts anchors the time alignment.
• Events with missing/invalid ts are skipped during anchor selection.
• Once anchored, all subsequent boundary calculations are deterministic.
4. Large count spans
• Count mode buffers all N events in memory until close.
• For N > 100,000, warn about potential OOM risk.
• Recommendation: use time-based mode for high-volume streams.
5. Signal handling during close
• First SIGINT/SIGTERM during --span-close: defer signal until script completes.
• This ensures span summaries are not corrupted by interruption.
• Second SIGINT/SIGTERM within 2 seconds: force immediate exit (code 130/143).
• Recommendation: Print "Received signal, waiting for span close... (Ctrl+C again to force quit)" on first signal.
6. Empty time spans
• Time gaps between events do NOT produce synthetic empty spans.
• Example: --span 1m with events at 12:00 and 12:05 → only 2 spans emitted.
• Spans can still close with span.size == 0 if every event in the interval was filtered or marked unassigned; --span-close still runs so scripts can emit zeros or diagnostics.
• Users expecting regular time series should generate synthetic events upstream.
7. Metrics access patterns
• track_* functions operate globally; span-level diffs are derived automatically.
• Read span.metrics during --span-close for per-span summaries (e.g., `let metrics = span.metrics; let hits = metrics["hits"];`). The map empties after each close.
• Global metrics remain available via metrics[...] and feed the final --metrics report.
• No manual reset step is required; span metrics never accumulate across spans.
⸻
Troubleshooting
Common issues and solutions:
Q: Why is span.size always 0?
A: Check if events are failing --filter. Filtered events (span_status="filtered") don't enter the buffer. Only "included" events count.
Q: Late events not appearing in output?
A: They are emitted (check span_status="late" in output) but not in span.events. Late events never enter closed spans.
Q: Metrics growing unexpectedly across spans?
A: track_* updates the global metrics map. Use span.metrics to read the per-span delta; it clears automatically after each close. Global metrics continue to grow until you intentionally track them to a new value.
Q: First span seems wrong or partial?
A: First event with valid ts anchors time alignment. If first event is mid-interval (e.g., 12:03:27 with --span 1m), first span is [12:03:00, 12:04:00). Pre-sort logs by timestamp for accuracy.
Q: Getting "unassigned" events in time mode?
A: Events have missing or invalid ts field. Check timestamp parsing. Run with --strict to turn them into hard errors, or fix upstream data.
Q: Span boundaries seem inconsistent?
A: In time mode, all events with valid ts advance the clock, even filtered ones. This ensures deterministic boundaries regardless of filter logic.
⸻
Testing recommendations
Critical edge cases to cover in integration tests:
1. End-of-stream handling
• Partial span at EOF is closed and --span-close runs.
• SIGINT/SIGTERM mid-stream: current span closes gracefully.
2. Events without timestamps (time-based)
• Missing ts: span_status="unassigned", not buffered, but emitted.
• Invalid ts format: same handling as missing.
• First event has missing ts: next valid ts anchors alignment.
3. Boundary precision
• Events at exact boundary time (e.g., 12:04:00.000) belong to next span.
• Multiple events with identical timestamps stay in same span.
• Millisecond-level precision maintained throughout.
4. Filter interactions
• Count mode: --filter affects span size (only events that pass every filter count toward N).
• Time mode: --filter doesn't affect boundary detection, but filtered events not buffered.
• Late events still pass through --filter.
5. Empty spans
• Time gap produces no synthetic spans (e.g., 12:00 → 12:05 with --span 1m = 2 spans, not 6).
• First span after anchor may be partial if first event is mid-interval.
6. Late events
• Event arrives after span closed: span_status="late", not buffered.
• Late event still runs through --exec and --filter.
• late_events counter increments.
7. Large counts
• --span 100000 triggers warning but works.
• Memory usage scales linearly with N in count mode.
8. Interaction with other flags
• --take N: closes current span before stopping.
• --since/--until: spans form over filtered stream.
• --window N: orthogonal; window helpers work in per-event context.
9. Signal handling
• SIGINT during --span-close: script completes before exit.
• SIGINT during per-event --exec: current span closes gracefully.
10. Metrics access operations
• span.metrics: contains only the metrics recorded during the current span; cleared after use.
• metrics dict or metrics[key]: read cumulative values without side effects.
• Span metrics do not require manual reset and never pollute later spans.