kelora 0.6.3

A command-line log analysis tool with embedded Rhai scripting
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
# Kelora

<p align="center">
  <img src="kelora-logo.svg" alt="Kelora Logo" width="300">
</p>

Scriptable log processor for the command line. Treats logs as structured events and lets you filter, transform, and analyze them using embedded [Rhai](https://rhai.rs) scripts with 40+ built-in functions.

> [!WARNING]
> Experimental tool. [Vibe-coded]https://en.wikipedia.org/wiki/Vibe_coding. APIs and behaviour may change without notice.

## Table of Contents
- [Overview]#overview
- [Getting Started]#getting-started
  - [First Commands]#first-commands
  - [Installation]#installation
- [Everyday Use]#everyday-use
  - [Core Concepts]#core-concepts
  - [CLI Feature Tour]#cli-feature-tour
  - [Example Pipelines]#example-pipelines
- [Deep Dives]#deep-dives
  - [Parsers & Formats]#parsers--formats
  - [Rhai Building Blocks]#rhai-building-blocks
  - [Configuration & Defaults]#configuration--defaults
  - [Troubleshooting]#troubleshooting
- [Reference]#reference
  - [Documentation Shortcuts]#documentation-shortcuts
  - [When to Reach for Kelora]#when-to-reach-for-kelora
  - [License]#license

## Overview

Kelora parses log streams into structured events and runs them through a programmable pipeline powered by Rhai scripting.

- Turns lines into structured events you can access as `e.field` or `e["field-name"]`.
- Embeds 40+ built-in Rhai helpers for parsing, enrichment, metrics, and windowed analysis.
- Speaks JSON, logfmt, syslog, CSV/TSV, column specs, and gzip-compressed inputs out of the box.
- Handles streaming or batch workloads with sequential and `--parallel` execution modes.
- Emits metrics and processing stats so you can observe pipelines while they run.

## Getting Started

### First Commands

```bash
# Filter error-level events from the logfmt sample
kelora -f logfmt --level error examples/simple_logfmt.log

# Focus on database events and surface slow queries
kelora -j examples/simple_json.jsonl \
  --filter 'e.service == "database"' \
  --exec 'e.duration_s = e.get_path("duration_ms", 0) / 1000' \
  --keys timestamp,message,duration_s

# Parse Apache/Nginx access logs, keep key fields, and surface stats
kelora -f combined examples/web_access_large.log.gz \
  --keys ip,status,request_time,request \
  --stats

# Show login-related events with surrounding context (like grep -A/-B/-C)
kelora -j examples/simple_json.jsonl \
  --filter 'e.message.contains("login")' \
  --after-context 2 --before-context 1
```

More quick commands to copy-paste:

- Stream-level error watch: `tail -f examples/simple_json.jsonl | kelora -j --level warn,error --exec 'track_count(e.service)' --metrics`
- Fan out nested arrays: `kelora -j examples/json_arrays.jsonl --exec 'emit_each(e.get_path(\"users\", []))' --keys id,name,score`
- Alias sensitive fields: `kelora -j examples/security_audit.jsonl --exec 'e.user_alias = pseudonym(e.user, \"users\"); e.ip_masked = e.ip.mask_ip(1)' --keys timestamp,event,user_alias,ip_masked`

> [!TIP]
> The sample logs in `examples/` map to the categories in [examples/README.md]examples/README.md#file-categories. Start there before pointing Kelora at production data. Need a fast reminder of the core flags? Run `kelora --help-quick`.

### Installation

#### Prebuilt binaries (recommended)

1. Download the archive for your platform from the [GitHub releases]https://github.com/dloss/kelora/releases page (macOS, Linux, and Windows builds are provided).
2. Unpack the archive and move `kelora` or `kelora.exe` somewhere on your `PATH`.
3. Run `kelora --help` to ensure the binary starts up correctly.

If you prefer to manage the build yourself, Kelora targets stable Rust; keep your toolchain fresh (`rustup update`) for best results.

#### Install from crates.io or source

```bash
# Install from crates.io
cargo install kelora

# Build from source
git clone https://github.com/dloss/kelora
cd kelora
cargo install --path .
```





## Everyday Use

### Core Concepts

- **Events** - Every log line becomes a structured map. Fields are accessible as `e.level`, `e["user-agent"]`, or via helper functions like `e.get_path("payload.id")`.
- **Pipeline** - Logs flow through `Input -> Parse -> Filter -> Transform -> Format -> Output`. Mix and match parsers, filters, exec scripts, and formatters freely.
- **Scripts** - Rhai expressions drive filtering (`--filter`), transformations (`--exec`/`--exec-file`), initialization (`--begin`), and teardown (`--end`).
- **Windows** - `--window N` exposes recent events for sequence detection via helpers like `window_values("field")`.

| Stage | Key switches | Purpose |
| --- | --- | --- |
| Input | `kelora [FILES]`, `--parallel`, `--file-order` | Select sources and decide sequential vs. parallel ingestion |
| Parse | `-f/--input-format`, `--extract-prefix`, `-M/--multiline` | Turn raw text into structured events |
| Filter | `--filter`, `--level`, `--since/--until` | Keep only the events you care about |
| Transform | `-e/--exec`, `--begin`, `--window` | Enrich, fan out, and compute stateful metrics |
| Format | `-F/--output-format`, `-k/--keys`, `--stats` | Control output shape and statistics |

### CLI Feature Tour

#### Input & Parsing

- `-f, --input-format <FORMAT>` selects a parser (`json`, `logfmt`, `syslog`, `combined`, `cols:<spec>`, ...); `-j` is shorthand for JSON.
- `--file-order {cli|name|mtime}` controls multi-file processing order.
- `--skip-lines`, `--keep-lines`, and `--ignore-lines` filter raw input before parsing.
- `-M, --multiline <strategy>` enables multi-line reconstruction (see [Multiline Strategies]#multiline-strategies).
- `--extract-prefix FIELD` + `--prefix-sep SEP` slices service prefixes before parsing.
- `--cols-sep SEP` provides a custom delimiter when using column specs.

#### Filtering & Selection

- `--filter 'expression'` runs boolean Rhai expressions; chain multiple occurrences.
- `--level` and `--exclude-levels` gate standard log levels.
- `--since`, `--until`, and `--take` trim by time range or limit output volume.
- `-A/--after-context`, `-B/--before-context`, and `-C/--context` show surrounding lines around matches (requires filtering).
- `--keep-lines`/`--ignore-lines` pair well with `--strict` to enforce hygiene.
- `--allow-fs-writes` enables Rhai file helpers (`mkdir`, `truncate_file`, `append_file`) so scripts can shard or persist results; without it these functions return `false` and perform no I/O.

#### Transformations & State

- `-e/--exec` and `-E/--exec-file` mutate events (`e`), emit new ones (`emit_each`), or track metrics.
- `--begin` seeds global read-only configuration through the `conf` map; `--end` performs final aggregation or reporting.
- `--window N` exposes `window` helpers for sliding analyses; combine with `window_values(...)`, `window_numbers(...)`, or custom loops.
- Default mode is resilient—errors skip the offending event. Use `--strict` for fail-fast pipelines.
- Dial verbosity with `-v` / `-vvv` for debugging or `-q` / `-qqq` for quiet pipelines.
- `-I/--include` prepends Rhai files to define reusable functions for `--exec`, `--begin`, and `--end` stages.

#### Output & Reporting

- `-F/--output-format` controls formatting (`default`, `json`, `logfmt`, `inspect`, `levelmap`, CSV/TSV variants, or `none`). `-J` is shorthand for JSON.
- `-k/--keys` and `-K/--exclude-keys` shape the output payload; `-c` keeps only core fields; `-b` switches to brief mode.
- `--convert-ts` converts timestamp fields to RFC3339 format (modifies event data); `-z/--show-ts-local` and `-Z/--show-ts-utc` format display timestamps (default formatter only).
- `--stats`, `--metrics`, and `-S/--stats-only` expose processing telemetry; `--metrics-file` writes JSON metrics to disk.
- `--mark-gaps` inserts visual separators when time jumps exceed a duration; `--no-emoji` disables emoji prefixes.
- Context lines are marked with visual prefixes: `*` for matches, `/` for before-context, `\` for after-context.

#### Performance & Reliability

- `--parallel` and `--threads` control concurrency; pair with `--batch-size` (lines per worker batch, default 1000) and `--batch-timeout` (max ms to flush a partial batch, default 200) to balance parallel throughput and latency.
- `--unordered` relaxes output ordering for faster parallel flushes.
- Sequential mode (default) shines for streaming sources; `--parallel` excels at log archives.
- Combine with `--config-file` or aliases for repeatable pipelines at scale.

## Deep Dives

### Parsers & Formats

Kelora defaults to `-f line`, which trims trailing newline/CR characters and exposes the result as `e.line`. Reach for `-f raw` when you need a byte-for-byte copy (including trailing newlines or escape markers), and reserve `-f 'cols:<spec>'` for bespoke formats that the built-in parsers do not cover. All parsers expect UTF-8 text; binary or other encodings will raise input errors.

| Format | Fields Produced | Typical Source |
| --- | --- | --- |
| `line` (default) | `line` | Newline-delimited text where trimming the trailing newline is acceptable |
| `raw` | `raw` | Exact text preservation (newline-sensitive data, continuation markers, binary artifacts) |
| `json` | Original JSON keys | JSONL or JSON arrays |
| `logfmt` | Key-value pairs | Logfmt structured logs |
| `syslog` | `timestamp`, `host`, `facility`, `message`, ... | RFC3164/RFC5424 syslog |
| `cef` | Header fields + extension map | ArcSight/Common Event Format |
| `csv` / `tsv` | Column headers as fields (strings by default) | Delimited datasets |
| `csv:<spec>` / `tsv:<spec>` | Column headers with type conversions | Typed CSV/TSV data |
| `combined` | `ip`, `status`, `method`, `path`, `request`, `request_time`, ... | Apache/Nginx access logs |
| `cols:<spec>` | Named fields defined by your spec (`ts`, `level`, `*rest`, ...) | Custom or proprietary log formats |

All parsers auto-detect gzip compression (files and stdin) by magic bytes—no extra flags required.

#### Format Recipes

##### Raw vs Line

Choose the right baseline for text pre-processing. `-f line` is the default: it trims the trailing newline/CR and gives you a tidy `e.line` field for downstream filters. `-f raw` keeps every byte (including trailing delimiters and escape markers) in `e.raw`, which is invaluable when you need to preserve continuation characters, feed the data into another parser verbatim, or re-emit the original payload.

```bash
# Preserve every byte (newline-sensitive analyses)
kelora -f raw examples/simple_line.log \
  --exec 'e.byte_len = e.raw.len()'

# Treat each line as plain text for simple filtering
kelora -f line examples/simple_line.log \
  --filter 'e.line.contains("ERROR")'
```

##### Prefix Extraction

Strip infrastructure prefixes before parsing structured payloads.

```bash
cat examples/prefix_docker.log | \
  kelora --extract-prefix container --prefix-sep " | " --filter 'e.container == "web_1"'
```

##### Type Annotations

Type annotations allow you to convert string fields to specific types during parsing. Supported for CSV, TSV, and cols formats.

```bash
# CSV with typed columns (status, bytes, and duration_ms as integers)
kelora -f "csv status:int bytes:int duration_ms:int" examples/simple_csv.csv

# TSV with type conversions - space-separated field specs
kelora -f "tsv: user_id:int success:bool" examples/simple_tsv.tsv

# Cols format with multi-column capture
kelora -f "cols:email ts status ip" examples/cols_mixed.log

# Cols with count specifiers and custom separator
kelora -f "cols:date(2) level *msg:string" examples/cols_fixed.log

# Strict mode: fail on conversion errors instead of falling back to strings
kelora -f "csv status:int" --strict examples/errors_csv_ragged.csv

# Works in pipes and with compressed data
gzip -dc examples/web_access_large.log.gz | kelora -f combined --stats
```

**Supported types:**
- `int` - Convert to 64-bit integer
- `float` - Convert to 64-bit float
- `bool` - Convert to boolean (recognizes `true/false`, `yes/no`, `1/0`, case-insensitive)
- (no annotation) - Keep as string (default)

**Error handling:**
- Resilient mode (default): Invalid conversions fall back to original string value
- Strict mode (`--strict`): Invalid conversions abort processing with error

**Format availability:**
- CSV/TSV: Type annotations work with headers (`csv`, `tsv`). Headerless formats (`csvnh`, `tsvnh`) use auto-generated column names (c1, c2, ...) and don't support type annotations.
- Cols: Type annotations can be embedded directly in the column spec (e.g., `status:int`, `ts(2):string`, `*msg:bool`)

##### Column Specs (`cols:<spec>`)

Declarative column parsing with skips, joins, and tail captures. This mode shines when your data has a repeatable column layout but no dedicated parser—think bespoke appliance logs, legacy flat files, or regex capture groups you want to map into fields.

Spec tokens are space-separated:

- `field` — consume one column into `field`.
- `field(n)` — consume `n ≥ 2` columns, joined together (whitespace joins in default mode, literal separator joins when you set one).
- `-` / `-(n)` — skip one or `n` columns with no output field.
- `*field` — capture the remaining text verbatim; must appear once and always at the end.

Whitespace is the default separator; add `--cols-sep "|"` (or another literal) when your columns are delimited. You can also feed pre-split arrays to `parse_cols` from Rhai (`let caps = e.line.extract_all_re(...); caps.parse_cols("ip user ts *msg");`). Missing data fills fields with `()` in resilient mode, while `--strict` turns shortages/extras into errors.

```bash
echo "2025-09-22 12:33:44 INFO alice login success" | \
  kelora -f 'cols:date time level user *message' \
    --exec 'e.timestamp = (e.date + " " + e.time).to_datetime()' \
    --keys timestamp,level,user,message
```

### Rhai Building Blocks

Kelora exposes the full Rhai language plus domain-specific helpers.

- **Text & parsing** - `extract_re`, `parse_logfmt`, `parse_cols`, `mask_ip`, `encode_*` / `decode_*`.
- **Arrays & maps** - `sorted`, `sorted_by`, `array.flatten`, `map.get_path`, `map.flatten`, `emit_each`.
- **Hashing & pseudonymization** - `bucket` (fast sampling), `hash` (multi-algo), `pseudonym` (short IDs).
- **Metrics** - `track_count`, `track_avg`, `track_bucket`, `track_unique` power `--metrics` and `--end` reports.
- **Datetime** - `to_datetime`, `to_duration`, `now_utc`, formatting helpers, arithmetic.
- **Environment & control** - `get_env`, `read_file`, `read_lines`, `exit`.

Example pipeline with shared configuration and sliding logic:

```bash
kelora -j examples/simple_json.jsonl \
  --begin 'conf.error_levels = ["ERROR", "FATAL"]; conf.retry_threshold = 2' \
  --filter 'conf.error_levels.contains(e.level)' \
  --exec 'if e.get_path("retry", 0) >= conf.retry_threshold { track_count("retries"); }' \
  --window 1 \
  --exec 'let comps = window_values("component"); if comps.len() > 1 && comps[0] == comps[1] { e.context = "repeat_component"; }' \
  --metrics
```

See `kelora --help-rhai` for syntax essentials and `kelora --help-functions` for the complete catalog.

### Multiline Strategies

Kelora now offers four explicit multiline modes:

- `timestamp` — detect leading timestamps using the adaptive parser. Add
  `:format=<chrono>` when you need to seed a specific layout (`%b %e %H:%M:%S`, etc.).
- `indent` — treat any line that begins with indentation as a continuation of the
  current event.
- `regex:match=<REGEX>[:end=<REGEX>]` — provide your own record headers (and
  optional terminators) when you need full control.
- `all` — buffer the entire input as a single event when you already have
  pre-chunked payloads.

The option stays off unless you pass `-M/--multiline`. Detection always runs before
parsing, so pair the strategy with the input format you expect (for example, `-f raw`
before handing events to a JSON parser). Buffering continues until the next detected
start (or end regex) arrives; if you run with `--parallel`, tune `--batch-size` or
`--batch-timeout` to keep memory bounded. Remember that `--multiline all` keeps the
entire stream in memory until it flushes.

### Configuration & Defaults

Define repeatable pipelines in `~/.config/kelora/config.ini`:

```ini
# Defaults applied to every run
defaults = --stats --parallel --input-tz UTC

[aliases]
errors = --level error --since 1h --stats
warnings = --filter 'e.level == "WARN" || e.level == "WARNING"'
slow-queries = --filter 'e.duration > 1000' --exec 'e.slow = true' --keys timestamp,query,duration
```

Usage:

```bash
kelora examples/simple_json.jsonl            # Uses defaults
kelora --config-file custom.ini examples/simple_line.log  # Swap configuration files
kelora --no-stats examples/simple_line.log                # Override a default
kelora -a errors examples/simple_json.jsonl  # Run the alias
kelora --show-config                        # Inspect the merged configuration
kelora --level error --stats --save-alias errors # Save current command as alias
```

Pair configs with `--ignore-config` for hermetic runs or CI pipelines.

### Example Pipelines

```bash
# Monitor access logs for server/client errors and count them
kelora -f combined examples/web_access_large.log.gz \
  --exec 'let status = e.status.to_int(); e.class = if status >= 500 { "server_error" } else if status >= 400 { "client_error" } else { "ok" };' \
  --filter 'e.class != "ok"' \
  --exec 'track_count("errors"); if e.class == "server_error" { eprint("🚨 " + e.status + " " + e.request); }' \
  --metrics

# Authentication watch with sliding windows and unique counters
kelora -f syslog examples/simple_syslog.log \
  --filter '"msg" in e && e.msg.contains("Failed login")' \
  --window 4 \
  --exec 'let count = 0; for msg in window_values("msg") { if msg.contains("Failed login") { count += 1; } } if count >= 3 { eprint("🚨 repeated failures: " + e.msg.extract_ip()); track_unique("alert_ips", e.msg.extract_ip()); }' \
  --metrics

# Convert syslog to structured JSON and redact sensitive fields
kelora -f syslog examples/simple_syslog.log \
  --exec 'e.severity_label = if e.severity <= 3 { "critical" } else if e.severity <= 4 { "error" } else { "info" }; e.host = e.host.mask_ip(1);' \
  -J

# Anonymize sensitive fields while keeping sessions linkable
kelora -j examples/security_audit.jsonl \
  --exec 'e.user_alias = pseudonym(e.user, "users"); e.ip_masked = e.ip.mask_ip(1)' \
  --filter 'e.event == "login"' \
  --keys timestamp,event,user_alias,ip_masked
```

### Advanced Pipelines

```bash
# Track slow requests and bucket them by severity
kelora -f logfmt examples/simple_logfmt.log \
  --filter '"duration" in e && e.duration.to_int_or(0) >= 100' \
  --exec 'track_count("slow_requests"); e.slow_bucket = if e.duration.to_int_or(0) >= 2000 { "very_slow" } else { "slow" }' \
  --metrics

# Expand nested arrays into individual events
kelora -j examples/json_arrays.jsonl \
  --exec 'emit_each(e.get_path("users", []))' \
  --keys id,name,score

# Rolling average over windowed metrics
kelora -j examples/window_metrics.jsonl \
  --window 5 \
  --metrics \
  --exec 'let values = window_numbers("value"); if values.len() == 5 { let total = 0.0; for v in values { total += v; } e.moving_avg = total / values.len(); }'
```

### Troubleshooting

- **No events printed**: run with `--verbose` to surface Rhai errors, or temporarily drop filters (`--filter`) to confirm parsing succeeds.
- **Timestamp parsing failures**: confirm the field name via `-F json`, then add `--ts-field`/`--ts-format` from [`kelora --help-time`]#documentation-shortcuts. In resilient mode failed timestamps are dropped silently; add `--strict` to fail fast.
- **Rhai script panics**: wrap risky lookups with helpers like `e.get_path("field", ())` or use `to_int_or` to coerce strings safely. `kelora --help-rhai` documents the available guards.
- **Performance dips with `--parallel`**: trim `--window` sizes, avoid heavy per-event printing, and tune `--batch-size`/`--batch-timeout` as described in [Performance & Reliability]#cli-feature-tour.
- **Unexpected empty fields**: inspect raw input using `-F inspect` or `-F logfmt` to ensure the parser chosen in `-f/--input-format` matches the data.

### Learning Path

1. **Events** - Practice accessing and mutating `e.field` values on JSON or logfmt samples.
2. **Parsing** - Try multiple formats (`-f json`, `-f combined`, `-f 'cols:...'`) and experiment with `--extract-prefix`.
3. **Basic Scripts** - Layer `--filter`, `--exec`, and `--keys` for simple transformations.
4. **Metrics & Stats** - Introduce `track_count`, `track_avg`, `--metrics`, and `--stats`.
5. **Pipelines** - Chain multiple `--filter`/`--exec` stages, add `--begin` configuration, and export results.
6. **Output Control** - Switch between `-F` modes, apply `-k`/`-K`, and format timestamps.
7. **Windows** - Explore `--window`, `window_values`, and `window_numbers` for sequence detection.
8. **Multi-stage Workloads** - Combine `--parallel`, config aliases, and `--end` summarization for production-style jobs.

Each milestone builds on the previous one; you can be productive early, then layer in advanced concepts as needed.

> [!TIP]
> See [examples/README.md]examples/README.md for 37 working examples covering all formats and complexity levels—from basic filtering to advanced pipelines.

## Reference

### Documentation Shortcuts

```bash
kelora --help-quick      # One-screen cheat sheet of common flags and recipes
kelora --help            # Full CLI reference grouped by stage
kelora --help-rhai       # Rhai syntax and scripting patterns
kelora --help-functions  # Built-in functions grouped by domain
kelora --help-multiline  # Multiline strategy reference
kelora --help-time       # Timestamp parsing and formatting guide
```

### When to Reach for Kelora

- **Use Kelora when** you need programmable filtering, enrichment, aggregation, or custom metrics directly in the terminal.
- **Pair with other tools**: pipe results into `jq`, `ripgrep`, or `lnav`—Kelora focuses on transformation, not visualization.
- **Prefer other tools when** you need interactive browsing (`lnav`), raw text search (`ripgrep`), heavy-duty JSON querying (`jq`), dashboards (`Grafana`), or centralized log shipping (`Fluentd`).

See also: [angle-grinder](https://github.com/rcoh/angle-grinder), [pq](https://github.com/iximiuz/pq), [Miller](https://github.com/johnkerl/miller).

### License

[MIT](LICENSE) - see the license file for full details.