kelora 0.6.1 - Docs.rs

# Scripting Stages

Deep dive into Kelora's three Rhai scripting stages: `--begin`, `--exec`, and `--end`.

## Overview

Kelora provides three scripting stages for transforming log data with Rhai scripts:

```
--begin  →  [Process Events]  →  --end
              ↓
           --exec (per event)
```

| Stage | Runs | Purpose | Access |
|-------|------|---------|--------|
| `--begin` | Once before processing | Initialize state, load data | `conf` map, file helpers |
| `--exec` | Once per event | Transform events | `e` (event), `conf`, tracking |
| `--end` | Once after processing | Summarize, report | `metrics`, `conf` |

## Begin Stage

### Purpose

The `--begin` stage runs **once** before any events are processed. Use it to:

- Initialize lookup tables
- Load reference data from files
- Set up shared configuration
- Prepare the `conf` map for use in other stages

### The `conf` Map

The global `conf` map is **read-write** in `--begin` and **read-only** in later stages.

```bash
> kelora -j \
    --begin 'conf.valid_users = ["alice", "bob", "charlie"]' \
    --exec 'e.is_valid = e.user in conf.valid_users' \
    app.log
```

### Available Helpers

Special functions available **only** in `--begin`:

#### `read_lines(path)`

Read file as array of strings (one per line, UTF-8).

```bash
> kelora -j \
    --begin 'conf.blocked_ips = read_lines("blocked.txt")' \
    --exec 'if e.ip in conf.blocked_ips { e = () }' \
    app.log
```

#### `read_file(path)`

Read entire file as single string (UTF-8).

```bash
> kelora -j \
    --begin 'conf.template = read_file("template.txt")' \
    --end 'print(conf.template.replace("{count}", metrics["total"].to_string()))' \
    app.log
```

#### `read_json(path)`

Parse JSON file (convenience helper).

```bash
> kelora -j \
    --begin 'conf.users = read_json("users.json")' \
    --exec 'e.user_name = conf.users.get(e.user_id, "unknown")' \
    app.log
```

### Examples

#### Load Lookup Table

```bash
> kelora -j \
    --begin 'conf.services = #{api: "API Gateway", db: "Database", cache: "Redis"}' \
    --exec 'e.service_name = conf.services.get(e.service, e.service)' \
    app.log
```

#### Load IP Geolocation Data

```bash
> kelora -j \
    --begin 'conf.ip_to_country = read_json("geoip.json")' \
    --exec 'e.country = conf.ip_to_country.get(e.ip, "unknown")' \
    app.log
```

#### Initialize Counters

```bash
> kelora -j \
    --begin 'conf.start_time = now_utc()' \
    --end 'let duration = now_utc() - conf.start_time; print("Processed in " + duration + "s")' \
    app.log
```

#### Load Configuration

```bash
> kelora -j \
    --begin 'conf.threshold = 1000; conf.alert_email = "ops@company.com"' \
    --exec 'if e.duration_ms > conf.threshold { eprint("⚠️  Slow request: " + e.path) }' \
    app.log
```

## Exec Stage

### Purpose

The `--exec` stage runs **once per event**. Use it to:

- Transform event fields
- Add computed fields
- Filter events (via `e = ()`)
- Track metrics
- Emit multiple events from arrays

### The Event Variable

The current event is available as `e`. Modifications to `e` persist through subsequent `--exec` scripts.

```bash
> kelora -j \
    --exec 'e.duration_s = e.duration_ms / 1000' \
    --exec 'e.is_slow = e.duration_s > 1.0' \
    app.log
```

### Multiple Exec Scripts

Multiple `--exec` scripts run in order. Each sees changes from previous scripts.

```bash
> kelora -j \
    --exec 'e.duration_s = e.duration_ms / 1000' \
    --exec 'track_avg("duration", e.duration_s)' \
    --exec 'if e.duration_s > 5.0 { e.alert = true }' \
    app.log
```

**Execution order:**
1. Convert `duration_ms` to `duration_s`
2. Track average duration
3. Add `alert` field for slow requests

### Atomic Execution

In resilient mode (default), exec scripts execute **atomically**:

- If an error occurs, changes are rolled back
- Original event is returned unchanged
- Processing continues with next event

```bash
> kelora -j \
    --exec 'e.result = e.value.to_int() * 2' \
    app.log
```

If `e.value` is not a valid integer:

- Error is recorded
- Event passes through unchanged
- No partial modifications

In strict mode (`--strict`), errors abort immediately.

### Common Patterns

#### Transform Fields

```bash
> kelora -j \
    --exec 'e.level = e.level.to_upper()' \
    --exec 'e.message = e.message.trim()' \
    app.log
```

#### Add Computed Fields

```bash
> kelora -j \
    --exec 'e.duration_s = e.duration_ms / 1000' \
    --exec 'e.timestamp_unix = e.timestamp.to_unix()' \
    app.log
```

#### Conditional Field Creation

```bash
> kelora -j \
    --exec 'if e.status >= 500 { e.severity = "critical" } else if e.status >= 400 { e.severity = "error" }' \
    app.log
```

#### Remove Events

```bash
> kelora -j \
    --exec 'if e.level == "DEBUG" { e = () }' \
    app.log
```

#### Track Metrics

```bash
> kelora -j \
    --exec 'track_count(e.service)' \
    --exec 'track_avg("response_time", e.duration_ms)' \
    --metrics \
    app.log
```

#### Fan-Out Arrays

```bash
> kelora -j \
    --exec 'emit_each(e.items)' \
    app.log
```

Each array element becomes a separate event.

### Access to conf

The `conf` map from `--begin` is **read-only** in `--exec`:

```bash
> kelora -j \
    --begin 'conf.multiplier = 2.5' \
    --exec 'e.adjusted = e.value * conf.multiplier' \
    app.log
```

## End Stage

### Purpose

The `--end` stage runs **once** after all events are processed. Use it to:

- Summarize metrics
- Generate reports
- Print final statistics
- Export aggregated data

### The metrics Map

The global `metrics` map contains all tracked data from `track_*()` functions:

```bash
> kelora -j \
    --exec 'track_count(e.service)' \
    --end 'for key in metrics.keys() { print(key + ": " + metrics[key]) }' \
    app.log
```

### Available Data

In `--end`, you have access to:

- `metrics` - All tracked metrics (counts, sums, averages, etc.)
- `conf` - Read-only configuration from `--begin`
- Standard Rhai functions (print, file helpers if `--allow-fs-writes`)

### Examples

#### Print Summary Statistics

```bash
> kelora -j \
    --exec 'track_count("total"); if e.level == "ERROR" { track_count("errors") }' \
    --end 'let error_rate = metrics.errors / metrics.total * 100; print("Error rate: " + error_rate + "%")' \
    app.log
```

#### Generate Report

```bash
> kelora -j \
    --exec 'track_count(e.service)' \
    --end 'print("=== Service Report ==="); for svc in metrics.keys() { print(svc + ": " + metrics[svc] + " requests") }' \
    app.log
```

#### Export Metrics to File

```bash
> kelora -j --allow-fs-writes \
    --exec 'track_count(e.service)' \
    --end 'append_file("report.txt", "Total services: " + metrics.len().to_string())' \
    app.log
```

#### Calculate Percentages

```bash
> kelora -j \
    --exec 'track_count("total"); track_count(e.level)' \
    --end 'for level in ["INFO", "WARN", "ERROR"] { let pct = metrics.get(level, 0) / metrics.total * 100; print(level + ": " + pct + "%") }' \
    app.log
```

## Stage Interaction

### Data Flow Between Stages

```
--begin:  Initialize conf map
    ↓
    conf (read-only)
    ↓
--exec:   Process events, track metrics
    ↓
    metrics + conf (both read-only)
    ↓
--end:    Summarize and report
```

### Complete Example

```bash
> kelora -j \
    --begin 'conf.threshold = 1000; conf.start = now_utc()' \
    --exec 'if e.duration_ms > conf.threshold { track_count("slow") }' \
    --exec 'track_count("total")' \
    --end 'let elapsed = now_utc() - conf.start; print("Processed " + metrics.total + " events in " + elapsed + "s"); print("Slow requests: " + metrics.get("slow", 0))' \
    app.log
```

**Flow:**
1. `--begin`: Set threshold to 1000ms, record start time
2. `--exec` (per event): Track slow requests, track total
3. `--end`: Calculate elapsed time, print summary

## Using Exec Files

### `-E, --exec-file`

Load Rhai script from file for the exec stage:

**transform.rhai:**
```rhai
// Convert duration to seconds
e.duration_s = e.duration_ms / 1000;

// Add severity based on status
if e.status >= 500 {
    e.severity = "critical";
} else if e.status >= 400 {
    e.severity = "error";
} else {
    e.severity = "ok";
}

// Track metrics
track_count(e.severity);
track_avg("response_time", e.duration_s);
```

**Usage:**
```bash
> kelora -j -E transform.rhai --metrics app.log
```

### `-I, --include`

Include Rhai library files before script stages:

**helpers.rhai:**
```rhai
fn classify_status(status) {
    if status >= 500 {
        "server_error"
    } else if status >= 400 {
        "client_error"
    } else if status >= 300 {
        "redirect"
    } else if status >= 200 {
        "success"
    } else {
        "other"
    }
}
```

**Usage:**
```bash
> kelora -j \
    -I helpers.rhai \
    --exec 'e.status_class = classify_status(e.status)' \
    app.log
```

## Best Practices

### Use --begin for Initialization

**Good:**
```bash
> kelora -j \
    --begin 'conf.lookup = read_json("data.json")' \
    --exec 'e.name = conf.lookup.get(e.id, "unknown")' \
    app.log
```

**Bad:**
```bash
> kelora -j \
    --exec 'let lookup = read_json("data.json"); e.name = lookup.get(e.id, "unknown")' \
    app.log
```

The bad example reads the file **once per event** (slow and wasteful).

### Keep --exec Scripts Simple

Break complex logic into multiple `--exec` scripts:

**Good:**
```bash
> kelora -j \
    --exec 'e.duration_s = e.duration_ms / 1000' \
    --exec 'e.is_slow = e.duration_s > 1.0' \
    --exec 'if e.is_slow { track_count("slow_requests") }' \
    app.log
```

**Bad:**
```bash
> kelora -j \
    --exec 'e.duration_s = e.duration_ms / 1000; e.is_slow = e.duration_s > 1.0; if e.is_slow { track_count("slow_requests") }' \
    app.log
```

The good example is easier to read and debug.

### Use --end for Summaries

**Good:**
```bash
> kelora -j \
    --exec 'track_count(e.service)' \
    --end 'print("Total services: " + metrics.len())' \
    app.log
```

**Bad:**
```bash
> kelora -j \
    --exec 'track_count(e.service); print("Processing...")' \
    app.log
```

The bad example prints on every event (noisy).

### Leverage File Helpers

For complex logic, use `-E` and `-I`:

```bash
> kelora -j -I helpers.rhai -E transform.rhai --metrics app.log
```

This keeps command lines clean and logic maintainable.

## Performance Considerations

### Begin Stage Overhead

The `--begin` stage runs once, so file I/O here is acceptable:

```bash
> kelora -j \
    --begin 'conf.large_dataset = read_json("10mb.json")' \
    --exec 'e.enriched = conf.large_dataset.get(e.id, #{})' \
    app.log
```

### Exec Stage Optimization

The `--exec` stage runs per event. Avoid expensive operations:

**Slow:**
```bash
> kelora -j \
    --exec 'let lookup = read_json("data.json"); e.name = lookup.get(e.id, "unknown")' \
    app.log
```

**Fast:**
```bash
> kelora -j \
    --begin 'conf.lookup = read_json("data.json")' \
    --exec 'e.name = conf.lookup.get(e.id, "unknown")' \
    app.log
```

### End Stage Overhead

The `--end` stage runs once, so complex calculations are fine:

```bash
> kelora -j \
    --exec 'track_count(e.service)' \
    --end 'let sorted = metrics.keys().sort(); for key in sorted { print(key + ": " + metrics[key]) }' \
    app.log
```

## Parallel Processing

When using `--parallel`, scripting stages behave differently:

### Begin and End

`--begin` and `--end` run **once** (not parallelized):

```bash
> kelora -j --parallel \
    --begin 'conf.start = now_utc()' \
    --exec 'track_count(e.service)' \
    --end 'print("Duration: " + (now_utc() - conf.start))' \
    app.log
```

### Exec Stage

`--exec` runs in parallel across worker threads:

- Each thread has its own copy of `conf` (read-only)
- Tracking functions aggregate across threads
- Event modifications are isolated per thread

```bash
> kelora -j --parallel \
    --exec 'e.duration_s = e.duration_ms / 1000' \
    --exec 'track_count(e.service)' \
    app.log
```

### Thread Safety

Kelora handles thread safety automatically:

- `conf` is cloned per thread (immutable)
- `metrics` uses thread-safe aggregation
- Event modifications are isolated

You don't need to worry about race conditions in scripts.

## Troubleshooting

### conf is Read-Only in --exec

**Problem:**
```bash
> kelora -j --exec 'conf.value = 42' app.log
# Error: conf is read-only in exec stage
```

**Solution:** Initialize in `--begin`:
```bash
> kelora -j --begin 'conf.value = 42' --exec 'e.result = conf.value * 2' app.log
```

### metrics Not Available in --exec

**Problem:**
```bash
> kelora -j --exec 'print(metrics["total"])' app.log
# Error: metrics not available in exec stage
```

**Solution:** Use `--end`:
```bash
> kelora -j --exec 'track_count("total")' --end 'print(metrics["total"])' app.log
```

### File Helpers Not Working

**Problem:**
```bash
> kelora -j --exec 'append_file("out.txt", e.message)' app.log
# Error: filesystem writes not allowed
```

**Solution:** Add `--allow-fs-writes`:
```bash
> kelora -j --allow-fs-writes --exec 'append_file("out.txt", e.message)' app.log
```

### Script Execution Order

**Problem:** Later `--exec` doesn't see earlier changes.

**Check:** Are you using `--filter` between them?

```bash
# This works - both --exec scripts run on same events
> kelora -j --exec 'e.a = 1' --exec 'e.b = e.a + 1' app.log

# This doesn't - filter may remove events before second --exec
> kelora -j --exec 'e.a = 1' --filter 'e.level == "ERROR"' --exec 'e.b = e.a + 1' app.log
```

Filters run **between** exec stages in the pipeline.

## See Also

- [Pipeline Model](pipeline-model.md) - How stages fit into the processing pipeline
- [Events and Fields](events-and-fields.md) - Working with event data
- [Function Reference](../reference/functions.md) - All available Rhai functions
- [CLI Reference](../reference/cli-reference.md) - Complete flag documentation