# Kelora Performance Optimization Plan
## Overview
This document outlines behavior-preserving performance optimizations for kelora's log processing pipeline. All changes maintain identical output and test compatibility while targeting identified hot paths. Expected improvement: **15-30% throughput increase** for typical workloads.
## Hot Paths Identified
1. **JSON parsing** (serde_json deserialization) - Primary bottleneck for JSON workloads
2. **Rhai script execution** - Dynamic type conversions and field access per event
3. **Dynamic type conversions** - Between JSON and Rhai types (recursive for nested data)
4. **Parallel processing overhead** - Batch distribution and state merging
5. **String operations** - Line trimming, clone operations, allocations
6. **Regex operations** - Pattern compilation for dynamic patterns
7. **Output formatting** - Dynamic-to-JSON conversions and string escaping
## Recommended Optimizations
### Phase 1: High-Impact String Operations (8-12% improvement)
#### 1.1 Line Trimming Optimization ⭐ CRITICAL
**Problem:** Every parsed line calls `.trim_end_matches('\n').trim_end_matches('\r')` - two passes over string tail.
**Files to modify:**
- `src/parsers/json.rs:25`
- `src/parsers/logfmt.rs:136`
- `src/parsers/csv.rs:254`
- `src/parsers/syslog.rs:226`
- `src/parsers/combined.rs:296`
- `src/parsers/cef.rs:161`
- `src/parsers/line.rs:16`
- `src/rhai_functions/columns.rs:333, 356`
**Change:**
```rust
// BEFORE
let line = line.trim_end_matches('\n').trim_end_matches('\r');
// AFTER
let line = line.trim_end_matches(['\n', '\r']);
```
**Testing:**
- Run parser tests: `cargo test parsers::`
- Run benchmarks: `just bench`
- Verify output byte-for-byte identical with CRLF and LF line endings
---
### Phase 2: Memory Pre-allocation (4-8% improvement)
#### 2.1 Pre-allocate Map Capacity ⭐ HIGH PRIORITY
**Problem:** Creating Rhai Maps without capacity hints causes multiple reallocations per event.
**Files to modify:**
- `src/engine.rs:2396` - Event map conversion (hot path)
- `src/engine.rs:2403` - Metadata map
- `src/engine.rs:2456` - Window event maps
- `src/rhai_functions/strings.rs:52` - event_to_map helper
- `src/rhai_functions/span.rs:82` - span event_to_map
**Change example (engine.rs:2396):**
```rust
// BEFORE
let mut event_map = rhai::Map::new();
for (k, v) in &event.fields {
event_map.insert(k.clone().into(), v.clone());
}
// AFTER
let mut event_map = rhai::Map::with_capacity(event.fields.len());
for (k, v) in &event.fields {
event_map.insert(k.clone().into(), v.clone());
}
```
**Metadata map optimization:**
```rust
// Metadata typically has 2-7 fields
let mut meta_map = rhai::Map::with_capacity(8);
```
**Testing:**
- All tests must pass
- Benchmark parallel mode (hot path)
- Memory profiler to verify reduced allocations
---
#### 2.2 Optimize Window Event Conversion
**Problem:** Window events converted to maps without capacity hints.
**File:** `src/engine.rs:2449-2489`
**Change:**
```rust
let window_array: rhai::Array = window
.iter()
.map(|event| {
let mut event_map = rhai::Map::with_capacity(event.fields.len() + 8);
// ... rest unchanged
})
.collect();
```
**Testing:**
- Window-specific tests
- Benchmark with `--window` flag
- Verify multi-event window correctness
---
### Phase 3: Parser-Specific Optimizations (2-4% improvement)
#### 3.1 JSON Numeric Conversion Optimization
**Problem:** Numeric conversion tries i64, u64, f64 sequentially with unwrapping.
**File:** `src/event.rs:223-236` (json_to_dynamic function)
**Change:**
```rust
// BEFORE
serde_json::Value::Number(n) => {
if let Some(i) = n.as_i64() {
Dynamic::from(i)
} else if let Some(u) = n.as_u64() {
Dynamic::from(u)
} else if let Some(f) = n.as_f64() {
Dynamic::from(f)
} else {
Dynamic::from(n.to_string())
}
}
// AFTER - check type characteristics first
serde_json::Value::Number(n) => {
if n.is_i64() {
Dynamic::from(n.as_i64().unwrap())
} else if n.is_u64() {
Dynamic::from(n.as_u64().unwrap())
} else {
Dynamic::from(n.as_f64().unwrap())
}
}
```
**Testing:**
- Test positive/negative integers, large u64, floats
- Ensure all numeric types preserved correctly
- Benchmark with number-heavy JSON
---
### Phase 4: Collection Pre-allocation (1-3% improvement)
#### 4.1 Pre-allocate Known-Size Collections
**Files with opportunities:**
- `src/rhai_functions/absorb.rs:227` - data_map for JSON absorption
- `src/formatters.rs:150, 194` - JSON output objects
**Pattern:**
```rust
// When source size is known
let mut map = Map::with_capacity(source.len());
```
**Testing:** Existing tests, spot-check affected features
---
### Phase 5: Speculative Optimizations (OPTIONAL - Profile First)
#### 5.1 Scope Creation Optimization (Speculative)
**Problem:** `scope_template.clone()` called per event (engine.rs:2390).
**Constraint:** Scope must be unique per event (Rhai requirement).
**Approach:** Cache empty map instances to reduce Map::new() overhead:
```rust
struct ScopeCache {
empty_e_map: rhai::Map,
empty_meta_map: rhai::Map,
empty_conf_map: rhai::Map,
}
// Clone from cached instances instead of creating fresh
```
**Important:** Profile `scope.clone()` cost BEFORE implementing. If negligible, skip.
---
#### 5.2 Lazy Metadata Construction (Speculative)
**Problem:** Meta map always built even if filter doesn't access `meta`.
**Approach:** Use AST analysis to detect `meta` access, only construct if needed.
**Concern:** Complex to implement, benefits unclear (metadata commonly used).
**Recommendation:** SKIP unless profiling shows metadata construction is a bottleneck.
---
## Implementation Sequence
### Week 1: High-Confidence Quick Wins
1. **Phase 1.1** - Line trimming (1 day)
2. **Phase 2.1** - Map pre-allocation (2 days)
3. Run benchmarks, validate (1 day)
### Week 2: Parser & Collection Improvements
1. **Phase 3.1** - JSON numeric path (1 day)
2. **Phase 2.2** - Window conversion (1 day)
3. **Phase 4.1** - Collection pre-allocation (1 day)
4. Final validation (1 day)
### Week 3: Optional/Speculative (If Pursuing)
1. Profile scope and metadata construction
2. Implement Phase 5.1/5.2 only if profiling justifies
3. Final benchmarking and validation
---
## Testing & Validation Strategy
### Per-Phase Checklist
1. ✅ `cargo fmt --all`
2. ✅ `cargo clippy --all-targets --all-features -- -D warnings`
3. ✅ `cargo test` - ALL tests must pass
4. ✅ `just bench` - compare to baseline
5. ✅ Spot-test with real log files
6. ✅ Output verification (byte-for-byte where applicable)
### Establish Baseline
```bash
# Before starting
just bench > baseline_results.txt
```
### After Each Phase
```bash
just bench > phase_N_results.txt
# Compare improvements
```
### Specific Test Commands
```bash
# Quick format-specific test
time ./target/release/kelora -f json benchmarks/bench_100k.jsonl \
--filter "e.level == 'ERROR'" > /dev/null
# Memory profiling
cargo build --release
valgrind --tool=massif ./target/release/kelora -f json \
benchmarks/bench_100k.jsonl > /dev/null
```
---
## Risk Mitigation
| Breaking behavior | Comprehensive test suite, output validation |
| Performance regression | Benchmark each phase, revert if slower |
| Parallel mode issues | Extensive parallel tests in suite |
| Memory regression | Profile before/after with valgrind/massif |
| Test failures | Immediate rollback of changes |
---
## Expected Results
| 1.1 | Line trimming | 2-4% | Very Low ⭐ |
| 2.1 | Map pre-alloc (events) | 3-5% | Very Low ⭐ |
| 2.2 | Map pre-alloc (windows) | 1-3% | Low |
| 3.1 | JSON numerics | 1-2% | Low |
| 4.1 | Collection alloc | 1-2% | Very Low |
| 5.1 | Scope caching (optional) | 1-2% | Medium |
| **TOTAL** | **Cumulative** | **15-30%** | **Low-Medium** |
**Note:** Improvements are multiplicative. Conservative estimate: 15%, optimistic: 30%.
---
## Critical Files
These files are central to the hot paths and will need modification:
1. **`src/parsers/json.rs`** - JSON parsing entry point, line trimming (L25)
2. **`src/event.rs`** - Core json_to_dynamic conversion (L220-257), used throughout
3. **`src/engine.rs`** - Event-to-map hot paths (L2390-2489), executed per-event
4. **`src/parsers/logfmt.rs`** - Second most common format, line trimming (L136)
5. **`src/parallel.rs`** - Batch processing context (readonly, for understanding)
---
## Success Criteria
✅ All existing tests pass
✅ No clippy warnings
✅ Benchmark improvements match estimates (±3%)
✅ Output identical for representative workloads
✅ No memory regressions (verify with profiler)
✅ Code remains readable and maintainable
---
## Key Insights from Exploration
### What's Already Optimized
- ✅ AST compilation is properly cached (scripts compiled once)
- ✅ Regex caching with thread-local LRU (1000 entries)
- ✅ JSON parser pre-allocates HashMap with capacity
- ✅ Scope template reuse pattern (clone + modify)
### Biggest Bottlenecks Identified
- ❌ Per-event Map creation without capacity hints
- ❌ Line trimming with two passes
- ❌ Window event conversion multiplies Map allocation overhead
- ❌ Field value cloning between Rust and Rhai types
### Why These Optimizations Are Safe
- No change to output format or values
- No change to script semantics
- Pre-allocation doesn't change behavior, only performance
- Single-pass string operations produce identical results
- All changes are internal implementation details
---
## Notes
- **No backwards compatibility requirement:** Breaking API changes acceptable (though not needed here)
- **Conservative approach selected:** Capacity hints only, no aggressive refactoring
- **Speculative items clearly marked:** Profile before implementing Phase 5
- **Incremental validation:** Test after each phase, easy rollback
- **Honest benchmarking:** Use existing `just bench` suite with external tool comparisons