# Documentation Improvements
Review completed: 2025-11-11
## Executive Summary
**Overall Rating:** 7.5/10 - Excellent foundation, needs new-user onboarding polish
**Strengths:**
- Excellent Diátaxis structure (Tutorials, How-To, Concepts, Reference)
- Comprehensive coverage with executable examples
- Logical learning path and good cross-referencing
- High-quality tutorial content
**Gaps:**
- Written for users who already understand log processing concepts
- Terminology used before definition
- Stage ordering confusion (filter/exec/levels)
- Missing scaffolding for complete newcomers (glossary, diagrams, FAQ)
---
## High Priority Fixes (Do First)
### 1. ✅ Add a Glossary
**Problem:** Terms like "event", "field", "span", "stage", "window" used throughout but never formally defined up front.
**Solution:** Create `docs/glossary.md` with clear definitions and link prominently from:
- Landing page
- Quickstart
- Basics tutorial
- Navigation footer
**Impact:** Reduces cognitive load for new users trying to understand examples.
---
### 2. Fix Title/Navigation Inconsistencies
**Problem:** `pipeline-model.md` has title "Processing Architecture" but nav shows "Pipeline Model"
**Solution:** Pick one name and use consistently. Recommend "Processing Architecture" as it's more descriptive.
**Files to update:**
- `docs/concepts/pipeline-model.md` (title)
- `mkdocs.yml` (nav entry)
- All cross-references
---
### 3. Add Prominent Stage Ordering Callout
**Problem:** New users will put `--filter 'e.computed_field > 10'` before the `--exec` that creates it.
**Solution:** Add a prominent admonition box in `docs/tutorials/basics.md` (Part 2.5 or Part 3):
```markdown
!!! warning "Stage Order Matters"
Filters and transforms run in the order you specify them on the CLI.
Create fields with `--exec` BEFORE filtering on them with `--filter`.
❌ Wrong: `kelora --filter 'e.duration_s > 1' -e 'e.duration_s = e.duration_ms / 1000'`
✅ Right: `kelora -e 'e.duration_s = e.duration_ms / 1000' --filter 'e.duration_s > 1'`
```
**Also add to:**
- `docs/intro-to-rhai.md` (Step 7)
- `docs/quickstart.md` (after command 3)
---
### 4. Create Quick Reference Page
**Problem:** Users need to search through multiple pages for common patterns.
**Solution:** Create `docs/reference/quick-reference.md` with one-page cheatsheet:
- Common flags and what they do
- Common patterns (filter errors, extract fields, export CSV)
- Stage ordering rules
- Format selection shortcuts
- Debugging tips
**Link from:**
- Navigation (Reference section)
- Landing page footer
- README
---
### 5. Add "Common Issues" Section
**Problem:** New users hit predictable errors with no guidance.
**Solution:** Add FAQ or "Common Issues" section covering:
**In Quickstart or as separate `docs/troubleshooting.md`:**
```markdown
## Common Issues
### "Field not found" or empty output
- Check format is specified: `-j` for JSON, `-f logfmt`, etc.
- Default is `-f line` (plain text)
- Use `-F inspect` to see field names and types
### Filter not working
- Fields must exist before filtering
- Create fields with `--exec` BEFORE filtering on them
- Use `--verbose` to see errors
### Type comparison errors
- String "500" ≠ Integer 500
- Use `.to_int_or(0)` to convert safely
- Use `--filter 'e.status.to_int() >= 500'` for string fields
### No events match filter
- Check level filtering: `-l error` only shows errors
- Use `--stats` to see how many events filtered out
- Verify timestamp filtering with `--since`/`--until`
```
---
## Medium Priority
### 6. Consolidate Performance Pages
**Problem:** Three separate performance pages create navigation overhead.
**Current:**
- `concepts/performance-model.md`
- `concepts/performance-comparisons.md`
- `concepts/benchmark-results.md`
**Solution:** Merge into single `concepts/performance.md` with sections:
1. **When to Optimize** - Decision tree for sequential vs parallel
2. **Performance Model** - Streaming, memory usage, throughput
3. **Benchmarks** - Real-world numbers
4. **Comparisons** - vs jq, awk, grep
5. **Tuning Guide** - Batch size, threads, optimization tips
---
### 7. Add Visual Pipeline Diagram
**Problem:** Text-only explanation of pipeline is hard to grasp.
**Solution:** Add a simple flowchart to landing page and pipeline-model page:
```
┌─────────┐ ┌────────┐ ┌──────────┐ ┌────────┐
│ Input │ -> │ Parse │ -> │ Stages │ -> │ Output │
│ (files) │ │ (json) │ │ (filter) │ │ (json) │
└─────────┘ └────────┘ └──────────┘ └────────┘
|
┌─────┴─────┐
│ --exec │
│ --filter │
│ --levels │
└───────────┘
```
Create simple ASCII or mermaid diagram showing data flow.
---
### 8. Split Dense Pipeline Model Page
**Problem:** 588 lines covering everything is overwhelming.
**Solution:** Split into two pages:
- `concepts/pipeline-basics.md` - Input → Parse → Filter → Output (core concepts)
- `concepts/pipeline-advanced.md` - Spans, parallel mode, batching, context lines
**Or:** Add executive summary/TL;DR at top with jump links.
---
### 9. Add "When NOT to Use Kelora" Section
**Problem:** Users waste time trying to use tool for wrong use cases.
**Solution:** Add to landing page or Concepts:
```markdown
## When NOT to Use Kelora
Kelora excels at streaming log analysis with custom logic, but isn't ideal for:
- **Quick grep/awk jobs** - Use grep/awk for simple pattern matching
- **SQL-style joins** - Use DuckDB, ClickHouse, or SQLite for relational queries
- **Real-time dashboards** - Use Grafana, Kibana, or Datadog for visualization
- **Log storage** - Use Elasticsearch, Loki, or S3 for archival
- **Binary log formats** - Kelora works with text-based formats only
**Best for:** Ad-hoc analysis, ETL pipelines, streaming transforms, custom parsing
```
---
### 10. Create Troubleshooting Guide
**Problem:** No systematic debugging approach for users stuck on errors.
**Solution:** Create `docs/troubleshooting.md` with:
**Debugging Workflow:**
1. Use `-F inspect` to see field types
2. Use `--verbose` to see error messages
3. Use `--stats` to see processing summary
4. Test with `--take 10` for quick iteration
5. Use `--strict` to fail fast on errors
**Common Error Patterns:**
- Parse errors → Check format selection
- Filter errors → Check field existence with `e.has("field")`
- Type errors → Use safe conversions like `.to_int_or(0)`
- Performance issues → Consider `--parallel` for large files
- Empty output → Check level filtering, format detection
**Exit Codes:**
- 0 = Success
- 1 = Parse/runtime errors
- 2 = Invalid CLI usage
---
## Low Priority (Nice to Have)
### 11. Comparison Table with Other Tools
Add to landing page or concepts:
| grep | Pattern matching | Structured field access, transformations |
| awk | Column processing | Type-aware, nested fields, metrics |
| jq | JSON manipulation | Multi-format, filtering, aggregation |
| Python | Complex logic | No setup, streaming, CLI-first |
| Logstash | ETL pipelines | Lightweight, no JVM, Rhai scripting |
---
### 12. Video/GIF Walkthroughs
Create short screencasts for:
- Quickstart (5 min)
- Error triage workflow (3 min)
- Custom parsing (3 min)
---
### 13. Migration Guides
Create guides for users coming from:
- jq users → Kelora equivalents
- awk users → Kelora patterns
- grep users → Filtering approaches
---
### 14. Extended Examples Repository
Separate from how-tos, create `docs/examples/` with:
- One-liners for common tasks
- Script snippets library
- Copy-paste templates
---
## Content Issues to Address
### Landing Page (index.md)
**Problem:** "What It Does" section overlaps heavily with Quickstart.
**Solution Options:**
1. **Remove section** - Link directly to Quickstart instead
2. **Tell a story** - Make three examples show progression: Problem → Detection → Solution
3. **Keep one example** - Show most compelling use case, link to Quickstart for more
**Recommendation:** Keep one compelling example, link prominently to Quickstart.
---
### Core Concepts Introduced Too Late
**Problem:** "Event" mentioned everywhere but not explained until tutorial Part 2.5.
**Solution:** Add brief explanation on landing page:
```markdown
## How It Works
Kelora parses each log line into an **event** - a structured object (map) with fields you can access and manipulate using Rhai scripts.
For example, after parsing this JSON:
\`\`\`json
{"timestamp": "...", "level": "ERROR", "message": "..."}
\`\`\`
You can filter with `--filter 'e.level == "ERROR"'` where `e` is the event and `e.level` accesses the level field.
```
---
### Format Flag Inconsistency
**Problem:** Examples use `-j`, `-f json`, `--input-format json` interchangeably without stating equivalence.
**Solution:** Add to Quickstart after first command:
```markdown
!!! tip "Format Shortcuts"
`-j` is shorthand for `-f json`. These are equivalent:
- `kelora -j app.log`
- `kelora -f json app.log`
- `kelora --input-format json app.log`
This guide uses `-j` for brevity.
```
---
### Terminology Consistency
**Current inconsistencies:**
- "150+ functions" vs "150+ built-in functions" vs "150+ built-in Rhai functions"
- "Pipeline Model" vs "Processing Architecture"
- Mix of short/long flag notation
**Recommendation:**
- Use "150+ built-in functions" everywhere
- Pick one: "Processing Architecture" (clearer for new users)
- Short flags in examples, long flags in prose: "Use `-e` (short for `--exec`) to transform events"
---
### Development Approach Section
**Problem:** Appears on both README and landing page (index.md).
**Solution Options:**
1. Remove from landing page, keep in README only
2. Move to separate "About" page
3. Move to footer with link "About This Project"
**Recommendation:** Keep on landing page but move to bottom (after License). It's important context but shouldn't be in "Get Started" path.
---
## Missing Documentation
### For New Users
1. ✅ Glossary (HIGH PRIORITY)
2. Visual pipeline diagram
3. "When NOT to use Kelora"
4. Common error messages with solutions
5. FAQ section
### For Learning
6. Quick reference card (HIGH PRIORITY)
7. Comparison table with other tools
8. Common gotchas page
9. Troubleshooting guide (MEDIUM PRIORITY)
### For Operations
10. Performance tuning guide (part of consolidated performance page)
11. Debugging workflow (part of troubleshooting guide)
12. Exit code reference (exists but needs better linking)
### For Reference
13. Complete CLI flag table (single page with all flags)
14. Rhai script snippets library
---
## Structural Improvements
### Navigation Improvements
**Add to mkdocs.yml nav:**
```yaml
- Getting Started:
- Home: index.md
- Quickstart: quickstart.md
- Glossary: glossary.md
- Troubleshooting: troubleshooting.md
```
**Add footer links:**
- Quick Reference
- Glossary
- FAQ
- Troubleshooting
---
### Learning Path Improvements
**Current flow:** Home → Quickstart → Basics → Intro to Rhai → ...
**Enhancement:** Add "learning checkpoint" boxes at end of each tutorial:
```markdown
## ✓ You've Learned
- [ ] Specify formats with `-f` and `-j`
- [ ] Filter levels with `-l error,warn`
- [ ] Select fields with `-k field1,field2`
- [ ] Export with `-F csv` or `-J`
**Next:** Learn to write custom filters and transforms in [Introduction to Rhai](intro-to-rhai.md)
```
---
### Cross-Reference Improvements
Add consistent "See Also" sections with:
- Related concepts
- Related how-tos
- Related reference pages
Format:
```markdown
## See Also
**Concepts:**
- [Events and Fields](../concepts/events-and-fields.md) - Event structure details
**How-To:**
- [Triage Production Errors](../how-to/find-errors-in-logs.md) - Apply these basics
**Reference:**
- [CLI Reference](../reference/cli-reference.md) - Complete flag list
```
---
## Specific Page Improvements
### Quickstart
- Add explanation of "why each step matters" between three commands
- Add "Format Shortcuts" tip box
- Add "Common Issues" section at end
- Add learning checkpoint
### Basics Tutorial
- Add stage ordering warning prominently
- Add "Understanding Events" earlier (currently Part 2.5, should be Part 1.5)
- Add learning checkpoint at end
### Intro to Rhai Tutorial
- Emphasize type awareness earlier
- Add more examples of type conversion failures
- Add debugging workflow section
### Pipeline Model (Processing Architecture)
- Add TL;DR section at top
- Add visual diagram
- Consider splitting into basics/advanced
### How-To Guides
- Add "Prerequisites" section to each (what you need to know)
- Add "Time to complete" estimate
- Add "Common variations" for each recipe
### Reference Pages
- CLI Reference: Add single-table overview of all flags
- Functions Reference: Already excellent, no changes needed
- Add Quick Reference page (new)
---
## Implementation Order
### Phase 1: Critical Fixes (Week 1)
1. Create glossary.md
2. Fix title/nav inconsistencies
3. Add stage ordering callouts
4. Add common issues to Quickstart
5. Fix terminology consistency
### Phase 2: New Content (Week 2)
6. Create quick-reference.md
7. Create troubleshooting.md
8. Consolidate performance pages
9. Add visual pipeline diagram
10. Split pipeline-model.md
### Phase 3: Polish (Week 3)
11. Add "When NOT to use" section
12. Add learning checkpoints
13. Improve cross-references
14. Add comparison table
15. Update navigation structure
### Phase 4: Nice-to-Have (Future)
16. Video walkthroughs
17. Migration guides
18. Extended examples repository
19. FAQ page
---
## Success Metrics
**How to measure improvement:**
1. **Time to first success** - Can a new user run a successful filter in < 10 minutes?
2. **Concept clarity** - Do users understand "event", "stage", "format" before using them?
3. **Error recovery** - Can users debug common issues without external help?
4. **Feature discovery** - Can users find advanced features when needed?
**Before/After Test:**
- Have 3-5 new users try Quickstart
- Note where they get stuck
- Measure completion time
- Collect confusion points
- Repeat after improvements
---
## Notes
- Documentation is already very good - these are polish improvements
- Executable examples (markdown-exec) are a huge strength
- Tutorial quality is high - just needs better scaffolding
- Main gap is onboarding for complete newcomers
- Structure (Diátaxis) is excellent and should be preserved