rsigma 0.14.0

CLI for parsing, validating, linting and evaluating Sigma detection rules
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
# rsigma

[![CI](https://github.com/timescale/rsigma/actions/workflows/ci.yml/badge.svg)](https://github.com/timescale/rsigma/actions/workflows/ci.yml)

`rsigma` is a command-line interface for parsing, validating, linting, evaluating, converting, inspecting field usage, and running [Sigma](https://github.com/SigmaHQ/sigma) detection rules as a long-running daemon.

This binary is part of the [rsigma workspace].

## Installation

```bash
cargo install rsigma
```

## Quick Start

```bash
# Single event (inline JSON)
rsigma engine eval -r path/to/rules/ -e '{"CommandLine": "cmd /c whoami"}'

# Stream NDJSON from stdin (auto-selected when piped)
cat events.ndjson | rsigma engine eval -r path/to/rules/

# Table view for interactive triage
rsigma engine eval -r path/to/rules/ -e @events.ndjson --output-format table

# CSV for spreadsheets / data tools
rsigma engine eval -r path/to/rules/ -e @events.ndjson --output-format csv

# Long-running daemon with hot-reload, health checks, and Prometheus metrics
hel run | rsigma engine daemon -r rules/ -p ecs_windows --api-addr 0.0.0.0:9090

# With a builtin pipeline (no external file needed)
rsigma engine eval -r rules/ -p ecs_windows -e '{"process.command_line": "whoami"}'

# Or use a custom pipeline YAML file
rsigma engine eval -r rules/ -p pipelines/custom.yml -e '{"src_ip": "10.0.0.1"}'

# Convert rules to backend-native queries
rsigma backend convert -r rules/ -t test

# Convert to PostgreSQL SQL
rsigma backend convert -r rules/ -t postgres

# List all fields referenced by rules (with optional pipeline mapping)
rsigma rule fields -r rules/ -p ecs_windows

# List available conversion backends
rsigma backend targets
```

## Global flags

These flags work with every subcommand, mirroring how `--log-format` does, and they also resolve from the YAML config and the `RSIGMA_*` env layer (see [Output Formats](https://timescale.github.io/rsigma/reference/output/)).

| Flag | Default | Description |
|------|---------|-------------|
| `--output-format <FORMAT>` | TTY-aware (pretty `json` on a terminal, `ndjson` when piped) | One of `json`, `ndjson`, `table`, `csv`, `tsv`. `engine eval`, `rule fields`, and `rule lint` honour every value; `backend convert` honours `json` (wraps queries) and warns + falls back to raw text for `table`/`csv`/`tsv`. |
| `--color <CHOICE>` | `auto` | One of `auto`, `always`, `never`. `auto` honours `NO_COLOR` and disables colour when stdout is not a TTY. |
| `--quiet`, `-q` | off | Suppress every non-data line (progress, stats, fallback warnings). Errors still go to stderr. |
| `--no-stats` | off | Suppress only the trailing summary; progress messages still appear. |
| `--log-format <FORMAT>` | unset | When set, initialises a stderr `tracing` subscriber in `text` or `json`. Diagnostic logs only; does not affect stdout. |

## Subcommands

Commands are grouped into four noun-led groups: `engine` (eval / daemon), `rule` (parse / validate / lint / fields / condition / stdin), `backend` (convert / targets / formats), and `pipeline` (resolve).

### Migrating from the old flat commands

Every flat top-level command still works as a hidden, undocumented forwarder. Invoking the old form prints a stderr warning and dispatches to the same implementation; stdout, exit codes, and every flag are unchanged. The aliases no longer appear in `rsigma --help`, but `rsigma <alias> --help` is still routable so existing scripts that introspect a subcommand keep working.

| Old (flat, deprecated) | New (grouped) |
|------------------------|---------------|
| `rsigma eval ...` | `rsigma engine eval ...` |
| `rsigma daemon ...` | `rsigma engine daemon ...` |
| `rsigma parse ...` | `rsigma rule parse ...` |
| `rsigma validate ...` | `rsigma rule validate ...` |
| `rsigma lint ...` | `rsigma rule lint ...` |
| `rsigma fields ...` | `rsigma rule fields ...` |
| `rsigma condition ...` | `rsigma rule condition ...` |
| `rsigma stdin ...` | `rsigma rule stdin ...` |
| `rsigma convert RULES ...` | `rsigma backend convert RULES ...` |
| `rsigma list-targets` | `rsigma backend targets` |
| `rsigma list-formats TARGET` | `rsigma backend formats TARGET` |
| `rsigma resolve ...` | `rsigma pipeline resolve ...` |

Deprecation timeline: flat aliases are **hidden** from `rsigma --help` but stay functional with a stderr migration warning, and will be **removed** in v1.0. Migrate at your convenience within that window.

### `config`: YAML configuration

Both `engine daemon` and `engine eval` accept their settings via a YAML config file in addition to CLI flags and environment variables. Precedence is **CLI flag > env > project file > user file > system file > compiled default**, applied per leaf (a project `.rsigmarc` that only sets `eval.rules` does not erase the rest of the user config).

```bash
# Scaffold a commented template at ./rsigma.yaml.
rsigma config init

# Discover files, deserialize, warn on unknown keys and inert sections.
rsigma config validate

# Print the effective config with the source of each leaf.
rsigma config show --for daemon

# Emit the JSON Schema (paired with a $schema header in the template).
rsigma config schema > rsigma.schema.json

# List the config files that would load.
rsigma config path

# Hot-reload a running daemon (POST /api/v1/reload, cross-platform).
rsigma config reload
```

`engine daemon` and `engine eval` also support `--config <PATH>` (load only that file) and `--dry-run` (print the effective section and exit `0`).

Discovery walks: `/etc/rsigma/config.yaml` → `~/.config/rsigma/config.yaml` → nearest `.rsigmarc` (walked up from CWD) → `./rsigma.yaml`. Override with `--config`. The full schema, environment-variable scheme (`RSIGMA_<SECTION>__<KEY>`), and secrets policy live in the [Configuration Reference](https://timescale.github.io/rsigma/reference/configuration/).

### `rule parse`: Parse a single rule

Parse a Sigma YAML file and output the AST as JSON.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `path` | positional | required | Path to a Sigma YAML file |
| `--pretty` / `-p` | flag | **true** | Pretty-print JSON output |

```bash
rsigma rule parse rule.yml            # print AST as pretty-printed JSON
rsigma rule parse rule.yml --pretty   # same (default)
```

Note: pretty-print is on by default and cannot be disabled.

### `rule validate`: Validate rules in a directory

Parse and compile all rules in a directory, reporting errors.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `path` | positional | required | Path to a directory of Sigma YAML files |
| `--verbose` / `-v` | flag | `false` | Show details for each file (parse errors, compile errors) |
| `--pipeline` / `-p` | repeatable | `[]` | Processing pipeline YAML file(s) to apply before compilation |
| `--resolve-sources` | flag | `false` | Also resolve dynamic pipeline sources during validation. Sources must be reachable (file/command/HTTP) for validation to pass |
| `--source` | repeatable | `[]` | External source file(s) or directory to load alongside pipeline-declared sources (for validating pipelines that reference external sources) |

```bash
rsigma rule validate path/to/rules/ -v              # verbose output
rsigma rule validate rules/ -p pipelines/ecs.yml    # validate with pipeline
rsigma rule validate rules/ -p dynamic.yml --resolve-sources  # validate + test source resolution
rsigma rule validate rules/ -p pipe.yml --source sources.yml --resolve-sources  # validate with external sources
```

### `rule lint`: Lint rules against the Sigma specification

Run 66 built-in lint rules with optional JSON schema validation.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `path` | positional | required | Path to a Sigma rule file or directory |
| `--schema` / `-s` | string | none | `"default"` to download the official schema (cached 7 days), or a path to a local JSON schema file |
| `--verbose` / `-v` | flag | `false` | Show details for all files, including those that pass |
| `--disable` | string | `""` | Comma-separated lint rule IDs to suppress |
| `--config` | path | none | Explicit path to `.rsigma-lint.yml` (otherwise auto-discovered by walking ancestor directories) |
| `--exclude` | string | none | Glob pattern for paths to skip (repeatable, relative to lint root) |
| `--tag-namespace` | string | none | Allow an additional tag namespace beyond the built-in spec set (repeatable). Tags using the namespace no longer trigger `unknown_tag_namespace`. Values are lowercased. |
| `--fix` | flag | `false` | Automatically apply safe fixes (lowercase keys, correct typos, remove duplicates, etc.) |
| `--fail-level` | string | `"error"` | Minimum severity for non-zero exit: `error` (default), `warning`, or `info` |

```bash
rsigma rule lint path/to/rules/                     # lint all rules
rsigma rule lint path/to/rules/ -v                  # verbose (show passing files + info-only)
rsigma rule lint path/to/rules/ --schema default    # + JSON schema validation (downloads + caches)
rsigma rule lint rule.yml --schema my-schema.json   # local JSON schema
rsigma rule lint path/to/rules/ --color always      # force color (global flag)
rsigma rule lint path/to/rules/ --output-format json # machine-readable {summary,findings}
rsigma rule lint rules/ --disable missing_description,missing_author  # suppress specific rules
rsigma rule lint rules/ --config my-lint.yml        # explicit config file
rsigma rule lint rules/ --exclude "config/**"       # skip non-rule files
rsigma rule lint rules/ --exclude "config/**" --exclude "**/unsupported/**"  # multiple patterns
rsigma rule lint rules/ --tag-namespace myorg --tag-namespace internal  # allow custom tag namespaces
rsigma rule lint rules/ --fix                       # auto-fix safe issues
rsigma rule lint rules/ --fail-level warning        # CI: fail on warnings too
rsigma rule lint rules/ --fail-level info           # CI: fail on any finding
```

**Lint output summary format:**

```
Checked N file(s): X passed, Y failed (A error(s), B warning(s), C info(s))
```

**Schema validation skips** documents with `action: global`, `action: reset`, or `action: repeat` (action fragments).

### `engine daemon`: Run as a long-running detection service

Run rsigma as a long-running daemon that continuously reads NDJSON from stdin, evaluates against rules, writes matches to stdout, and exposes health/metrics/management APIs over HTTP.

Unlike `engine eval`, the daemon stays alive after stdin reaches EOF and supports hot-reload: adding, modifying, or removing `.yml`/`.yaml` files in the rules directory or any pipeline file passed via `-p` triggers an automatic reload (rules and pipelines are re-read together). SIGHUP and the `/api/v1/reload` endpoint also trigger reloads. The daemon is designed for production deployment behind a log collector (e.g. `hel run | rsigma engine daemon ...`) or an event bus.

> [!TIP]
> Correlation rules also work in `engine eval` mode within a single run (via stdin or `@file`), but `engine daemon` mode is recommended for continuous stateful tracking with hot-reload and state persistence.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--rules` / `-r` | path | required | Path to Sigma rule file or directory |
| `--pipeline` / `-p` | repeatable | `[]` | Processing pipeline YAML file(s), applied in priority order |
| `--input` | string | `"stdin"` | Event input source: `stdin`, `http`, or `nats://<host>:<port>/<subject>` |
| `--output` | repeatable | `["stdout"]` | Detection output sink (fan-out): `stdout`, `file://<path>`, `nats://<host>:<port>/<subject>` |
| `--input-format` | string | `"auto"` | Input log format: `auto`, `json`, `syslog`, `plain`, `logfmt`\*, `cef`\* |
| `--syslog-tz` | string | `"+00:00"` | Default timezone for RFC 3164 syslog (e.g. `+05:00`, `-08:00`) |
| `--jq` | string | none | jq filter to extract event payload (conflicts with `--jsonpath`) |
| `--jsonpath` | string | none | JSONPath (RFC 9535) query (conflicts with `--jq`) |
| `--include-event` | flag | `false` | Include full event JSON in each detection match |
| `--pretty` | flag | `false` | Pretty-print JSON output |
| `--api-addr` | string | `0.0.0.0:9090` | Address for health, metrics, and management API server |
| `--suppress` | string | none | Suppression window for correlation alerts (e.g. `5m`, `1h`) |
| `--action` | string | none | `alert` or `reset`, the action taken after correlation fires |
| `--no-detections` | flag | `false` | Suppress detection-level output (only show correlation alerts) |
| `--correlation-event-mode` | string | `"none"` | `none`, `full`, or `refs` |
| `--max-correlation-events` | integer | **10** | Max events stored per correlation window |
| `--timestamp-field` | repeatable | `[]` | Event field(s) for timestamp extraction |
| `--bloom-prefilter` | flag | `false` | Enable bloom-filter pre-filtering of positive substring matchers (workload-dependent; see `crates/rsigma-eval/README.md`) |
| `--bloom-max-bytes` | integer | **1048576** | Memory budget for the bloom index (no effect without `--bloom-prefilter`) |
| `--cross-rule-ac` | flag | `false` | Enable cross-rule Aho-Corasick pre-filter (requires `--features daachorse-index`; see `crates/rsigma-eval/README.md`) |
| `--observe-fields` | flag | `false` | Record the field keys of every event evaluated by the engine so `/api/v1/fields*` can report gap and broken-coverage signals. Off by default; when off the engine task does not iterate event fields at all |
| `--observe-fields-max-keys` | integer | **10000** | Hard ceiling on distinct field names tracked by the observer. Overflow drops are counted via `rsigma_fields_observer_overflow_dropped_total`. No effect without `--observe-fields` |
| `--buffer-size` | integer | **10000** | Bounded channel capacity for source-to-engine and engine-to-sink queues |
| `--batch-size` | integer | **1** | Maximum events per engine lock acquisition (reduces mutex overhead under load) |
| `--drain-timeout` | integer | **5** | Seconds to wait for in-flight events to drain on shutdown |
| `--dlq` | string | none | Dead-letter queue: `stdout`, `file://<path>`, or `nats://<host>:<port>/<subject>` |
| `--state-db` | path | none | Path to SQLite database for persisting correlation state across restarts |
| `--state-save-interval` | integer | **30** | Seconds between periodic state snapshots (only with `--state-db`) |
| `--clear-state` | flag | `false` | Clear correlation state on startup (conflicts with `--keep-state`) |
| `--keep-state` | flag | `false` | Force restore correlation state on startup, even during replay (conflicts with `--clear-state`) |
| `--timestamp-fallback` | string | `"wallclock"` | `wallclock` (substitute current time) or `skip` (omit from correlation) when events lack parseable timestamps |
| `--source` | repeatable | `[]` | External source file(s) or directory of source files. Loads dynamic source declarations independently of pipeline files. See [External source files](#external-source-files) |
| `--allow-remote-include` | flag | `false` | Allow `include` directives in dynamic pipelines to reference remote (HTTP/NATS) sources. Local sources (file/command) are always permitted |

**NATS flags** (require `daemon-nats` feature):

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--replay-from-sequence` | integer | none | Replay from a specific JetStream stream sequence number |
| `--replay-from-time` | string | none | Replay from a timestamp (ISO 8601, e.g. `2026-01-15T10:00:00Z`) |
| `--replay-from-latest` | flag | `false` | Start from the latest message, skipping stream history |
| `--consumer-group` | string | none | Shared durable consumer name for load balancing across daemon instances (env: `RSIGMA_CONSUMER_GROUP`) |
| `--nats-creds` | path | none | Credentials file (`.creds`) for JWT + NKey auth (env: `NATS_CREDS`) |
| `--nats-token` | string | none | Authentication token (env: `NATS_TOKEN`) |
| `--nats-user` | string | none | Username (requires `--nats-password`, env: `NATS_USER`) |
| `--nats-password` | string | none | Password (requires `--nats-user`, env: `NATS_PASSWORD`) |
| `--nats-nkey` | string | none | NKey seed (env: `NATS_NKEY`) |
| `--nats-tls-cert` | path | none | Client certificate for mutual TLS (requires `--nats-tls-key`) |
| `--nats-tls-key` | path | none | Client private key for mutual TLS (requires `--nats-tls-cert`) |
| `--nats-require-tls` | flag | `false` | Require TLS on NATS connections |

**TLS flags** (require `daemon-tls` feature):

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--tls-cert` | path | none | PEM-encoded leaf certificate (chain) for the API listener. Requires `--tls-key`. |
| `--tls-key` | path | none | PEM-encoded private key (PKCS#8, PKCS#1, or SEC1). Requires `--tls-cert`. |
| `--tls-key-password` | string | none | Password for an encrypted `--tls-key` (env: `RSIGMA_TLS_KEY_PASSWORD`). Currently rejected with a clear hint pointing at `openssl rsa` for offline decryption. |
| `--tls-client-ca` | path | none | PEM bundle of trusted CAs for inbound client certificate verification (mTLS). |
| `--tls-min-version` | string | `"1.3"` | Minimum TLS protocol version: `1.2` or `1.3`. |
| `--allow-plaintext` | flag | `false` | Permit plaintext on a non-loopback `--api-addr`. Loopback always allows plaintext. |

When the `daemon-tls` feature is built in, the daemon refuses to start on a non-loopback `--api-addr` without `--tls-cert`/`--tls-key` or an explicit `--allow-plaintext` opt-in. With TLS configured, the same socket serves HTTP REST, `/metrics`, OTLP/HTTP, and OTLP/gRPC over a single TLS connection via ALPN (advertises both `h2` and `http/1.1`). Crypto provider is `aws-lc-rs`, matching the NATS client TLS path. Certificate hot-reload is cross-platform: any of the hot-reload triggers below re-reads the cert/key from disk and atomically swaps the active `rustls::ServerConfig` via `Arc<ArcSwap<…>>` without dropping inflight TLS connections.

\* Feature-gated: `logfmt` requires the `logfmt` feature, `cef` requires the `cef` feature.

**Usage:**

```bash
# Basic daemon: stream events, detect, output matches
hel run | rsigma engine daemon -r rules/ -p ecs.yml

# Accept events via HTTP POST instead of stdin
rsigma engine daemon -r rules/ --input http
# Then: curl -X POST http://localhost:9090/api/v1/events -d '{"CommandLine":"whoami"}'

# NATS JetStream source and sink
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> --output nats://localhost:4222/detections

# Fan-out: write detections to both stdout and a file
hel run | rsigma engine daemon -r rules/ --output stdout --output file:///tmp/detections.ndjson

# NATS with authentication (credentials file)
rsigma engine daemon -r rules/ --input nats://nats.example.com:4222/events.> --nats-creds /etc/rsigma/nats.creds

# NATS with token auth (via environment variable)
NATS_TOKEN=secret rsigma engine daemon -r rules/ --input nats://localhost:4222/events.>

# NATS with mutual TLS
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> \
  --nats-tls-cert /etc/rsigma/client.pem --nats-tls-key /etc/rsigma/client-key.pem --nats-require-tls

# Dead-letter queue for failed events
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> \
  --dlq file:///var/log/rsigma-dlq.ndjson

# Replay from a specific stream sequence
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-sequence 42

# Replay from a point in time
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-time 2026-04-30T00:00:00Z

# Start from the latest message, ignoring history
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-latest

# Consumer groups for horizontal scaling
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> --consumer-group detection-workers

# Terminate TLS in-process (requires the daemon-tls feature)
rsigma engine daemon -r rules/ --input http --api-addr 0.0.0.0:9090 \
  --tls-cert /etc/rsigma/tls/server.crt \
  --tls-key  /etc/rsigma/tls/server.key

# Mutual TLS: every agent must present a CA-signed client cert
rsigma engine daemon -r rules/ --input http --api-addr 0.0.0.0:9090 \
  --tls-cert /etc/rsigma/tls/server.crt \
  --tls-key  /etc/rsigma/tls/server.key \
  --tls-client-ca /etc/rsigma/tls/clients-ca.crt

# With SQLite state persistence (correlation state survives restarts)
hel run | rsigma engine daemon -r rules/ -p ecs.yml --state-db ./rsigma-state.db

# Force clear state on startup (ignore any saved state)
rsigma engine daemon -r rules/ --state-db ./state.db --clear-state

# Force restore state during replay (forward catch-up scenario)
rsigma engine daemon -r rules/ --input nats://localhost:4222/events.> \
  --state-db ./state.db --replay-from-sequence 1001 --keep-state

# Skip events without timestamps for correlation (forensic replay)
rsigma engine daemon -r rules/ --timestamp-fallback skip

# Tune pipeline: micro-batch 64 events per lock, 50K buffer, 10s drain on shutdown
rsigma engine daemon -r rules/ --batch-size 64 --buffer-size 50000 --drain-timeout 10

# With all options
rsigma engine daemon \
  -r rules/ \
  -p ecs.yml \
  --jq '.event' \
  --suppress 5m \
  --action reset \
  --api-addr 0.0.0.0:9090 \
  --state-db /var/lib/rsigma/state.db
```

**HTTP endpoints:**

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/healthz` | GET | Always returns `{"status": "ok"}` |
| `/readyz` | GET | Returns 200 when rules are loaded, 503 otherwise |
| `/metrics` | GET | Prometheus metrics (events processed, matches, latency, rules loaded, etc.) |
| `/api/v1/status` | GET | Full daemon status (rules, state entries, counters, uptime) |
| `/api/v1/rules` | GET | Rule counts and rules path |
| `/api/v1/reload` | POST | Trigger a manual reload of rules, pipelines, enrichers, and (with `daemon-tls`) the TLS certificate. Cross-platform alternative to `SIGHUP`. |
| `/api/v1/events` | POST | Ingest events (NDJSON body, one event per line). Only available with `--input http` |
| `/api/v1/sources` | GET | List dynamic sources and their resolution status |
| `/api/v1/sources/resolve` | POST | Trigger re-resolution of all dynamic sources (or specific ones via request body) |
| `/api/v1/sources/cache/{source_id}` | DELETE | Invalidate the cached value for a specific source |
| `/api/v1/fields` | GET | Combined snapshot with summary, unknown (gap signal), and missing (broken coverage). Returns 503 unless `--observe-fields` is set. Paginated via `?limit=&offset=` |
| `/api/v1/fields/unknown` | GET | Event fields no rule references, sorted by descending count. Requires `--observe-fields`. Paginated |
| `/api/v1/fields/missing` | GET | Rule fields never observed in events, with sample rule titles. Requires `--observe-fields`. Paginated |
| `/api/v1/fields/observer` | DELETE | Clear the observer's counters and return `{previous_keys, previous_events}`. Requires `--observe-fields` |
| `/v1/logs` | POST | OTLP log ingestion (`application/x-protobuf` or `application/json`, gzip supported). Requires `daemon-otlp` feature |

**OTLP log ingestion** (requires `daemon-otlp` feature):

When built with `daemon-otlp`, the daemon accepts [OpenTelemetry Protocol (OTLP)](https://opentelemetry.io/docs/specs/otlp/) log export requests on `/v1/logs` (HTTP) and via gRPC on the same port. The endpoint is always active regardless of `--input`. This lets you point any OpenTelemetry-compatible agent (Grafana Alloy, Vector, Fluent Bit, OTel Collector) at rsigma for real-time detection.

Both transports support gzip compression. OTLP `LogRecord` fields are flattened to JSON: resource attributes are prefixed with `resource.`, log attributes are unprefixed, and key-value map bodies are flattened to top-level fields for direct Sigma rule matching.

```bash
# Send OTLP logs via protobuf
curl -X POST http://localhost:9090/v1/logs \
  -H 'Content-Type: application/x-protobuf' \
  --data-binary @export_logs_request.pb

# Send OTLP logs via JSON
curl -X POST http://localhost:9090/v1/logs \
  -H 'Content-Type: application/json' \
  -d '{"resourceLogs":[...]}'

# Grafana Alloy config (forward to rsigma)
# otelcol.exporter.otlphttp "rsigma" {
#   client { endpoint = "http://rsigma:9090" }
# }

# Vector config
# [sinks.rsigma]
# type = "http"
# uri = "http://rsigma:9090/v1/logs"
# encoding.codec = "native"  # protobuf
```

**Hot-reload triggers:**

- File system changes to `.yml`/`.yaml` files in the rules directory (debounced 500ms)
- `SIGHUP` signal (Unix only) -- triggers both rule reload and dynamic source re-resolution
- `POST /api/v1/reload` -- cross-platform; the recommended cert-rotation path on Windows
- `POST /api/v1/sources/resolve` -- re-resolves dynamic sources without reloading rules
- NATS control subject `rsigma.control.resolve` (when using NATS sources) -- payload can be empty (resolve all) or `{"source_id": "..."}` (resolve one)

The first three triggers all funnel through one debounced reload task that re-reads rules, pipelines, enrichers, and (when `daemon-tls` is built in) the TLS certificate and key. A failed reload of any component bumps `rsigma_reloads_failed_total`, logs an error, and leaves the previous in-memory state active so a typo on disk cannot black-hole the daemon.

**Prometheus metrics:**

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `rsigma_events_processed_total` | counter | | Total events processed |
| `rsigma_detection_matches_total` | counter | | Total detection matches (aggregate) |
| `rsigma_detection_matches_by_rule_total` | counter | `rule_title`, `level` | Detection matches per rule |
| `rsigma_correlation_matches_total` | counter | | Total correlation matches (aggregate) |
| `rsigma_correlation_matches_by_rule_total` | counter | `rule_title`, `level`, `correlation_type` | Correlation matches per rule |
| `rsigma_events_parse_errors_total` | counter | | JSON parse errors on input |
| `rsigma_detection_rules_loaded` | gauge | | Number of detection rules loaded |
| `rsigma_correlation_rules_loaded` | gauge | | Number of correlation rules loaded |
| `rsigma_correlation_state_entries` | gauge | | Active correlation state entries |
| `rsigma_reloads_total` | counter | | Total rule reload attempts |
| `rsigma_reloads_failed_total` | counter | | Failed rule reload attempts |
| `rsigma_event_processing_seconds` | histogram | | Per-event processing latency |
| `rsigma_pipeline_latency_seconds` | histogram | | End-to-end latency from event dequeue to sink send |
| `rsigma_batch_size` | histogram | | Number of events processed per batch |
| `rsigma_input_queue_depth` | gauge | | Current events buffered in source-to-engine channel |
| `rsigma_output_queue_depth` | gauge | | Current results buffered in engine-to-sink channel |
| `rsigma_back_pressure_events_total` | counter | | Times a source was blocked on a full event channel |
| `rsigma_uptime_seconds` | gauge | | Daemon uptime in seconds |
| `rsigma_dlq_events_total` | counter | | Events routed to the dead-letter queue |
| `rsigma_source_resolves_total` | counter | `source_id` | Total dynamic source resolution attempts |
| `rsigma_source_resolve_errors_total` | counter | `source_id` | Total dynamic source resolution failures |
| `rsigma_source_resolve_latency_seconds` | histogram | | Source resolution latency |
| `rsigma_source_cache_hits_total` | counter | | Times a cached value was served instead of fetching fresh |
| `rsigma_source_last_resolved_timestamp` | gauge | `source_id` | Unix timestamp of last successful resolution per source |
| `rsigma_otlp_requests_total` | counter | `transport`, `encoding` | OTLP export requests received (requires `daemon-otlp`) |
| `rsigma_otlp_log_records_total` | counter | | Log records ingested via OTLP (requires `daemon-otlp`) |
| `rsigma_otlp_errors_total` | counter | `transport`, `reason` | OTLP request errors (requires `daemon-otlp`) |
| `rsigma_tls_certificate_expiry_seconds` | gauge | | Seconds until the active TLS server certificate's `not_after` (signed; negative once expired). Requires `daemon-tls` |
| `rsigma_tls_active_connections` | gauge | | Currently active TLS-terminated connections on the API listener (requires `daemon-tls`) |

The per-rule labeled counters (`_by_rule_total`) enable per-rule alerting in Grafana or other Prometheus-based tools. A single PromQL query like `increase(rsigma_detection_matches_by_rule_total[5m]) > 0` produces separate alert instances for each `{rule_title, level}` combination. The aggregate counters (`_total`) remain for lightweight total-throughput monitoring.

**Logging:** structured JSON to stderr, configurable via `RUST_LOG` environment variable (default: `info`). Useful filter targets:

- `RUST_LOG=info,tower_http=debug` — HTTP API access logs (method, URI, status, latency) for every request to `/api/v1/*`, `/healthz`, `/metrics`.
- `RUST_LOG=info,rsigma=debug` — verbose batch processing (`Batch processed` events with `batch_size`, `matches`, `elapsed_ms`), DLQ routing, source resolution timing, state snapshot duration, and OTLP per-request fields.
- `RUST_LOG=info,rsigma_runtime::sources=debug` — dynamic source resolution and refresh scheduler details.
- `RUST_LOG=info,rsigma_eval=debug` — correlation engine internals (chain depth limits, hard-cap eviction warnings already emit at `warn`).

The `tracing` spans installed on hot paths (batch processing, source resolution, OTLP ingest, rule loading) double as profiling hooks consumable by `tokio-console` or `tracing-timing` without code changes—just swap in the corresponding subscriber layer.

**CLI subcommand logging:** non-daemon subcommands (everything outside `rsigma engine daemon` — that is, `rsigma engine eval`, the `rsigma rule *` group, `rsigma backend *`, `rsigma pipeline resolve`) default to human-readable stdout/stderr output only. Pass the global `--log-format json` (or `--log-format text`) to additionally install a tracing subscriber on stderr for CI/log aggregation use cases. Verbosity follows `RUST_LOG` (default `info`). Human-readable output is unchanged when the flag is set.

**State persistence:** when `--state-db` is set, correlation state (window entries, suppression timestamps, event buffers) is persisted to a SQLite database. State is loaded on startup, saved periodically (default every 30s, configurable via `--state-save-interval`), and saved on graceful shutdown. This allows correlation windows to survive daemon restarts. For example, an `event_count` correlation that saw 2 of 3 required events before a restart will resume from 2 after restarting. The database uses WAL journal mode and stores a single JSON snapshot row. Correlation entries are keyed by stable rule identifiers (id/name), so state survives rule reloads even if internal ordering changes.

**State restore during replay:** when restarting with a NATS replay flag (`--replay-from-sequence`, `--replay-from-time`, `--replay-from-latest`), the daemon automatically decides whether to restore or clear correlation state based on the replay direction. The last-acked NATS stream sequence and timestamp are stored in SQLite alongside the snapshot. If the replay starts after the stored position (forward catch-up), state is restored safely. If the replay starts at or before the stored position (backward replay or forensic investigation), state is cleared to prevent double-counting. Use `--keep-state` to override the automatic decision and always restore, or `--clear-state` to always start fresh.

**Timestamp fallback:** the `--timestamp-fallback` flag controls how correlation windows handle events without parseable timestamp fields. The default `wallclock` substitutes the current time (suitable for live streaming). The `skip` mode omits the event from correlation state updates while still firing stateless detections, which prevents wall-clock times from corrupting temporal windows during forensic replay of historical data.

**At-least-once delivery:** when using NATS JetStream input, messages are held in an `AckToken` until the sink confirms delivery. If the daemon crashes before acknowledging, NATS redelivers the message after the consumer's `ack_wait` expires.

**Dead-letter queue:** events that fail processing (parse errors, sink delivery failures) are routed to the `--dlq` target instead of being silently discarded. Each DLQ entry is a JSON object containing `original_event`, `error`, and `timestamp`.

**Consumer groups:** the `--consumer-group` flag sets a shared durable consumer name. Multiple daemon instances using the same group pull from a single JetStream consumer, and NATS distributes messages for load balancing. When not specified, the consumer name is derived from the subject.

**Feature flags:** the daemon subcommand requires the `daemon` feature (enabled by default). NATS flags require the `daemon-nats` feature. OTLP log ingestion (HTTP and gRPC) requires the `daemon-otlp` feature. To build without daemon dependencies: `cargo build --no-default-features`.

### `engine eval`: Evaluate events against rules

Evaluate JSON events against Sigma detection and correlation rules.

> [!TIP]
> Eval mode builds correlation state in memory for the duration of a single run, so correlation rules fire when multiple events are processed together (via stdin or `@file`). State is not persisted between runs. For continuous correlation over time, use `daemon` mode.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--rules` / `-r` | path | required | Path to Sigma rule file or directory |
| `--event` / `-e` | string | none | A single event as a JSON string, or `@path` to read from a file. Supports NDJSON files and `.evtx` (Windows Event Log) files (requires `evtx` feature). If omitted, reads NDJSON from stdin |
| `--pretty` | flag | **false** | Pretty-print JSON output |
| `--pipeline` / `-p` | repeatable | `[]` | Processing pipeline YAML file(s), applied in priority order |
| `--jq` | string | none | jq filter to extract event payload (conflicts with `--jsonpath`) |
| `--jsonpath` | string | none | JSONPath (RFC 9535) query (conflicts with `--jq`) |
| `--suppress` | string | none | Suppression window for correlation alerts (e.g. `5m`, `1h`, `30s`) |
| `--action` | string | none | `alert` or `reset`, the action taken after correlation fires |
| `--no-detections` | flag | `false` | Suppress detection-level output (only show correlation alerts) |
| `--include-event` | flag | `false` | Include full event JSON in each detection match |
| `--correlation-event-mode` | string | `"none"` | `none`, `full`, or `refs` |
| `--max-correlation-events` | integer | **10** | Max events stored per correlation window |
| `--timestamp-field` | repeatable | `[]` | Event field(s) for timestamp extraction (prepended to the default list) |
| `--input-format` | string | `"auto"` | Input log format: `auto`, `json`, `syslog`, `plain`, `logfmt`\*, `cef`\* |
| `--syslog-tz` | string | `"+00:00"` | Default timezone for RFC 3164 syslog (e.g. `+05:00`, `-08:00`) |
| `--fail-on-detection` | flag | `false` | Exit with code 1 when any detection or correlation fires. Useful for CI/CD pipelines |
| `--bloom-prefilter` | flag | `false` | Enable bloom-filter pre-filtering of positive substring matchers (see `crates/rsigma-eval/README.md` for the trade-off) |
| `--bloom-max-bytes` | integer | **1048576** | Memory budget for the bloom index (no effect without `--bloom-prefilter`) |
| `--cross-rule-ac` | flag | `false` | Enable cross-rule Aho-Corasick pre-filter (requires `--features daachorse-index`; see `crates/rsigma-eval/README.md`) |
| `--observe-fields` | flag | `false` | Record the field keys of every evaluated event and emit a coverage report at end-of-run (gap signal + broken-coverage signal). Same JSON shape as the daemon's `GET /api/v1/fields` endpoint |
| `--observe-fields-max-keys` | integer | **10000** | Hard ceiling on distinct field names tracked. Overflow is counted via `overflow_dropped` in the report. No effect without `--observe-fields` |
| `--observe-fields-report` | path | none | Path to write the report. Defaults to stderr when omitted so detections on stdout stay machine-consumable. No effect without `--observe-fields` |

\* Feature-gated: `logfmt` requires the `logfmt` feature, `cef` requires the `cef` feature, `evtx` requires the `evtx` feature.

**Basic evaluation:**

```bash
# Single event (inline JSON)
rsigma engine eval -r path/to/rules/ -e '{"CommandLine": "whoami"}'

# Read events from a file (@file syntax, streams as NDJSON, one event per line)
rsigma engine eval -r path/to/rules/ -e @events.ndjson

# Stream NDJSON from stdin
cat events.ndjson | rsigma engine eval -r path/to/rules/

# With processing pipeline(s), applied in priority order
rsigma engine eval -r rules/ -p sysmon.yml -p custom.yml -e '...'
```

The `@file` syntax is equivalent to piping the file via stdin but avoids the pipe:

```bash
# These are equivalent:
rsigma engine eval -r rules/ -e @events.ndjson
cat events.ndjson | rsigma engine eval -r rules/
```

**EVTX (Windows Event Log) files** (requires `evtx` feature):

Files with a `.evtx` extension are automatically detected and parsed as binary Windows Event Log files. Each record is converted to JSON and evaluated against the loaded rules.

```bash
# Evaluate Sigma rules against a Windows Event Log file
rsigma engine eval -r rules/ -e @security.evtx

# With a pipeline and pretty output
rsigma engine eval -r rules/ -p sysmon.yml -e @Microsoft-Windows-Sysmon.evtx --pretty
```

**Event extraction (jq / JSONPath):**

`--jq` and `--jsonpath` are mutually exclusive. Both can return multiple values (e.g. `.records[]`, `$.records[*]`), and each returned value is evaluated as a separate event.

```bash
# Unwrap nested payloads with jq syntax
rsigma engine eval -r rules/ --jq '.event' -e '{"ts":"...","event":{"CommandLine":"whoami"}}'

# JSONPath (RFC 9535)
rsigma engine eval -r rules/ --jsonpath '$.event' -e '{"ts":"...","event":{"CommandLine":"whoami"}}'

# Array unwrapping: yields one event per element
rsigma engine eval -r rules/ --jq '.records[]' -e '{"records":[{"CommandLine":"whoami"},{"CommandLine":"id"}]}'

# Stream with extraction
hel run | rsigma engine eval -r rules/ -p ecs.yml --jq '.event'
```

**Detection output:**

```bash
# Include the full matched event JSON in detection output
rsigma engine eval -r rules/ --include-event -e '{"CommandLine": "whoami"}'
```

**Correlation options:**

```bash
# Suppress duplicate correlation alerts within a time window
rsigma engine eval -r rules/ --suppress 5m < events.ndjson

# Reset state after alert fires (default: alert)
rsigma engine eval -r rules/ --suppress 5m --action reset < events.ndjson

# Include full contributing events in correlation output (compressed in memory)
rsigma engine eval -r rules/ --correlation-event-mode full < events.ndjson

# Include lightweight event references (timestamp + ID) instead
rsigma engine eval -r rules/ --correlation-event-mode refs < events.ndjson

# Cap stored events per correlation window (default: 10)
rsigma engine eval -r rules/ --correlation-event-mode full --max-correlation-events 20 < events.ndjson

# Suppress detection output (only show correlation alerts)
rsigma engine eval -r rules/ --no-detections < events.ndjson

# Custom timestamp field for correlation windowing
rsigma engine eval -r rules/ --timestamp-field time < events.ndjson
```

### Custom rule attributes

Sigma rules can include `rsigma.*` custom attributes to override CLI defaults on a per-rule basis. These attributes are set in the rule YAML under `custom_attributes` (or via pipeline `SetCustomAttribute` transformations) and take precedence over engine-level settings.

| Attribute | Applies to | Description |
|-----------|-----------|-------------|
| `rsigma.include_event` | detection rules | `"true"` or `"false"`, include the matched event in detection output |
| `rsigma.suppress` | correlation rules | Suppression window (e.g. `"5m"`, `"1h"`), overrides `--suppress` |
| `rsigma.action` | correlation rules | `"alert"` or `"reset"`, overrides `--action` |
| `rsigma.correlation_event_mode` | correlation rules | `"none"`, `"full"`, or `"refs"`, overrides `--correlation-event-mode` |
| `rsigma.max_correlation_events` | correlation rules | Integer as string (e.g. `"25"`), overrides `--max-correlation-events` |

Example rule YAML:

```yaml
title: Brute Force Detection
logsource:
    product: okta
    service: system
correlation:
    type: event_count
    rules: failed_login
    group-by: actor.displayName
    timespan: 5m
    condition:
        gte: 10
custom_attributes:
    rsigma.suppress: "10m"
    rsigma.action: "reset"
    rsigma.correlation_event_mode: "refs"
    rsigma.max_correlation_events: "50"
level: high
```

### `backend convert`: Convert rules to backend-native queries

Convert Sigma rules into query strings for a specific backend (SQL, SPL, KQL, Lucene, etc.).

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `<rules>` | positional, repeatable | required | Path(s) to Sigma rule file(s) or directory |
| `--target` / `-t` | string | required | Backend target name (see `backend targets`) |
| `--pipeline` / `-p` | repeatable | `[]` | Processing pipeline YAML file(s) |
| `--format` / `-f` | string | `"default"` | Output format (see `backend formats`) |
| `-O` / `--option` | repeatable | `[]` | Backend options as `key=value` pairs (e.g. `-O table=logs -O schema=public`) |
| `--output` / `-o` | path | stdout | Write output to a file instead of stdout |
| `--skip-unsupported` / `-s` | flag | `false` | Skip unsupported rules instead of failing |
| `--without-pipeline` | flag | `false` | Skip pipeline requirement check |

Available backends: `test`, `postgres` (aliases: `postgresql`, `pg`).

```bash
# Convert rules using the test backend
rsigma backend convert rules/ -t test

# Convert with a pipeline and specific output format
rsigma backend convert rules/ -t test -p pipelines/ecs.yml -f state

# Convert a single rule
rsigma backend convert rule.yml -t test

# Convert to PostgreSQL SQL
rsigma backend convert rules/ -t postgres

# Convert to PostgreSQL with OCSF field mapping (single table)
rsigma backend convert rules/ -t postgres -p pipelines/ocsf_postgres.yml

# Convert with per-logsource table routing (multi-table)
rsigma backend convert rules/ -t postgres -p pipelines/ocsf_postgres_multi_table.yml

# Generate PostgreSQL views
rsigma backend convert rules/ -t postgres -f view

# Generate TimescaleDB continuous aggregates
rsigma backend convert rules/ -t postgres -f continuous_aggregate

# Custom backend options (table, schema, timestamp field, etc.)
rsigma backend convert rules/ -t postgres -O table=security_logs -O schema=public -O timestamp_field=created_at

# JSONB mode: access fields inside a JSONB column
rsigma backend convert rules/ -t postgres -O table=okta_events -O json_field=data -O timestamp_field=time

# Skip rules that the backend does not support
rsigma backend convert rules/ -t postgres --skip-unsupported

# Write output to a file
rsigma backend convert rules/ -t postgres -o queries.sql
```

### `backend targets`: List available conversion backends

List all registered conversion backend targets.

```bash
rsigma backend targets
# Output:
#   test      Backend-neutral text queries for testing
#   postgres  PostgreSQL/TimescaleDB SQL
```

### `backend formats`: List output formats for a backend

List the output formats supported by a specific backend.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `<target>` | positional | required | Backend target name |

```bash
rsigma backend formats postgres
# Output:
#   default              Plain PostgreSQL SQL
#   view                 CREATE OR REPLACE VIEW for each rule
#   timescaledb          TimescaleDB-optimized queries with time_bucket()
#   continuous_aggregate CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous)
#   sliding_window       Correlation queries using window functions for per-row sliding detection
```

### `rule fields`: List all fields referenced by Sigma rules

Extract and display every field name referenced across detection rules, correlation rules, filter rules, and rule metadata. Useful for building a field catalog, auditing pipeline coverage, or understanding which fields a ruleset depends on.

When pipelines are provided, fields are shown after pipeline transformations (field name mappings, prefixes, suffixes), so you can verify that your pipeline maps every field your rules need.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--rules` / `-r` | path | required | Path to a Sigma rule file or directory |
| `--pipeline` / `-p` | repeatable | `[]` | Processing pipeline YAML file(s). When provided, fields are shown after transformations |
| `--no-filters` | flag | `false` | Exclude fields contributed by filter rules |
| `--json` | flag | `false` | Output as JSON instead of a table |

**Field sources:** each field is annotated with where it was found:

| Source | Description |
|--------|-------------|
| `detection` | Field names from detection block items (`selection`, `filter`, etc.) |
| `correlation` | `group-by` fields, `condition.field`, and alias mapping values |
| `filter` | Fields from filter rule detection blocks |
| `metadata` | Fields listed in the rule's `fields:` metadata section |

```bash
# List all fields in a ruleset
rsigma rule fields -r rules/

# Show fields after ECS pipeline mapping
rsigma rule fields -r rules/ -p pipelines/ecs.yml

# Exclude filter-contributed fields
rsigma rule fields -r rules/ --no-filters

# JSON output for scripting
rsigma rule fields -r rules/ --json

# Pipe JSON to jq for further analysis
rsigma rule fields -r rules/ --json | jq '.fields[] | select(.sources[] == "detection") | .field'
```

**Table output** writes field data to stdout and a summary line to stderr, so you can pipe the table or redirect it without mixing in summary text.

**JSON output** includes a `summary` object (rule/correlation/filter counts, unique fields, pipelines applied), a `fields` array, and when pipelines are applied, a `pipeline_mappings` array showing each field name transformation.

### `pipeline resolve`: Test dynamic source resolution

Resolve all dynamic sources declared in the given pipeline(s) and/or external source files and print the resulting data as JSON. Useful for testing source configuration without running the daemon.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--pipeline` / `-p` | repeatable | required | Processing pipeline(s) containing dynamic sources |
| `--source` / `-s` | string | none | Resolve only a specific source by ID |
| `--source-file` | repeatable | `[]` | External source file(s) or directory of source files (same as daemon `--source`) |
| `--pretty` | flag | `false` | Pretty-print JSON output |
| `--dry-run` | flag | `false` | Show what would be resolved (source metadata) without performing resolution |

```bash
# Resolve all sources in a dynamic pipeline
rsigma pipeline resolve -p pipelines/dynamic.yml --pretty

# Resolve a specific source by ID
rsigma pipeline resolve -p pipelines/dynamic.yml --source threat_intel

# Resolve external source files alongside pipeline sources
rsigma pipeline resolve -p pipelines/dynamic.yml --source-file sources.yml --pretty

# Dry-run: list sources and metadata without fetching
rsigma pipeline resolve -p pipelines/dynamic.yml --dry-run

# Test multiple pipelines at once
rsigma pipeline resolve -p pipeline1.yml -p pipeline2.yml
```

**Output format (normal mode):**

```json
{
  "pipeline": "dynamic_example",
  "source_id": "field_map",
  "status": "ok",
  "data": { "CommandLine": "process.command_line", "User": "user.name" }
}
```

**Output format (dry-run mode):**

```json
{
  "pipeline": "dynamic_example",
  "source_id": "field_map",
  "source_type": "File",
  "required": true,
  "refresh": "Watch"
}
```

### `rule condition`: Parse a condition expression

Parse a Sigma condition expression and output the AST as pretty-printed JSON. Output is always pretty-printed.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `expr` | positional | required | The condition expression to parse |

```bash
rsigma rule condition 'selection and not filter'
```

### `rule stdin`: Parse YAML from stdin

Read a single Sigma YAML document from stdin and output the AST as JSON.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--pretty` / `-p` | flag | **true** | Pretty-print JSON output |

```bash
cat rule.yml | rsigma rule stdin
```

### `rule migrate-sources`: Extract pipeline sources into standalone files

Extract pipeline-embedded `sources:` blocks into standalone source files. Pipeline-embedded sources are deprecated; this tool automates the migration to the `--source` flag.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--pipeline` / `-p` | repeatable | required | Pipeline file or directory to migrate |
| `--output` / `-o` | path | required | Output file (single strategy) or directory (per-pipeline strategy) |
| `--strategy` | string | `"single"` | `single` consolidates all sources into one file; `per-pipeline` writes one file per pipeline |
| `--dry-run` | flag | `false` | Preview extracted sources on stdout without writing |

```bash
rsigma rule migrate-sources -p pipelines/ -o sources.yml                     # consolidate
rsigma rule migrate-sources -p pipelines/ -o sources.d/ --strategy per-pipeline  # one file per pipeline
rsigma rule migrate-sources -p pipeline.yml -o sources.yml --dry-run         # preview
```

## File Discovery

All subcommands that accept a directory path scan recursively for `.yml` and `.yaml` files only.

- **Rule loading:** Files are parsed individually; parse errors are accumulated (not fatal). Rules, correlations, and filters from all files are merged into a single collection.
- **Lint config discovery:** Walks ancestor directories from the target path upward, looking for `.rsigma-lint.yml` or `.rsigma-lint.yaml`. The `--config` flag overrides auto-discovery.

## Event Input Modes

| Mode | Input format | Behavior |
|------|-------------|----------|
| `rsigma engine eval -e '...'` | Inline JSON string | Parses the string as a single JSON object and evaluates it |
| `rsigma engine eval -e @path` | NDJSON file | Reads the file line-by-line as NDJSON (same behavior as stdin) |
| `rsigma engine eval -e @path.evtx` | EVTX binary file | Parses the binary Windows Event Log file and evaluates each record (requires `evtx` feature) |
| `rsigma engine eval` (no `--event`) | NDJSON from stdin | Each non-blank line is parsed as JSON. Blank lines are skipped. Exits after EOF |
| `rsigma engine daemon` | NDJSON from stdin | Continuous stdin reader; stays alive after EOF. Exposes HTTP APIs for management |
| `rsigma engine daemon --input http` | NDJSON via HTTP POST | Events sent to `POST /api/v1/events`. Stays alive, exposes all APIs |
| `rsigma engine daemon --input nats://...` | NATS JetStream | Subscribes to a JetStream subject. At-least-once delivery with deferred ack |
| OTLP (any `--input` mode) | OTLP protobuf/JSON via HTTP POST or gRPC | Agents send `ExportLogsServiceRequest` to `/v1/logs` (HTTP) or the gRPC `LogsService/Export` endpoint. Requires `daemon-otlp` feature |
| `rsigma rule stdin` | Single YAML document | Parses as Sigma YAML → outputs AST as JSON |

Event filters (`--jq`/`--jsonpath`) are applied to every event regardless of input mode.

## Output Format

### Detection match (JSON)

```json
{
  "rule_title": "Detect Whoami",
  "rule_id": "abc-123-...",
  "level": "medium",
  "tags": ["attack.execution"],
  "matched_selections": ["selection"],
  "matched_fields": [
    { "field": "CommandLine", "value": "cmd /c whoami" }
  ],
  "event": null
}
```

The `event` field is present only when `--include-event` is set.

### Correlation match (JSON)

```json
{
  "rule_title": "Brute Force",
  "rule_id": null,
  "level": "high",
  "tags": [],
  "correlation_type": "event_count",
  "group_key": [["User", "admin"]],
  "aggregated_value": 3.0,
  "timespan_secs": 300,
  "events": null,
  "event_refs": null
}
```

`events` is populated when `--correlation-event-mode full`; `event_refs` when `--correlation-event-mode refs`.

### Stderr messages

- `Loaded N rules from PATH` (detection-only) or `Loaded N detection rules + M correlation rules from PATH`
- `Loaded pipeline: NAME (priority N)` per pipeline
- `Event filter: jq 'EXPR'` or `Event filter: jsonpath 'EXPR'` when using `--jq`/`--jsonpath`
- `No matches.` when a single event yields no matches
- `Invalid JSON event: ...` on parse error (single event)
- `Invalid JSON on line N` for NDJSON parse errors (continues processing)
- `Processed N events, M matches.` (detection-only) or `Processed N events, M detection matches, K correlation matches.` (with correlations)

## Pipeline Loading

- Each `-p NAME_OR_PATH` loads one pipeline. The argument is first checked against builtin names; if no builtin matches, it is treated as a file path.
- **Builtin pipelines** (no external file needed):
  - `ecs_windows` -- maps Sigma/Sysmon field names to Elastic Common Schema (ECS) fields (e.g. `CommandLine` becomes `process.command_line`). Use with Winlogbeat/Elastic Agent output.
  - `sysmon` -- adds Sysmon `EventID` conditions for logsource routing. Use when evaluating against raw Sysmon JSON that includes `EventID`.
- Pipelines are sorted by `priority` (ascending); lower priority runs first.
- All pipelines are applied in sequence to each rule before compilation.
- In daemon mode, pipeline files are watched for changes and re-read on reload (alongside rules). Builtin pipelines are embedded at compile time and are not file-watched. If a pipeline file becomes invalid, the reload fails and the previous configuration stays active.
- `merge_pipelines` is not used by the CLI; each pipeline remains separate with its own state.

## Dynamic Pipelines

Dynamic pipelines extend static Sigma pipelines with external data sources. Any string, list, or mapping value in the pipeline YAML can contain `${source.<id>}` template references that are resolved at runtime.

### Source declaration

Declare sources in a standalone YAML file with a top-level `sources:` list and load them with the repeatable `--source` flag. The pipeline file references each source by `id`:

```yaml
# sources.yml
sources:
  - id: ip_blocklist
    type: http
    url: https://feeds.example.com/blocklist.json
    format: json
    extract: ".ips"
    refresh: 300s
    timeout: 10s
    on_error: use_cached
    required: true

  - id: field_config
    type: file
    path: /etc/rsigma/fields.json
    format: json
    refresh: watch

  - id: enrichment_rules
    type: command
    command: ["generate-transformations", "--format", "json"]
    format: json
    refresh: once
```

```yaml
# pipelines/dynamic_threat_intel.yml
name: dynamic_threat_intel
transformations:
  - id: map_fields
    type: field_name_mapping
    mapping: ${source.field_config}

  - id: block_known_bad
    type: add_condition
    conditions:
      - field: DestinationIp
        value: ${source.ip_blocklist}

  - include: ${source.enrichment_rules}
```

```bash
rsigma engine daemon -r rules/ -p pipelines/ --source sources.yml
rsigma engine daemon -r rules/ -p pipelines/ --source sources.d/   # loads all *.yml/*.yaml in directory
```

External source files decouple source configuration from pipeline logic, so pipelines stay reusable across environments. Source IDs must be unique across every `--source` file. The flag is repeatable, so multiple files can be combined (each with its own per-team or per-data-source ownership).

> **Deprecated.** Declaring `sources:` inline in a pipeline file is deprecated and will be removed in v1.0 (tracked in [#137](https://github.com/timescale/rsigma/issues/137)). The parser still accepts it but prints a `warning:` line on stderr at every load. Migrate with `rsigma rule migrate-sources -p <dir-or-file> -o sources.yml` and load the result via `--source sources.yml`.

### Source types

| Type | Description |
|------|-------------|
| `file` | Read from a local file. Supports `refresh: watch` for automatic reload on change |
| `http` | Fetch from an HTTP endpoint. Supports `method`, `headers`, `timeout` |
| `command` | Run a local command and capture stdout |
| `nats` | Subscribe to a NATS subject for push-based updates (requires `daemon-nats` feature) |

### Data formats

| Format | Description |
|--------|-------------|
| `json` | Parsed with serde_json |
| `yaml` | Parsed with yaml_serde |
| `lines` | One value per line (produces a JSON array of strings) |
| `csv` | Comma-separated values |

### Extraction languages

After parsing the source data, an optional `extract` expression selects a subset:

```yaml
# jq (default) -- plain string is always jq
extract: ".indicators[].ip"

# JSONPath -- structured syntax
extract:
  type: jsonpath
  expr: "$.indicators[*].ip"

# CEL (Common Expression Language) -- structured syntax
extract:
  type: cel
  expr: "data.indicators.filter(i, i.severity > 7)"
```

| Language | Syntax | Library |
|----------|--------|---------|
| jq | Plain string or `{ type: jq, expr: "..." }` | jaq |
| JSONPath | `{ type: jsonpath, expr: "..." }` | jsonpath-rust |
| CEL | `{ type: cel, expr: "..." }` | cel-rust |

### Refresh policies

| Policy | Behavior |
|--------|----------|
| `once` | Fetch at startup only |
| `<duration>` (e.g. `300s`, `5m`) | Re-fetch on a fixed interval |
| `watch` | Watch the file for changes (file sources only) |
| `push` | Updated on each incoming NATS message (NATS sources only) |
| `on_demand` | Fetch at startup, then only when triggered via API or SIGHUP |

### Error policies

| Policy | Behavior |
|--------|----------|
| `use_cached` | Serve the last successfully fetched value on failure |
| `fail` | For required sources: block startup. For optional sources: log and use null |
| `use_default` | Fall back to the `default` value declared in the source config |

### Include directives

The `include` transformation type injects an entire block of transformations from a resolved source:

```yaml
transformations:
  - include: ${source.dynamic_transforms}
```

The source must resolve to a JSON array of transformation objects. Nested includes are rejected (max depth 1). Remote sources (HTTP/NATS) require `--allow-remote-include` for security.

### Startup behavior

- **Required sources** (`required: true`, the default): the daemon blocks until resolution succeeds. If `on_error: fail`, it exits on failure.
- **Optional sources** (`required: false`): if resolution fails at startup, the daemon starts with a null fallback and retries in the background.

### Caching

Resolved values are cached in memory (and optionally SQLite). The cache supports TTL-based expiration. The `use_cached` error policy serves stale data from the cache when a fresh fetch fails. Cache entries can be invalidated per-source via `DELETE /api/v1/sources/cache/{source_id}`.

## Enrichers

Post-evaluation enrichers run after `engine.evaluate()` produces a `ProcessResult` and before each result is serialized to a sink. They inject contextual data (asset info, IP reputation, identity, GeoIP, runbook URLs, ...) into `enrichments.<field>` on each detection or correlation.

Enrichers are configured in a YAML file passed via `--enrichers <path>` to `engine daemon`:

```bash
rsigma engine daemon -r rules/ --enrichers /etc/rsigma/enrichers.yml
```

The config is hot-reloaded on `SIGHUP`, on file-watcher changes, and on `POST /api/v1/reload`. A reload that fails validation logs the error and keeps the previous pipeline active; the daemon never silently degrades to "no enrichment" because of a typo.

### Config schema

```yaml
# Bound on concurrent enrichment chains across all results in a batch.
# Defaults to 16 if omitted.
max_concurrent_enrichments: 16

enrichers:
  - id: <unique-string>            # required, used as a Prometheus label
    kind: detection | correlation  # required, see "Kind and template namespaces"
    type: template | lookup | http | command  # required, the primitive
    inject_field: <field-name>     # required, key under enrichments.<...>
    timeout: 5s                    # optional, humantime; default 5s
    on_error: skip | null | drop   # optional; default skip
    scope:                         # optional; see "Scope filtering"
      rules: [<rule-id-or-glob>, ...]
      tags:  [<tag-or-prefix.*>, ...]
      levels: [low, medium, high, critical, informational]
    # ... primitive-specific fields below ...
```

### Kind and template namespaces

Every enricher declares a `kind: detection | correlation`. The kind drives two checks:

1. **Config-load-time template validation.** A `kind: detection` enricher may only reference `${detection.*}` variables in its templated fields (`url`, `template`, `headers`, `body`, `command`, `env`, `extract`); a `kind: correlation` enricher may only reference `${correlation.*}`. Cross-namespace references are rejected at startup with a clear error pointing at the offending field. `${ENV_VAR}` is allowed in both namespaces.
2. **Runtime body matching.** The pipeline skips enrichers whose declared kind does not match the current `EvaluationResult` body variant before invoking `enrich()`, so a detection-kind enricher pays no cost on correlation results and vice versa.

Detection variables (`${detection.*}`):

| Variable | Resolves to |
|---|---|
| `${detection.rule.title}` / `.id` / `.level` | Rule metadata from `RuleHeader` |
| `${detection.tags}` | Comma-joined `tags` |
| `${detection.fields.<name>}` | The matched value of `<name>` from `matched_fields` |
| `${detection.event.<dotted.path>}` | JSON path into the original event (when `rsigma.include_event: "true"` on the rule) |

Correlation variables (`${correlation.*}`):

| Variable | Resolves to |
|---|---|
| `${correlation.rule.title}` / `.id` / `.level` | Rule metadata from `RuleHeader` |
| `${correlation.tags}` | Comma-joined `tags` |
| `${correlation.type}` | `event_count`, `temporal`, `value_sum`, ... |
| `${correlation.aggregated_value}` | The value that crossed the condition threshold |
| `${correlation.timespan_secs}` | Window size in seconds |
| `${correlation.group_key.<field>}` | Look up a group-by field by name |
| `${correlation.group_key}` | Joined `field=value,field=value` string |

For enrichers that conceptually apply to both kinds (identity lookups, runbook URLs, any tag-based enricher), declare two YAML entries. The plan below shows the YAML-anchor pattern for cutting duplication.

### Scope filtering

`scope` limits when an enricher fires within its declared `kind`:

- `scope.rules`: list of rule IDs (exact match) or rule-title globs (`Suspicious *`)
- `scope.tags`: tag-set intersection with prefix wildcards (`attack.*` matches `attack.t1059.001`)
- `scope.levels`: severity membership against `RuleHeader::level`
- No scope = fires for every result of the enricher's declared kind (use for cheap enrichers like `template`)

There is no `scope.kinds` axis: the top-level `kind` already gates which result variant the enricher sees. Axes are AND-ed; an empty axis is not a filter.

### `template`: pure string interpolation

Cheapest primitive. No I/O. Cannot fail past config-load-time template parse errors.

```yaml
- id: runbook_det
  kind: detection
  type: template
  inject_field: runbook_url
  template: "https://wiki.internal/runbooks/${detection.rule.id}"
```

### `lookup`: read from the dynamic-pipelines source cache

Reads a value from the dynamic-pipelines `SourceCache` by `source_id` and applies an `extract` expression (jq/JSONPath/CEL) with template-expanded variables to slice it. Zero-network-cost for anything already loaded as a dynamic source.

```yaml
- id: asset_context_corr
  kind: correlation
  type: lookup
  inject_field: asset_context
  source: asset_inventory          # id of a dynamic source configured on the daemon
  extract: '.assets[] | select(.hostname == "${correlation.group_key.HostName}")'
  extract_type: jq                 # jq | jsonpath | cel; defaults to jq
  default: "unknown"               # injected on cache miss / no extract match
  on_error: skip                   # applied only when default is not configured
```

The decision matrix:

- **Cache hit + extract matches** → inject the extracted value
- **Cache hit + no extract match** → if `default` is configured, inject it; otherwise apply `on_error`
- **Cache miss** → if `default` is configured, inject it; otherwise apply `on_error`
- **Extract evaluation error** (invalid jq, type mismatch) → always applies `on_error`, even with `default` set

`lookup` requires at least one dynamic source to be configured on the daemon via `--source <file>`. The loader surfaces a clear error at startup if a `lookup` enricher is configured without a source cache. (Pipeline-embedded `sources:` blocks also populate the cache but are deprecated; see [Source declaration](#source-declaration).)

### `http`: per-result HTTP fetch with optional response cache

Per-result `reqwest` request with template-expanded URL, headers, and optional body. Parses the response as JSON, optionally sliced by an `extract` expression. The optional response cache is keyed on `(method, url, body_hash)` with a configurable TTL; mandatory in practice for any rate-limited API.

```yaml
- id: hash_virustotal
  kind: detection
  type: http
  inject_field: file_reputation
  url: "https://www.virustotal.com/api/v3/files/${detection.fields.SHA256}"
  method: GET                      # default GET
  headers:
    x-apikey: "${VIRUSTOTAL_API_KEY}"
  cache_ttl: 1h                    # mandatory for the 4 req/min free tier
  extract: ".data.attributes.last_analysis_stats"
  extract_type: jq
  on_error: skip
  scope:
    tags: ["attack.execution", "attack.defense_evasion"]
```

Each enricher instance owns its own response cache: two enrichers hitting the same URL with different `Authorization` headers do not share entries.

### `command`: per-result local-process execution

Per-result `tokio::process::Command` invocation with template-expanded argv and environment. Stdout is captured (capped at 10 MB) and parsed as JSON or as a raw string. Non-zero exit codes map to `EnrichErrorKind::Fetch` with a snippet of stderr attached to the error message.

```yaml
- id: ip_reputation
  kind: detection
  type: command
  inject_field: ip_reputation
  command:
    - "/usr/local/bin/check-ip-rep"
    - "${detection.fields.SourceIp}"
  env:
    REP_LOCAL_DB: "/var/lib/iprep.db"
  output: json                     # json (default) | raw
  timeout: 3s
  on_error: skip
```

### YAML-anchor pattern for kind-agnostic enrichers

Some enrichers conceptually apply to both kinds (identity lookups, runbook URLs, any tag-based enricher). Declare two entries with the same `type` and `inject_field` but different `kind` and template namespaces; YAML anchors cut the duplication where the loader allows them, but the safest portable form is two explicit entries:

```yaml
enrichers:
  - id: runbook_det
    kind: detection
    type: template
    inject_field: runbook_url
    template: "https://wiki.internal/runbooks/${detection.rule.id}"

  - id: runbook_corr
    kind: correlation
    type: template
    inject_field: runbook_url
    template: "https://wiki.internal/runbooks/${correlation.rule.id}"
```

### Output shape

The pipeline writes into `RuleHeader::enrichments` lazily, so detections and correlations that no enricher touched still serialize without an empty `enrichments` object. A typical NDJSON line looks like:

```json
{"rule_title":"Suspicious PowerShell encoded command","rule_id":"rule-pwsh-enc","level":"high","tags":["attack.t1059.001"],"matched_selections":["selection"],"matched_fields":[{"field":"CommandLine","value":"powershell -enc ..."}],"enrichments":{"asset_info":{"hostname":"dc01","owner":"IT-Ops"},"runbook_url":"https://wiki.internal/runbooks/rule-pwsh-enc"}}
```

### Composing enrichers (recipe catalog)

Operators rarely need new Rust code. Every recipe below composes one of the four primitives against a dynamic-pipelines source or a remote API. Field names like `${detection.fields.SourceIp}` are illustrative; substitute the names your pipeline actually produces.

#### `enrich_ip_employee`: identity lookup by source IP

```yaml
# sources.yml -- loaded via `rsigma engine daemon --source sources.yml`
sources:
  - id: employee_directory
    type: file
    path: /etc/rsigma/employees.json
    format: json
    extract:
      expr: 'with_entries(.value |= {user: .user, team: .team})'
      type: jq
```

```yaml
# enrichers.yml -- loaded via `--enrichers enrichers.yml`
enrichers:
  - id: enrich_ip_employee
    kind: detection
    type: lookup
    inject_field: employee
    source: employee_directory
    extract: '."${detection.fields.SourceIp}"'
    extract_type: jq
    default: "unknown"
    scope:
      levels: [high, critical]
```

Expected `enrichments.employee` shape: `{"user": "alice", "team": "Platform"}` or `"unknown"` on miss.

#### `enrich_username_employee`: identity lookup by username

Same source as above, key by username instead:

```yaml
- id: enrich_username_employee
  kind: detection
  type: lookup
  inject_field: employee
  source: employee_directory_by_user
  extract: '."${detection.fields.User}"'
  extract_type: jq
  default: null
```

#### `enrich_ip_geoip`: country/city/ASN by IP

Prefer `lookup` if a GeoIP dump fits in memory; fall back to `http` for vendor APIs:

```yaml
# sources.yml
sources:
  - id: geoip_db
    type: file
    path: /var/lib/geoip/cidr-to-country.json
    format: json
    refresh: 24h
```

```yaml
# enrichers.yml
enrichers:
  - id: enrich_ip_geoip
    kind: detection
    type: lookup
    inject_field: geoip
    source: geoip_db
    extract: '.[] | select(.cidr | test("${detection.fields.SourceIp}"))'
    extract_type: jq
    default: { country: "unknown" }
```

#### `enrich_hash_virustotal`: hash reputation with cache

`cache_ttl` is mandatory for the 4 req/min free tier and a major win for duplicate-detection bursts on any tier:

```yaml
- id: enrich_hash_virustotal
  kind: detection
  type: http
  inject_field: file_reputation
  url: "https://www.virustotal.com/api/v3/files/${detection.fields.SHA256}"
  headers:
    x-apikey: "${VIRUSTOTAL_API_KEY}"
  cache_ttl: 1h
  extract: ".data.attributes.last_analysis_stats"
  extract_type: jq
  on_error: skip
  scope:
    tags: ["attack.execution"]
```

#### `enrich_cve_kev`: known-exploited-vulnerability flag

Pulls the CISA KEV catalog as a dynamic-pipelines source, then flags CVEs that appear in it:

```yaml
# sources.yml
sources:
  - id: kev_catalog
    type: http
    url: https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json
    format: json
    refresh: 1h
```

```yaml
# enrichers.yml
enrichers:
  - id: enrich_cve_kev
    kind: detection
    type: lookup
    inject_field: kev
    source: kev_catalog
    extract: '.vulnerabilities[] | select(.cveID == "${detection.fields.CveId}")'
    extract_type: jq
    default: null
```

#### `enrich_url_runbook`: synthesized runbook URL

Pure string interpolation, no I/O. Use this any time a downstream consumer (Slack, PagerDuty, RSoar) needs a per-detection link:

```yaml
- id: enrich_url_runbook
  kind: detection
  type: template
  inject_field: runbook_url
  template: "https://wiki.internal/runbooks/${detection.rule.id}"
```

#### When to pick which primitive

- Prefer `lookup` if the data is bounded and refreshes infrequently (employee directory, KEV catalog, GeoIP dump).
- Prefer `http` only when the data is genuinely per-result or too large to cache. Always set `cache_ttl` for rate-limited APIs.
- Prefer `command` only when no other primitive will do (a binary parser, a vendored CLI tool, anything that already exists as a script).
- Never use `template` for anything that could be a YAML literal.

### Bespoke (Rust-coded) enrichers

The four primitives cover almost every use case via composition. A bespoke Rust-coded enricher is justified only when at least one of these holds:

1. **It bundles non-trivial data** (e.g. a dataset committed to the repo and `include_bytes!`-ed at compile time). Recipes can't express vendored data.
2. **It needs a parser the YAML primitives don't expose** (e.g. MaxMind's binary GeoLite2 format, the STIX 2.1 graph with parent/child resolution). Adding the parser as a generic source might cost more than just shipping the enricher.
3. **It provides a stable named contract**: downstream consumers reference a specific `enrichments.<field>` shape directly. A recipe-driven approach lets every operator choose their own `inject_field`, which is fine for ad-hoc enrichment but bad for a contract that crosses team or organisational boundaries.
4. **It implements a non-obvious algorithm** (e.g. coalescing per-result hash lookups into one batched-GET request). This is implementable as a recipe but the implementation is fragile.

Otherwise, ship a recipe and only promote when one of the criteria above bites.

#### `register_builtin(name, factory)`

External crates register a bespoke enricher type via:

```rust
use rsigma_runtime::{Enricher, register_builtin};

register_builtin(
    "enrich_my_thing",
    std::sync::Arc::new(|raw_config: &serde_json::Value| -> Result<Box<dyn Enricher>, String> {
        let cfg: MyConfig = serde_json::from_value(raw_config.clone()).map_err(|e| e.to_string())?;
        Ok(Box::new(MyEnricher::new(cfg)))
    }),
).unwrap();
```

Reserved names (`template`, `lookup`, `http`, `command`) are rejected at registration time; duplicate registrations of the same name are rejected to keep the global registry append-only. Bespoke types follow the same `kind` / `scope` / template rules as the four primitives; promotion does not change the YAML shape, only the `type:` value.

### Metrics

The enrichment pipeline reports six Prometheus metrics:

| Metric | Labels | Description |
|---|---|---|
| `rsigma_enrichment_total` | `enricher_id`, `kind`, `status` | Per-call outcome counter; `status` is `success` / `skip` / `error` / `timeout` / `drop` |
| `rsigma_enrichment_duration_seconds` | `enricher_id`, `kind` | Per-enricher latency histogram |
| `rsigma_enrichment_queue_depth` | – | Pending enrichment calls (sum across both kinds) |
| `rsigma_enrichment_http_cache_hits_total` | `enricher_id` | HTTP enricher response-cache hits |
| `rsigma_enrichment_http_cache_misses_total` | `enricher_id` | HTTP enricher response-cache misses |
| `rsigma_enrichment_http_cache_expirations_total` | `enricher_id` | HTTP enricher response-cache entries evicted on expiry |

The `kind` label is carried even though `enricher_id` typically already encodes it (`asset_lookup_det` vs `asset_lookup_corr`), so dashboards can compute `sum by (kind)` without depending on a naming convention. Every label combination is pre-registered at startup, so all six families render at zero on the first `/metrics` scrape, before any event has fired. Filtered (kind- or scope-mismatched) calls do not increment any counters.

## Environment Variables

| Variable | Scope | Behavior |
|----------|-------|----------|
| `NO_COLOR` | `lint` only | When set, disables color output regardless of `--color` setting |
| `RUST_LOG` | `daemon` only | Log level filter (e.g. `info`, `debug`, `rsigma=debug`). Default: `info` |
| `NATS_CREDS` | `daemon` | NATS credentials file path (alternative to `--nats-creds`) |
| `NATS_TOKEN` | `daemon` | NATS authentication token (alternative to `--nats-token`) |
| `NATS_USER` | `daemon` | NATS username (alternative to `--nats-user`) |
| `NATS_PASSWORD` | `daemon` | NATS password (alternative to `--nats-password`) |
| `NATS_NKEY` | `daemon` | NATS NKey seed (alternative to `--nats-nkey`) |
| `RSIGMA_CONSUMER_GROUP` | `daemon` | Consumer group name (alternative to `--consumer-group`) |

## Feature Flags

| Flag | Default | Description |
|------|---------|-------------|
| `daemon` | **on** | Enables the `engine daemon` subcommand (tokio, axum, prometheus, notify, rusqlite) |
| `daemon-nats` | off | Enables NATS JetStream input/output, authentication, replay, and consumer groups (implies `daemon`) |
| `daemon-otlp` | off | Enables OTLP log ingestion via HTTP (protobuf/JSON) and gRPC on `/v1/logs` (implies `daemon`) |
| `logfmt` | off | Enables `logfmt` input format in `daemon` and `eval` |
| `cef` | off | Enables CEF (ArcSight) input format in `daemon` and `eval` |
| `evtx` | off | Enables EVTX (Windows Event Log) input format |
| `daachorse-index` | off | Enables the `--cross-rule-ac` flag and links in [daachorse](https://crates.io/crates/daachorse) for cross-rule Aho-Corasick pre-filtering of large substring-heavy rule sets |

```bash
# Build with all features
cargo build --release --features daemon-nats,daemon-otlp,logfmt,cef,evtx,daachorse-index

# Build without daemon (parser, eval, convert, lint only)
cargo build --release --no-default-features
```

## Exit Codes

| Code | Meaning |
|------|---------|
| `0` | Success. For `eval`: events processed (detections may or may not have fired, unless `--fail-on-detection` is set). For `lint`: no findings at the configured `--fail-level`. For `validate`: all rules parsed and compiled. |
| `1` | Findings. For `eval --fail-on-detection`: at least one detection or correlation fired. For `lint`: at least one finding at or above `--fail-level` severity. |
| `2` | Rule error. A Sigma rule could not be parsed, compiled, or converted. |
| `3` | Configuration error. A pipeline file could not be loaded, a CLI argument was invalid, or the tool was otherwise misconfigured. |

### CI/CD flags

Use `--fail-on-detection` with `eval` to fail a CI pipeline when a detection rule matches:

```bash
rsigma engine eval -r rules/ --fail-on-detection -e @test-events.ndjson
echo $?  # 0 = no detections, 1 = detections fired, 2 = rule error, 3 = config error
```

Use `--fail-level` with `lint` to control the minimum severity that triggers a non-zero exit:

```bash
rsigma rule lint rules/ --fail-level warning   # exit 1 on warnings or errors
rsigma rule lint rules/ --fail-level info      # exit 1 on any finding
rsigma rule lint rules/                        # default: exit 1 only on errors
```

## License

MIT License.

[rsigma workspace]: https://github.com/timescale/rsigma