rsigma-convert
rsigma-convert is a Sigma rule conversion engine that transforms parsed Sigma rules into backend-native query strings (SQL, SPL, KQL, Lucene, etc.).
This library is part of rsigma.
Overview
The crate provides a generic conversion framework that any backend can plug into:
Backendtrait with ~30 methods covering condition dispatch, detection item conversion, field/value escaping, regex, CIDR, comparison operators, field existence, field references, keywords, IN-list optimization, deferred expressions, and query finalization.TextQueryConfigwith ~90 configuration fields mirroring pySigma'sTextQueryBackendclass variables: precedence, boolean operators, wildcards, string/field quoting, match expressions (startswith/endswith/contains + case-sensitive variants), regex/CIDR templates, compare ops, IN-list optimization, unbound values, deferred parts, and query envelope.- Condition tree walker that recursively converts
ConditionExprnodes into query strings with selector/quantifier support. - Orchestrator via
convert_collection(), which applies pipelines, converts each rule, and collects results and errors. - Deferred expressions through the
DeferredExpressiontrait andDeferredTextExpressionfor backends that need post-query appendages (e.g. Splunk| regex,| where). - Test backend with
TextQueryTestBackendandMandatoryPipelineTestBackendfor backend-neutral foundation testing. - PostgreSQL/TimescaleDB backend with native
ILIKE, regex (~*), CIDR (inet/cidr), full-text search (tsvector/tsquery), JSONB field access, correlation via CTEs and window functions, and TimescaleDB-specific output formats (continuous aggregates,time_bucketqueries, view generation). - LynxDB backend generating SPL2-compatible
FROM <index> | search ...queries with glob wildcards, deferred| whereclauses for regex and CIDR,CASE()case-sensitive matching, and correct parenthesization for LynxDB's non-standard boolean precedence (NOT > OR > AND).
Backends
| Backend | Target names | Description |
|---|---|---|
| Test | test |
Backend-neutral text queries for foundation testing |
| PostgreSQL | postgres, postgresql, pg |
Native PostgreSQL SQL with TimescaleDB support |
| LynxDB | lynxdb |
SPL2-compatible search queries for LynxDB log analytics engine |
Usage
Test backend
use parse_sigma_yaml;
use ;
use TextQueryTestBackend;
let yaml = r#"
title: Detect Whoami
logsource:
category: process_creation
product: windows
detection:
selection:
CommandLine|contains: 'whoami'
condition: selection
level: medium
"#;
let collection = parse_sigma_yaml.unwrap;
let backend = new;
let output = convert_collection.unwrap;
for result in &output.queries
PostgreSQL backend
use parse_sigma_yaml;
use ;
use PostgresBackend;
let yaml = r#"
title: Detect Whoami
logsource:
category: process_creation
product: windows
detection:
selection:
CommandLine|contains: 'whoami'
condition: selection
level: medium
"#;
let collection = parse_sigma_yaml.unwrap;
let backend = new;
let output = convert_collection.unwrap;
for result in &output.queries
LynxDB backend
use parse_sigma_yaml;
use ;
use LynxDbBackend;
let yaml = r#"
title: Detect Whoami
logsource:
category: process_creation
product: windows
detection:
selection:
CommandLine|contains: 'whoami'
condition: selection
level: medium
"#;
let collection = parse_sigma_yaml.unwrap;
let backend = new;
let output = convert_collection.unwrap;
for result in &output.queries
LynxDB output formats
| Format | Description |
|---|---|
default |
Full query with index prefix: FROM main | search ... |
minimal |
Search expression only (no FROM prefix), useful for the LynxDB API q parameter |
LynxDB index selection
The target index defaults to main. Set it via pipeline state:
# In a pipeline YAML
transformations:
- type: set_state
key: index
value: security_logs
# Output: FROM security_logs | search ...
PostgreSQL output formats
| Format | Description |
|---|---|
default |
Plain SELECT * FROM {table} WHERE ... queries |
view |
CREATE OR REPLACE VIEW sigma_{id} AS SELECT ... |
timescaledb |
Queries with time_bucket() for TimescaleDB optimization |
continuous_aggregate |
CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous) |
sliding_window |
Correlation queries using window functions for per-row sliding detection |
SELECT column selection
When a Sigma rule specifies fields:, the backend emits SELECT field1, field2, ... instead of SELECT *. Function calls (e.g. count(*)) pass through unchanged, and field as alias is supported with both sides quoted independently.
CLI backend options
Backend configuration can be set via -O key=value flags on the CLI, which are wired through to PostgresBackend::from_options. Recognized keys: table, schema, database, timestamp_field, json_field, case_sensitive_re.
Custom table, schema, and database
The target table and schema can be set at three levels (highest precedence first):
- Rule-level
custom_attributes:postgres.table,postgres.schema,postgres.database - Pipeline state:
set_statewithkey: table,key: schema - CLI backend options:
-O table=...,-O schema=...,-O database=... - Backend defaults:
PostgresBackend.table,.schema,.database
Example rule with custom attributes:
title: Process Creation
logsource:
category: process_creation
detection:
selection:
CommandLine|contains: 'whoami'
condition: selection
custom_attributes:
postgres.table: process_events
postgres.schema: siem
OCSF pipelines
Two OCSF processing pipelines are included:
| Pipeline | Description |
|---|---|
pipelines/ocsf_postgres.yml |
Single-table: all events go to security_events |
pipelines/ocsf_postgres_multi_table.yml |
Per-logsource routing: each category gets its own table (process_events, network_events, etc.) |
# Single-table pipeline
# Multi-table pipeline (per-logsource routing)
# With output format
Multi-table temporal correlations
When a temporal correlation rule references detection rules that target different tables (via per-logsource pipeline routing or custom attributes), the backend automatically generates a UNION ALL CTE:
-- Rules targeting different tables produce UNION ALL
WITH matched AS (
SELECT *, 'process_rule' AS rule_name FROM process_events
WHERE time >= NOW - INTERVAL '300 seconds'
UNION ALL
SELECT *, 'network_rule' AS rule_name FROM network_events
WHERE time >= NOW - INTERVAL '300 seconds'
)
SELECT "User", COUNT(DISTINCT rule_name) AS distinct_rules,
MIN(time) AS first_seen, MAX(time) AS last_seen
FROM matched
GROUP BY "User"
HAVING COUNT(DISTINCT rule_name) >= 2
When all referenced rules share the same table, the simpler single-table approach is used instead.
Per-rule schemas are also tracked: if different detection rules set different schemas (via postgres.schema custom attribute or set_state key: schema in the pipeline), each leg of the UNION ALL uses the correct schema.table.
Important: The multi-table
UNION ALLusesSELECT *in each leg, so PostgreSQL requires all referenced tables to have the same column count and compatible column types. This works well when tables share a normalized event schema. If your tables have different column layouts, either normalize them through pipeline field-mappings or use a single-table approach with a discriminator column (e.g.rule_name) instead.
Reference schema
A reference TimescaleDB schema is provided at schema/timescaledb_security_events.sql with hypertable setup, indexes (B-tree, GIN for full-text and JSONB), compression, retention policies, and an example continuous aggregate.
Backend Trait
Backends implement the Backend trait to produce query strings from Sigma AST nodes. The trait operates on parsed types from rsigma-parser (not compiled matchers) because conversion needs the original field names, modifiers, and values.
Key methods:
| Method | Description |
|---|---|
convert_rule |
Convert a single SigmaRule into query strings |
convert_condition |
Walk a ConditionExpr tree |
convert_detection |
Convert a Detection (AllOf/AnyOf/Keywords) |
convert_detection_item |
Convert a single DetectionItem (field + modifiers + values) |
convert_field_eq_str |
String value matching with modifier dispatch |
convert_field_eq_re |
Regex matching |
convert_field_eq_cidr |
CIDR matching |
convert_field_compare |
Numeric comparison (gt, gte, lt, lte) |
convert_field_exists |
Field existence check |
convert_keyword |
Unbound/keyword value matching |
finish_query |
Assemble final query with deferred parts |
finalize_query |
Apply output format to a query |
finalize_output |
Finalize the complete output |
TextQueryConfig
For text-based query backends (the vast majority), create a TextQueryConfig with your backend's tokens and expressions, then delegate to the text_convert_* free functions:
| Function | Description |
|---|---|
text_escape_and_quote_field |
Escape and optionally quote a field name |
text_convert_value_str |
Convert a SigmaString with escaping and quoting |
text_convert_value_re |
Escape a regex pattern |
text_convert_condition_and |
Join expressions with AND token |
text_convert_condition_or |
Join expressions with OR token |
text_convert_condition_not |
Negate an expression |
text_convert_condition_group |
Precedence-aware grouping |
text_convert_field_eq_str |
String match dispatch (contains/startswith/endswith/wildcard/exact) |
text_finish_query |
Assemble query with deferred parts and state substitution |
Implementing a Backend
- Define a
TextQueryConfigconstant with your backend's tokens and expressions. - Create a struct that implements
Backend, delegating most methods to thetext_convert_*helpers. - Override specific methods for backend-specific behavior (e.g. deferred regex for Splunk, SQL-specific CIDR handling for PostgreSQL).
- Register your backend in the CLI's
get_backend()registry.
See backends/test.rs for a complete reference implementation, backends/postgres.rs for a production backend with SQL-specific overrides, and backends/lynxdb/ for a TextQueryConfig-based backend with deferred expressions and custom precedence handling.
PostgreSQL Backend Details
The PostgreSQL backend (PostgresBackend) leverages native PostgreSQL features that map cleanly to Sigma modifiers:
| Sigma Modifier | PostgreSQL SQL |
|---|---|
contains |
ILIKE (case-insensitive) |
startswith / endswith |
ILIKE |
cased |
LIKE (case-sensitive) |
re |
~* (case-insensitive regex) or ~ (with cased) |
cidr |
field::inet <<= 'value'::cidr |
exists |
IS NOT NULL / IS NULL |
| keywords | to_tsvector() @@ plainto_tsquery() |
Correlation rules are converted to SQL using GROUP BY / HAVING for aggregation types (event_count, value_count, value_sum, value_avg, value_percentile, value_median) and CTEs for temporal correlation. Multi-table temporal correlations automatically generate UNION ALL CTEs when referenced rules target different tables.
Non-temporal correlations support CTE-based pre-filtering: when the correlation references detection rules that were converted in the same collection, the backend wraps their queries in a WITH combined_events AS (q1 UNION ALL q2 ...) CTE so the aggregate only counts events matching the detection logic.
The sliding_window output format uses SQL window functions for event_count correlations, producing a per-row sliding window that emits every event crossing the threshold:
WITH combined_events AS (...),
event_counts AS (
SELECT *, COUNT(*) OVER (
PARTITION BY "User"
ORDER BY time
RANGE BETWEEN INTERVAL '300 seconds' PRECEDING AND CURRENT ROW
) AS correlation_event_count
FROM combined_events
)
SELECT * FROM event_counts WHERE correlation_event_count >= 5
Configuration
PostgresBackend fields:
| Field | Type | Default | Description |
|---|---|---|---|
table |
String |
"security_events" |
Default table name (overridden by pipeline state or postgres.table custom attribute) |
timestamp_field |
String |
"time" |
Timestamp column for time-windowed queries |
json_field |
Option<String> |
None |
If set, fields are accessed via JSONB extraction (see JSONB field access) |
case_sensitive_re |
bool |
false |
Use ~ instead of ~* for regex |
schema |
Option<String> |
None |
PostgreSQL schema name (overridden by pipeline state or postgres.schema custom attribute) |
database |
Option<String> |
None |
PostgreSQL database name (connection-level metadata) |
timescaledb |
bool |
false |
Enable TimescaleDB-specific features |
JSONB field access
When json_field is set (e.g. -O json_field=data), all Sigma field references are translated to PostgreSQL JSONB extraction operators instead of bare column names.
For top-level fields, the backend uses the ->> operator:
-- Sigma field: eventType
data->>'eventType'
For dotted field names (nested paths like securityContext.isProxy), the backend generates chained operators where intermediate segments use -> (returns jsonb) and the final segment uses ->> (returns text):
-- Sigma field: securityContext.isProxy
data->'securityContext'->>'isProxy'
-- Sigma field: actor.detail.alternateId
data->'actor'->'detail'->>'alternateId'
This matches the nested traversal behavior of the evaluation engine (rsigma-eval), which splits dotted field names on . and walks into nested JSON objects.
# Convert rules with JSONB field access against a "data" column
LynxDB Backend Details
The LynxDB backend (LynxDbBackend) generates SPL2/Lynx Flow queries for the LynxDB log analytics engine. It produces FROM <index> | search <predicates> queries with deferred | where clauses for operations that LynxDB's search syntax does not natively support.
| Sigma Modifier | LynxDB Query |
|---|---|
contains |
field=*"value"* |
startswith |
field="value"* |
endswith |
field=*"value" |
re |
| where field=~"pattern" (deferred) |
cidr |
| where cidrmatch("cidr", field) (deferred) |
cased (exact) |
field=CASE("value") |
wildcards (*) |
field="va*lue" (glob) |
wildcards (?) |
| where field=~"va.lue" (deferred, converted to regex) |
exists |
field=* |
null |
NOT field=* |
| keywords | "value" (unbound search) |
Boolean precedence
LynxDB's parser uses non-standard boolean operator precedence: NOT > OR > AND. This differs from most query languages where AND binds tighter than OR. The backend explicitly parenthesizes AND groups to preserve Sigma's intended logic:
Sigma: A AND B OR C (intended: (A AND B) OR C)
Query: (A AND B) OR C (explicit parens prevent misparse as A AND (B OR C))
Deferred expressions
Regex patterns, CIDR matches, and single-character wildcard (?) patterns cannot be expressed in LynxDB's search syntax and are instead emitted as | where pipeline stages appended after the search clause:
FROM main | search status=500 | where Path=~"/api/.*"
When a detection contains only deferred expressions, the search clause uses * (match all) followed by the deferred stages:
FROM main | search * | where SourceIP=~"^10\.0\." | where cidrmatch("192.168.1.0/24", DestIP)
License
MIT License.