rsigma-convert

rsigma-convert is a Sigma rule conversion engine that transforms parsed Sigma rules into backend-native query strings (SQL, SPL, KQL, Lucene, etc.).

This library is part of rsigma.

Overview

The crate provides a generic conversion framework that any backend can plug into:

Backend trait with ~30 methods covering condition dispatch, detection item conversion, field/value escaping, regex, CIDR, comparison operators, field existence, field references, keywords, IN-list optimization, deferred expressions, and query finalization.
TextQueryConfig with ~90 configuration fields mirroring pySigma's TextQueryBackend class variables: precedence, boolean operators, wildcards, string/field quoting, match expressions (startswith/endswith/contains + case-sensitive variants), regex/CIDR templates, compare ops, IN-list optimization, unbound values, deferred parts, and query envelope.
Condition tree walker that recursively converts ConditionExpr nodes into query strings with selector/quantifier support.
Orchestrator via convert_collection(), which applies pipelines, converts each rule, and collects results and errors.
Deferred expressions through the DeferredExpression trait and DeferredTextExpression for backends that need post-query appendages (e.g. Splunk | regex, | where).
Test backend with TextQueryTestBackend and MandatoryPipelineTestBackend for backend-neutral foundation testing.
PostgreSQL/TimescaleDB backend with native ILIKE, regex (~*), CIDR (inet/cidr), full-text search (tsvector/tsquery), JSONB field access, correlation via CTEs and window functions, and TimescaleDB-specific output formats (continuous aggregates, time_bucket queries, view generation).
LynxDB backend generating SPL2-compatible FROM <index> | search ... queries with glob wildcards, deferred | where clauses for regex and CIDR, CASE() case-sensitive matching, and correct parenthesization for LynxDB's non-standard boolean precedence (NOT > OR > AND).

Backends

Backend	Target names	Description
Test	`test`	Backend-neutral text queries for foundation testing
PostgreSQL	`postgres`, `postgresql`, `pg`	Native PostgreSQL SQL with TimescaleDB support
LynxDB	`lynxdb`	SPL2-compatible search queries for LynxDB log analytics engine

Usage

Test backend

use rsigma_parser::parse_sigma_yaml;
use rsigma_convert::{convert_collection, Backend};
use rsigma_convert::backends::test::TextQueryTestBackend;

let yaml = r#"
title: Detect Whoami
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
level: medium
"#;

let collection = parse_sigma_yaml(yaml).unwrap();
let backend = TextQueryTestBackend::new();

let output = convert_collection(&backend, &collection, &[], "default").unwrap();
for result in &output.queries {
    for query in &result.queries {
        println!("{query}");
        // Output: CommandLine contains "whoami"
    }
}

PostgreSQL backend

use rsigma_parser::parse_sigma_yaml;
use rsigma_convert::{convert_collection, Backend};
use rsigma_convert::backends::postgres::PostgresBackend;

let yaml = r#"
title: Detect Whoami
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
level: medium
"#;

let collection = parse_sigma_yaml(yaml).unwrap();
let backend = PostgresBackend::new();

let output = convert_collection(&backend, &collection, &[], "default").unwrap();
for result in &output.queries {
    for query in &result.queries {
        println!("{query}");
        // Output: SELECT * FROM security_events WHERE "CommandLine" ILIKE '%whoami%'
    }
}

LynxDB backend

use rsigma_parser::parse_sigma_yaml;
use rsigma_convert::{convert_collection, Backend};
use rsigma_convert::backends::lynxdb::LynxDbBackend;

let yaml = r#"
title: Detect Whoami
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
level: medium
"#;

let collection = parse_sigma_yaml(yaml).unwrap();
let backend = LynxDbBackend::new();

let output = convert_collection(&backend, &collection, &[], "default").unwrap();
for result in &output.queries {
    for query in &result.queries {
        println!("{query}");
        // Output: FROM main | search CommandLine=*"whoami"*
    }
}

LynxDB output formats

Format	Description
`default`	Full query with index prefix: `FROM main \| search ...`
`minimal`	Search expression only (no `FROM` prefix), useful for the LynxDB API `q` parameter

LynxDB index selection

The target index defaults to main. Set it via pipeline state:

# In a pipeline YAML
transformations:
  - type: set_state
    key: index
    value: security_logs

rsigma convert -r rules/ -t lynxdb -p pipeline.yml
# Output: FROM security_logs | search ...

PostgreSQL output formats

Format	Description
`default`	Plain `SELECT * FROM {table} WHERE ...` queries
`view`	`CREATE OR REPLACE VIEW sigma_{id} AS SELECT ...`
`timescaledb`	Queries with `time_bucket()` for TimescaleDB optimization
`continuous_aggregate`	`CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous)`
`sliding_window`	Correlation queries using window functions for per-row sliding detection

SELECT column selection

When a Sigma rule specifies fields:, the backend emits SELECT field1, field2, ... instead of SELECT *. Function calls (e.g. count(*)) pass through unchanged, and field as alias is supported with both sides quoted independently.

CLI backend options

Backend configuration can be set via -O key=value flags on the CLI, which are wired through to PostgresBackend::from_options. Recognized keys: table, schema, database, timestamp_field, json_field, case_sensitive_re.

rsigma convert -r rules/ -t postgres -O table=security_logs -O schema=public -O timestamp_field=created_at

Custom table, schema, and database

The target table and schema can be set at three levels (highest precedence first):

Rule-level custom_attributes: postgres.table, postgres.schema, postgres.database
Pipeline state: set_state with key: table, key: schema
CLI backend options: -O table=..., -O schema=..., -O database=...
Backend defaults: PostgresBackend.table, .schema, .database

Example rule with custom attributes:

title: Process Creation
logsource:
    category: process_creation
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
custom_attributes:
    postgres.table: process_events
    postgres.schema: siem

OCSF pipelines

Two OCSF processing pipelines are included:

Pipeline	Description
`pipelines/ocsf_postgres.yml`	Single-table: all events go to `security_events`
`pipelines/ocsf_postgres_multi_table.yml`	Per-logsource routing: each category gets its own table (`process_events`, `network_events`, etc.)

# Single-table pipeline
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml

# Multi-table pipeline (per-logsource routing)
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres_multi_table.yml

# With output format
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml -f view
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml -f continuous_aggregate

Multi-table temporal correlations

When a temporal correlation rule references detection rules that target different tables (via per-logsource pipeline routing or custom attributes), the backend automatically generates a UNION ALL CTE:

-- Rules targeting different tables produce UNION ALL
WITH matched AS (
    SELECT *, 'process_rule' AS rule_name FROM process_events
        WHERE time >= NOW() - INTERVAL '300 seconds'
    UNION ALL
    SELECT *, 'network_rule' AS rule_name FROM network_events
        WHERE time >= NOW() - INTERVAL '300 seconds'
)
SELECT "User", COUNT(DISTINCT rule_name) AS distinct_rules,
    MIN(time) AS first_seen, MAX(time) AS last_seen
FROM matched
GROUP BY "User"
HAVING COUNT(DISTINCT rule_name) >= 2

When all referenced rules share the same table, the simpler single-table approach is used instead.

Per-rule schemas are also tracked: if different detection rules set different schemas (via postgres.schema custom attribute or set_state key: schema in the pipeline), each leg of the UNION ALL uses the correct schema.table.

Important: The multi-table UNION ALL uses SELECT * in each leg, so PostgreSQL requires all referenced tables to have the same column count and compatible column types. This works well when tables share a normalized event schema. If your tables have different column layouts, either normalize them through pipeline field-mappings or use a single-table approach with a discriminator column (e.g. rule_name) instead.

Reference schema

A reference TimescaleDB schema is provided at schema/timescaledb_security_events.sql with hypertable setup, indexes (B-tree, GIN for full-text and JSONB), compression, retention policies, and an example continuous aggregate.

Backend Trait

Backends implement the Backend trait to produce query strings from Sigma AST nodes. The trait operates on parsed types from rsigma-parser (not compiled matchers) because conversion needs the original field names, modifiers, and values.

Key methods:

Method	Description
`convert_rule`	Convert a single `SigmaRule` into query strings
`convert_condition`	Walk a `ConditionExpr` tree
`convert_detection`	Convert a `Detection` (AllOf/AnyOf/Keywords)
`convert_detection_item`	Convert a single `DetectionItem` (field + modifiers + values)
`convert_field_eq_str`	String value matching with modifier dispatch
`convert_field_eq_re`	Regex matching
`convert_field_eq_cidr`	CIDR matching
`convert_field_compare`	Numeric comparison (`gt`, `gte`, `lt`, `lte`)
`convert_field_exists`	Field existence check
`convert_keyword`	Unbound/keyword value matching
`finish_query`	Assemble final query with deferred parts
`finalize_query`	Apply output format to a query
`finalize_output`	Finalize the complete output

TextQueryConfig

For text-based query backends (the vast majority), create a TextQueryConfig with your backend's tokens and expressions, then delegate to the text_convert_* free functions:

Function	Description
`text_escape_and_quote_field`	Escape and optionally quote a field name
`text_convert_value_str`	Convert a `SigmaString` with escaping and quoting
`text_convert_value_re`	Escape a regex pattern
`text_convert_condition_and`	Join expressions with AND token
`text_convert_condition_or`	Join expressions with OR token
`text_convert_condition_not`	Negate an expression
`text_convert_condition_group`	Precedence-aware grouping
`text_convert_field_eq_str`	String match dispatch (contains/startswith/endswith/wildcard/exact)
`text_finish_query`	Assemble query with deferred parts and state substitution

Implementing a Backend

Define a TextQueryConfig constant with your backend's tokens and expressions.
Create a struct that implements Backend, delegating most methods to the text_convert_* helpers.
Override specific methods for backend-specific behavior (e.g. deferred regex for Splunk, SQL-specific CIDR handling for PostgreSQL).
Register your backend in the CLI's get_backend() registry.

See backends/test.rs for a complete reference implementation, backends/postgres.rs for a production backend with SQL-specific overrides, and backends/lynxdb/ for a TextQueryConfig-based backend with deferred expressions and custom precedence handling.

PostgreSQL Backend Details

The PostgreSQL backend (PostgresBackend) leverages native PostgreSQL features that map cleanly to Sigma modifiers:

Sigma Modifier	PostgreSQL SQL
`contains`	`ILIKE` (case-insensitive)
`startswith` / `endswith`	`ILIKE`
`cased`	`LIKE` (case-sensitive)
`re`	`~*` (case-insensitive regex) or `~` (with `cased`)
`cidr`	`field::inet <<= 'value'::cidr`
`exists`	`IS NOT NULL` / `IS NULL`
keywords	`to_tsvector() @@ plainto_tsquery()`

Correlation rules are converted to SQL using GROUP BY / HAVING for aggregation types (event_count, value_count, value_sum, value_avg, value_percentile, value_median) and CTEs for temporal correlation. Multi-table temporal correlations automatically generate UNION ALL CTEs when referenced rules target different tables.

Non-temporal correlations support CTE-based pre-filtering: when the correlation references detection rules that were converted in the same collection, the backend wraps their queries in a WITH combined_events AS (q1 UNION ALL q2 ...) CTE so the aggregate only counts events matching the detection logic.

The sliding_window output format uses SQL window functions for event_count correlations, producing a per-row sliding window that emits every event crossing the threshold:

WITH combined_events AS (...),
event_counts AS (
    SELECT *, COUNT(*) OVER (
        PARTITION BY "User"
        ORDER BY time
        RANGE BETWEEN INTERVAL '300 seconds' PRECEDING AND CURRENT ROW
    ) AS correlation_event_count
    FROM combined_events
)
SELECT * FROM event_counts WHERE correlation_event_count >= 5

Configuration

PostgresBackend fields:

Field	Type	Default	Description
`table`	`String`	`"security_events"`	Default table name (overridden by pipeline state or `postgres.table` custom attribute)
`timestamp_field`	`String`	`"time"`	Timestamp column for time-windowed queries
`json_field`	`Option<String>`	`None`	If set, fields are accessed via JSONB extraction (see JSONB field access)
`case_sensitive_re`	`bool`	`false`	Use `~` instead of `~*` for regex
`schema`	`Option<String>`	`None`	PostgreSQL schema name (overridden by pipeline state or `postgres.schema` custom attribute)
`database`	`Option<String>`	`None`	PostgreSQL database name (connection-level metadata)
`timescaledb`	`bool`	`false`	Enable TimescaleDB-specific features

JSONB field access

When json_field is set (e.g. -O json_field=data), all Sigma field references are translated to PostgreSQL JSONB extraction operators instead of bare column names.

For top-level fields, the backend uses the ->> operator:

-- Sigma field: eventType
data->>'eventType'

For dotted field names (nested paths like securityContext.isProxy), the backend generates chained operators where intermediate segments use -> (returns jsonb) and the final segment uses ->> (returns text):

-- Sigma field: securityContext.isProxy
data->'securityContext'->>'isProxy'

-- Sigma field: actor.detail.alternateId
data->'actor'->'detail'->>'alternateId'

This matches the nested traversal behavior of the evaluation engine (rsigma-eval), which splits dotted field names on . and walks into nested JSON objects.

# Convert rules with JSONB field access against a "data" column
rsigma convert -r rules/ -t postgres -O table=okta_events -O json_field=data -O timestamp_field=time

LynxDB Backend Details

The LynxDB backend (LynxDbBackend) generates SPL2/Lynx Flow queries for the LynxDB log analytics engine. It produces FROM <index> | search <predicates> queries with deferred | where clauses for operations that LynxDB's search syntax does not natively support.

Sigma Modifier	LynxDB Query
`contains`	`field="value"`
`startswith`	`field="value"*`
`endswith`	`field=*"value"`
`re`	`\| where field=~"pattern"` (deferred)
`cidr`	`\| where cidrmatch("cidr", field)` (deferred)
`cased` (exact)	`field=CASE("value")`
wildcards (`*`)	`field="va*lue"` (glob)
wildcards (`?`)	`\| where field=~"va.lue"` (deferred, converted to regex)
`exists`	`field=*`
`null`	`NOT field=*`
keywords	`"value"` (unbound search)

Boolean precedence

LynxDB's parser uses non-standard boolean operator precedence: NOT > OR > AND. This differs from most query languages where AND binds tighter than OR. The backend explicitly parenthesizes AND groups to preserve Sigma's intended logic:

Sigma: A AND B OR C    (intended: (A AND B) OR C)
Query: (A AND B) OR C  (explicit parens prevent misparse as A AND (B OR C))

Deferred expressions

Regex patterns, CIDR matches, and single-character wildcard (?) patterns cannot be expressed in LynxDB's search syntax and are instead emitted as | where pipeline stages appended after the search clause:

FROM main | search status=500 | where Path=~"/api/.*"

When a detection contains only deferred expressions, the search clause uses * (match all) followed by the deferred stages:

FROM main | search * | where SourceIP=~"^10\.0\." | where cidrmatch("192.168.1.0/24", DestIP)

License

MIT License.

rsigma-convert 0.10.0