rsigma-convert 0.10.0

Sigma rule conversion engine — convert rules to backend-native query strings
Documentation

rsigma-convert

CI

rsigma-convert is a Sigma rule conversion engine that transforms parsed Sigma rules into backend-native query strings (SQL, SPL, KQL, Lucene, etc.).

This library is part of rsigma.

Overview

The crate provides a generic conversion framework that any backend can plug into:

  • Backend trait with ~30 methods covering condition dispatch, detection item conversion, field/value escaping, regex, CIDR, comparison operators, field existence, field references, keywords, IN-list optimization, deferred expressions, and query finalization.
  • TextQueryConfig with ~90 configuration fields mirroring pySigma's TextQueryBackend class variables: precedence, boolean operators, wildcards, string/field quoting, match expressions (startswith/endswith/contains + case-sensitive variants), regex/CIDR templates, compare ops, IN-list optimization, unbound values, deferred parts, and query envelope.
  • Condition tree walker that recursively converts ConditionExpr nodes into query strings with selector/quantifier support.
  • Orchestrator via convert_collection(), which applies pipelines, converts each rule, and collects results and errors.
  • Deferred expressions through the DeferredExpression trait and DeferredTextExpression for backends that need post-query appendages (e.g. Splunk | regex, | where).
  • Test backend with TextQueryTestBackend and MandatoryPipelineTestBackend for backend-neutral foundation testing.
  • PostgreSQL/TimescaleDB backend with native ILIKE, regex (~*), CIDR (inet/cidr), full-text search (tsvector/tsquery), JSONB field access, correlation via CTEs and window functions, and TimescaleDB-specific output formats (continuous aggregates, time_bucket queries, view generation).
  • LynxDB backend generating SPL2-compatible FROM <index> | search ... queries with glob wildcards, deferred | where clauses for regex and CIDR, CASE() case-sensitive matching, and correct parenthesization for LynxDB's non-standard boolean precedence (NOT > OR > AND).

Backends

Backend Target names Description
Test test Backend-neutral text queries for foundation testing
PostgreSQL postgres, postgresql, pg Native PostgreSQL SQL with TimescaleDB support
LynxDB lynxdb SPL2-compatible search queries for LynxDB log analytics engine

Usage

Test backend

use rsigma_parser::parse_sigma_yaml;
use rsigma_convert::{convert_collection, Backend};
use rsigma_convert::backends::test::TextQueryTestBackend;

let yaml = r#"
title: Detect Whoami
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
level: medium
"#;

let collection = parse_sigma_yaml(yaml).unwrap();
let backend = TextQueryTestBackend::new();

let output = convert_collection(&backend, &collection, &[], "default").unwrap();
for result in &output.queries {
    for query in &result.queries {
        println!("{query}");
        // Output: CommandLine contains "whoami"
    }
}

PostgreSQL backend

use rsigma_parser::parse_sigma_yaml;
use rsigma_convert::{convert_collection, Backend};
use rsigma_convert::backends::postgres::PostgresBackend;

let yaml = r#"
title: Detect Whoami
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
level: medium
"#;

let collection = parse_sigma_yaml(yaml).unwrap();
let backend = PostgresBackend::new();

let output = convert_collection(&backend, &collection, &[], "default").unwrap();
for result in &output.queries {
    for query in &result.queries {
        println!("{query}");
        // Output: SELECT * FROM security_events WHERE "CommandLine" ILIKE '%whoami%'
    }
}

LynxDB backend

use rsigma_parser::parse_sigma_yaml;
use rsigma_convert::{convert_collection, Backend};
use rsigma_convert::backends::lynxdb::LynxDbBackend;

let yaml = r#"
title: Detect Whoami
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
level: medium
"#;

let collection = parse_sigma_yaml(yaml).unwrap();
let backend = LynxDbBackend::new();

let output = convert_collection(&backend, &collection, &[], "default").unwrap();
for result in &output.queries {
    for query in &result.queries {
        println!("{query}");
        // Output: FROM main | search CommandLine=*"whoami"*
    }
}

LynxDB output formats

Format Description
default Full query with index prefix: FROM main | search ...
minimal Search expression only (no FROM prefix), useful for the LynxDB API q parameter

LynxDB index selection

The target index defaults to main. Set it via pipeline state:

# In a pipeline YAML
transformations:
  - type: set_state
    key: index
    value: security_logs
rsigma convert -r rules/ -t lynxdb -p pipeline.yml
# Output: FROM security_logs | search ...

PostgreSQL output formats

Format Description
default Plain SELECT * FROM {table} WHERE ... queries
view CREATE OR REPLACE VIEW sigma_{id} AS SELECT ...
timescaledb Queries with time_bucket() for TimescaleDB optimization
continuous_aggregate CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous)
sliding_window Correlation queries using window functions for per-row sliding detection

SELECT column selection

When a Sigma rule specifies fields:, the backend emits SELECT field1, field2, ... instead of SELECT *. Function calls (e.g. count(*)) pass through unchanged, and field as alias is supported with both sides quoted independently.

CLI backend options

Backend configuration can be set via -O key=value flags on the CLI, which are wired through to PostgresBackend::from_options. Recognized keys: table, schema, database, timestamp_field, json_field, case_sensitive_re.

rsigma convert -r rules/ -t postgres -O table=security_logs -O schema=public -O timestamp_field=created_at

Custom table, schema, and database

The target table and schema can be set at three levels (highest precedence first):

  1. Rule-level custom_attributes: postgres.table, postgres.schema, postgres.database
  2. Pipeline state: set_state with key: table, key: schema
  3. CLI backend options: -O table=..., -O schema=..., -O database=...
  4. Backend defaults: PostgresBackend.table, .schema, .database

Example rule with custom attributes:

title: Process Creation
logsource:
    category: process_creation
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
custom_attributes:
    postgres.table: process_events
    postgres.schema: siem

OCSF pipelines

Two OCSF processing pipelines are included:

Pipeline Description
pipelines/ocsf_postgres.yml Single-table: all events go to security_events
pipelines/ocsf_postgres_multi_table.yml Per-logsource routing: each category gets its own table (process_events, network_events, etc.)
# Single-table pipeline
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml

# Multi-table pipeline (per-logsource routing)
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres_multi_table.yml

# With output format
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml -f view
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml -f continuous_aggregate

Multi-table temporal correlations

When a temporal correlation rule references detection rules that target different tables (via per-logsource pipeline routing or custom attributes), the backend automatically generates a UNION ALL CTE:

-- Rules targeting different tables produce UNION ALL
WITH matched AS (
    SELECT *, 'process_rule' AS rule_name FROM process_events
        WHERE time >= NOW() - INTERVAL '300 seconds'
    UNION ALL
    SELECT *, 'network_rule' AS rule_name FROM network_events
        WHERE time >= NOW() - INTERVAL '300 seconds'
)
SELECT "User", COUNT(DISTINCT rule_name) AS distinct_rules,
    MIN(time) AS first_seen, MAX(time) AS last_seen
FROM matched
GROUP BY "User"
HAVING COUNT(DISTINCT rule_name) >= 2

When all referenced rules share the same table, the simpler single-table approach is used instead.

Per-rule schemas are also tracked: if different detection rules set different schemas (via postgres.schema custom attribute or set_state key: schema in the pipeline), each leg of the UNION ALL uses the correct schema.table.

Important: The multi-table UNION ALL uses SELECT * in each leg, so PostgreSQL requires all referenced tables to have the same column count and compatible column types. This works well when tables share a normalized event schema. If your tables have different column layouts, either normalize them through pipeline field-mappings or use a single-table approach with a discriminator column (e.g. rule_name) instead.

Reference schema

A reference TimescaleDB schema is provided at schema/timescaledb_security_events.sql with hypertable setup, indexes (B-tree, GIN for full-text and JSONB), compression, retention policies, and an example continuous aggregate.

Backend Trait

Backends implement the Backend trait to produce query strings from Sigma AST nodes. The trait operates on parsed types from rsigma-parser (not compiled matchers) because conversion needs the original field names, modifiers, and values.

Key methods:

Method Description
convert_rule Convert a single SigmaRule into query strings
convert_condition Walk a ConditionExpr tree
convert_detection Convert a Detection (AllOf/AnyOf/Keywords)
convert_detection_item Convert a single DetectionItem (field + modifiers + values)
convert_field_eq_str String value matching with modifier dispatch
convert_field_eq_re Regex matching
convert_field_eq_cidr CIDR matching
convert_field_compare Numeric comparison (gt, gte, lt, lte)
convert_field_exists Field existence check
convert_keyword Unbound/keyword value matching
finish_query Assemble final query with deferred parts
finalize_query Apply output format to a query
finalize_output Finalize the complete output

TextQueryConfig

For text-based query backends (the vast majority), create a TextQueryConfig with your backend's tokens and expressions, then delegate to the text_convert_* free functions:

Function Description
text_escape_and_quote_field Escape and optionally quote a field name
text_convert_value_str Convert a SigmaString with escaping and quoting
text_convert_value_re Escape a regex pattern
text_convert_condition_and Join expressions with AND token
text_convert_condition_or Join expressions with OR token
text_convert_condition_not Negate an expression
text_convert_condition_group Precedence-aware grouping
text_convert_field_eq_str String match dispatch (contains/startswith/endswith/wildcard/exact)
text_finish_query Assemble query with deferred parts and state substitution

Implementing a Backend

  1. Define a TextQueryConfig constant with your backend's tokens and expressions.
  2. Create a struct that implements Backend, delegating most methods to the text_convert_* helpers.
  3. Override specific methods for backend-specific behavior (e.g. deferred regex for Splunk, SQL-specific CIDR handling for PostgreSQL).
  4. Register your backend in the CLI's get_backend() registry.

See backends/test.rs for a complete reference implementation, backends/postgres.rs for a production backend with SQL-specific overrides, and backends/lynxdb/ for a TextQueryConfig-based backend with deferred expressions and custom precedence handling.

PostgreSQL Backend Details

The PostgreSQL backend (PostgresBackend) leverages native PostgreSQL features that map cleanly to Sigma modifiers:

Sigma Modifier PostgreSQL SQL
contains ILIKE (case-insensitive)
startswith / endswith ILIKE
cased LIKE (case-sensitive)
re ~* (case-insensitive regex) or ~ (with cased)
cidr field::inet <<= 'value'::cidr
exists IS NOT NULL / IS NULL
keywords to_tsvector() @@ plainto_tsquery()

Correlation rules are converted to SQL using GROUP BY / HAVING for aggregation types (event_count, value_count, value_sum, value_avg, value_percentile, value_median) and CTEs for temporal correlation. Multi-table temporal correlations automatically generate UNION ALL CTEs when referenced rules target different tables.

Non-temporal correlations support CTE-based pre-filtering: when the correlation references detection rules that were converted in the same collection, the backend wraps their queries in a WITH combined_events AS (q1 UNION ALL q2 ...) CTE so the aggregate only counts events matching the detection logic.

The sliding_window output format uses SQL window functions for event_count correlations, producing a per-row sliding window that emits every event crossing the threshold:

WITH combined_events AS (...),
event_counts AS (
    SELECT *, COUNT(*) OVER (
        PARTITION BY "User"
        ORDER BY time
        RANGE BETWEEN INTERVAL '300 seconds' PRECEDING AND CURRENT ROW
    ) AS correlation_event_count
    FROM combined_events
)
SELECT * FROM event_counts WHERE correlation_event_count >= 5

Configuration

PostgresBackend fields:

Field Type Default Description
table String "security_events" Default table name (overridden by pipeline state or postgres.table custom attribute)
timestamp_field String "time" Timestamp column for time-windowed queries
json_field Option<String> None If set, fields are accessed via JSONB extraction (see JSONB field access)
case_sensitive_re bool false Use ~ instead of ~* for regex
schema Option<String> None PostgreSQL schema name (overridden by pipeline state or postgres.schema custom attribute)
database Option<String> None PostgreSQL database name (connection-level metadata)
timescaledb bool false Enable TimescaleDB-specific features

JSONB field access

When json_field is set (e.g. -O json_field=data), all Sigma field references are translated to PostgreSQL JSONB extraction operators instead of bare column names.

For top-level fields, the backend uses the ->> operator:

-- Sigma field: eventType
data->>'eventType'

For dotted field names (nested paths like securityContext.isProxy), the backend generates chained operators where intermediate segments use -> (returns jsonb) and the final segment uses ->> (returns text):

-- Sigma field: securityContext.isProxy
data->'securityContext'->>'isProxy'

-- Sigma field: actor.detail.alternateId
data->'actor'->'detail'->>'alternateId'

This matches the nested traversal behavior of the evaluation engine (rsigma-eval), which splits dotted field names on . and walks into nested JSON objects.

# Convert rules with JSONB field access against a "data" column
rsigma convert -r rules/ -t postgres -O table=okta_events -O json_field=data -O timestamp_field=time

LynxDB Backend Details

The LynxDB backend (LynxDbBackend) generates SPL2/Lynx Flow queries for the LynxDB log analytics engine. It produces FROM <index> | search <predicates> queries with deferred | where clauses for operations that LynxDB's search syntax does not natively support.

Sigma Modifier LynxDB Query
contains field=*"value"*
startswith field="value"*
endswith field=*"value"
re | where field=~"pattern" (deferred)
cidr | where cidrmatch("cidr", field) (deferred)
cased (exact) field=CASE("value")
wildcards (*) field="va*lue" (glob)
wildcards (?) | where field=~"va.lue" (deferred, converted to regex)
exists field=*
null NOT field=*
keywords "value" (unbound search)

Boolean precedence

LynxDB's parser uses non-standard boolean operator precedence: NOT > OR > AND. This differs from most query languages where AND binds tighter than OR. The backend explicitly parenthesizes AND groups to preserve Sigma's intended logic:

Sigma: A AND B OR C    (intended: (A AND B) OR C)
Query: (A AND B) OR C  (explicit parens prevent misparse as A AND (B OR C))

Deferred expressions

Regex patterns, CIDR matches, and single-character wildcard (?) patterns cannot be expressed in LynxDB's search syntax and are instead emitted as | where pipeline stages appended after the search clause:

FROM main | search status=500 | where Path=~"/api/.*"

When a detection contains only deferred expressions, the search clause uses * (match all) followed by the deferred stages:

FROM main | search * | where SourceIP=~"^10\.0\." | where cidrmatch("192.168.1.0/24", DestIP)

License

MIT License.