rsigma-eval 0.5.0

Evaluator for Sigma detection and correlation rules — match rules against events
Documentation

rsigma-eval

CI

rsigma-eval is an evaluator for Sigma detection rules. It compiles Sigma rules into optimized in-memory matchers and evaluates them against JSON events. Rules are compiled once; evaluation is zero-allocation on the hot path.

This library is part of rsigma.

Public API

Engine

Method Description
Engine::new() Create an empty engine
Engine::new_with_pipeline(pipeline) Create engine with an initial pipeline
set_include_event(include: bool) Global override: include full event JSON in all match results
add_pipeline(pipeline) Add a pipeline (sorted by priority after add)
add_rule(rule: &SigmaRule) Apply pipelines and compile a rule
add_collection(collection: &SigmaCollection) Add all rules, then apply all filters
add_collection_with_pipelines(collection, pipelines) Temporarily replace pipelines, add collection, restore
add_compiled_rule(rule: CompiledRule) Add a pre-compiled rule directly
apply_filter(filter: &FilterRule) Inject filter as AND NOT into referenced rules
evaluate(event: &Event) Evaluate all rules against an event
evaluate_with_logsource(event, logsource) Evaluate with logsource-based pre-filtering
rule_count() Number of loaded rules
rules() Access the compiled rules slice

Correlation Engine

Method Description
CorrelationEngine::new(config) Create with a CorrelationConfig
set_include_event(include: bool) Global override for event inclusion
add_collection(collection) Add rules and correlations
add_rule(rule: &SigmaRule) Add a single detection rule
add_correlation(corr: &CorrelationRule) Add a single correlation rule
process_event(event: &Event) Evaluate + update correlation state (wall-clock time)
process_event_at(event, timestamp_secs) Evaluate + update state with explicit timestamp
evict_expired(now) Manually evict expired state entries
state_count() Number of active correlation state entries
event_buffer_count() Total events stored across all buffers
event_buffer_bytes() Total bytes of compressed event data

Compilation

Function Description
compile_rule(rule: &SigmaRule) Compile a parsed rule into a CompiledRule
compile_detection(detection: &Detection) Compile a detection tree
evaluate_rule(rule: &CompiledRule, event: &Event) Evaluate one compiled rule
eval_condition(expr, detections, event, matched) Evaluate a condition expression tree

Pipeline

Function Description
parse_pipeline(yaml: &str) Parse a pipeline from a YAML string
parse_pipeline_file(path: &Path) Parse a pipeline from a YAML file
apply_pipelines(pipelines, rule) Apply all pipelines to a rule in priority order
merge_pipelines(pipelines) Merge multiple pipelines into one (sorted by priority)

Detection Engine

  • Compiled matchers: optimized matching for all 30 modifier combinations — exact, contains, startswith, endswith, regex, CIDR, numeric comparison, base64 offset (3 alignment variants), windash expansion (5 replacement characters), field references, placeholder expansion, timestamp part extraction
  • Logsource routing: optional pre-filtering by category/product/service to reduce the number of rules evaluated per event
  • Condition tree evaluation: short-circuit boolean logic, selector patterns with quantifiers (1 of selection_*, all of them)
  • Filter application: runtime injection of filter rules as AND NOT conditions on referenced rules

Compilation Pipeline

  1. Rule compilation (compile_rule): For each named detection, call compile_detection. Reads rsigma.include_event from custom_attributes.
  2. Detection compilation (compile_detection):
    • AllOf → compile each item, reject empty.
    • AnyOf → recursively compile each sub-detection, reject empty.
    • Keywords → compile each value as case-insensitive contains, combine with AnyOf.
  3. Value compilation (compile_value): Handles modifiers in this order: |expand → timestamp part → |fieldref|re|cidr → numeric comparison → |neq → string modifiers. String modifiers: |wide/|utf16le|utf16be|utf16|base64|base64offset|windash → string match.

Compiled Matcher Types

Matcher Modifier Notes
Exact (default) Case-insensitive by default; |cased makes it sensitive
Contains |contains Substring match
StartsWith |startswith Prefix match
EndsWith |endswith Suffix match
Regex |re |i adds (?i), |m adds multiline, |s adds dotall
Cidr |cidr IP network matching via IpNet
NumericEq/Gt/Gte/Lt/Lte |gt, |gte, etc. f64 comparison
Exists |exists Accepts true/yes/false/no as values
FieldRef |fieldref Compares against another field's value
Null Matches null or missing values
BoolEq Boolean equality
Expand |expand Placeholder template expansion
TimestampPart |minute, |hour, |day, |week, |month, |year Extract timestamp component, match inner value
Not |neq Wraps inner matcher with negation
AnyOf / AllOf Multiple values combined (OR / AND with |all)

Value Coercion

  • Arrays: string matchers use OR semantics (any element matches).
  • Numbers: coerced to string for string matchers.
  • Booleans: "true", "1", "yes" → true; "false", "0", "no" → false.

Filter Rule Behavior

  • Filters match by rule.id or rule.title (from filter.rules).
  • If the filter has a logsource, the rule must be compatible (symmetric check).
  • Empty filter.rules applies the filter to all rules.
  • Filter detections are added as __filter_{counter}_{name} (counter prevents key collisions when multiple filters share detection names); the condition is wrapped as original AND NOT filter.

Selector Pattern Matching

  • * — matches any detection name.
  • selection_* — prefix match.
  • *_filter — suffix match.
  • exact — exact match.
  • them — matches all names except those starting with _.

Event Model

The Event wrapper provides flexible field access over serde_json::Value:

  • Flat-key precedence: "actor.user.name" as a literal top-level key takes priority over nested traversal.
  • Dot-notation: if no flat key matches and the path contains ., split and traverse nested objects.
  • Array traversal: arrays are searched with OR semantics (first matching element wins).
  • Keyword detection: matches_keyword searches all string values across all fields recursively.
  • Max nesting depth: recursive traversal stops at depth 64 (MAX_NESTING_DEPTH).

Correlation Engine

Stateful processing with sliding time windows, group-by aggregation, and all 8 correlation types.

CorrelationConfig

Field Type Default Description
timestamp_fields Vec<String> ["@timestamp", "timestamp", "EventTime", "TimeCreated", "eventTime"] Field names to try for timestamp extraction, in priority order
timestamp_fallback TimestampFallback WallClock WallClock (use Utc::now()) or Skip (skip event from correlation)
max_state_entries usize 100,000 Hard cap across all correlations and group keys
suppress Option<u64> None Default suppression window in seconds
action_on_match CorrelationAction Alert Alert (keep state) or Reset (clear window state)
emit_detections bool true Whether to emit detection-level matches for correlation-only rules
correlation_event_mode CorrelationEventMode None None, Full (deflate-compressed), or Refs (timestamp + ID)
max_correlation_events usize 10 Max events stored per (correlation, group_key) window

Core Features

  • Group-by partitioning: composite keys with field aliasing across referenced rules
  • Correlation chaining: correlation results propagate to higher-level correlations (max depth: 10, MAX_CHAIN_DEPTH)
  • Extended temporal conditions: boolean expressions over rule references (e.g. rule_a and rule_b and not rule_c)
  • Cycle detection: DFS-based validation of the correlation reference graph at load time

Alert Management

  • Suppression: per-correlation or global suppression windows to prevent alert floods. After a (correlation, group_key) fires, suppress re-alerts for the configured duration
  • Action-on-fire: Alert (keep state, re-fire on next match) or Reset (clear window state, require fresh threshold)
  • Generate flag: Sigma-standard generate support — suppress detection output for correlation-only rules

Event Inclusion

  • Full mode: contributing events stored as individually deflate-compressed blobs (compression level 1, 3-5x memory savings on typical JSON)
  • Refs mode: lightweight references (timestamp + optional event ID) at ~40 bytes per event
  • Event ID extraction (Refs mode): tries fields in order: id, _id, event_id, EventRecordID, event.id
  • Configurable cap: max_correlation_events bounds memory per window
  • Zero cost when disabled: buffers are not allocated unless mode is Full or Refs
  • Per-correlation override: set rsigma.correlation_event_mode via custom_attributes in YAML

Memory Management

  • Max state entries: configurable hard cap (default: 100,000) across all correlations and group keys
  • Time-based eviction: entries outside their correlation window are evicted automatically
  • Hard-cap eviction: when over the limit, entries are evicted until 90% of the cap is reached (the stalest 10% are dropped in bulk to avoid evicting on every event)
  • Stale alert cleanup: expired suppression entries are garbage-collected

Timestamp Extraction

  • Field priority list: configurable ordered list of fields to try (default: @timestamp, timestamp, EventTime, TimeCreated, eventTime)
  • Format support: RFC 3339, %Y-%m-%dT%H:%M:%S, %Y-%m-%dT%H:%M:%S%.f, %Y-%m-%d %H:%M:%S, epoch seconds, epoch milliseconds (auto-detected if value > 10^12)
  • Fallback policy: WallClock (use Utc::now(), good for real-time streaming) or Skip (skip event from correlation, recommended for batch/replay)

Value Percentile

value_percentile uses linear interpolation (C=1 method). The condition threshold represents the percentile rank (0-100), clamped to that range.

Output Types

MatchResult

Field Type Description
rule_title String Rule title
rule_id Option<String> Rule UUID
level Option<Level> Severity level
tags Vec<String> Tags
matched_selections Vec<String> Detection names that matched
matched_fields Vec<FieldMatch> Field/value pairs that contributed to the match
event Option<Value> Full event JSON when include_event is enabled

FieldMatch

Field Type
field String
value serde_json::Value

CorrelationResult

Field Type Description
rule_title String Correlation rule title
rule_id Option<String> Rule UUID
level Option<Level> Severity level
tags Vec<String> Tags
correlation_type CorrelationType e.g. event_count, temporal
group_key Vec<(String, String)> Group-by field/value pairs
aggregated_value f64 Computed aggregate (count, sum, avg, percentile, median)
timespan_secs u64 Correlation window duration
events Option<Vec<Value>> Contributing events (Full mode)
event_refs Option<Vec<EventRef>> Event references (Refs mode)

EventRef

Field Type
timestamp i64
id Option<String>

Processing Pipelines

pySigma-compatible pipeline system for field mapping, logsource transformation, and backend-specific configuration. Supports multi-pipeline chaining with priority ordering.

Pipeline Chaining

  • Priority: Pipeline.priority (default 0); lower runs first.
  • Sorting: pipelines are sorted by priority on add.
  • State isolation: each pipeline gets its own PipelineState; state is not shared across pipelines.

Transformation Item Fields

Each transformation item in a pipeline can have:

Field Description
id Identifier for processing_item_applied conditions
rule_conditions All must match (AND logic) for the transformation to apply
rule_cond_expression Logical expression over rule condition IDs (alternative to rule_conditions)
detection_item_conditions Conditions on individual detection items
field_name_conditions Conditions on field names
field_name_cond_not Negate field name conditions

Transformations (26 types)

Type Fields Description
field_name_mapping mapping: {k: v} Rename fields via a mapping dict
field_name_prefix_mapping mapping: {prefix: replacement} Rename fields matching a prefix
field_name_prefix prefix Add a prefix to all field names
field_name_suffix suffix Add a suffix to all field names
field_name_transform transform_func, mapping Case transformation (see below)
drop_detection_item Remove matching detection items
add_condition conditions: {k: v}, negated (default: false) Inject additional detection conditions
change_logsource category, product, service Modify logsource fields
replace_string regex, replacement, skip_special (default: false) Regex-based string replacement (skip_special preserves wildcards)
map_string mapping: {k: v | [v1, v2]} Map string values to replacements (supports one-to-many)
set_value value Replace detection item values
convert_type target_type (str/int/float/bool, default: str) Convert values between types
value_placeholders Expand %placeholder% in values
wildcard_placeholders Expand placeholders to wildcards
query_expression_placeholders expression (default: "") Backend query placeholders (no-op in eval)
set_state key, value Store key-value pairs in pipeline state
rule_failure message (default: "rule failure") Raise an error for matching rules
detection_item_failure message (default: "detection item failure") Raise an error for matching detection items
hashes_fields valid_hash_algos, field_prefix (default: "File"), drop_algo_prefix (default: false) Transform hash field names
add_field field Add a new detection item with a fixed value
remove_field field Remove a field from detection items
set_field fields: [...] Rename the field of a detection item
set_custom_attribute attribute, value Set key-value attributes on rules
case_transformation case_type / case (lower/upper/snake_case) Transform case of field values
nest items or transformations Apply a group of transformations conditionally
regex Regex transformation (no-op in eval)

Aliases: case is accepted as an alias for case_transformation.

field_name_transform Functions

Value Behavior
lower / lowercase to_lowercase
upper / uppercase to_uppercase
title Capitalize each word, join with _ (e.g. hello_worldHello_World)
snake_case camelCase → snake_case

Conditions (3 levels)

Rule Conditions

Type Fields
logsource category, product, service
contains_detection_item field, value (optional)
processing_item_applied processing_item_id
processing_state key, val
is_sigma_rule
is_sigma_correlation_rule
rule_attribute attribute (level/status/author/title/id/date/description), value
tag tag

Detection Item Conditions

Type Fields
match_string pattern (default: ".*"), negate (default: false)
is_null negate
processing_item_applied processing_item_id
processing_state key, val

Field Name Conditions

Type Fields
include_fields fields, match_type (plain or regex, default: plain)
exclude_fields fields, match_type
processing_item_applied processing_item_id
processing_state key, val

Finalizers (3 types)

Type Fields Defaults
concat separator, prefix, suffix " ", "", ""
json indent
template template ""

Finalizers are stored in the pipeline but not executed in eval mode.

Custom Attributes (rsigma.*)

Pipeline transformations can configure engine behavior via SetCustomAttribute, following the same pattern as pySigma backends (e.g. pySigma-backend-loki):

Attribute Effect CLI equivalent Scope
rsigma.timestamp_field Prepends a field name to the timestamp extraction priority list --timestamp-field Engine
rsigma.suppress Sets the suppression window (e.g. 5m) --suppress Engine + per-correlation
rsigma.action Sets the post-fire action (alert or reset) --action Engine + per-correlation
rsigma.include_event Embeds the full event JSON in detection output --include-event Per-rule
rsigma.correlation_event_mode Sets event inclusion mode (full or refs) --correlation-event-mode Per-correlation
rsigma.max_correlation_events Caps stored events per correlation window (integer) --max-correlation-events Per-correlation

CLI flags and the library API always take precedence over pipeline attributes. Engine-level attributes (timestamp_field, suppress, action) are only applied when the CLI did not already set the corresponding flag. Per-correlation attributes override engine defaults for individual correlation rules.

# Example pipeline with custom attributes
transformations:
  - type: set_custom_attribute
    attribute: rsigma.timestamp_field
    value: time
  - type: set_custom_attribute
    attribute: rsigma.suppress
    value: 5m

Constants and Limits

Constant Value Purpose
MAX_NESTING_DEPTH 64 Recursive JSON traversal depth for keyword search
MAX_WINDASH_DASHES 8 Maximum dash characters expanded by windash (5^8 variants)
WINDASH_CHARS 5 -, /, (en-dash), (em-dash), (horizontal bar)
MAX_CHAIN_DEPTH 10 Maximum correlation chaining depth
max_state_entries 100,000 Default hard cap for correlation state
Eviction target 90% Hard-cap eviction drops the stalest 10%
max_correlation_events 10 Default per-window event cap
Epoch threshold 10^12 Numeric timestamps above this are treated as milliseconds

Error Types

Error When
InvalidRegex Regex compilation failure
InvalidCidr CIDR parse failure
Base64 Base64 encoding error
UnknownDetection Condition references missing detection (caught at compile time)
InvalidModifiers Invalid modifier combo, empty AllOf/AnyOf, windash overflow, pipeline failure
IncompatibleValue Wrong type for modifier (e.g. null for string)
ExpectedNumeric Numeric modifier with non-numeric value
Parser Parser error (from rsigma-parser)
CorrelationError Correlation compile/runtime error
UnknownRuleRef Correlation references unknown rule (caught at add_collection time)
CorrelationCycle Cycle in correlation references

Usage

Detection only:

use rsigma_parser::parse_sigma_yaml;
use rsigma_eval::{Engine, Event, parse_pipeline};
use serde_json::json;

let yaml = r#"
title: Detect Whoami
logsource:
    product: windows
    category: process_creation
detection:
    selection:
        CommandLine|contains: 'whoami'
    condition: selection
level: medium
"#;

let collection = parse_sigma_yaml(yaml).unwrap();

let pipeline = parse_pipeline(r#"
name: ECS Mapping
transformations:
  - type: field_name_mapping
    mapping:
      CommandLine: process.command_line
    rule_conditions:
      - type: logsource
        product: windows
"#).unwrap();

let mut engine = Engine::new_with_pipeline(pipeline);
engine.add_collection(&collection).unwrap();

// Rule now expects ECS field names
let event = Event::from_value(&json!({"process.command_line": "whoami"}));
let matches = engine.evaluate(&event);

With correlations:

use rsigma_eval::{CorrelationEngine, CorrelationConfig, CorrelationAction, CorrelationEventMode};

let config = CorrelationConfig {
    suppress: Some(300),                         // 5-minute suppression window
    action_on_match: CorrelationAction::Reset,   // clear state after firing
    emit_detections: false,                      // only emit correlation alerts
    correlation_event_mode: CorrelationEventMode::Full, // include full events
    max_correlation_events: 20,                        // keep last 20 events per window
    ..Default::default()
};

let mut engine = CorrelationEngine::new(config);
engine.set_include_event(true);                  // embed event JSON in all match results
engine.add_collection(&collection).unwrap();
let result = engine.process_event_at(&event, timestamp_secs);
// result.detections: Vec<MatchResult>
// result.correlations: Vec<CorrelationResult>
// result.correlations[0].events: Option<Vec<serde_json::Value>>     (Full mode)
// result.correlations[0].event_refs: Option<Vec<EventRef>>          (Refs mode)

Benchmarks

Criterion.rs benchmarks with synthetic rules and events (Apple M-series, single-threaded):

Detection Evaluation

Scenario Time Throughput
Compile 1,000 rules 669 us --
Compile 5,000 rules 3.4 ms --
1 event vs 100 rules 4.8 us --
1 event vs 1,000 rules 65 us --
1 event vs 5,000 rules 336 us --
100K events vs 100 rules 458 ms 218K events/sec
Wildcard-heavy (1,000 rules, 100 events) 5.9 ms --
Regex-heavy (1,000 rules, 100 events) 7.3 ms --

Correlation Engine

Scenario Time Throughput
1K events, 20 event_count correlations 727 us 1.37M events/sec
1K events, 10 temporal correlations 411 us 2.43M events/sec
100K events, 50 detection + 10 correlation rules 217 ms 462K events/sec
50K unique group keys (state pressure) 35.8 ms 1.40M events/sec
cargo bench --bench eval
cargo bench --bench correlation

License

MIT License.