Skip to main content

Module forensic

Module forensic 

Source
Expand description

Imputation-like forensic baseline — answer “what would this dim have looked like if the point were normal?” by aggregating the per-dim distribution of the forest’s currently-held sample points.

Inspired by AWS’s ImputeVisitor but repurposed: instead of imputing a NaN feature, this helper tells an SOC analyst how far an observed point sits from the forest’s current idea of “normal” on every dimension — the expected value under normality plus a z-score-style delta.

§Semantics

  • expected[d] — mean of dim d across every point currently held in any tree’s reservoir (the forest’s live baseline).
  • stddev[d] — population standard deviation of the same set.
  • observed[d] — the caller’s raw query value.
  • delta[d] = observed[d] − expected[d].
  • zscore[d] = delta[d] / stddev[d] (clamped to 0 when the baseline stddev is zero on a dim — constant baseline means no meaningful z-score).
  • live_points — number of unique points contributing to the baseline.

The baseline is computed in raw-point space: feature_scales is applied to the stored points for averaging then inverted so expected / stddev / delta live in the caller’s original coordinate system. SOC dashboards don’t need to know about the internal scaling.

Structs§

ForensicBaseline
Per-dim forensic baseline comparing an observed point against the forest’s current live sample distribution.