Skip to main content

Module parse

Module parse 

Source
Expand description

Shell AST front-end for the classifier (pure-Rust, via brush-parser).

The Tier-1 classifier historically tokenized the command line with a small hand-rolled splitter. That is fast and dependency-light, but it can’t see the true shell structure — commands hidden inside command substitution $(…) / backticks, here-documents fed to a shell, or unusual quoting. This module parses the line into a real bash AST and flattens it to the list of simple commands it would run, descending into:

  • pipelines, &&/||/; lists, and compound commands (subshells, groups, if/for/while/case/functions),
  • command substitutions $(…) and backticks found in any word,
  • here-document bodies fed to a shell (bash <<EOF … EOF),
  • the -c script of a shell, and find -exec / xargs payloads.

The classifier composes this with the existing tokenizer pass worst-wins, so the AST can only ever add detections (deeper, obfuscated payloads) and never downgrade a tokenizer verdict. A parse failure yields None — the caller treats that as “the AST found nothing”, and the tokenizer pass (and the fail-toward-caution default) still stands.

Structs§

Analysis
What the AST pass found: every simple command the line would run (flattened, including those nested in substitutions / compounds), plus the raw text of every command substitution $(…) / backtick — so the classifier can also whole-line-scan substitution bodies (e.g. a curl … | sh hidden in $(…)).
SimpleCmd
One simple command extracted from the AST: program plus its argument words. Word text is raw (quotes/expansions preserved), exactly as the agent wrote it — the classifier trims quotes where it matters.

Functions§

analyze
Parse raw into an Analysis. Returns None if the line can’t be parsed (caller falls back to the tokenizer pass + the cautious default).
ast_commands
Just the flattened simple commands (used in tests).