Crate shell_sanitize

Expand description

Type-safe input sanitization for shell arguments and file paths.

This crate provides the core framework: the Rule trait, the Sanitizer builder, and the Sanitized<T> proof type that can only be constructed by passing all rules.

For built-in rules and ready-made presets, see the companion crate shell_sanitize_rules.

§When to use this crate

Prefer std::process::Command when possible. Passing arguments via Command::new("git").arg(user_input) bypasses the shell entirely and is always the safest option.

This crate is for situations where shell evaluation is unavoidable:

Scenario	Why you can’t avoid the shell
SSH remote commands	Remote side evaluates through shell
`docker exec ctr sh -c "..."`	Container-side shell
CI/CD pipeline `run:` blocks	YAML → shell evaluation
AI agent tool execution	LLM output may reach a shell
Legacy `system()` / `popen()`	API forces shell involvement

It is also valuable for path validation even without shell involvement: blocking ../../etc/passwd in upload paths, config file references, and template includes.

§Design principle: reject, don’t escape

Escaping is fragile — it depends on the target shell, can be double-applied, and makes legitimate commands non-functional. This crate rejects dangerous input with a clear error instead of trying to transform it into something “safe”.

§Scope: argument validation, not command validation

This crate validates individual arguments and paths — it does not parse or validate entire shell command strings.

"git clone https://example.com"   ← full command: out of scope
                                      (use sandbox + command allowlist)

"https://example.com"             ← individual argument: in scope
                                      (validate with shell_command preset)

"uploads/photo.jpg"               ← file path: in scope
                                      (validate with file_path preset)

Sanitizing an entire command string would break legitimate syntax (pipes, redirects, subshells are valid command constructs). Instead, separate the trusted command structure from untrusted data, then validate only the data.

§AI agent threat model

LLM output should be treated as untrusted input — indirect prompt injection can manipulate what the AI produces. However, in practice, AI agents generate commands at different levels of structure:

Pattern	Example	shell-sanitize applicability
Structured tool call	`{ tool: "read", path: "src/lib.rs" }`	High — validate `path` with `file_path`
Single command + args	`Bash("git diff HEAD~3")`	Medium — tokenize first, then validate each arg
Free-form command string	`Bash("cd repo && make && ./run")`	Out of scope — use sandbox/container

§Where this crate fits in defense-in-depth

AI Agent Framework
┌──────────────────────────────────────────┐
│                                          │
│   Path-based tools        Bash tool      │
│   (Read/Write/Glob)       (free-form)    │
│         │                      │         │
│         ▼                      ▼         │
│   ★ shell-sanitize ★     Sandbox/Container
│   file_path() preset     (OS-level isolation)
│   file_path_absolute()                   │
│                                          │
└──────────────────────────────────────────┘

Path arguments (Read, Write, Include) → this crate’s primary value
Structured tool call arguments → effective with appropriate preset
Trusted template + sanitized slots → sh -c "cd {safe} && make {safe}"
Free-form bash strings → out of scope; rely on sandbox, container isolation, and command allowlists

Structs§

FilePath: Marker: value is safe to use as a file path component.
RuleViolation: A single rule violation with context.
SanitizeError: Error returned when sanitization fails.
Sanitized: A value that has passed all sanitization rules for marker type T.
Sanitizer: A composable sanitizer that runs a chain of Rules against input.
ShellArg: Marker: value is safe to use as a shell argument.

Traits§

MarkerType: Marker trait for sanitized value categories.
Rule: A sanitization rule that inspects an input string and reports violations.

Type Aliases§

RuleResult: Result of a rule check: either pass or one or more violations.