execkit
The safety layer that lets an AI agent run shell on real infrastructure — without you holding your breath.
Persistent local + SSH sessions · structured results · secret-safe · default-deny policy · embeddable · open source
What libssh2 is to SSH, execkit is to agent shell sessions.
Status: v0.1.0 — early release. The core is built and reviewed — local + SSH transports, structured results, advisory policy, secret redaction, and an MCP server — all verified end-to-end (see
poc/and the test suite). An early0.1.xrelease — not production-ready (see Limitations). The plan isROADMAP.md; the vision isFEATURE_VISION.md.
The problem
Letting an autonomous agent run shell commands is the most useful — and most terrifying — thing you can give it. Today your options are bad:
- Built-in harness shells (Claude Code, Cursor) are local-only and have no real guardrails for autonomous, unsupervised runs.
- Managed sandboxes (E2B, Daytona) are great but cloud-hosted — you can't embed them, and you inherit vendor lock-in and latency.
- Raw SSH / tmux hacks are stateless-per-command, leak escape codes, and have zero notion of "is this command allowed?"
So most teams just... don't let agents touch real infrastructure. execkit exists to remove that fear.
The core idea: the agent is the adversary
A traditional tool trusts its caller. execkit can't — the LLM driving it can be hijacked by prompt injection from any data it reads (a poisoned file, a web page, a CI log). So execkit's first job is to contain its own caller.
Every command passes through a fence before it reaches a shell:
agent ──▶ execkit ──▶ [ default-deny policy ] ──▶ [ dangerous-pattern intercept ]
│ blocked │ HITL approval
▼ ▼
never executed human approves / denies
│ allowed
▼
transport (local · SSH · Docker · K8s)
│
structured result ◀── [ secret redaction ] ◀── output
A blocked rm -rf never touches the filesystem. An AWS key in the output is
redacted before it ever reaches the model or your logs. A changed SSH host
key fails loudly instead of silently reconnecting into a MITM. These aren't
roadmap promises — each gate is verified in poc/run_flashy.py.
What you get
# target API (v0.1) — illustrative
=
=
# ExecResult(exit_code=0, stdout="● api active (running)...",
# stderr="", duration_ms=120, cwd="/home/deploy")
# Blocked(reason="dangerous pattern") — the shell never saw it
# ExecResult(stdout="AWS_SECRET_ACCESS_KEY=[REDACTED]") # never leaves the box
- Safe autonomy — default-deny capability fence, dangerous-command interception (human-in-the-loop), secret redaction, tamper-evident audit.
- Persistent sessions —
cd/env/state stick across commands, like a real terminal left open. Not a new connection per command. - One API, every transport — local PTY, SSH, Docker exec, K8s exec return the identical structured result.
- Token-aware output — compress a 4,000-line log to the part that matters, so agent context (and cost) doesn't blow up.
- Embeddable, never a service —
cargo add/pip install, in your process. No daemon you don't control, no vendor.
Structured output is a feature, not the pitch. LLMs read raw terminal text fine. execkit's value is trust: persistence, multi-transport reach, and the safety to point an agent at infrastructure you actually care about.
Using it from an AI agent
- Claude Code · Cursor · Gemini CLI — execkit ships an MCP server (v0.1). Add it to your MCP config and the agent calls its tools directly; no model changes, no special access.
- Custom agents (Claude / Gemini / OpenAI APIs, LangChain, CrewAI, OpenHands) — native Python SDK (v0.1), Node (v0.2), Go (v0.3).
Why Rust
Concurrent session handling, zero-cost FFI to every language SDK, and PTY
correctness via portable-pty — memory-safe where a C core would get ugly fast.
The critical path is already proven in Rust: poc/rust/.
Status & roadmap
| Version | Theme |
|---|---|
| v0.1 | Proven core + non-negotiable safety: PTY+SSH, ExecResult, capability fence, secret redaction, MCP mode, Python SDK |
| v0.2 | Docker/K8s transports, pooling, output budgets, Node SDK |
| v0.3 | Streaming, interactive stdin, semantic events, token-aware compression, Go SDK |
| v0.4 | Sandbox transport, host-key-verified reconnect, encrypted snapshots, audit + OTel |
| v1.0 | Windows ConPTY, stable API, framework guides, benchmarks |
Full detail in ROADMAP.md. Cut on purpose: cross-host
federated sessions (attack surface > value).
Limitations (v0.1)
Be upfront — this is a young library. Today:
- Not a sandbox. The command policy is an advisory tripwire (string-matching, bypassable). The load-bearing control is the environment — run the agent and SSH user with least privilege. A real sandbox transport is on the roadmap (v0.4).
- A timed-out command poisons the session. There's no interrupt-and-resync yet; on timeout you get a clear error and should create a new session.
- Unix-only. Local transport needs a POSIX shell (
bash); Windows (ConPTY) is v1.0. - Synchronous core. Fine for typical agent use; not yet tuned for thousands of concurrent sessions.
- SSH
AcceptAnyhost-key mode exists for testing and is gated behind an explicit insecure opt-in — never use it in production. - Recovery/time-travel, Docker/K8s transports, streaming, and more SDKs are
roadmap, not built. See
ROADMAP.md.
Found something rough? Please open an issue.
Contributing & security
- Contributions: see
CONTRIBUTING.md. - Found a vulnerability? Please follow
SECURITY.md— do not open a public issue for security reports.
License
Apache 2.0 — embed it freely, including commercially. See LICENSE
and NOTICE.