Expand description
Evaluating VM / sandbox systems for agentic AI use.
An agent runtime does not run one long-lived VM; it spawns a fleet of short-lived, isolated execution environments — one per tool call, code run, or sub-agent — and tears them down. That workload rewards different properties than a classic datacenter VM, so this module scores VM/sandbox systems on five agent-native axes:
- start-latency — how fast a fresh, isolated sandbox is ready. Agent loops spawn constantly; cold-start dominates wall-clock.
- density — sandboxes per host (per-instance memory/CPU overhead). Fleet economics: how many concurrent agents fit on a box.
- isolation — strength of the security boundary for untrusted, agent-generated code. Hardware virtualization beats a shared kernel.
- snapshotting — instant fork / snapshot-restore of a warm template (copy-on-write), so an agent can branch a primed context per call and keep warm pools.
- agent-control — is the control plane programmatic and agent/tool-native (an agent can discover and drive lifecycle directly), or bring-your-own glue?
Profiles are curated 0.0–1.0 static judgments with evidence, like the
languages and frameworks
profiles — deterministic, serializable, comparable. Scores reflect each
system’s design center for the ephemeral agent-sandbox workload, not its
fitness for every use; a great long-lived datacenter VM can still rank low
here, and that is the point.
use agentic_eval::vms::{profile, rank_vms, Vm};
let fc = profile(Vm::Firecracker);
assert!(fc.evidence.len() >= 3);
let ranked = rank_vms();
assert!(ranked[0].fitness() >= ranked[ranked.len() - 1].fitness());Structs§
- VmComparison
- Compare two systems: positive deltas mean
afits agentic use better. - VmProfile
- A curated agentic profile of a VM/sandbox system across the five agent-native axes, with evidence.
Enums§
- Vm
- VM / sandbox systems with curated agentic profiles.
Functions§
- compare_
vms - Compare system
aagainst baselinebacross all five axes. - profile
- The curated profile for
vm(static, documented judgments — see module docs). - profiles
- Profiles for all systems, in
Vm::allorder (deterministic). - rank_
vms - All profiles ranked best-first by
VmProfile::fitness(stable order on ties).