vmdk-cli 0.3.0

CLI tool for inspecting VMDK disk images
vmdk-cli-0.3.0 is not a library.

vmdk

Crates.io docs.rs License: MIT CI Sponsor

Pure-Rust, read-only reader for VMware VMDK disk images. Presents the virtual disk as a plain Read + Seek byte stream — and, uniquely, recovers data from a damaged disk through the redundant grain directory that qemu-img and libvmdk throw away, while surfacing the forensic metadata they discard.

Command-line tool

$ cargo run --bin vmdk -- info disk.vmdk
File:              disk.vmdk
Format:            VMDK v1 (monolithicSparse)
Virtual disk size: 4,194,304 bytes (4.00 MiB)
Sector size:       512 bytes
Sectors:           8,192
Grain size:        128 sectors (64 KiB)
Compressed:        no
CID:               dc80b6c7
Descriptor:        17 lines (see --descriptor)

Six subcommands — info, map, dump, hash, verify, diff — fold the common qemu-img workflows into one binary:

$ vmdk verify disk.vmdk
RGD:     OK (matches primary GD)
Allocated grains: 3 (196,608 bytes)
Integrity: OK (3 grains checked, no out-of-bounds pointers)
Status:  OK

dump, hash, map, and verify accept --recover: when the primary grain directory is damaged, the read is resolved through the redundant grain directory instead, so data behind the corruption is still extractable.

$ vmdk verify damaged.vmdk            # primary GD is corrupt
Integrity: FAIL — 1 out-of-bounds grain table(s) … Status: FAILED

$ vmdk verify --recover damaged.vmdk  # resolve through the redundant GD
Integrity: OK (1 grains checked, no out-of-bounds pointers)
Recovered 1 grain(s) via the redundant grain directory
Status:  OK

dump writes raw virtual-disk bytes to stdout or a file (-o), a byte range (--offset / --length), or a hex view (--hex) — pipe it straight into a filesystem tool (NTFS, ext4, …) to read the guest's files. verify exits 0 when clean and 1 on corruption, so it drops into a triage pipeline.

Rust library

[dependencies]
vmdk = "0.3"

Quick start

use vmdk::VmdkReader;
use std::io::{Read, Seek, SeekFrom};

// Open any `Read + Seek` source — a File, a Cursor, another container reader.
let mut disk = VmdkReader::open(std::fs::File::open("disk.vmdk")?)?;

println!("virtual size: {} bytes", disk.virtual_disk_size());

// Read decoded virtual sectors like any byte stream — sparse/compressed grains
// are decompressed and zero-filled transparently.
let mut first_mib = vec![0u8; 1 << 20];
disk.seek(SeekFrom::Start(0))?;
disk.read_exact(&mut first_mib)?;
# Ok::<(), Box<dyn std::error::Error>>(())

For path-based images with companion files — monolithicFlat, the twoGbMaxExtent* split sets, raw-device maps — use VmdkFileReader::open_path, which locates and opens the extent files for you. For snapshot/delta trees, use VmdkChainReader::open, which layers a delta on its parent chain.

What makes this different from qemu-img and libvmdk

Most VMDK readers answer one question: "give me the bytes." vmdk answers the questions a digital forensics examiner actually needs — and reads disks the others give up on:

Capability qemu-img / libvmdk vmdk
Sparse / streamOptimized / flat read
COWD (vmfsSparse/vmfsThin) + seSparse (VMFS6) partial
Snapshot / delta chain traversal
Recover data behind a damaged primary GD (redundant-GD fallback)
Recover an individual lost grain-table entry from the redundant copy
Redundant-GD validation (grain-table contents, not pointers)
Structural integrity scan (dangling GD/GT/grain pointers)
ddb.* disk database (adapter, geometry, UUID, tools/HW version) discarded
Header provenance — unclean-shutdown flag, FTP-ASCII-mangling check
Change Block Tracking (-ctk) reference
longContentID resolution (the CID == 0xFFFFFFFE sentinel)
Raw Device Mapping (VMFSRDM) extent enumeration
Streaming SHA-256 + MD5 of the virtual disk
Adversarial-input hardening + fuzz testing
Pure Rust, zero unsafe, no C library

Formats

Every VMDK createType and extent type in the VMware Virtual Disk Format spec (cross-checked against QEMU block/vmdk.c and libvmdk):

createType Notes
monolithicSparse, streamOptimized header v1/v2/v3; DEFLATE grains; GD_AT_END footer
monolithicFlat, vmfs, vmfsPreallocated, vmfsEagerZeroedThick preallocated flat extents
twoGbMaxExtentSparse, twoGbMaxExtentFlat split 2 GB extent sets
vmfsSparse, vmfsThin ESXi COWD copy-on-write sparse
seSparse vSphere 6.5+ space-efficient sparse (nibble-typed, bit-rotated grains)
vmfsRaw, vmfsRawDeviceMap, vmfsPassthroughRawDeviceMap, fullDevice, partitionedDevice device / raw-LUN maps
custom arbitrary extent mix, routed by extent type

Extent types: FLAT, VMFS, VMFSRAW, VMFSRDM, ZERO, SPARSE, VMFSSPARSE, SESPARSE; access RW / RDONLY / NOACCESS. ZERO and NOACCESS regions read as zeros without touching disk.

Forensic recovery

VMware writes the grain tables twice — the grain directory (GD) and a redundant copy (RGD) point to separate physical copies. qemu-img and libvmdk read only the primary and fail when it is damaged. vmdk uses the redundant copy to keep reading:

use vmdk::VmdkReader;
use std::io::Read;

let mut disk = VmdkReader::open(std::fs::File::open("damaged.vmdk")?)?;

// Triage: how much of the primary grain directory is recoverable via the RGD?
let report = disk.grain_directory_recovery()?;
println!(
    "{} entries, {} damaged, {} recoverable via RGD",
    report.total_entries, report.primary_damaged, report.recoverable_via_rgd,
);

// Opt in to recovery, then read normally — damaged pointers resolve through the RGD.
disk.enable_rgd_fallback();
let mut buf = vec![0u8; 1 << 20];
let _ = disk.read(&mut buf);
println!("recovered {} grain(s) via the RGD", disk.rgd_recovery_count());
# Ok::<(), Box<dyn std::error::Error>>(())

Recovery is opt-in and never changes a healthy read; without it a dangling pointer simply errors (the safe default).

Forensic metadata

The text descriptor carries provenance that other readers parse and then throw away. vmdk surfaces all of it:

use vmdk::VmdkReader;

let mut disk = VmdkReader::open(std::fs::File::open("disk.vmdk")?)?;

let ddb = disk.disk_database();                 // ddb.* disk database
println!("adapter:   {:?}", ddb.adapter_type);  // ide / lsilogic / pvscsi …
println!("geometry:  {:?}", ddb.geometry);      // CHS cylinders/heads/sectors
println!("disk UUID: {:?}", ddb.uuid);
println!("HW / tools: {:?} / {:?}", ddb.virtual_hw_version, ddb.tools_version);

if let Some(p) = disk.header_provenance()? {
    println!("unclean shutdown:  {}", p.unclean_shutdown);    // crash / live image
    println!("newline check ok:  {}", p.newline_check_intact); // false ⇒ FTP-mangled
}
println!("CBT file:   {:?}", disk.change_track_path());       // -ctk.vmdk reference
println!("content ID: {}",  disk.effective_content_id());     // resolves longContentID
# Ok::<(), Box<dyn std::error::Error>>(())

API highlights

Method Purpose
VmdkReader::open(reader) open any Read + Seek source
VmdkFileReader::open_path(path) open path-based images (flat / multi-extent / device maps)
VmdkChainReader::open(path) layer a delta on its parent snapshot chain
read / seek (std::io) decoded virtual-sector byte stream
info()VmdkInfo version, CID, geometry, compression, descriptor, disk database
is_allocated(lba) / iter_allocated_grains() sparse-map queries
hash()VmdkDigest streaming SHA-256 + MD5 of the virtual disk
validate_rgd() / check_integrity() redundant-GD + structural integrity
grain_directory_recovery() / enable_rgd_fallback() / rgd_recovery_count() RGD recovery
disk_database() / header_provenance() / change_track_path() / effective_content_id() forensic metadata

serde derives on the public report types are available behind the serde feature.

Security

vmdk is built to run on untrusted, potentially crafted disk images:

  • No panics on malicious input — every allocation derived from header fields is bounds-checked; reads are clamped; compressed-grain sizes are capped.
  • Allocation-amplification hardenednumGTEsPerGT is capped at the spec value (512), matching QEMU, so a crafted header can't drive a multi-gigabyte grain-table allocation.
  • Zero unsafeunsafe_code = "forbid" workspace-wide; no C dependency.
  • Fuzz-tested — three cargo fuzz targets cover the open path, the full read/scan/integrity surface, and the RGD recovery paths; run in CI on every change and deeper on a schedule.
# Requires nightly Rust and cargo-fuzz
rustup install nightly
cargo install cargo-fuzz

cargo +nightly fuzz run fuzz_open
cargo +nightly fuzz run fuzz_read
cargo +nightly fuzz run fuzz_recover

Testing

280+ tests (unit + integration) covering every public API, every format branch, the recovery paths, and adversarial inputs. COWD and seSparse output is cross-validated byte-for-byte against qemu-img convert -O raw — the synthetic fixtures and the reader cannot share a blind spot. Coverage is enforced in CI.

cargo test
cargo +stable llvm-cov --workspace --all-features --summary-only

Related

vmdk gives you the virtual disk as bytes. These crates read other container formats the same way — a pure Read + Seek over the decoded sector stream:

Crate Format
ewf E01 / Expert Witness Format (EnCase, FTK Imager)
vhdx Microsoft VHDX (Hyper-V, Azure)
vhd Legacy VHD (Virtual PC / Hyper-V Gen-1)
qcow2 QEMU / KVM QCOW2
dd Raw / flat / dd images

Once you have the bytes, these parsers analyse the partition layout inside:

Crate Scheme
mbr-forensic Master Boot Record — anomalies, slack carving, boot-code ID
gpt-forensic GUID Partition Table — backup-header reconciliation, CRC32
disk-forensic Orchestrator — auto-detects MBR/GPT/APM and dispatches

Container-format knowledge (magic numbers, header layouts, encoding rules) lives in forensicnomicon.


Privacy Policy · Terms of Service · © 2026 Security Ronin Ltd