Expand description

A parser for the minidump file format.

The minidump module provides a parser for the minidump file format as produced by Microsoft’s MinidumpWriteDump API and the Google Breakpad library.

Usage

The primary API for this library is the Minidump struct, which can be instantiated by calling the Minidump::read or Minidump::read_path methods.

Successfully parsing a Minidump struct means the minidump has a minimally valid header and stream directory. Individual streams are only parsed when they’re requested.

Although you may enumerate the streams in a minidump with methods like Minidump::all_streams, this is only really useful for debugging. Instead you should statically request streams with Minidump::get_stream. Depending on what analysis you’re trying to perform, you may:

  • Consider it an error for a stream to be missing (using ? or unwrap)
  • Branch on the presence of stream to conditionally refine your analysis
  • Use a stream’s Default implementation to get an “empty” instance (with unwrap_or_default)
use minidump::*;

fn main() -> Result<(), Error> {
    // Read the minidump from a file
    let mut dump = minidump::Minidump::read_path("../testdata/test.dmp")?;

    // Statically request (and require) several streams we care about:
    let system_info = dump.get_stream::<MinidumpSystemInfo>()?;
    let exception = dump.get_stream::<MinidumpException>()?;

    // Combine the contents of the streams to perform more refined analysis
    let crash_reason = exception.get_crash_reason(system_info.os, system_info.cpu);

    // Conditionally analyze a stream
    if let Ok(threads) = dump.get_stream::<MinidumpThreadList>() {
        // Use `Default` to try to make progress when a stream is missing.
        // This is especially natural for MinidumpMemoryList because
        // everything needs to handle memory lookups failing anyway.
        let mem = dump.get_stream::<MinidumpMemoryList>().unwrap_or_default();

        for thread in &threads.threads {
            let stack = thread.stack_memory(&mem);
            // ...
        }
    }

    Ok(())
}

Generally speaking, there isn’t any reason to distinguish between a stream being absent and it being corrupt. Just ask for what you want and we’ll do our best to give it to you.

Everything else you would want to do with a Minidump is specific to the individual streams:

Notable Streams

There’s a lot of different Minidump Streams, but some are especially notable/fundamental:

MinidumpSystemInfo includes details about the hardware and operating system that the crash occured on. This information is often required to properly interpret the other streams of the minidump, as they contain platform-specific values.

MinidumpException includes actual details about where and why the crash occured.

MinidumpThreadList includes the registers and stack memory of every thread in the program at the time of the crash. This enables generating backtraces for every thread.

MinidumpMemoryList maps the crashing program’s runtime addresses (such as $rsp) to ranges of memory in the Minidump.

MinidumpModuleList includes info on all the modules (libraries) that were linked into the crashing program. This enables symbolication, as you can map instruction addresses back to offsets in a specific library’s binary.

What is a Minidump?

Minidumps capture the state of a crashing process (threads, stack memory, registers, dlls), why it crashed (crashing thread, error codes, error messages), and details about the system the program was running on (os, cpu).

The information in a minidump is divided up into a series of independent “streams”. If you want a specific piece of information, you must know the stream that contains it, and then look up that stream in the minidump’s directory. Most streams are pretty straight-forward – you can guess what you might find in MinidumpThreadList or MinidumpSystemInfo – but others – like MinidumpMiscInfo – are a bit more random.

This format was initially defined by Microsoft, as Windows has long included system apis to generate minidumps. But lots of software gets made for operating systems other than Windows, where no such native support for minidumps is present. google-breakpad was created to extend Microsoft’s minidump format to other platforms, and defines minidump generators for things like Linux and MacOS.

I do not believe that Microsoft and Breakpad officially collaborate on the format, it’s just designed to be very extensible, so it’s easy to add random stuff to a minidump in ways that don’t break old tools and likely won’t interfere with future versions. That said, Microsoft does now develop cross-platform products that make use of Breakpad, such as VSCode, so at very least their crash reporting infra deals with Breakpad minidumps.

The rust-minidump crates are specifically designed to support Breakpad’s extended minidump format (and native Windows minidumps, which should in theory just be a subset). That said, rust-minidump doesn’t yet (and probably won’t ever) support everything. There’s a lot of random stuff that either Microsoft or Breakpad have defined over the years that we just, do not have any use for at the moment. Not a lot of demand for handling minidumps for PlayStation 3, SPARC, or Windows CE these days.

The Minidump Format

This section is dedicated to describing how to parse minidumps, for anyone wanting to maintain this code or write their own parser.

Minidumps are a binary format. This format is simultaneously very simple and very complicated.

The simple part of a minidump is that it’s basically just an array of pointers to different typed “Streams” (system info, exception info, threads, memory mappings, etc.). So if you want to lookup the system info, you just search the array for a system info stream and interpret that range of memory as that stream.

The complicated part of a minidump is the fact that every stream contains totally different information in totally different formats. Sure, there are families of streams that have the same general structure, but you’ve still got to write custom code to interpret the values meaningfully and figure out what on earth that information is useful for.

Sometimes the answer to “what is it useful for?” is “I don’t know but maybe we’ll find a use for it later”. This is genuinely useful because it allows us to add new analyses long after a crash occurs and gain new insights that the minidump format wasn’t explicitly designed to provide.

This is all to say that, beyond the basic layout of the minidump header and directory, it’s basically just a big ball of random formats with independent formats and layout – and everyone is technically free to come up with their own custom Streams that they can just toss in there, so trying to cover everything is kind of impossible? Lets see how far we get!

The Minidump Header and Directory

The first thing in a Minidump is the MINIDUMP_HEADER, which has the following layout:

pub struct MINIDUMP_HEADER {
    pub signature: u32,
    pub version: u32,
    pub stream_count: u32,
    pub stream_directory_rva: RVA,
    pub checksum: u32,
    pub time_date_stamp: u32,
    pub flags: u64,
}

/// Offset into the minidump
pub type RVA = u32;

The signature is always MINIDUMP_SIGNATURE = 0x504d444d (“MDMP” in ascii). You can use this to detect whether the minidump is little-endian or big-endian (minidumps always have the endianess of platform they were generated on, since they contain lots of raw memory from the process, but at this point we don’t know what that platform is).

The lower 16 bits of version are always MINIDUMP_VERSION = 42899. (The high bits contain implementation-specific values that you should just ignore).

stream_directory_rva and stream_count are the location (offset from the start of the file, in bytes) and size of the stream directory, respectively.

checksum is some kind of checksum of the minidump itself (which may be null), but the algorithm isn’t specified, and rust-minidump doesn’t check it.

time_date_stamp is a Windows time_t of when the miniump was generated.

flags are a MINIDUMP_TYPE which largely just specify what you can expect to find in the minidump. This is unused by rust-minidump since this information is generally redundant with the stream directory and flags within the streams that we need to check anyway. (e.g. instead of checking that this is a MiniDumpWithUnloadedModules, you can just check the directory for the MinidumpUnloadedModuleList stream.)

At stream_directory_rva (typically immediately after the header) you will find an array of stream_count MINIDUMP_DIRECTORY entries, with the following layout:

pub struct MINIDUMP_DIRECTORY {
    /// The type of the stream
    pub stream_type: u32,
    /// The location of the stream contents within the dump.
    pub location: MINIDUMP_LOCATION_DESCRIPTOR,
}

/// A "slice" of the minidump
pub struct MINIDUMP_LOCATION_DESCRIPTOR {
    /// The size of this data (in bytes)
    pub data_size: u32,
    /// The offset to this data within the minidump file.
    pub rva: RVA,
}

/// Offset into the minidump
pub type RVA = u32;

Known stream_type values are defined in MINIDUMP_STREAM_TYPE, but users are allowed to define their own stream types, so it’s normal to see unknown types (this is the primary mechanism breakpad uses to extend the format without causing upstream problems).

And that’s it! Everything else in a minidump is just all the different types of stream. As of this writing, rust-minidump is aware of 51 different types of stream, and implements 18 of them (there’s a long tail of platform-specific and domain-specific streams, so that isn’t as bad as it sounds).

Stream Format Families

Although every stream can do whatever it wants, there’s a lot of streams that are basically “a struct” or “a list of structs”, so the same header formats and layouts are used in several places. (This is descriptive, so these aren’t necessarily official terms/concepts.)

Plain Old Struct Streams

A stream that’s just a struct.

That’s it. Just read the struct out of the stream. Although it might contain RVAs to other data, which may or may not be relative to the start of the stream or the start of the file (annoyingly inconsistent between streams).

Known members of this family:

List Streams

A list of some entry type.

A u32 count of entries followed by an array of entries. There may be padding between the count and the entries. The array should be “right-justified” in the stream (the stream ends exactly where the array does), so you can use the difference between the array’s expected size and the rest of the stream’s size to determine the padding.

This format is used by a lot of the oldest (and therefore most important) minidump streams.

Known members of this family:

The stream MinidumpMemory64List is a variant of list stream. It starts with a u64 count of entries, a 64-bit shared RVA for all entries, then followed by an array of entires MINIDUMP_MEMORY_DESCRIPTOR64.

EX List Streams

A newer and more flexible version of list streams. (so EXtreme!!!)

EX list streams start with this header:

struct EX_LIST_HEADER {
  /// Size (in bytes) of this header (array starts immediately after)
  pub size_of_header: u32,
  /// Size (in bytes) of an entry in the array
  pub size_of_entry: u32,
  /// The number of entries in the array
  pub number_of_entries: u32,
}

This design allows newer versions of the stream to be introduced, and for fields to be added to the end of an entry type. I am not aware of an instance where this flexibility has been used yet, but in theory you could identify “versions” of the stream format by size, and older versions don’t need to worry about unknown future revisions, because they can just ignore the trailing bytes of each entry.

Known members of this family:

Linux List Streams

A dump of a special linux file like /proc/cpuinfo.

These streams are plain text (strings::LinuxOsString) files containing line-delimited key-value pairs, like:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz

Whitespace and separators vary from stream to stream.

Known members of this family:

Modules

Minidump structure definitions.

Information about the system that produced a Minidump.

Structs

An iterator over registers and values in a CpuContext.

An index into the contents of a minidump.

Information about an assertion that caused a crash.

Additional information about process state.

CPU context such as register states.

Additional Crashpad-specific information carried within a minidump file.

Information about the exception that caused the minidump to be generated.

A stream in the minidump that this implementation can interpret,

Interesting values extracted from /proc/cpuinfo

Interesting values extracted from /proc/self/environ

Interesting values extracted from /etc/lsb-release

A memory mapping entry for the process we are analyzing.

The contents of /proc/self/maps for the crashing process.

Interesting values extracted from /proc/self/status

A region of memory from the process that wrote the minidump. This is the underlying generic type for MinidumpMemory and MinidumpMemory64.

Metadata about a region of memory (whether it is executable, freed, private, and so on).

A list of memory regions included in a minidump. This is the underlying generic type for MinidumpMemoryList and MinidumpMemory64List.

Miscellaneous information about the process that wrote the minidump.

An executable or shared library loaded in the process at the time the Minidump was written.

Additional Crashpad-specific information about a module carried within a minidump file.

A list of MinidumpModules contained in a Minidump.

Information about the system that generated the minidump.

The state of a thread from the process when the minidump was written.

A list of MinidumpThreads contained in a Minidump.

A mapping of thread ids to their names.

A stream in the minidump that this implementation is aware of but doesn’t yet support.

A stream in the minidump that this implementation has no knowledge of.

An executable or shared library that was once loaded into the process, but was unloaded by the time the Minidump was written.

A list of MinidumpUnloadedModules contained in a Minidump.

Enums

CodeView data describes how to locate debug symbols

Errors encountered while reading a MinidumpContext.

The reason for a process crash.

The endianness (byte order) of a stream of bytes

Errors encountered while reading a Minidump.

A typed annotation object.

Information about which registers are valid in a MinidumpContext.

A broad classification of the mapped memory described by a MinidumpLinuxMapInfo.

The CPU-specific context structure.

A UnifiedMemoryInfoList entry, providing metatadata on a region of memory in the crashed process.

Provides a unified interface for getting metadata about the process’s mapped memory regions at the time of the crash.

Traits

Generic over the specifics of a CPU context.

The fundamental unit of data in a Minidump.

An executable or shared library loaded in a process.

Shorthand for Read + Seek

Type Definitions

A region of memory from the process that wrote the minidump.

A large region of memory from the process that wrote the minidump (usually a full dump).

A list of large memory regions included in a minidump (usually a full dump).

A list of memory regions included in a minidump.