Expand description
A parser for the minidump file format.
The minidump
module provides a parser for the
minidump file format as produced by Microsoft’s
MinidumpWriteDump
API and the
Google Breakpad library.
§Usage
The primary API for this library is the Minidump
struct, which can be
instantiated by calling the Minidump::read
or Minidump::read_path
methods.
Successfully parsing a Minidump struct means the minidump has a minimally valid header and stream directory. Individual streams are only parsed when they’re requested.
Although you may enumerate the streams in a minidump with methods like
Minidump::all_streams
, this is only really useful for debugging. Instead
you should statically request streams with Minidump::get_stream
.
Depending on what analysis you’re trying to perform, you may:
- Consider it an error for a stream to be missing (using
?
orunwrap
) - Branch on the presence of stream to conditionally refine your analysis
- Use a stream’s
Default
implementation to get an “empty” instance (withunwrap_or_default
)
use minidump::*;
fn main() -> Result<(), Error> {
// Read the minidump from a file
let mut dump = minidump::Minidump::read_path("../testdata/test.dmp")?;
// Statically request (and require) several streams we care about:
let system_info = dump.get_stream::<MinidumpSystemInfo>()?;
let exception = dump.get_stream::<MinidumpException>()?;
// Combine the contents of the streams to perform more refined analysis
let crash_reason = exception.get_crash_reason(system_info.os, system_info.cpu);
// Conditionally analyze a stream
if let Ok(threads) = dump.get_stream::<MinidumpThreadList>() {
// Use `Default` to try to make progress when a stream is missing.
// This is especially natural for MinidumpMemoryList because
// everything needs to handle memory lookups failing anyway.
let mem = dump.get_memory().unwrap_or_default();
for thread in &threads.threads {
let stack = thread.stack_memory(&mem);
// ...
}
}
Ok(())
}
Generally speaking, there isn’t any reason to distinguish between a stream being absent and it being corrupt. Just ask for what you want and we’ll do our best to give it to you.
Everything else you would want to do with a Minidump is specific to the individual streams:
MinidumpAssertion
MinidumpBreakpadInfo
MinidumpCrashpadInfo
MinidumpException
MinidumpLinuxCpuInfo
MinidumpLinuxEnviron
MinidumpLinuxLsbRelease
MinidumpLinuxMaps
MinidumpLinuxProcStatus
MinidumpMacCrashInfo
MinidumpMacBootargs
MinidumpMemoryList
MinidumpMemoryInfoList
MinidumpMiscInfo
MinidumpModuleList
MinidumpSystemInfo
MinidumpThreadList
MinidumpThreadNames
MinidumpUnloadedModuleList
MinidumpLinuxProcLimits
§Notable Streams
There’s a lot of different Minidump Streams, but some are especially notable/fundamental:
MinidumpSystemInfo
includes details about the hardware and operating
system that the crash occured on. This information is often required to
properly interpret the other streams of the minidump, as they contain
platform-specific values.
MinidumpException
includes actual details about where and why the crash
occured.
MinidumpThreadList
includes the registers and stack memory of every thread
in the program at the time of the crash. This enables generating backtraces for
every thread.
MinidumpMemoryList
maps the crashing program’s runtime addresses (such as
$rsp
) to ranges of memory in the Minidump.
MinidumpModuleList
includes info on all the modules (libraries) that were
linked into the crashing program. This enables symbolication, as you can map
instruction addresses back to offsets in a specific library’s binary.
§What is a Minidump?
Minidumps capture the state of a crashing process (threads, stack memory, registers, dlls), why it crashed (crashing thread, error codes, error messages), and details about the system the program was running on (os, cpu).
The information in a minidump is divided up into a series of
independent “streams”. If you want a specific piece of information, you must
know the stream that contains it, and then look up that stream in the
minidump’s directory. Most streams are pretty straight-forward – you can guess
what you might find in MinidumpThreadList
or MinidumpSystemInfo
– but others – like MinidumpMiscInfo
– are a bit more random.
This format was initially defined by Microsoft, as Windows has long included system apis to generate minidumps. But lots of software gets made for operating systems other than Windows, where no such native support for minidumps is present. google-breakpad was created to extend Microsoft’s minidump format to other platforms, and defines minidump generators for things like Linux and MacOS.
I do not believe that Microsoft and Breakpad officially collaborate on the format, it’s just designed to be very extensible, so it’s easy to add random stuff to a minidump in ways that don’t break old tools and likely won’t interfere with future versions. That said, Microsoft does now develop cross-platform products that make use of Breakpad, such as VSCode, so at very least their crash reporting infra deals with Breakpad minidumps.
The rust-minidump crates are specifically designed to support Breakpad’s extended minidump format (and native Windows minidumps, which should in theory just be a subset). That said, rust-minidump doesn’t yet (and probably won’t ever) support everything. There’s a lot of random stuff that either Microsoft or Breakpad have defined over the years that we just, do not have any use for at the moment. Not a lot of demand for handling minidumps for PlayStation 3, SPARC, or Windows CE these days.
§The Minidump Format
This section is dedicated to describing how to parse minidumps, for anyone wanting to maintain this code or write their own parser.
Minidumps are a binary format. This format is simultaneously very simple and very complicated.
The simple part of a minidump is that it’s basically just an array of pointers to different typed “Streams” (system info, exception info, threads, memory mappings, etc.). So if you want to lookup the system info, you just search the array for a system info stream and interpret that range of memory as that stream.
The complicated part of a minidump is the fact that every stream contains totally different information in totally different formats. Sure, there are families of streams that have the same general structure, but you’ve still got to write custom code to interpret the values meaningfully and figure out what on earth that information is useful for.
Sometimes the answer to “what is it useful for?” is “I don’t know but maybe we’ll find a use for it later”. This is genuinely useful because it allows us to add new analyses long after a crash occurs and gain new insights that the minidump format wasn’t explicitly designed to provide.
This is all to say that, beyond the basic layout of the minidump header and directory, it’s basically just a big ball of random formats with independent formats and layout – and everyone is technically free to come up with their own custom Streams that they can just toss in there, so trying to cover everything is kind of impossible? Lets see how far we get!
§The Minidump Header and Directory
The first thing in a Minidump is the MINIDUMP_HEADER
, which has the
following layout:
pub struct MINIDUMP_HEADER {
pub signature: u32,
pub version: u32,
pub stream_count: u32,
pub stream_directory_rva: RVA,
pub checksum: u32,
pub time_date_stamp: u32,
pub flags: u64,
}
/// Offset into the minidump
pub type RVA = u32;
The signature
is always MINIDUMP_SIGNATURE
= 0x504d444d
(“MDMP” in ascii). You can use this to detect whether the minidump is little-endian or
big-endian (minidumps always have the endianess of platform they were generated
on, since they contain lots of raw memory from the process, but at this point
we don’t know what that platform is).
The lower 16 bits of version
are always
MINIDUMP_VERSION
= 42899.
(The high bits contain implementation-specific values that you should just
ignore).
stream_directory_rva
and stream_count
are the location (offset from the
start of the file, in bytes) and size of the stream directory, respectively.
checksum
is some kind of checksum of the minidump itself (which may be null),
but the algorithm isn’t specified, and rust-minidump doesn’t check it.
time_date_stamp
is a Windows time_t
of when the miniump was generated.
flags
are a MINIDUMP_TYPE
which largely just specify what you can expect
to find in the minidump. This is unused by rust-minidump since this information
is generally redundant with the stream directory and flags within the streams
that we need to check anyway. (e.g. instead of checking that this is a
MiniDumpWithUnloadedModules
, you can just check the directory for the
MinidumpUnloadedModuleList
stream.)
At stream_directory_rva
(typically immediately after the header) you will find
an array of stream_count
MINIDUMP_DIRECTORY
entries,
with the following layout:
pub struct MINIDUMP_DIRECTORY {
/// The type of the stream
pub stream_type: u32,
/// The location of the stream contents within the dump.
pub location: MINIDUMP_LOCATION_DESCRIPTOR,
}
/// A "slice" of the minidump
pub struct MINIDUMP_LOCATION_DESCRIPTOR {
/// The size of this data (in bytes)
pub data_size: u32,
/// The offset to this data within the minidump file.
pub rva: RVA,
}
/// Offset into the minidump
pub type RVA = u32;
Known stream_type
values are defined in
MINIDUMP_STREAM_TYPE
, but users
are allowed to define their own stream types, so it’s normal to see unknown
types (this is the primary mechanism breakpad uses to extend the format without
causing upstream problems).
And that’s it! Everything else in a minidump is just all the different types of stream. As of this writing, rust-minidump is aware of 51 different types of stream, and implements 18 of them (there’s a long tail of platform-specific and domain-specific streams, so that isn’t as bad as it sounds).
§Stream Format Families
Although every stream can do whatever it wants, there’s a lot of streams that are basically “a struct” or “a list of structs”, so the same header formats and layouts are used in several places. (This is descriptive, so these aren’t necessarily official terms/concepts.)
§Plain Old Struct Streams
A stream that’s just a struct.
That’s it. Just read the struct out of the stream. Although it might contain RVAs to other data, which may or may not be relative to the start of the stream or the start of the file (annoyingly inconsistent between streams).
Known members of this family:
MinidumpAssertion
(containsMINIDUMP_ASSERTION_INFO
)MinidumpBreakpadInfo
(containsMINIDUMP_BREAKPAD_INFO
)MinidumpCrashpadInfo
(containsMINIDUMP_CRASHPAD_INFO
)MinidumpException
(containsMINIDUMP_EXCEPTION_STREAM
)MinidumpSystemInfo
(containsMINIDUMP_SYSTEM_INFO
)
§List Streams
A list of some entry type.
A u32
count of entries followed by an array of entries. There may be padding
between the count and the entries. The array should be “right-justified” in the
stream (the stream ends exactly where the array does), so you can use the
difference between the array’s expected size and the rest of the stream’s size
to determine the padding.
This format is used by a lot of the oldest (and therefore most important) minidump streams.
Known members of this family:
MinidumpMemoryList
(entries areMINIDUMP_MEMORY_DESCRIPTOR
)MinidumpModuleList
(entries areMINIDUMP_MODULE
)MinidumpThreadList
(entries areMINIDUMP_THREAD
)MinidumpThreadNames
(entries areMINIDUMP_THREAD_NAME
)MINIDUMP_THREAD_EX_LIST
(yes, the stream with “EX_LIST” in the name isn’t an EX list, names are hard.)
The stream MinidumpMemory64List
is a variant of list stream. It starts with
a u64
count of entries, a 64-bit shared RVA for all entries, then followed by
an array of entires MINIDUMP_MEMORY_DESCRIPTOR64
.
§EX List Streams
A newer and more flexible version of list streams. (so EXtreme!!!)
EX list streams start with this header:
struct EX_LIST_HEADER {
/// Size (in bytes) of this header (array starts immediately after)
pub size_of_header: u32,
/// Size (in bytes) of an entry in the array
pub size_of_entry: u32,
/// The number of entries in the array
pub number_of_entries: u32,
}
This design allows newer versions of the stream to be introduced, and for fields to be added to the end of an entry type. I am not aware of an instance where this flexibility has been used yet, but in theory you could identify “versions” of the stream format by size, and older versions don’t need to worry about unknown future revisions, because they can just ignore the trailing bytes of each entry.
Known members of this family:
MinidumpMemoryInfoList
(entries areMINIDUMP_MEMORY_INFO
)MinidumpUnloadedModuleList
(entries areMINIDUMP_UNLOADED_MODULE
)MinidumpHandleDataStream
is a slight variation of this format with different filed names and a trailingu32
member reserved for future use (entries areMINIDUMP_HANDLE_DESCRIPTOR
andMINIDUMP_HANDLE_DESCRIPTOR_2
)MinidumpThreadInfoList
(entries areMINIDUMP_THREAD_INFO
)
§Linux List Streams
A dump of a special linux file like /proc/cpuinfo
.
These streams are plain text (strings::LinuxOsString
) files containing
line-delimited key-value pairs, like:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
Whitespace and separators vary from stream to stream.
Known members of this family:
MinidumpLinuxCpuInfo
(separator is:
)MinidumpLinuxEnviron
(separator is=
)MinidumpLinuxLsbRelease
(separator is=
)MinidumpLinuxProcStatus
(separator is:
)MinidumpLinuxProcLimits
(separator is
Modules§
- format
- Minidump structure definitions.
- strings
- system_
info - Information about the system that produced a
Minidump
.
Structs§
- CpuRegisters
- An iterator over registers and values in a
CpuContext
. - Minidump
- An index into the contents of a minidump.
- Minidump
Assertion - Information about an assertion that caused a crash.
- Minidump
Breakpad Info - Additional information about process state.
- Minidump
Context - CPU context such as register states.
- Minidump
Crashpad Info - Additional Crashpad-specific information carried within a minidump file.
- Minidump
Exception - Information about the exception that caused the minidump to be generated.
- Minidump
Handle Data Stream - A stream holding all the system handles at the time the minidump was written. On Linux this is the list of open file descriptors.
- Minidump
Handle Descriptor - Describes the state of an individual system handle at the time the minidump was written.
- Minidump
Handle Object Information - Contains object-specific information for a handle. Microsoft documentation doesn’t describe the contents of this type.
- Minidump
Implemented Stream - A stream in the minidump that this implementation can interpret,
- Minidump
Linux CpuInfo - Interesting values extracted from /proc/cpuinfo
- Minidump
Linux Environ - Interesting values extracted from /proc/self/environ
- Minidump
Linux LsbRelease - Interesting values extracted from /etc/lsb-release
- Minidump
Linux MapInfo - A memory mapping entry for the process we are analyzing.
- Minidump
Linux Maps - The contents of
/proc/self/maps
for the crashing process. - Minidump
Linux Proc Limits - Interesting values extracted from /proc/self/limits
- Minidump
Linux Proc Status - Interesting values extracted from /proc/self/status
- Minidump
MacBootargs - Minidump
MacCrash Info - Minidump
Memory Base - A region of memory from the process that wrote the minidump. This is the underlying generic type for MinidumpMemory and MinidumpMemory64.
- Minidump
Memory Info - Metadata about a region of memory (whether it is executable, freed, private, and so on).
- Minidump
Memory Info List - Minidump
Memory List Base - A list of memory regions included in a minidump. This is the underlying generic type for MinidumpMemoryList and MinidumpMemory64List.
- Minidump
Misc Info - Miscellaneous information about the process that wrote the minidump.
- Minidump
Module - An executable or shared library loaded in the process at the time the
Minidump
was written. - Minidump
Module Crashpad Info - Additional Crashpad-specific information about a module carried within a minidump file.
- Minidump
Module List - A list of
MinidumpModule
s contained in aMinidump
. - Minidump
Soft Errors - Soft errors encountered by minidump-writer during generation
- Minidump
System Info - Information about the system that generated the minidump.
- Minidump
Thread - The state of a thread from the process when the minidump was written.
- Minidump
Thread Info - The state of a thread from the process when the minidump was written.
- Minidump
Thread Info List - A list of
MinidumpThread
s contained in aMinidump
. - Minidump
Thread List - A list of
MinidumpThread
s contained in aMinidump
. - Minidump
Thread Names - A mapping of thread ids to their names.
- Minidump
Unimplemented Stream - A stream in the minidump that this implementation is aware of but doesn’t yet support.
- Minidump
Unknown Stream - A stream in the minidump that this implementation has no knowledge of.
- Minidump
Unloaded Module - An executable or shared library that was once loaded into the process, but was unloaded
by the time the
Minidump
was written. - Minidump
Unloaded Module List - A list of
MinidumpUnloadedModule
s contained in aMinidump
.
Enums§
- Code
View - CodeView data describes how to locate debug symbols
- Context
Error - Errors encountered while reading a
MinidumpContext
. - Crash
Reason - The reason for a process crash.
- Endian
- The endianness (byte order) of a stream of bytes
- Error
- Errors encountered while reading a
Minidump
. - Minidump
Annotation - A typed annotation object.
- Minidump
Context Validity - Information about which registers are valid in a
MinidumpContext
. - Minidump
RawContext - The CPU-specific context structure.
- RawHandle
Descriptor - RawMac
Crash Info - RawMisc
Info - Unified
Memory - Provides a unified interface for MinidumpMemory and MinidumpMemory64
- Unified
Memory Info - A
UnifiedMemoryInfoList
entry, providing metatadata on a region of memory in the crashed process. - Unified
Memory Info List - Provides a unified interface for getting metadata about the process’s mapped memory regions at the time of the crash.
- Unified
Memory List - Provides a unified interface for MinidumpMemoryList and MinidumpMemory64List
Traits§
- CpuContext
- Generic over the specifics of a CPU context.
- Minidump
Stream - The fundamental unit of data in a
Minidump
. - Module
- An executable or shared library loaded in a process.
- Readable
- Shorthand for Read + Seek
Type Aliases§
- Minidump
Memory - A region of memory from the process that wrote the minidump.
- Minidump
Memory64 - A large region of memory from the process that wrote the minidump (usually a full dump).
- Minidump
Memory64 List - A list of large memory regions included in a minidump (usually a full dump).
- Minidump
Memory List - A list of memory regions included in a minidump.
- Mmap
Minidump - An index into the contents of a memory-mapped minidump.