Crate wholesym

source ·
Expand description

wholesym is a fully-featured library for fetching symbol files and for resolving code addresses to symbols and debug info. It supports Windows, macOS and Linux. It is lightning-fast and optimized for minimal time-to-first-symbol.

Use it as follows:

  1. Create a SymbolManager using SymbolManager::with_config.
  2. Load a SymbolMap with SymbolManager::load_symbol_map_for_binary_at_path.
  3. Look up an address with SymbolMap::lookup.
  4. Inspect the returned AddressInfo, which gives you the symbol name, and potentially file and line information, along with inlined function info.

Behind the scenes, wholesym loads symbol files much like a debugger would. It supports symbol servers, collecting information from multiple files, and all kinds of different ways to embed symbol information in various file formats.

§Example

use wholesym::{SymbolManager, SymbolManagerConfig, LookupAddress};
use std::path::Path;

let symbol_manager = SymbolManager::with_config(SymbolManagerConfig::default());
let symbol_map = symbol_manager
    .load_symbol_map_for_binary_at_path(Path::new("/usr/bin/ls"), None)
    .await?;
println!("Looking up 0xd6f4 in /usr/bin/ls. Results:");
if let Some(address_info) = symbol_map.lookup(LookupAddress::Relative(0xd6f4)).await {
    println!(
        "Symbol: {:#x} {}",
        address_info.symbol.address, address_info.symbol.name
    );
    if let Some(frames) = address_info.frames {
        for (i, frame) in frames.into_iter().enumerate() {
            let function = frame.function.unwrap();
            let file = frame.file_path.unwrap().display_path();
            let line = frame.line_number.unwrap();
            println!("  #{i:02} {function} at {file}:{line}");
        }
    }
} else {
    println!("No symbol for 0xd6f4 was found.");
}

This example prints the following output on my machine:

Looking up 0xd6f4 in /usr/bin/ls. Results:
Symbol: 0xd5d4 gobble_file.constprop.0
  #00 do_lstat at ./src/ls.c:1184
  #01 gobble_file at ./src/ls.c:3403

The example demonstrates support for debuglink and debugaltlink. It gets the symbol information from local debug files at /usr/lib/debug/.build-id/63/260a3e6e46db57abf718f6a3562c6eedccf269.debug and at /usr/lib/debug/.dwz/aarch64-linux-gnu/coreutils.debug, which were installed by the coreutils-dbgsym package. If these files are not present, it will fall back to whichever information is available.

§Features

§Windows

Supported symbol file sources:

  • Local PDB files at the absolute PDB path that’s written down in the .exe / .dll
  • PDB files on Windows symbol servers + _NT_SYMBOL_PATH environment variable
  • Breakpad symbol files, local or on a server
  • DWARF-in-PE debug info
  • Fallback symbols from exported functions and function start addresses

Unsupported for now (patches accepted):

  • Support for /DEBUG:FASTLINK PDB files (issue #53)

§macOS

Supported symbol file sources:

  • Local dSYM bundles with symbol tables + DWARF, found in vicinity of the binary
  • dSYM bundles found via Spotlight
  • DWARF found in object files which are referred to from a linked binary (via OSO stabs symbols)
  • Breakpad symbol files, local or on a server
  • Symbols from the regular symbol table
  • Fallback symbols from exported functions and function start addresses

Unsupported for now (patches accepted):

§Linux

Supported symbol file sources:

  • DWARF and symbol tables in binaries
  • DWARF and symbol tables in separate debug files found via build ID or debug link
  • Symbol tables in MiniDebugInfo
  • Combining multiple files with DWARF if debug info has been partially moved with dwz (using debugaltlink)
  • debuginfod servers and the DEBUGINFOD_URLS environment variable
  • Breakpad symbol files, local or on a server
  • Symbols from the regular symbol table
  • Fallback symbols from exported functions and function start addresses
  • Split DWARF with .dwo files
  • Split DWARF with .dwp files

§Performance

The most computationally intense part of symbol resolution is the parsing of debug info. Debug info can be very large, for example 700MB to 1500MB for Firefox’s libxul. wholesym uses the addr2line and pdb-addr2line crates for parsing DWARF and PDB, respectively. It also has its own code for parsing the Breakpad sym format. All of these parsers have been optimized extensively to minimize the time it takes to get the first symbol result, and to cache things so that repeated lookups in the same functions are fast. This means:

  • No expensive preprocessing happens when the symbol file is first loaded.
  • Parsing is as lazy as possible: If possible, we only parse the bytes that are needed for the function which contains the looked-up address.
  • The first parse is as shallow and fast as possible, and just builds an index.
  • Strings (e.g. function names and file paths) are only looked up when needed.
  • Symbol lists, line records, and inlines are cached in sorted structures, and queried via binary search.

Re-exports§

Structs§

Enums§

  • An enum carrying an identifier for a binary. This is stores the same information as a debugid::CodeId, but without projecting it down to a string.
  • The error type used in this crate.
  • Information to find an address within an external file, for debug info lookup.
  • Information to find an external file with debug information.
  • Contains address debug info (inlined functions, file names, line numbers) if available.
  • An address that can be looked up in a SymbolMap.
  • A special source file path for source files which are hosted online.
  • In case the loaded binary contains multiple architectures, this specifies how to resolve the ambiguity. This is only needed on macOS.