razgad is a Rust library for decoding, classifying, normalizing, and re-emitting mangled, decorated, and runtime symbol names across a wide spread of compiler, platform, and language ecosystems.
It also does something most demangling libraries do not: it exposes a reusable parser for already-demangled function names and tool-generated declaration strings. That matters in real reverse-engineering workflows, where you often need to consume both raw manglings and post-processed names from disassemblers, decompilers, crash logs, symbol servers, game engines, or metadata dumps.
The crate gives you two complementary layers:
- A cross-scheme
Symbolmodel for mangled-name decoding, heuristic detection, and re-encoding. - A function-name parser for extracting access modifiers, calling conventions, return types, templates, argument names, return-location annotations, and trailing qualifiers from display strings.
use ;
Why razgad exists
Most symbol tooling falls into one of three buckets:
- it understands one ABI deeply, but falls apart on mixed corpora
- it only produces a display string, throwing away structure you need for indexing or analysis
- it assumes the input is either fully mangled or fully clean, and has no answer for the messy middle
razgad exists to handle that messy middle.
It treats symbol handling as a normalization problem, not only a pretty-printing problem. Wrappers stay separate from inner grammars. Platform decorations stay orthogonal to semantic identity. Exact byte replay stays possible when the normalized model would otherwise be lossy. And already-demangled names can still be parsed into useful structure instead of being left as opaque strings.
In practice this makes the crate useful for reverse engineering, corpus analysis, binary indexing, signature databases, symbol cleanup, crash-symbol normalization, and any workflow that has to cross boundaries between compilers, languages, and tooling conventions.
Public API
The public surface now has two distinct halves.
Mangling / demangling API
| Function | Purpose |
|---|---|
decode(scheme, input) |
Decode with an explicit, caller-chosen scheme |
heuristic_decode(input) |
Detect likely scheme, attach confidence, then decode |
encode(scheme, &symbol) |
Re-emit a Symbol back into a scheme-specific spelling |
The reusable model is built around:
Scheme- the scheme requested by the caller or selected heuristicallySymbol- the normalized symbol recordName,Type,Signature- structured identity and callable type informationPlatformDecorations- wrappers such as import prefixes, leading underscores, and ELF versionsConfidence- certainty level for heuristic discovery
Function-name parsing API
| Function / type | Purpose |
|---|---|
normalize_symbol_display() |
Normalize Rust-style escape sequences and common display artifacts |
parse_function_name() |
Parse C++-style scoped declarations using :: |
parse_function_name_with_separator() |
Parse alternate scope conventions such as . |
parse_template_node() |
Parse a template tree from a qualified type or callable |
parse_template_node_with_separator() |
Same parser with custom scope separator |
split_scope() / split_scope_with_separator() |
Split qualified paths without breaking nested templates |
split_argument_name() / split_argument_name_with_separator() |
Separate type text from argument names |
template_depth() |
Measure nested template depth in a declaration |
ParsedFunctionName, ParsedArgument, TemplateNode |
Structured outputs for downstream analysis |
This parser layer is not decorative. It is now part of how the crate enriches Plain, dotted naming schemes, Swift displays, MSVC demangled outputs, function-pointer return styles, and receiver-like method displays.
Function-name parser example
The parser is designed for already-readable declarations that still carry useful structure:
use ;
let parsed = parse_function_name
.unwrap;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
It also supports non-C++ scope separators for ecosystems that prefer dotted names:
use parse_function_name_with_separator;
let parsed = parse_function_name_with_separator
.unwrap;
assert_eq!;
It also handles function-pointer return styles such as void (__cdecl *demo::signal(int))(char const *), pointer-to-member declarator forms such as int (demo::Widget::*demo::Factory::slot()), and avoids mistaking Go receiver forms like main.(*T).Method for signatures.
The normalized symbol model
The core idea is a scheme-neutral Symbol tree:
Symbol
|- scheme
|- concrete_family
|- kind
|- path
|- signature
|- special
|- platform
`- verbatim
This split matters.
schemerecords the route the caller cares about:MachO,CoffPe,Elf,IntelNativeCpp, and so on.concrete_familyrecords the inner grammar actually doing the work: for example aMachOsymbol may still be an Itanium C++ symbol under the wrapper.kindseparates normal functions from methods, constructors, destructors, vtables, thunks, metadata, imports, module initializers, type encodings, closures, and runtime artifacts.platformkeeps transport details out of semantic identity: leading underscores, import thunk prefixes, inner scheme hints, ELF versions.verbatimpreserves byte-for-byte replay safety for decoded inputs.
This gives you a model that is useful for programmatic analysis while still remaining practical for exact round-tripping.
Supported schemes
Scheme::all_public() currently exposes 50 public schemes.
| Group | Schemes |
|---|---|
| Core ABIs and mainstream languages | ItaniumCpp, MicrosoftCpp, Dlang, RustLegacy, RustV0, Swift, ObjectiveC, Jni |
| Legacy and vendor C++ families | BorlandCpp, WatcomCpp, DigitalMars, IbmXlCppLegacy, HpAccCppLegacy, SunStudioCppLegacy, CfrontCpp, ArmCppLegacy, GreenHillsCpp, IntelNativeCpp, EdgCppLegacy, CrayCpp, SgiMipsproCpp, MetrowerksCpp, Os400Cpp, Vms, CarbonCpp |
| Calling conventions and binary wrappers | Cdecl, Stdcall, Fastcall, Vectorcall, MachO, CoffPe, Elf |
| Naming and runtime ecosystems | Pascal, FortranExternal, DotNet, Haskell, AdaGnat, GfortranModule, Ocaml, Go, Zig, Nim, PascalDelphi, Modula, Crystal, Vlang, WebAssembly, Plain, UnityIl2Cpp, MonoManaged |
Some important subtleties:
IntelNativeCppis treated as a target-dependent family that can resolve to MSVC or Itanium.MachO,CoffPe, andElfare wrappers, not standalone inner grammars.- Several historical schemes are intentionally modeled as stable naming conventions rather than full ABI-rich type systems.
- Dotted naming families such as Ada, Modula, Pascal/Delphi, Go receiver forms, and parts of Swift / managed-name handling now benefit from the shared declaration parser instead of ad hoc path splitting alone.
Round-tripping philosophy
razgad is deliberately normalized first and lossless by escape hatch.
When you decode a symbol, the original text is preserved in Symbol::verbatim. That means encode() can replay the exact original bytes even when the normalized model does not fully describe every vendor-specific token.
This is a deliberate tradeoff:
- callers get a usable cross-scheme AST
- obscure vendor spellings still survive round-trips intact
- canonical fresh construction stays honest instead of faking precision it does not really have
Fresh canonical encoding is currently implemented for a focused subset, with especially solid coverage for:
- Itanium-family construction
- Windows C decoration families (
cdecl,stdcall,fastcall,vectorcall) - D, JNI, Ada GNAT, gfortran modules, Fortran externals, and V names
- platform wrappers such as Mach-O, COFF import thunks, and ELF versioned symbols
- plain, Unity IL2CPP, and Mono-style managed naming forms
The canonical encoder surface is intentionally narrower than the decoder surface. The crate is conservative about what it claims to synthesize from structured data.
Detection strategy
heuristic_decode() runs ordered sniffers and returns both the chosen Scheme and a Confidence value.
Examples of strong signals:
_R/__R-> Rust v0_ZN...17h...E/__ZN...17h...E-> Rust legacy_Z...,__Z...,_ZTV...-> Itanium-family?name@@...-> MSVC-familyJava_...-> JNI_OBJC_...,-[...],+[...],v@:-> Objective-C forms__imp_...-> COFF import thunk wrapper...@@GLIBCXX_...-> ELF versioned wrapper- Unity IL2CPP and Mono-managed forms are recognized before generic naming fallbacks
For genuinely ambiguous forms the API returns Medium or Low confidence rather than pretending certainty.
Architecture at a glance
Internally the crate is organized around a few clear layers:
src/schemes/contains per-family decoders plus wrapper handling for Mach-O / COFF / ELF.src/heuristics.rshandles scheme discovery and confidence assignment.src/model.rsdefines the shared, scheme-neutral symbol representation.src/codec.rshandles canonical encoding and exact verbatim replay.src/function_names.rsparses already-readable declarations, templates, arguments, calling conventions, and return-location annotations.src/text.rsis the bridge layer that projects demangled or parsed text back intoName,Type,Signature, andSymbolstructures.
One of the more important recent architectural shifts is that the generic function-name parser is no longer just a side utility. It now participates directly in:
Plainsymbol decoding- dotted naming families in
src/schemes/naming.rs - Swift demangled-display enrichment
- MSVC demangled-display enrichment
- function-pointer and pointer-to-member declaration projection through the shared parser path
That keeps the crate from having four separate half-parsers for the same declaration features.
For high-confidence families, razgad leans on battle-tested ecosystem crates where that makes sense:
cpp_demanglefor Itanium-family parsingmsvc-demanglerfor Microsoft C++rustc-demanglefor Rust forms- an in-tree pure-Rust Swift demangler derived from Swift's demangling sources
The important part is what happens after that: vendor-specific outputs are normalized into one common model instead of being left as unrelated display strings.
Validation
The test suite is deliberately behavior-first.
tests/exhaustive.rscontains 102 fixture cases spanning every public scheme inScheme::all_public()- fixture tests assert explicit decode, heuristic detection, and decode-then-encode round-trips
tests/function_names.rsexercises declaration parsing, nested templates, Rust display normalization, alternate scope separators, function-pointer and pointer-to-member declarators, Go receiver displays, and plain-scheme enrichmenttests/model.rschecks that templates, wrappers, metadata, runtime artifacts, dotted naming schemes, Go receiver methods, Objective-C runtime wrappers, Swift, and MSVC all project correctly into the same normalized treecargo testcurrently passes with 33 total tests in this repository
Run the suite with:
There is also a corpus utility for bulk validation against large symbol lists:
That tool reports coverage, scheme distribution, sample failures, and can emit a TSV dump of undecoded symbols for follow-up work.
Building and using it
Build
Test
Use as a dependency
[]
= { = "../razgad" }
Then:
use ;
let symbol = decode?;
assert_eq!;
let detected = heuristic_decode?;
assert_eq!;
let parsed = parse_function_name
.unwrap;
assert_eq!;
Current shape of the project
Today, razgad is already good at a very specific kind of work:
- decoding a broad range of mangled and decorated symbol forms through one API
- preserving wrapper semantics instead of flattening everything into one string
- giving callers a normalized symbol representation they can inspect and transform
- parsing human-readable declaration strings into structured parts
- round-tripping decoded inputs safely
- expanding coverage through fixture-driven and corpus-driven validation
It is not pretending to be a perfect canonical encoder for every ABI on day one. The implementation is intentionally incremental: broad decode coverage first, faithful normalization second, shared declaration parsing across schemes, and canonical fresh encoding where it can be done honestly.
That bias is what makes the crate useful in real reverse-engineering and binary-analysis workflows instead of only in toy examples.