droidsaw
droidsaw takes a DEX file or Hermes bundle apart and puts it back together byte-for-byte — 5,767 DEX files from F-Droid recovered bit-identically under preservation mode, Hermes bytecode round-trip verified on v84, v96, v98, and v99. That test fails loudly when the format model has a hole: a string-table offset wrong by one, an alignment requirement missed, a padding byte forgotten — the re-emitted bytes diverge and the test says where. Most Android RE tooling works per-layer; droidsaw traces a JS value through the React Native bridge into Java as a single taint path. Hand it an APK and it unpacks the container, decompiles every layer, and pipes the output through Semgrep and TruffleHog in the same pass. CLI and MCP server share one command surface. Pure Rust, BSD-3-Clause.
What it finds
Patterns surfaced across a corpus of production apps spanning dating, social, fintech, banking, and health:
- Key material in the JS heap — private-key operations running in Hermes with no native boundary, no memory zeroing. Visible because droidsaw reads the Hermes layer.
- Signing-chain weaknesses at the cryptographic level — ROCA fingerprint (CVE-2017-15361), Fermat-factorable close primes (returns
(p, q)), batch-GCD shared-prime recovery across a corpus (Bernstein quasilinear), Wiener-regime exponent fingerprint (e > N^0.75; full recovery). - Data crossing the React Native bridge into sinks it shouldn't reach — JS network input landing in
Runtime.exec(), analytics SDKs receiving health or financial signals. Reported as a single taint path from JS source to Java sink. - Exported components without permission guards — activities and services reachable from any app, leaking auth tokens, OAuth codes, refresh tokens.
- Staging infrastructure surviving release — internal endpoints, debug flags, hardcoded credentials across the manifest, DEX string pool, and Hermes global string table.
Bidirectional xref is for understanding what an app does. Trace a suspicious string to every function that touches it, then walk the callers to map the full dispatch surface. A worked example on a stalkerware sample: Cerberus Anti-theft: Stalkerware RE.
Commands
| Command | Output |
|---|---|
audit |
Security audit (--mode=<basic|full|semgrep|trufflehog>) |
decompile |
Decompile DEX classes to Java or Hermes functions to JavaScript |
taint |
Cross-layer taint paths (JS → bridge → Java → sink) |
xrefs |
Bidirectional cross-reference — strings to functions and callers to callees. "Who references this key?" is a query, not a grep. |
manifest |
AndroidManifest.xml analysis |
signing |
v1 / v2 / v3 / v4 signing block analysis |
apk-info |
Container report: manifest + signing + ELF + entropy |
info |
Bytecode layer summary (version, function / string / class counts) |
hbc |
Hermes subcommands (info, functions, strings, decompile, disassemble) |
dex |
DEX subcommands (classes, methods, strings) |
Other commands (elf, entries, resources, frida, sbom, export, yara, diff, call-graph, native-modules, module-list, npm-packages, scan-corpus, corpus-ingest, trufflehog, semgrep) — see droidsaw --help.
Cross-layer taint
audit --mode=basic and --mode=full run three taint passes. Results land in the taint_flows SQLite table.
- HBC pass. User-controlled inputs seeded and propagated through Hermes functions. Detects
DirectEval(CWE-95) and tainted args to NativeModuleCall*ops. Records which arg positions carried taint. - DEX pass. Follows
invoke-direct,invoke-static, and monomorphicinvoke-virtual/invoke-interfaceacross DEX boundaries via a cross-DEX class hierarchy index (CHA). Interprocedural depth: 4. - Bridge pass.
@ReactMethodparams seeded as taint sources, then run through the DEX pass. Seeding is restricted to the arg positions the HBC pass found tainted — only the JS-side values that actually carry taint become Java-side seeds. Falls back to all params when no HBC bundle is present.
15 taint sources × 15 sinks defined in droidsaw-common, including bridge edges that span the layers.
Per-layer scope
| Layer | Status |
|---|---|
| APK container | Signing v1–v4, crypto (ROCA / Fermat / Wiener / batch-GCD), YARA (credential / packer / crypto / anti-analysis), SBOM, ELF metadata. |
| DEX → Java | Decompiler. 100% byte-identical roundtrip on F-Droid corpus under preservation mode. |
| Hermes → JS | Decompiler. v40–v100. Byte-exact reconstruction. OXC round-trip validated on decompile output. |
| Native ELF | Hardening flags, JNI exports, relocation counts. No disassembly. |
| Dart AOT | Deferred. |
| IL2CPP | Deferred. |
MCP server
droidsaw exposes its full command surface over MCP. An agent can load an APK, run a cross-layer audit, query taint findings, decompile a specific class, and diff two Hermes bundles — all in one session, one schema, without spawning subprocesses or reassembling output.
13 tools: load, info, manifest, signing, audit, query, xrefs, decompile, taint, frida, triage, investigate, strings, diff. Large outputs stream to a tempfile rather than the context window.
HTTP transport: streamable-http binds loopback by default. Non-loopback binds emit a stderr warning. v1 ships without authentication — for non-loopback exposure, terminate at a reverse proxy that adds auth and TLS.
Integrations
| Tool | Surface |
|---|---|
| SQLite | export / audit / corpus-ingest. Every layer queryable as relational tables. findings + taint_flows schemas at rev 4. query MCP tool runs read-only SELECT against the audit DB. |
| Semgrep | audit --mode=semgrep / scan semgrep --persist. Decompiled DEX source extracted to disk and fed to Semgrep. Custom rules via --rules <path> or DROIDSAW_SEMGREP_RULES. |
| TruffleHog | audit --mode=trufflehog / trufflehog subcommand. Extracted strings from every layer piped to TruffleHog. Verified hits land in the credentials view. |
| YARA-X | yara / bundled in audit. YARA-X (Rust port). Bundled rule packs; custom rules accepted. Provenance-aware suppression. |
| STIX 2.1 | audit --stix-feed <path>. Loads any STIX 2.1 bundle (file path; no network I/O). IOC matches against parsed APK content. |
| Frida | frida subcommand. Auto-generated hook stubs against functions that touch matched strings. |
Architecture
Both decompilers follow the same pipeline. The middle stages live in droidsaw-common (generic over an I: Instr trait); the bundle crate supplies its own Insn type and language-specific sugar.
| Stage | Module | Input → Output |
|---|---|---|
| decode | <bundle>/decode.rs, <bundle>/parser/ |
&[u8] → Vec<Insn> |
| CFG | <bundle>/cfg.rs, oracle in common/graph/ |
Vec<Insn> → basic blocks + edges |
| dominators | common/graph/dominators.rs |
basic blocks → idom map |
| SSA (Braun) | common/ssa/, <bundle>/ssa.rs |
basic blocks → SsaFunction |
| Expr IR | <bundle>/expr.rs (Hermes), common/region/ |
SsaFunction → expression tree |
| structure | common/region/, <bundle>/structure.rs |
expression tree → RegionTree |
| sugar | <bundle>/sugar.rs, hermes/decompile/ |
RegionTree → RegionTree |
| emit | dex/emit_dex.rs, hermes/emit.rs |
RegionTree → source bytes |
| validate | tests/byte_identity_smoke.rs, tests/hbc_corpus_roundtrip.rs |
source bytes ≡ input (round-trip) |
Deterministic IR — BTreeMap throughout, so output is stable across runs. Typed opcode enum. Cross-validated against reference disassemblers — DEX against dexdump, Hermes against hbcdump.
CrossLayerContext is built once per invocation from a single path:
Apk::parse(path) → apk
apk.hbc → HbcOwned::parse(data) → hbc (self_cell, no Box::leak)
apk.dex → DexFile::parse(data) → dex[]
HbcOwned holds the byte buffer and parsed view together; MCP sessions don't accumulate leaked buffers across loads.
Correctness
Five gates. Each catches what the layer above it can't.
1. Round-trip disassembly
The strongest claim about a format parser is that it understands every byte. One way to test that: parse the file, regenerate the bytes from what was parsed, and check that they are identical to what you started with. A wrong rule anywhere — a string-table offset off by one, a missed alignment requirement, a padding byte misclassified — produces a divergence the test catches precisely.
DEX: 100% byte-identical on the F-Droid corpus (5,767 DEX files across 3,782 apps) under preservation mode. The 5.4% subset (309 files) that differs does so in 24 header bytes only — exclusively legacy-dx-toolchain non-canonical SHA-1 inputs; droidsaw recomputes correct checksums on default emit, or preserves verbatim under audit mode.
Hermes: byte-exact reconstruction on bytecode versions v84, v96, v98, v99 — header, global string table, function table, alignment, and metadata layout. Verified clean on public v96 corpus samples.
Verify locally:
2. Fixture ratchets
DEX: 68 in-repo Java + Kotlin + R8 sources. COMPILE_FAIL = 1. SEMANTIC_FAIL = 0.
Hermes: v96 fixture matrix. COMPILE_FAIL = 0. SEMANTIC_FAIL = 0.
UNRECOGNIZED_REGION ratchet pinned per-fixture in tests/unrecognized_ratchet.rs. Any new region a recognizer fails to handle is a build break.
The ratchet only decreases. A fixture flip blocks merge.
3. Adversarial fuzz
libFuzzer targets: fuzz_parser, fuzz_opcode_decode, fuzz_cfg, fuzz_ssa, fuzz_emit_roundtrip (DEX), fuzz_emit_roundtrip_hbc (Hermes), fuzz_protector_recognizer, fuzz_enum_cross_class.
The parser and decoder targets for both DEX and Hermes ran for extended campaigns with zero panics, zero artifacts.
fuzz_emit_roundtrip{_hbc} runs the round-trip property under libFuzzer instrumentation on the full input space, not just the fixture set.
4. Cross-tool differential
DEX vs dexdump (Android SDK's DEX disassembler), used as a code-unit coverage oracle. Every class descriptor and every method (class, name, proto) triple dexdump -d enumerates must also appear in droidsaw-dex output. A missed class or method is a build break.
Hermes vs hbcdump (Meta's official disassembler):
- Parse-side: header, global string table, function table compared byte-for-byte.
- Instruction-level: 12,000 sampled
(opname, operand_count)tuples compared across functions. Zero opcode disagreements.
5. Formal proofs
Kani (bounded model checking). 96 harnesses across the workspace on statements with decidable input domains: MUTF-8 codec totality, signing-block padding gate, LEB128 read/write round-trip, bit-field bounds, Hermes function_get u64-overflow guard (against a u128 oracle), MANIFEST.MF base64 positional gate, recursion-depth caps, per-tag truncation guards, base64 capacity arithmetic.
Lean 4. 20 proved theorems on statements that quantify over arbitrary input length or arbitrary CFG shape — out of Kani's bounded reach. AXML parser totality and acyclicity. Dominator antisymmetry, transitivity, and unique idom. Lattice monotonicity of the dataflow fixed point. Hermes try-catch RPO ordering. No sorry, no axiom. Each .lean file names the Rust function it backs via a RUST: comment; the correspondence is asserted in source and maintained by hand, not mechanically verified. See the droidsaw-lean workspace crate.
OXC round-trip. Every Hermes decompile output is parsed back by OXC (a Rust-native JavaScript parser and codegen). Output OXC rejects is annotated and returned, never silently dropped.
The compile-time floor on every non-test module:
Suppressions on the panic family require a written PROOF: obligation and code-review sign-off. panic = "abort" is set workspace-wide — a stale PROOF obligation terminates the process at runtime, not just at lint.
Inputs
APK, XAPK, .hbc, .dex. Hermes bundles and DEX files are extracted from APKs automatically.
Output
stdout is one JSON object, JSON array, or NDJSON stream. Nothing else.
stderr carries progress prefixed droidsaw: .
Exit code 0 on success, 2 on failure. Every failure produces a typed JSON error envelope on stdout:
Repeated runs on the same input produce bit-identical output.
License
BSD-3-Clause.