# petriage Feature Scope
## MVP (v0.1) — Must Have
These features are the minimum for a useful surface analysis tool. Every serious PE analysis tool provides these, and without them petriage would not be competitive.
1. **File Info**: file size, file type detection, file hashes (MD5, SHA1, SHA256, imphash) — **implemented**
- imphash: Mandiant-compatible import hash (DLL name normalization, `.dll`/`.ocx`/`.sys` extension removal, ordinal fallback) — **implemented**
2. **DOS Header**: e_magic, e_lfanew — **implemented** (minimal fields, not full 64-byte DOS header)
3. **PE Signature**: PE signature validation ("PE\0\0") — **implemented** (via goblin)
4. **COFF/File Header**: machine type, number of sections, timestamp, characteristics — **implemented**
5. **Optional Header**: magic (PE32/PE32+), entry point, image base, subsystem, DLL characteristics, data directory count — **implemented**
6. **Data Directories**: list all 16 data directories with RVA and size — **implemented**
7. **Section Headers**: name, virtual/raw size, virtual/raw address, characteristics, per-section entropy — **implemented**
8. **Import Table**: DLL names and imported function names (by name) — **implemented**
- Note: import by ordinal display is deferred to goblin's output
9. **Export Table**: exported function names, ordinals, RVAs — **implemented** (ordinal_base correction applied)
- Note: forwarded exports detection is **not yet implemented**
10. **Strings**: ASCII and UTF-16LE string extraction (configurable min length, default 4, max 100K strings) — **implemented**
11. **Overlay Detection**: detect data appended after the last section (offset and size) — **implemented**
12. **Output Formats**: human-readable table output (default) + JSON output (`--json`) + NDJSON output (`--ndjson`) + file output (`-o`) — **implemented**
- `--batch <dir>`: Batch-analyze all PE files in a directory — **implemented**
- `--ndjson`: Newline-delimited JSON (one JSON object per line, ideal for streaming/piping) — **implemented**
- `--fail-on <severity>`: Exit with code 3 if any anomaly meets or exceeds the given severity (critical/warning/info) — **implemented**
## v0.2 — Important
These features differentiate a good tool from a basic one. PEStudio, PE-bear, and PPEE all provide these.
13. **Resource Directory**: resource tree parsing (types, names, languages, sizes), VS_VERSIONINFO parsing, manifest extraction, embedded icon extraction (RT_GROUP_ICON / RT_ICON → ICO reconstruction) and GUI display — **implemented**
14. **Rich Header**: parsing, XOR key extraction, compiler/linker tool entries (comp.id, product.id, count), Rich Hash (MD5, YARA/VirusTotal compatible), checksum verification (tampering detection), Product ID database (~70 entries, VS 6.0–2022), comp_id hex display — **implemented**
15. **TLS Directory**: TLS callback detection (critical for malware — callbacks run before main), PE32/PE32+ support, callback VA listing — **implemented**
16. **Debug Directory**: PDB path, debug type (CodeView, COFF, etc.), GUID, age. PDB paths are always parsed (not gated by `--all`) and surfaced as OPSEC indicators in CLI (yellow highlight + dedicated section) and GUI (orange badge on Debug/File Info tabs) — **implemented**
17. **Suspicious API Indicators**: ~130 APIs across 12 categories with 3-level severity (high/medium/low), CLI color-coding, GUI filtering — **implemented**
18. **Anomaly Detection**: 21 heuristic rules with `rule_id`/`evidence`/`threshold` for JSON traceability. Covers packing (entropy, W^X, expansion ratio), security features (ASLR/DEP/CFG/SEH), timestamp anomalies, structural issues, suspicious API combos, OPSEC indicators (OPSEC-001: PDB path leakage), and Rich Header integrity (RICH-001: checksum tampering, RICH-002: missing Rich Header). All arithmetic uses checked/float operations to prevent overflow panics on crafted PEs — **implemented**
19. **PE Header Editor (GUI)**: CFF Explorer-style header editing in the Editor tab. Editable fields: COFF header (TimeDateStamp, Characteristics with flag checkboxes), Optional header (AddressOfEntryPoint, ImageBase PE32/PE32+, SectionAlignment, FileAlignment, SizeOfImage, SizeOfHeaders, CheckSum, Subsystem, DllCharacteristics with 7 individual flag checkboxes), Section headers (Name, VirtualSize, VirtualAddress, SizeOfRawData, PointerToRawData, Characteristics with flag checkboxes). Modified fields highlighted, pending edits tracked, Save As writes patched PE. Boundary-checked: truncated optional headers show error instead of editable fields; OOB edits skipped on save; no-op edits are not tracked — **implemented**
20. **Load Config Directory**: SEH handler table, CFG function table, guard flags — **not yet implemented**
21. **TUI Hex Viewer**: interactive terminal hex viewer with PE region navigation, alternate screen mode (`--features tui`, `-x`/`--view` flag) — **implemented**
22. **Authenticode**: digital signature presence detection, PKCS#7/CMS parsing, X.509 certificate chain extraction (subject, issuer, serial, validity, SHA-1 thumbprint), signer identification, expiry/self-signed/chain warnings (`-c`/`--authenticode`, GUI "Signing" tab) — **implemented**
## v0.3 — Advanced
These features make petriage a comprehensive professional-grade tool.
23. **.NET Metadata**: CLR header, metadata tables, streams, managed entry point
24. **Bound/Delay Imports**: parsing and display
25. **Relocation Table**: parsing (base relocation entries)
26. **Entropy Histogram**: per-section and overall entropy with visual bar chart in terminal
27. **Packer Detection**: signature-based packer/compiler identification (PEiD-compatible signatures)
28. **Exception Directory**: exception handler table (x64)
## Technical Approach
### Rust Crate Selection
| Crate | Purpose | Justification |
|-------|---------|---------------|
| **goblin** | Primary PE parser | Best-maintained Rust PE library; handles headers, sections, imports, exports; fuzz-tested against 100M+ inputs |
| **clap** | CLI argument parsing | Industry standard for Rust CLIs; derive macro for clean code |
| **md-5, sha1, sha2** | Hash computation | Standard RustCrypto crates |
| **serde + serde_json** | JSON output | De facto Rust serialization |
| **image** | ICO/PNG/BMP decoding for icon display (GUI only) | Standard Rust image library; optional dependency gated behind `gui` feature |
| **ratatui** | TUI hex viewer (optional, `tui` feature) | Terminal UI framework; alternate screen mode for interactive PE browsing |
| **crossterm** | Terminal I/O for TUI (optional, `tui` feature) | Cross-platform terminal manipulation |
| **cms** | PKCS#7/CMS SignedData parsing for Authenticode | Standard RustCrypto crate for CMS/PKCS#7 |
| **x509-cert** | X.509 certificate parsing | Standard RustCrypto crate for X.509 |
| **der** | ASN.1 DER encoding/decoding | Required by cms and x509-cert |
| **const-oid** | OID constants (e.g., CN = 2.5.4.3) | Required for X.509 attribute extraction |
| Manual parsing | Rich header, TLS, debug, resources, overlay | goblin doesn't expose these; straightforward to parse from raw bytes |
**Why goblin over pelite?** goblin is more actively maintained (recent releases, larger community), handles both PE32 and PE32+ uniformly, and is heavily fuzz-tested. pelite has deeper PE coverage but slower release cadence. We supplement goblin's gaps with targeted manual parsing rather than pulling in a second full PE library.
### Architecture
```
petriage <file> [OPTIONS]
Options:
-a, --all Show all information (default)
-H, --headers Show headers only (DOS + COFF + Optional)
-i, --imports Show imports
-e, --exports Show exports
-s, --sections Show sections
-S, --strings Show strings
-r, --resources Show resources
-c, --authenticode Show Authenticode/code signing info
--hashes Show file hashes
--overlay Show overlay information
--json Output as JSON
--min-str-len <N> Minimum string length (default: 4)
-o, --output <FILE> Write output to file
-x, --view Launch TUI hex viewer (--features tui)
(GUI is a separate binary: petriage-gui)
-h, --help Print help
-V, --version Print version
```
### Module Structure (actual)
```
src/
main.rs # CLI entry point, argument parsing, exit code contract
rich_db.rs # Rich Header Product ID database (~70 entries, VS 6.0–2022)
analysis.rs # All PE analysis logic in one module:
# headers, sections, imports, exports, strings,
# hashes, entropy, overlay, resources (tree/version/manifest/icons),
# suspicious API indicators (~130 APIs, 12 categories),
# anomaly detection (21 rules with rule_id/evidence/threshold),
# authenticode (PKCS#7/CMS + X.509 certificate chain)
output.rs # Human-readable and JSON formatting
gui/mod.rs # egui GUI entry point (optional, --features gui)
gui/app_state.rs # GUI application state
gui/panels/ # GUI tab panels (file_info, headers, sections, imports, ..., authenticode, editor)
tui.rs # ratatui TUI hex viewer (optional, --features tui)
```
> Note: The original plan split logic across multiple files (strings.rs, hashes.rs, etc.) but the actual implementation consolidates everything in `analysis.rs` for simplicity. This may be refactored as the codebase grows.
### Cross-Compilation
Rust's cross-compilation support enables single-binary distribution:
- `cargo build --target x86_64-unknown-linux-gnu`
- `cargo build --target aarch64-unknown-linux-gnu`
- `cargo build --target x86_64-apple-darwin`
- `cargo build --target aarch64-apple-darwin`
- `cargo build --target x86_64-pc-windows-gnu`
All targets produce static binaries with zero runtime dependencies.
### Design Principles
1. **No execution**: petriage never executes or loads the PE — pure static/surface analysis
2. **Robust parsing**: Handle malformed and truncated PEs gracefully (common in malware)
3. **Fast**: Process files in milliseconds, suitable for batch analysis of thousands of samples
4. **Offline**: No network calls by default (no VirusTotal, no update checks)
5. **Composable**: JSON output enables piping to jq, integration with SIEM, etc.