hermes_rs 0.1.3

A dependency-free disassembler and assembler for the Hermes bytecode
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
# hermes_rs

Note: Still a WIP - A PR is always welcome.  


A *nearly* dependency-free disassembler and assembler for the Hermes bytecode, written in Rust. 

A special thanks to [P1sec](https://github.com/P1sec/hermes-dec) for digging through the Hermes git repo, pulling all of the BytecodeDef files out, and tagging them. This made writing this tool much easier.

- [hermes\_rs]#hermes_rs
    - [Supported HBC Versions]#supported-hbc-versions
      - [Project Goals]#project-goals
        - [Potential Use cases]#potential-use-cases
      - [Features]#features
  - [Installation]#installation
  - [Usage]#usage
      - [Reading File Header]#reading-file-header
      - [Reading Strings]#reading-strings
      - [Reading Function Headers]#reading-function-headers
      - [Dumping Bytecode]#dumping-bytecode
      - [Encoding Instructions]#encoding-instructions
      - [Creating Binaries From Scratch]#creating-binaries-from-scratch
      - [Using specific HBC Versions]#using-specific-hbc-versions
- [Hermes Resources]#hermes-resources
- [Development]#development
    - [Supporting new versions of Hermes]#supporting-new-versions-of-hermes

### Supported HBC Versions

| HBC Version | Disassembler | (Binary) Assembler | (Textual) Assembler | Decompiler |
| ----------- | ------------ | ------------------ | ------------------- | ---------- |
| 89          |||||
| 90          |||||
| 93          |||||
| 94          |||||
| 95          |||||
| 96          |||||

A couple of features are missing currently, as they're low priority for me at the moment.

- Regular Expression deserialization and serialization*  
- Debug Info  deserialization and serialization*  

\* _Supports u8 buffer for manual population_  


#### Project Goals

- Full coverage for all public HBC versions  
- The ability to inject code stubs directly into the .hbc file for instrumentation  
- Textual HBC assembly  
- Eventually a halfway decent decompiler, but that may be another project that uses this one  

##### Potential Use cases

- Find which functions reference specific strings
- Generate frida hooks for mobile implementations
  - hermes loader -> hook loading the package -> feed to hermes_rs -> patch code
    for bidirectional communication or even just logging
- Writing fuzzers

#### Features

- Disassemble Hermes Bytecode (HBC)  
- Assemble Hermes Bytecode (HBC)  
- Type-safe instruction building across multiple versions of HBC  
- The ability to reduce binary size by [only enabling certain versions of HBC]#using-specific-hbc-versions

## Installation

`cargo add hermes_rs`

## Usage

#### Reading File Header

```rust
let filename = "./input_data/index.android.bundle";

let f = File::open(filename).expect("no file found");
let mut reader = io::BufReader::new(f);

let mut hermes_file = HermesFile::deserialize(&mut reader);

println!("{:?}", hermes_file.header);
```

Output:

```go
HermesHeader {
  magic: 2240826417119764422,
  version: 94,
  sha1: [20, 178, 139, 133, 105, 198, 134, 29, 58, 101, 194, 248, 210, 173, 84, 79, 162, 174, 43, 205],
  file_length: 11059884,
  global_code_index: 0,
  function_count: 54483,
  string_kind_count: 3,
  identifier_count: 35878,
  string_count: 65091,
  overflow_string_count: 425,
  string_storage_size: 2238216,
  big_int_count: 0,
  big_int_storage_size: 0,
  reg_exp_count: 448,
  reg_exp_storage_size: 49719,
  array_buffer_size: 132510,
  obj_key_buffer_size: 43517,
  obj_value_buffer_size: 137207,
  segment_id: 0,
  cjs_module_count: 0,
  cjs_module_offset: 0,
  function_source_count: 1361,
  debug_info_offset: 11059836,
  options: BytecodeOptions {
    static_builtins: false,
    cjs_modules_statically_resolved: false,
    has_async: false,
    flags: false
  },
  _padding: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
}
```

#### Reading Strings  

```rust
println!("Strings: {:?}", hermes_file.get_strings());
```

Output:  

```sh
Strings: ["$$typeof", "type", "API", "isArray", "Array", ... ]
```

#### Reading Function Headers

```rust
for func in hermes_file.function_headers {
    println!("{:?}", func);
}

// Prints the following:
Small(SmallFunctionHeader { offset: 252641, param_count: 3, byte_size: 63, func_name: 5168, info_offset: 842244, frame_size: 16, env_size: 0, highest_read_cache_index: 1, highest_write_cache_index: 0, flags: FunctionHeaderFlag { prohibit_invoke: ProhibitNone, strict_mode: false, has_exception_handler: false, has_debug_info: false, overflowed: false }, exception_handlers: [], debug_info: None })
Small(LargeFunctionHeader { offset: 252704, param_count: 2, byte_size: 41, func_name: 3756, info_offset: 842244, frame_size: 14, env_size: 0, highest_read_cache_index: 1, highest_write_cache_index: 0, flags: FunctionHeaderFlag { prohibit_invoke: ProhibitNone, strict_mode: false, has_exception_handler: false, has_debug_info: false, overflowed: false }, exception_handlers: [], debug_info: None })
...
```

#### Dumping Bytecode

**Single Function**

Call `print_bytecode_for_fn(fidx)`, where `fidx` is the function index of the element in `hermes_file.function_headers`.

```rust
hermes_file.parse_bytecode_for_fn(1337);
```

Output:

```asm
Function<setCurrentTarget>(3 params, 11 registers, 0 symbols):
0x00000000	GetEnvironment  r0,  0
0x00000001	LoadFromEnvironment  r2,  r0,  6
0x00000002	LoadConstUndefined  r0
0x00000003	LoadParam  r1,  1
0x00000004	Call2  r2,  r2,  r0,  r1
0x00000005	LoadConstNull  r1
0x00000006	PutById  r2,  r1,  1,  "currentTarget"
0x00000007	Ret  r0
```

**Entire File**  

Printing out the bytecode for the entire executable in a human-readable format is possible by calling `print_bytecode`.  

```rust
hermes_file.print_bytecode();
```

Which outputs:  

```smali
Function<global>(1 params, 19 registers, 0 symbols):
0x00000000	DeclareGlobalVar  "__BUNDLE_START_TIME__"
0x00000001	DeclareGlobalVar  "__DEV__"
0x00000002	DeclareGlobalVar  "process"
0x00000003	DeclareGlobalVar  "__METRO_GLOBAL_PREFIX__"
0x00000004	CreateEnvironment  r3
0x00000005	LoadThisNS  r4
0x00000006	GetById  r1,  r4,  1,  "nativePerformanceNow"
0x00000007	GetGlobalObject  r0
0x00000008	JmpTrue  2L1,  r1
0x00000009	TryGetById  r2,  r0,  2,  "Date"
0x0000000A	GetByIdShort  r1,  r2,  3,  "now"
0x0000000B	Call1  r1,  r1,  r2
0x0000000C	Jmp  L1
0x0000000D	TryGetById  r5,  r0,  1,  "nativePerformanceNow"
    L1:
0x0000000E	LoadConstUndefined  r2
0x0000000F	Call1  r1,  r5,  r2
0x00000010	PutById  r0,  r1,  1,  "__BUNDLE_START_TIME__"
0x00000011	LoadConstFalse  r1
0x00000012	PutById  r0,  r1,  2,  "__DEV__"
0x00000013	GetByIdShort  r1,  r4,  4,  "process"
0x00000014	JmpTrue  5,  r1
...
```

**Raw Bytes**  

In the event that you want to access *just* the raw bytes for a specific function, you can use `hermes_file.get_bytecode()` and iterate.
The function returns a `Vec<FunctionBytecode>`, which has the function index and bytecode (`Vec<u8>`) pairing.

```rust
let bc = hermes_file.get_bytecode();

for func in bc {
  println!("{:?}", func);
}
```

Output:  

```sh
FunctionBytecode { func_index: 0, bytecode: [52, 2, 11, 0, 0, 52, 3, 11, 0, 0, 52, 217, 0, 0, 0, 52, ..<truncated>... ] }
FunctionBytecode { func_index: 1, bytecode: [50, 2, 108, 8, 1, 42, 2, 0, 8, 100, 7, 2, 2, 0, 100, 4, ..<truncated>... ] }
FunctionBytecode { func_index: 2, bytecode: [48, 0, 57, 0, 0, 1, 19, 0, 54, 1, 0, 2, 219, 106, 1, 1, ..<truncated>... ] }
FunctionBytecode { func_index: 3, bytecode: [108, 3, 2, 41, 0, 0, 46, 2, 0, 1, 54, 1, 2, 1, 163, 83, ..<truncated>... ] }
FunctionBytecode { func_index: 4, bytecode: [108, 3, 1, 41, 0, 0, 46, 2, 0, 1, 54, 1, 2, 1, 47, 83,  ..<truncated>... ] }
FunctionBytecode { func_index: 5, bytecode: [108, 4, 1, 41, 2, 0, 46, 1, 2, 1, 54, 0, 1, 1, 47, 83,  ..<truncated>... ] }
FunctionBytecode { func_index: 6, bytecode: [108, 4, 1, 41, 2, 0, 46, 1, 2, 1, 54, 0, 1, 1, 47, 83,  ..<truncated>... ] }
```

#### Encoding Instructions

Encoding instructions is trivial - each `Instruction` implements a trait with `deserialize` and `serialize` methods.


```rust
use hermes_rs::{define_instructions, InstructionParser};

/*
 * Use the define_instructions macro to get a vec of the correct instructions
 * for the version of Hermes you're targeting.
* The first parameter is the hermes version you'd like to use.
*
* The bytecode below represents: eval(`print(123);`)
* The `print(123)` string is the second (index 1) string in the string table.
*/
let instructions = define_instructions!(
    hermes_rs::v96,
    LoadConstString { r0: 0, p0: 1 },   // Load `print(123);` into r0
    DirectEval { r0: 0, r1: 0, p0: 0 }, // Evaluate the string
    Ret { r0: 0 },                      // Return
).unwrap();

let mut writer = Vec::new();

for instr in instructions {
    instr.serialize(&mut writer);
}

// Make sure the encoded bytes are valid
assert!(writer == vec![115, 0, 1, 0, 94, 0, 0, 0, 92, 0], "Bytecode is incorrect!");
```

#### Creating Binaries From Scratch  

Take a look at the [Creating Binaries](./CreatingBinaries.md) example.

#### Using specific HBC Versions

Want to use a specific version of the Hermes bytecode and reduce your binary size?

In Cargo.toml, find the `hermes_rs` dependency and select which HBC version(s) you'd like to use in your application.

Example:

```toml
[dependencies]
hermes_rs = { features = ["v89", "v90", "v93", "v94", "v95", "v96"] }
```

# Hermes Resources

**My Stuff**  

- **Github\.io Page**: https://pilfer.github.io/mobile-reverse-engineering/react-native/  

**Other Resources**  
- **Official docs**: https://hermesengine.dev/  
  - Source: https://github.com/facebook/hermes  
- **hermes-dec** disassembler/decompiler:  
  - https://github.com/P1sec/hermes-dec  
  - Opcode Docs: https://p1sec.github.io/hermes-dec/opcodes_table.html  
- **hbctool**: https://github.com/bongtrop/hbctool  
- **hasmer**: https://github.com/lucasbaizer2/hasmer  

---

# Development

### Supporting new versions of Hermes

This section assumes that only instructions have been modified, and not core parsing logic (struct fields, RegExp bytecode, Debug Info fields, etc). If the latter has a diff, obviously we'll need to implement those changes.  

There is a script in `./def_versions/_gen_macros.js` that reads and parses a Bytecode Definition file passed to it as the first argument and outputs a file containing the macro body to support the updated instructions.


```sh
# How I generated them

cd ./def_versions

node _gen_macros.js 89.def > ../src/hermes/v89/mod.rs
node _gen_macros.js 90.def > ../src/hermes/v90/mod.rs
node _gen_macros.js 93.def > ../src/hermes/v93/mod.rs
node _gen_macros.js 94.def > ../src/hermes/v94/mod.rs
node _gen_macros.js 95.def > ../src/hermes/v95/mod.rs
```

Example with a hypothetical `v100` version :

```sh
node _gen_macros.js v100.def
```

Which outputs:

```rust
use crate::hermes;

build_instructions!(
  (0, Unreachable, ),
  (1, NewObjectWithBuffer, r0: Reg8, p0: UInt16, p1: UInt16, p2: UInt16, p3: UInt16),
  (2, NewObjectWithBufferLong, r0: Reg8, p0: UInt16, p1: UInt16, p2: UInt32, p3: UInt32),
  (3, NewObject, r0: Reg8),
  (4, NewObjectWithParent, r0: Reg8, r1: Reg8),
  ... other instructions here
);
```

From here, you'll add a new directory and `mod.rs` file for this version (`./src/hermes/v100/mod.rs`) and paste the output from the script into it.

This could (and probably should) be a `build.rs` process.

After creating this file, open up `./src/hermes/mod.rs` and navigate to the Instruction module imports and add the import, then populate the Instruction enum + trait + other functions' match statements with the new version. You'll likely need to rely on the compiler to complain about missing match branches - there's only a few, though.

As this codebase evolves, you may need add branch arms in different matches.

```rust
#[macro_use]
#[cfg(feature = "v100")]
pub mod v100;

// ...

pub enum Instruction {
  // ...
  #[cfg(feature = "v100")]
  V100(v100::Instruction),
}

// ...

impl Instruction {
  // implement the methods of the trait
  fn display(&self, _hermes: &HermesHeader) -> String{
      match self {
        // ...
        #[cfg(feature = "v100")]
        Instruction::V100(instruction) => instruction.display(_hermes),
      }
  }

  fn size(&self) -> usize {
      match self {
          // ...
          #[cfg(feature = "v100")]
          Instruction::V100(instruction) => instruction.size(),
      }
  }
}


// ...

// In parse_bytecode there's currently a match statement that will also need to be populated...
let ins_obj: Option<Instruction> = match self.version {
  #[cfg(feature = "v89")]
  89 => Some(Instruction::V89(v89::Instruction::deserialize(&mut r_cursor, op))),
  #[cfg(feature = "v90")]
  90 => Some(Instruction::V90(v90::Instruction::deserialize(&mut r_cursor, op))),
  #[cfg(feature = "v93")]
  93 => Some(Instruction::V93(v93::Instruction::deserialize(&mut r_cursor, op))),
  #[cfg(feature = "v94")]
  94 => Some(Instruction::V94(v94::Instruction::deserialize(&mut r_cursor, op))),
  #[cfg(feature = "v95")]
  95 => Some(Instruction::V95(v95::Instruction::deserialize(&mut r_cursor, op))),
  _ => None,
};
```

Finally, add the `feature` (`v100 = []`) to Cargo.toml.