hhh 1.0.1

The hhh Binary File Processor
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
# Reading Binary Descriptions

**hhh** can parse a binary description file to create a binary file.  Use the `-p` or `--parse` flag to signal that you want to parse a file and create a binary file.  Input files are expected to be UTF-8 text files, but may also be encoded in UTF-16.

```
$ hhh -p description.hhh -o file.bin
```

## Basics

At its most basic, parsing operates as follows.

- If an offset is found (signaled by a number followed by a colon `:`), then this becomes the new offset for the following bytes
- Groups of bytes are processed as they are encountered, separated either by spaces or by commas
- Groups are assumed to be in big endian order, unless little endian is specified
- Comments can be C (`/*..*/`) or C++ (`//`), and are ignored
- Bytes not explicitly specified are assumed to be 0x00

Offsets are typically, but do not have to be, at the start of a line.  Except for `//` comments that require an end of line, whitespace is not significant.

Parsing the following creates a ten byte file, where the first byte is 0x40 and the last byte is 0x41 and all other bytes are zero.

```hhh -p 
40 9:41

// Result
// 00000000: 40 00 00 00 00 00 00 00 00 41
```

This is the same as the following.

```hhh -p 
0000: 40 00 00 00 00 00 00 00 00 41

// Result
// 00000000: 40 00 00 00 00 00 00 00 00 41
```

The same output can also be produced by the following, which specifies the offsets out of order.

```hhh -p 
9:41 0:40

// Result
// 00000000: 40 00 00 00 00 00 00 00 00 41
```

## Overwriting

Because you can specify offsets out of order, you can overwrite prior bytes in the stream.  For example, you can start with a simple hex dump of a file, and then add "patches" to the end of the file that overwrite selected prior bytes.

```hhh -p 
// Final byte sequence is 01 02 03 04 05
0000: fe ff 21
0002: 03 04 05    // Overwrites 21
0000: 01 02       // Overwrites fe ff

// Result
// 00000000: 01 02 03 04 05
```

## Underscores

Underscores may appear in a number, and are ignored.

```hhh -p 
0000_0000: 48_ae_21_45

// Result (big endian)
// 00000000: 48 ae 21 45
```

```hhh -p --little-endian
0000_0000: 48_ae_21_45

// Result (little endian)
// 00000000: 45 21 ae 48
```

## Strings

Double-quoted strings can appear in the file.  These are interpreted as byte sequences in UTF-8 and permit the usual escapes, detailed below in [String Escapes](#string-escapes).

```hhh -p 
0: "你好世界!"
10: "हैलो वर्ल्ड!"
20: "Hola Mundo!"
30: "Hello world!"

// Result
// 00000000: e4 bd a0 e5 a5 bd e4 b8 96 e7 95 8c 21 00 00 00
// 00000010: e0 a4 b9 e0 a5 88 e0 a4 b2 e0 a5 8b 20 e0 a4 b5
// 00000020: 48 6f 6c 61 20 4d 75 6e 64 6f 21 8d e0 a4 a1 21
// 00000030: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21
```

## String Escapes

**hhh** allows a superset of the escapes used elsewhere, including those from C, C++, and Python.

|Escape              |Meaning                  |
|--------------------|-------------------------|
|`\\`                |Slash                    |
|`\'`                |Apostrophe (U+0027)      |
|`\"`                |Quotation mark (U+0022)  |
|`\a`                |Bell (U+0007)            |
|`\b`                |Backspace (U+0008)       |
|`\e`                |Escape (U+001B)          |
|`\f`                |Form feed (U+000C)       |
|`\n`                |Newline (U+000A)         |
|`\r`                |Carriage return (U+000D) |
|`\t`                |Horizontal tab (U+0009)  |
|`\v`                |Vertical tab (U+000B)    |
|`\x`                |Byte escape <sup>1</sup>    |
|`\u`                |Unicode escape <sup>2</sup> |
|`\N`                |Named Unicode escape <sup>3</sup> |
|`\?`                |Question mark (U+003F)   |

<sup>1</sup> Byte escapes require exactly two hexadecimal digits.  For example, a newline (U+000A) can also be written as `\x0a` or `\x0A`.

<sup>2</sup> Unicode escapes have one or more hex digits inside curly braces.  For example, a newline (U+000A) can be written in several ways, including `\u{a}` and `\u{00000A}`.

<sup>3</sup> Named Unicode escapes interpret the Unicode code point name inside curly braces and generate the associated character.  For example a newline (U+000A) can be written as `\N{new line}` or `\N{NEW LINE}`.  Find the list of all Unicode character names in [the Unicode standard](https://home.unicode.org/).

```hhh -p 
"\a\N{new line}\N{asterisk}\x40\x41\u{214e}"

// Result
// 00000000: 07 0a 2a 40 41 e2 85 8e
```

## Width

By default the width of a simple hex number is the ceiling of the number of digits divided by two.  That is, `0` and `00` are encoded as single bytes, `000` and `0000` are encoded as two bytes, etc.  This is intended to match the user's intuition and correctly decode groups produced by a hexdump.

If radix prefixes are used, the rule changes.  Now the width of a number is taken to be the smallest number of bytes that can hold the number.  So `0x0` is encoded as a single byte, as is `0x0000000000`, but `0x10_00_00` is encoded as *three* bytes, because this is the smallest number of bytes that can hold the value.

A value may be followed by a slash `/`.  This can then be followed by a decimal number to indicate the number of bytes for the value.  Note that this is similar to [Ghidra](https://ghidra-sre.org/)'s P-Code (but the colon is used for something else in **hhh**).

This specifies a four-byte (32-bit) zero.

```hhh -p 
0/4

// Result
// 00000000: 00 00 00 00
```

Because of truncation, this also specifies a 32-bit zero for big endian byte order.

```hhh -p 
fffe00000000/4

// Result
// 00000000: 00 00 00 00
```

Consider `24/2`.  By default this specifies the sequence `00 24` (big endian).  If little endian is selected, then this specifies the sequence `24 00`.

An offset is not a byte sequence, so you can't use a width specifier with it.

## Radix Prefixes

Radix prefixes specify the radix (hexadecimal, decimal, octal, binary) of the following number, and are enabled by the `--radix-prefixes` or `-r` switch or by the `[[prefix]]` directive.

|Prefix    |Meaning    |Example          |
|----------|-----------|-----------------|
|`0b`      |Binary     |`0b1001100` = 76 |
|`0o`      |Octal      |`0o114` = 76     |
|none      |Decimal    |76 = 76          |
|`0x`      |Hexadecimal|`0x4c` = 76      |

Underscores can appear in these numbers to break up groups, such as `65_535`.

If prefixes are enabled, then values are assumed to be in decimal unless a radix prefix is explicitly given.  The following encodes the string "Hello world!".

```hhh -p 
[[prefix]] 0x00000000: 0x48 0x65 "l" "l" "o" 32 0x77/1 0x6f 0b1110010 0x6c 0x64 33/1

// Result
// 00000000: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21
```

When using radix prefixes, numbers are allowed to be negative, and are interpreted as the two's complement byte sequence using the same width rules as elsewhere.

```hhh -p 
[[prefix]] 0x00000000: -16/8

// Result
// 00000000: ff ff ff ff ff ff ff f0
```

Numbers support arbitrary precision signed values thanks to the [`num`] package.

## Byte Order (Endianness)

Endianness can be confusing.  There are two forms.

- In big endian form, the most significant byte (the "big end") is written first, so the number `213e` is written as the byte sequence `21 3e`.
- In little endian form, the least significant byte (the "little end") is written first, so the number `213e` is written as the byte sequence `3e 21`.

The ordering set by the directives `[[little_endian]]` and `[[big_endian]]` as well as by the command line switch *only* changes how unprefixed byte groups are read.

### Without Prefixes

If radix prefixes are not used, then byte groups are interpreted according to the current default ordering, set by directive or command line switch.  This effectively "undoes" the same setting on the hex output.

For example, if group is set to four and order to little endian, then the byte sequence `ff 01 02 03` is written as `030201ff` when creating a hex dump.  In order to read this back correctly, we set the order to little endian and interpret `030201ff` as the byte sequence `ff 01 02 03`.

That is, the default ordering changes how byte groups are read.

Suppose we use `hhh --little-endian -b2 --no-ascii` with the following input.

```hhh --little-endian -b2 --no-ascii
HI!
// Result
// 00000000: 4948 0a21
```

Then we can recover the original file with `hhh -p --little-endian`.

```hhh -p --little-endian
00000000: 4948 0a21

// Result
// 00000000: 48 49 21 0a
```

### With Prefixes

If radix prefixes (see the section on [Radix Prefixes](#radix-prefixes)) are used, then numbers are presumed to be numbers and not simply byte sequences.  This means that `0x21e3` represents the *number* `0x21e3`.

The ordering set by directives has *no effect* on the order of bytes in a prefixed number.  If you wish to change the order, use a *suffix*, described in the section [Suffixes](#suffixes).


### Suffixes

If a width specifier is given, then an order specifier can also be given.  This takes the form of either an `LE` (for little endian) or a `BE` (for big endian) and specifies that the number should be interpreted as written in little-endian or big-endian order.  (This means `BE` is effectively a no-op.)

For numbers with radix prefixes, this simply changes the output order of the bytes, so (for example) `0x0102/2LE` *always* produces the byte sequence `0x02` `0x01`.

For byte groups it is applied after the default ordering set by directives or command line switch. This can be confusing.  Essentially, suffixing the byte group with `LE` reverses it, but it may *already* have been reversed by setting the default to little endian.  In short, **you should only use the suffixes with the default big endian ordering**.

- `[[little_endian]] 0102/2LE` represents the byte order `01 02`
- `[[little_endian]] 0102/2BE` represents the byte order `02 01`

```hhh -p
[[little_endian]] 0102/2LE
[[little_endian]] 0102/2BE

// Result
// 00000000: 01 02 02 01
```

## Expressions

Expressions in square brackets can be used to specify a number, either for an offset or for a byte or sequence of bytes.  If a byte sequence is intended, a width specifier *should* follow the closing square bracket.  Radix prefixes are enabled in the brackets, and decimal is assumed.

```hhh -p
0: [4 * 16 + 2]/2

// Result
// 00000000: 00 42
```

See the chapter [Expressions](expressions.md) for details.

## Repeat

A byte sequence can be enclosed in parentheses, with the closing parenthesis followed by a `*` and decimal number.  This repeats the byte sequence the given number of times.  Parenthesized expressions cannot contain an offset specification, assignments, or a nested parenthesized sequence.

```hhh -p
00000020: (6f 40)*8

// Result
// 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
// 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
// 00000020: 6f 40 6f 40 6f 40 6f 40 6f 40 6f 40 6f 40 6f 40
```

## Variables

Variables begin with a dollar sign `$` and consist of numbers, underscores, and letters.  The variables `$_` and `$__` are special, and are described later.

Variables cannot use backward references; you cannot refer to a variable's value before it is assigned.

### Assignment

Assign (or re-assign) a name to a byte sequence using an equal sign.  You can then use the variable to refer to that byte sequence, and can further modify it with length or order specifications.

```hhh -p
$x = 1000/4     // Now $x is the sequence 00 00 10 00
$y = 2000       // Now $y is the sequence 20 00

$x/3            // Emit the sequence 00 10 00
$y/2LE          // Emit the sequence 00 20

// Result
// 00000000: 00 10 00 00 20
```

If the byte sequence is multiple groups, enclose them in parentheses.  If it is a single group, parentheses are optional.  You cannot nest parentheses.

When a bytes sequence is assigned a name, the endianness rules are applied *before* assigning the value, so that the value is stored in the correct byte order.

Symbolic names can be referenced by giving the name with an initial dollar sign.  You can include length and endianness specifiers.  A specification of `LE` reverses the stored byte order.

The assigned value can be the value of an expression, but assignments cannot occur inside expressions.  Variable references, however, can occur in expressions.

```hhh -p
$a = [17]
$x = [((2**4) | (2**3) + $a) <<2]
$x/2

// Result
// 00000000: 00 a4
```

### Capture

Alternatively you can *capture* a byte group after it is emitted using `->` followed by a variable name.  That is, the sequence `214e->$x` both emits the byte sequence and assigns it to `$x`.

The following illustrates the difference between simple assignment and capture.

```hhh -p
$x = 214a    // Assign the bytes 21 4a to $x
214a -> $x   // Emit 21 4a and then capture those bytes to $x
// The above will only produce 21 4a.

// Result
// 00000000: 21 4a
```

Here is a longer example.

```hhh -p
// Define three symbolic names.
00000020->$here $here: (20 24)->$bytes1 0040/2LE->$bytes2 $bytes1 $bytes2
// $here = 0x20
// $bytes1 = 0x2024/2
// $bytes2 = 0x4000/2

// Result
// 00000000: 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00
// 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
// 00000020: 20 24 40 00 20 24 40 00
```

The following produces a byte sequence in addition to assigning `$x`.

```hhh -p
$a = [17]
[((2**4) | (2**3) + 17) <<2]/4 -> $x

// Result
// 00000000: 00 00 00 a4
```

### Special Variables

The special symbolic name `$_` refers to the *current* offset and is always immediately replaced by that value.  Thus you *cannot* assign to this variable.

```hhh -p
00000020: /* $_ is 0x20 */ $x=$_  3f 41 2d 8a  /* $_ is 0x24 */ $y=$_
00000030: $x/4 $y/4
// Result
// 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
// 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
// 00000020: 3f 41 2d 8a 00 00 00 00 00 00 00 00 00 00 00 00
// 00000030: 00 00 00 20 00 00 00 24
```

The special symbolic name `$__` (two underscores) refers to the enclosing biased base address.  Most of the time, this is likely 0.  The real use for this is determining the base address of a structure.  See the section on [Structures](#structures) for details.

## Directives

Directives may be given inside double square brackets `[[..]]` and take effect immediately.  A single directive may be given inside a `[[..]]` pair.

```hhh -p
// Enable radix prefixes.
[[prefix]]

// Specify a bias.
[[bias(0x1000)]]

0x1000: 20 30 40 50

// Result
// 00000000: 14 1e 28 32
```

Directives are discussed in their own chapter: [Directives](directive.md).  Variables and expressions may be used in directives, and directives can be used to set variables.

```hhh -p
[[set("a",17)]]
$a/2LE

// Result
// 00000000: 11 00
```

## Bias

The offsets can be treated as absolute values; the sequence `10:ff` means the byte `ff` should occur at absolute offset 16 (hex `10`).  Alternately, a *bias* can be subtracted from the explicitly given offsets.  By default the bias is zero, so the final offset is the same as the explicitly given offset.

**NB**: When generating a hexdump, the bias is *added*.  When parsing a binary description, the bias is *subtracted*.

The bias can be non-zero.  For example, consider the following.

```hhh -p
[[bias(0x4000)]]
4000: fe ff 21 4e 8a

// Result
// 00000000: fe ff 21 4e 8a
```

The purpose of this is to allow incorporating hexdumps or other information where there is already an offset present.  The bias just moves the offset up by the given amount.  Offsets can also be *negative*, moving the bytes down by some amount.

Suppose I have a chunk of bytes that starts at 0x4000 and I want to incorporate them at an arbitrary position.  That is, I have some bytes and I want the chunk (starting at 0x4000) to be added *here*.  There is a directive that does that: `relative`.

```hhh -p
0000: 86 00 0f 8a 01
[[relative(0x4000)]]
4000: fe ff 21 4e 8a

// Result
// 00000000: 86 00 0f 8a 01 fe ff 21 4e 8a
```

You can undo the effect of either directive with `[[bias(0)]]`, explicitly setting the bias back to zero.

```hhh -p
0000: 21
[[relative(0x4000)]]
4000: 22
[[bias(0)]]
23 24

// Result
// 00000000: 21 22 23 24
```

```hhh -p
00000002: $x=$_ 21 $y=$_ 22 $z=$_ 23 $x/1 $y/1 $z/1
00000008: $x=$_ 21 22 23 24 $y=$_ 25 26 27 28 $z=$_ $x/1 $y/1 $z/1

// Result
// 00000000: 00 00 21 22 23 02 03 04 21 22 23 24 25 26 27 28
// 00000010: 08 0c 10
```

## Structures

A *structure* can be embedded in a file using curly braces `{..}`.  These can be nested, and create a new scope with a bias such that offsets start at zero.  The closing brace then restores the outer bias.

The actual base offset of the current scope can be obtained by `$__` (two underscores).  Thus within a `{..}` pair, `$__` gives the offset of the *start* of the current scope.

Suppose we have the following C structure.

```c
struct mystruct {
    long link_off,
    long name_ptr,
    long next_ptr,
    long prev_ptr,
}
```

We might capture that directly (and provide an example) as follows.

```hhh -p
[[prefix]]

// Set up the offsets to elements of the structure.
$link_off = 0
$name_ptr = 8
$next_ptr = 16
$prev_ptr = 24

// Record the structure's total length.
$length = 32

0x0: 0xfe 0xff 0x10 0x22

0x10: $1name = $_ "First Element" 0     // $1name captures the address of this string
0x20: $2name = $_ "Second Element" 0    // $2name captures the address of this string

// Align on a 16-byte boundary.
[($_ + 15) & -16]:

// An instance of the structure.
{
    $name_ptr: $1name/8LE           // Pointer to the name.
    $next_ptr: [$__ + $length]/8LE  // Pointer to the next structure.
    $prev_ptr: 0/8LE                // Null
    $length:
}

// Second instance of the structure.
{
    $name_ptr: $2name/8LE           // Pointer to the name.
    $next_ptr: 0/8LE                // Null
    $prev_ptr: [$__ - $length]/8LE  // Pointer to the previous structure.
    $length:
}

0x21 0x43

// Result
// 00000000: fe ff 10 22 00 00 00 00 00 00 00 00 00 00 00 00
// 00000010: 46 69 72 73 74 20 45 6c 65 6d 65 6e 74 00 00 00
// 00000020: 53 65 63 6f 6e 64 20 45 6c 65 6d 65 6e 74 00 00
// 00000030: 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00
// 00000040: 50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
// 00000050: 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00
// 00000060: 00 00 00 00 00 00 00 00 30 00 00 00 00 00 00 00
// 00000070: 21 43
```

In the above the bias is initially zero.  The structure contains four 8-byte values at the specific zero-based offsets.  To assure that the correct size is used, the length is given as the final offset just prior to closing the curly braces.

Note that `$link_off` is not used in the structure, so its bytes remain zero.  Also note the use of `$__` to reference the offset of the start of the structure.

This results in the following byte sequence.  (Bytes not explicitly shown are zero.)

```
00000000: fe ff 10 22
00000010: 46 69 72 73 74 20 45 6c 65 6d 65 6e 74 00         // The first string
00000020: 53 65 63 6f 6e 64 20 45 6c 65 6d 65 6e 74 00      // The second string

// First structure.
00000030: 00 00 00 00 00 00 00 00                           // link_off
00000038: 10 00 00 00 00 00 00 00                           // name_ptr
00000040: 50 00 00 00 00 00 00 00                           // next_ptr
00000048: 00 00 00 00 00 00 00 00                           // prev_ptr

// Second structure.
00000050: 00 00 00 00 00 00 00 00                           // link_off
00000058: 20 00 00 00 00 00 00 00                           // name_ptr
00000060: 00 00 00 00 00 00 00 00                           // next_ptr
00000068: 30 00 00 00 00 00 00 00                           // prev_ptr

00000070: 21 43
```

Internally this is implemented with a simple stack.  The opening `{` pushes the current bias onto the stack and then sets the bias so that the first address inside the braces is zero.  The closing `}` restores the original bias from the stack.


[`num`]: https://crates.io/crates/num