Function pelite::pattern::parse

source ·
pub fn parse(pat: &str) -> Result<Pattern, ParsePatError>
Expand description

Pattern parser.

Remarks

Following are examples of the pattern syntax. The syntax takes inspiration from YARA hexadecimal strings.

55 89 e5 83 ? ec

Case insensitive hexadecimal characters match the exact byte pattern and question marks serve as placeholders for unknown bytes.

Note that a single question mark matches a whole byte. The syntax to mask part of a byte is not yet available.

Spaces (code point 32) are completely optional and carry no semantic meaning, their purpose is to visually group things together.

b9 ' 37 13 00 00

Single quotes are used as a bookmarks, to save the current cursor rva in the save array passed to the scanner.

It is no longer necessary to do tedious address calculations to read information out of the byte stream after a match was found. This power really comes to life with the capability to follow relative and absolute references.

The first entry in the save array is reserved for the rva where the pattern was matched. The rest of the save array is filled in order of appearance of the quotes. Here the rva of the quote can be found in save[1].

b8 [16] 50 [13-42] ff

Pairs of decimal numbers separated by a hypen in square brackets indicate the lower and upper bound of number of bytes to skip. The scanner is non greedy and considers the first match while skipping as little as possible.

A single decimal number in square brackets without hypens is a fixed size jump, equivalent to writing that number of consecutive question marks.

31 c0 74 % ' c3
e8 $ ' 31 c0 c3
68 * ' 31 c0 c3

These symbols are used to follow; a signed 1 byte relative jump: %, a signed 4 byte relative jump: $ and an absolute pointer: *.

They are designed to be able to have the scanner follow short jumps, calls and longer jumps, and absolute pointers.

Composes really well with bookmarks to find the addresses of referenced functions and other data without tedious address calculations.

b8 * "STRING" 00

String literals appear in double quotes and will be matched as UTF-8.

Escape sequences are not supported, switch back to matching with hex digits as needed. For UTF-16 support, you are welcome to send a PR.

e8 $ { ' } 83 f0 5c c3

Curly braces must follow a jump symbol (see above).

The sub pattern enclosed within the curly braces is matched at the destination after following the jump. After the pattern successfully matched, the cursor returns to before the jump was followed. The bytes defining the jump are skipped and matching continues again from here.

e8 $ @4

Checks that the cursor is aligned at this point in the scan. The align value is (1 << arg), in this example the cursor is checked to be aligned to 16.

e8 i1 a0 u4

An i or u indicates memory read operations followed by the size of the operand to read.

The read values are stored in the save array alongside the bookmarked addresses (single quotes). This means the values are sign- or zero- extended respectively before being stored. Operand sizes are 1 (byte), 2 (word) or 4 (dword).

The cursor is advanced by the size of the operand.

83 c0 2a ( 6a ? | 68 ? ? ? ? ) e8

Parentheses indicate alternate subpatterns separated by a pipe character.

The scanner attempts to match the alternate subpatterns from left to right and fails if none of them match.