Skip to main content

Parser

Struct Parser 

pub struct Parser<'a> { /* private fields */ }
Expand description

Provides access to low-level file and memory parsing utilities.

The crate::Parser type is used for decoding CIL bytecode and metadata streams.

§Usage Examples

use dotscope::{Parser, assembly::decode_instruction};
let code = [0x2A]; // ret
let mut parser = Parser::new(&code);
let instr = decode_instruction(&mut parser, 0x1000)?;
assert_eq!(instr.mnemonic, "ret");

A generic binary data parser for reading .NET metadata structures.

Parser provides a cursor-based interface for reading binary data in both little-endian and big-endian formats. It’s designed specifically for parsing .NET metadata structures that follow ECMA-335 specifications, including method signatures, type signatures, custom attributes, and marshalling data.

The parser maintains an internal position cursor and provides bounds checking to prevent buffer overruns when reading malformed or truncated data.

§Features

  • Bounds checking: All read operations validate data availability
  • Endianness support: Both little-endian and big-endian reading
  • Position tracking: Maintains current offset for sequential parsing
  • Flexible seeking: Random access to any position within the data
  • Type safety: Strongly typed reading methods for common data types

§Examples

use dotscope::Parser;

let data = [0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08];
let mut parser = Parser::new(&data);

// Read little-endian values
let first = parser.read_le::<u32>()?;
assert_eq!(first, 0x04030201);

// Seek to a specific position
parser.seek(6)?;
let last_bytes = parser.read_le::<u16>()?;
assert_eq!(last_bytes, 0x0807);

§Metadata Parsing Usage

The Parser handles compressed integers, variable-length encodings, and complex metadata structures found in .NET assemblies. It supports reading calling conventions, parameter counts, type signatures, and other binary metadata formats efficiently.

Implementations§

§

impl<'a> Parser<'a>

pub fn new(data: &'a [u8]) -> Self

Create a new crate::file::parser::Parser from a byte slice.

§Arguments
  • data - The byte slice to read from
§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let parser = Parser::new(&data);
assert_eq!(parser.len(), 4);

pub fn len(&self) -> usize

Returns the length of the underlying data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let parser = Parser::new(&data);
assert_eq!(parser.len(), 3);

pub fn is_empty(&self) -> bool

Returns true if the parser has no data.

§Examples
use dotscope::Parser;
let empty_data = [];
let parser = Parser::new(&empty_data);
assert!(parser.is_empty());

let data = [0x01];
let parser = Parser::new(&data);
assert!(!parser.is_empty());

pub fn has_more_data(&self) -> bool

Returns true if there is more data available to parse.

This checks if the current position is before the end of the data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02];
let mut parser = Parser::new(&data);
assert!(parser.has_more_data());

let _byte = parser.read_le::<u8>()?;
assert!(parser.has_more_data());

let _byte = parser.read_le::<u8>()?;
assert!(!parser.has_more_data());

pub fn seek(&mut self, pos: usize) -> Result<()>

Move the current position to the specified index.

§Arguments
  • pos - The position to move the cursor to
§Errors

Returns crate::Error::OutOfBounds if position is beyond the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

parser.seek(2)?;
assert_eq!(parser.pos(), 2);
let value = parser.read_le::<u8>()?;
assert_eq!(value, 0x03);

pub fn advance(&mut self) -> Result<()>

Move the position forward by one byte.

§Errors

Returns crate::Error::OutOfBounds if advancing would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let mut parser = Parser::new(&data);

assert_eq!(parser.pos(), 0);
parser.advance()?;
assert_eq!(parser.pos(), 1);

pub fn advance_by(&mut self, step: usize) -> Result<()>

Move the position forward by the specified number of bytes.

§Arguments
  • step - Amount of bytes to advance
§Errors

Returns crate::Error::OutOfBounds if advancing by step would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04, 0x05];
let mut parser = Parser::new(&data);

assert_eq!(parser.pos(), 0);
parser.advance_by(3)?;
assert_eq!(parser.pos(), 3);

pub fn pos(&self) -> usize

Get the current position of the parser within the data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let mut parser = Parser::new(&data);

assert_eq!(parser.pos(), 0);
let _byte = parser.read_le::<u8>()?;
assert_eq!(parser.pos(), 1);

pub fn data(&self) -> &[u8]

Get access to the underlying data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let parser = Parser::new(&data);
assert_eq!(parser.data(), &[0x01, 0x02, 0x03]);

pub fn peek_byte(&self) -> Result<u8>

Peek at the next byte without advancing the position.

§Errors

Returns crate::Error::OutOfBounds if position is at or beyond the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let mut parser = Parser::new(&data);

assert_eq!(parser.peek_byte()?, 0x01);
assert_eq!(parser.pos(), 0); // Position unchanged
let value = parser.read_le::<u8>()?;
assert_eq!(value, 0x01);
assert_eq!(parser.pos(), 1); // Now position advanced

pub fn peek_le<T: CilIO>(&self) -> Result<T>

Peek at a value of type T in little-endian format without advancing the position.

This method reads a value from the current position but does not modify the parser state, allowing inspection of upcoming data before deciding how to proceed.

§Errors

Returns crate::Error::OutOfBounds if reading T would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

// Peek at a u16 without advancing
let peeked: u16 = parser.peek_le()?;
assert_eq!(peeked, 0x0201);
assert_eq!(parser.pos(), 0); // Position unchanged

// Now read it for real
let value: u16 = parser.read_le()?;
assert_eq!(value, 0x0201);
assert_eq!(parser.pos(), 2); // Position advanced

pub fn transactional<T, F>(&mut self, f: F) -> Result<T>
where F: FnOnce(&mut Self) -> Result<T>,

Execute a closure transactionally, rolling back on failure.

This method saves the current parser position, executes the provided closure, and only commits the position change if the closure succeeds. If the closure returns Err, the parser position is restored to its original value.

This is useful for speculative parsing where you want to try parsing something and only consume the input if parsing succeeds.

§Arguments
  • f - A closure that takes a mutable reference to the parser and returns a Result<T>
§Returns

Returns the result of the closure. On success, the parser position reflects any advances made during parsing. On failure, the parser position is restored.

§Errors

Returns any error produced by the closure f. When an error is returned, the parser position is automatically restored to its state before the call.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

// Try to parse - on success, position advances
let result: Result<u16, _> = parser.transactional(|p| p.read_le());
assert!(result.is_ok());
assert_eq!(parser.pos(), 2); // Position advanced on success

// Try to parse something that fails - position restored
let mut parser2 = Parser::new(&[0x01]);
let result: Result<u32, _> = parser2.transactional(|p| p.read_le());
assert!(result.is_err());
assert_eq!(parser2.pos(), 0); // Position restored on failure

pub fn align(&mut self, alignment: usize) -> Result<()>

Align the position to a specific boundary.

This advances the position to the next multiple of the specified alignment, which is useful when parsing data structures that require specific memory alignment.

§Arguments
  • alignment - The boundary to align to (must be a power of 2)
§Errors

Returns crate::Error::OutOfBounds if aligning would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08];
let mut parser = Parser::new(&data);

parser.advance()?; // Position is now 1
parser.align(4)?;  // Align to 4-byte boundary
assert_eq!(parser.pos(), 4); // Position advanced to next 4-byte boundary

pub fn read_le<T: CilIO>(&mut self) -> Result<T>

Read a type T from the current position in little-endian format and advance the position.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

let value: u16 = parser.read_le()?;
assert_eq!(value, 0x0201); // Little-endian interpretation
assert_eq!(parser.pos(), 2);

pub fn read_be<T: CilIO>(&mut self) -> Result<T>

Read a type T from the current position in big-endian format and advance the position.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

let value: u16 = parser.read_be()?;
assert_eq!(value, 0x0102); // Big-endian interpretation
assert_eq!(parser.pos(), 2);

pub fn read_compressed_uint(&mut self) -> Result<u32>

Read a compressed unsigned integer as defined in ECMA-335 II.23.2.

Compressed integers use variable-length encoding to efficiently store small values:

  • Values 0-127: 1 byte (0xxxxxxx)
  • Values 128-16383: 2 bytes (10xxxxxx xxxxxxxx)
  • Values 16384-536870911: 4 bytes (11xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx)
§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid compressed uint format.

§Examples
use dotscope::Parser;

// Single byte encoding (value < 128)
let data = [0x7F]; // Represents 127
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_uint()?, 127);

// Two byte encoding
let data = [0x80, 0x80]; // Represents 128
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_uint()?, 128);

pub fn read_compressed_int(&mut self) -> Result<i32>

Read a compressed signed integer as defined in ECMA-335 II.23.2.

Compressed signed integers use the same variable-length encoding as unsigned integers but with the least significant bit indicating the sign and the remaining bits shifted right.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid encoding.

§Examples
use dotscope::Parser;

// Positive number: 10 encoded as 20 (10 << 1 | 0)
let data = [20];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_int()?, 10);

// Negative number: -5 encoded as 9 ((5-1) << 1 | 1)
let data = [9];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_int()?, -5);

pub fn read_compressed_token(&mut self) -> Result<Token>

Read a compressed token as defined in ECMA-335 II.23.2.4 (TypeDefOrRefOrSpecEncoded).

Compressed tokens encode type references using the 2 lowest bits as a tag and the remaining bits as the table row index. The tag determines which metadata table:

TagTableToken Prefix
0x0TypeDef0x0200_0000
0x1TypeRef0x0100_0000
0x2TypeSpec0x1B00_0000
0x3(reserved/invalid)-

Tag 0x3 is reserved and currently unused by the ECMA-335 specification. Encountering this tag value indicates a malformed compressed token.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed if tag 0x3 is encountered (invalid encoding).

§Examples
use dotscope::Parser;

// TypeRef token (tag 0x1, index 1) encoded as (1 << 2) | 0x1 = 5
let data = [5];
let mut parser = Parser::new(&data);
let token = parser.read_compressed_token()?;
assert_eq!(token.value(), 0x01000001); // TypeRef table with index 1

pub fn read_7bit_encoded_int(&mut self) -> Result<u32>

Read a 7-bit encoded integer (used in .NET for variable-length encoding).

This encoding uses the most significant bit of each byte as a continuation flag. If set, the next byte is part of the value. The value is reconstructed by concatenating the lower 7 bits of each byte in little-endian order.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid encoding (overflow).

§Examples
use dotscope::Parser;

// Single byte: 127 (0x7F)
let data = [0x7F];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_7bit_encoded_int()?, 127);

// Two bytes: 128 (0x80 0x01)
let data = [0x80, 0x01];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_7bit_encoded_int()?, 128);

pub fn read_string_utf8(&mut self) -> Result<String>

Read a UTF-8 encoded null-terminated string.

Reads bytes from the current position until a null terminator (0x00) is found, then decodes the bytes as UTF-8. The position is advanced past the null terminator.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-8 encoding.

§Examples
use dotscope::Parser;

let data = b"Hello\0World\0";
let mut parser = Parser::new(data);

let first = parser.read_string_utf8()?;
assert_eq!(first, "Hello");

let second = parser.read_string_utf8()?;
assert_eq!(second, "World");

pub fn read_prefixed_string_utf8(&mut self) -> Result<String>

Read a length-prefixed UTF-8 string.

The string length is encoded as a 7-bit encoded integer, followed by that many UTF-8 bytes. This format is commonly used in .NET metadata streams.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-8 encoding.

§Examples
use dotscope::Parser;

// Length 5, followed by "Hello"
let data = [5, b'H', b'e', b'l', b'l', b'o'];
let mut parser = Parser::new(&data);

let result = parser.read_prefixed_string_utf8()?;
assert_eq!(result, "Hello");

pub fn read_prefixed_string_utf8_ref(&mut self) -> Result<&'a str>

Read a 7-bit encoded length-prefixed UTF-8 string as a borrowed slice (zero-copy).

This is the zero-copy variant of read_prefixed_string_utf8. Instead of allocating a new String, it returns a borrowed &str slice directly into the underlying data buffer. This is ideal for large strings or performance-critical code where you want to avoid allocations.

The string length is encoded as a 7-bit compressed integer (ECMA-335 format), followed by that many UTF-8 bytes.

§Lifetime

The returned string slice borrows from the parser’s underlying data buffer and has the same lifetime 'a as the parser.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-8 encoding.

§Examples
use dotscope::Parser;

// Length 5, followed by "Hello"
let data = [5, b'H', b'e', b'l', b'l', b'o'];
let mut parser = Parser::new(&data);

let result = parser.read_prefixed_string_utf8_ref()?;
assert_eq!(result, "Hello");
// No allocation - result borrows from data

pub fn read_compressed_string_utf8(&mut self) -> Result<String>

Read a compressed uint length-prefixed UTF-8 string.

The string length is encoded as a compressed unsigned integer according to ECMA-335, followed by that many UTF-8 bytes. This format is used for strings in custom attributes, security permissions, and other metadata structures that follow ECMA-335 blob format.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-8 encoding.

§Examples
use dotscope::Parser;

// Length 5 (compressed uint), followed by "Hello"
let data = [5, b'H', b'e', b'l', b'l', b'o'];
let mut parser = Parser::new(&data);

let result = parser.read_compressed_string_utf8()?;
assert_eq!(result, "Hello");

pub fn remaining(&self) -> usize

Returns the number of bytes remaining from the current position.

This is useful for checking available data before reading operations or for implementing consistent bounds checking patterns.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04, 0x05];
let mut parser = Parser::new(&data);

assert_eq!(parser.remaining(), 5);
parser.advance_by(2)?;
assert_eq!(parser.remaining(), 3);

pub fn ensure_remaining(&self, needed: usize) -> Result<()>

Ensures that at least needed bytes are available from the current position.

This method provides a standardized way to validate data availability before performing read operations. It returns a descriptive error when insufficient data is available.

§Arguments
  • needed - The number of bytes required from the current position
§Errors

Returns crate::Error::OutOfBounds if fewer than needed bytes remain.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let mut parser = Parser::new(&data);

parser.ensure_remaining(3)?;  // OK
parser.advance()?;
parser.ensure_remaining(2)?;  // OK
// parser.ensure_remaining(3)?;  // Would fail - only 2 bytes remaining

pub fn calc_end_position(&self, length: usize) -> Result<usize>

Calculates an end position safely with overflow checking.

Computes self.position + length while checking for arithmetic overflow and ensuring the result doesn’t exceed the data bounds.

§Arguments
  • length - The length to add to the current position
§Errors

Returns crate::Error::OutOfBounds if the calculation would overflow or if the resulting position exceeds the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04, 0x05];
let mut parser = Parser::new(&data);

let end = parser.calc_end_position(3)?;
assert_eq!(end, 3);

parser.seek(2)?;
let end = parser.calc_end_position(2)?;
assert_eq!(end, 4);

pub fn read_bytes(&mut self, length: usize) -> Result<&'a [u8]>

Reads a slice of bytes of the specified length from the current position.

This method performs bounds checking and advances the position after reading. It’s useful when you need to read a chunk of raw bytes rather than a specific type.

§Arguments
  • length - The number of bytes to read
§Errors

Returns crate::Error::OutOfBounds if reading length bytes would exceed the data.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04, 0x05];
let mut parser = Parser::new(&data);

let chunk = parser.read_bytes(3)?;
assert_eq!(chunk, &[0x01, 0x02, 0x03]);
assert_eq!(parser.pos(), 3);

pub fn read_prefixed_string_utf16(&mut self) -> Result<String>

Read a length-prefixed UTF-16 string.

The string length is encoded as a 7-bit encoded integer (in bytes), followed by that many UTF-16 bytes in little-endian format. This format is used for wide character strings in .NET metadata.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-16 encoding or odd byte length.

§Examples
use dotscope::Parser;

// Length 10 bytes (5 UTF-16 chars), followed by "Hello" in UTF-16 LE
let data = [10, 0x48, 0x00, 0x65, 0x00, 0x6C, 0x00, 0x6C, 0x00, 0x6F, 0x00];
let mut parser = Parser::new(&data);

let result = parser.read_prefixed_string_utf16()?;
assert_eq!(result, "Hello");

Auto Trait Implementations§

§

impl<'a> Freeze for Parser<'a>

§

impl<'a> RefUnwindSafe for Parser<'a>

§

impl<'a> Send for Parser<'a>

§

impl<'a> Sync for Parser<'a>

§

impl<'a> Unpin for Parser<'a>

§

impl<'a> UnwindSafe for Parser<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, A> IntoAst<A> for T
where T: Into<A>, A: Ast,

Source§

fn into_ast(self, _a: &A) -> A

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.