Struct Parser

pub struct Parser<'a> { /* private fields */ }
Expand description

Provides access to low-level file and memory parsing utilities.

The crate::Parser type is used for decoding CIL bytecode and metadata streams.

§Usage Examples

use dotscope::{Parser, disassembler::decode_instruction};
let code = [0x2A]; // ret
let mut parser = Parser::new(&code);
let instr = decode_instruction(&mut parser, 0x1000)?;
assert_eq!(instr.mnemonic, "ret");

A generic binary data parser for reading .NET metadata structures.

Parser provides a cursor-based interface for reading binary data in both little-endian and big-endian formats. It’s designed specifically for parsing .NET metadata structures that follow ECMA-335 specifications, including method signatures, type signatures, custom attributes, and marshalling data.

The parser maintains an internal position cursor and provides bounds checking to prevent buffer overruns when reading malformed or truncated data.

§Features

  • Bounds checking: All read operations validate data availability
  • Endianness support: Both little-endian and big-endian reading
  • Position tracking: Maintains current offset for sequential parsing
  • Flexible seeking: Random access to any position within the data
  • Type safety: Strongly typed reading methods for common data types

§Examples

use dotscope::Parser;

let data = [0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08];
let mut parser = Parser::new(&data);

// Read little-endian values
let first = parser.read_le::<u32>()?;
assert_eq!(first, 0x04030201);

// Seek to a specific position
parser.seek(6)?;
let last_bytes = parser.read_le::<u16>()?;
assert_eq!(last_bytes, 0x0807);

§Metadata Parsing Usage

The Parser handles compressed integers, variable-length encodings, and complex metadata structures found in .NET assemblies. It supports reading calling conventions, parameter counts, type signatures, and other binary metadata formats efficiently.

Implementations§

§

impl<'a> Parser<'a>

pub fn new(data: &'a [u8]) -> Self

Create a new crate::file::parser::Parser from a byte slice.

§Arguments
  • data - The byte slice to read from
§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let parser = Parser::new(&data);
assert_eq!(parser.len(), 4);

pub fn len(&self) -> usize

Returns the length of the underlying data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let parser = Parser::new(&data);
assert_eq!(parser.len(), 3);

pub fn is_empty(&self) -> bool

Returns true if the parser has no data.

§Examples
use dotscope::Parser;
let empty_data = [];
let parser = Parser::new(&empty_data);
assert!(parser.is_empty());

let data = [0x01];
let parser = Parser::new(&data);
assert!(!parser.is_empty());

pub fn has_more_data(&self) -> bool

Returns true if there is more data available to parse.

This checks if the current position is before the end of the data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02];
let mut parser = Parser::new(&data);
assert!(parser.has_more_data());

let _byte = parser.read_le::<u8>()?;
assert!(parser.has_more_data());

let _byte = parser.read_le::<u8>()?;
assert!(!parser.has_more_data());

pub fn seek(&mut self, pos: usize) -> Result<()>

Move the current position to the specified index.

§Arguments
  • pos - The position to move the cursor to
§Errors

Returns crate::Error::OutOfBounds if position is beyond the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

parser.seek(2)?;
assert_eq!(parser.pos(), 2);
let value = parser.read_le::<u8>()?;
assert_eq!(value, 0x03);

pub fn advance(&mut self) -> Result<()>

Move the position forward by one byte.

§Errors

Returns crate::Error::OutOfBounds if advancing would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let mut parser = Parser::new(&data);

assert_eq!(parser.pos(), 0);
parser.advance()?;
assert_eq!(parser.pos(), 1);

pub fn advance_by(&mut self, step: usize) -> Result<()>

Move the position forward by the specified number of bytes.

§Arguments
  • step - Amount of bytes to advance
§Errors

Returns crate::Error::OutOfBounds if advancing by step would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04, 0x05];
let mut parser = Parser::new(&data);

assert_eq!(parser.pos(), 0);
parser.advance_by(3)?;
assert_eq!(parser.pos(), 3);

pub fn pos(&self) -> usize

Get the current position of the parser within the data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let mut parser = Parser::new(&data);

assert_eq!(parser.pos(), 0);
let _byte = parser.read_le::<u8>()?;
assert_eq!(parser.pos(), 1);

pub fn data(&self) -> &[u8]

Get access to the underlying data buffer.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let parser = Parser::new(&data);
assert_eq!(parser.data(), &[0x01, 0x02, 0x03]);

pub fn peek_byte(&self) -> Result<u8>

Peek at the next byte without advancing the position.

§Errors

Returns crate::Error::OutOfBounds if position is at or beyond the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03];
let mut parser = Parser::new(&data);

assert_eq!(parser.peek_byte()?, 0x01);
assert_eq!(parser.pos(), 0); // Position unchanged
let value = parser.read_le::<u8>()?;
assert_eq!(value, 0x01);
assert_eq!(parser.pos(), 1); // Now position advanced

pub fn align(&mut self, alignment: usize) -> Result<()>

Align the position to a specific boundary.

This advances the position to the next multiple of the specified alignment, which is useful when parsing data structures that require specific memory alignment.

§Arguments
  • alignment - The boundary to align to (must be a power of 2)
§Errors

Returns crate::Error::OutOfBounds if aligning would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08];
let mut parser = Parser::new(&data);

parser.advance()?; // Position is now 1
parser.align(4)?;  // Align to 4-byte boundary
assert_eq!(parser.pos(), 4); // Position advanced to next 4-byte boundary

pub fn read_le<T: CilIO>(&mut self) -> Result<T>

Read a type T from the current position in little-endian format and advance the position.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

let value: u16 = parser.read_le()?;
assert_eq!(value, 0x0201); // Little-endian interpretation
assert_eq!(parser.pos(), 2);

pub fn read_be<T: CilIO>(&mut self) -> Result<T>

Read a type T from the current position in big-endian format and advance the position.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length.

§Examples
use dotscope::Parser;
let data = [0x01, 0x02, 0x03, 0x04];
let mut parser = Parser::new(&data);

let value: u16 = parser.read_be()?;
assert_eq!(value, 0x0102); // Big-endian interpretation
assert_eq!(parser.pos(), 2);

pub fn read_compressed_uint(&mut self) -> Result<u32>

Read a compressed unsigned integer as defined in ECMA-335 II.23.2.

Compressed integers use variable-length encoding to efficiently store small values:

  • Values 0-127: 1 byte (0xxxxxxx)
  • Values 128-16383: 2 bytes (10xxxxxx xxxxxxxx)
  • Values 16384-536870911: 4 bytes (11xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx)
§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid compressed uint format.

§Examples
use dotscope::Parser;

// Single byte encoding (value < 128)
let data = [0x7F]; // Represents 127
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_uint()?, 127);

// Two byte encoding
let data = [0x80, 0x80]; // Represents 128
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_uint()?, 128);

pub fn read_compressed_int(&mut self) -> Result<i32>

Read a compressed signed integer as defined in ECMA-335 II.23.2.

Compressed signed integers use the same variable-length encoding as unsigned integers but with the least significant bit indicating the sign and the remaining bits shifted right.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid encoding.

§Examples
use dotscope::Parser;

// Positive number: 10 encoded as 20 (10 << 1 | 0)
let data = [20];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_int()?, 10);

// Negative number: -5 encoded as 9 ((5-1) << 1 | 1)
let data = [9];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_compressed_int()?, -5);

pub fn read_compressed_token(&mut self) -> Result<Token>

Read a compressed token as defined in ECMA-335 II.23.2.4.

Compressed tokens encode type references using 2 tag bits and the table index. The tag bits determine which metadata table the token refers to:

  • 0x0: TypeDef table
  • 0x1: TypeRef table
  • 0x2: TypeSpec table
§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid token encoding.

§Examples
use dotscope::Parser;

// TypeRef token (tag 0x1, index 1) encoded as (1 << 2) | 0x1 = 5
let data = [5];
let mut parser = Parser::new(&data);
let token = parser.read_compressed_token()?;
assert_eq!(token.value(), 0x01000001); // TypeRef table with index 1

pub fn read_7bit_encoded_int(&mut self) -> Result<u32>

Read a 7-bit encoded integer (used in .NET for variable-length encoding).

This encoding uses the most significant bit of each byte as a continuation flag. If set, the next byte is part of the value. The value is reconstructed by concatenating the lower 7 bits of each byte in little-endian order.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid encoding (overflow).

§Examples
use dotscope::Parser;

// Single byte: 127 (0x7F)
let data = [0x7F];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_7bit_encoded_int()?, 127);

// Two bytes: 128 (0x80 0x01)
let data = [0x80, 0x01];
let mut parser = Parser::new(&data);
assert_eq!(parser.read_7bit_encoded_int()?, 128);

pub fn read_string_utf8(&mut self) -> Result<String>

Read a UTF-8 encoded null-terminated string.

Reads bytes from the current position until a null terminator (0x00) is found, then decodes the bytes as UTF-8. The position is advanced past the null terminator.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-8 encoding.

§Examples
use dotscope::Parser;

let data = b"Hello\0World\0";
let mut parser = Parser::new(data);

let first = parser.read_string_utf8()?;
assert_eq!(first, "Hello");

let second = parser.read_string_utf8()?;
assert_eq!(second, "World");

pub fn read_prefixed_string_utf8(&mut self) -> Result<String>

Read a length-prefixed UTF-8 string.

The string length is encoded as a 7-bit encoded integer, followed by that many UTF-8 bytes. This format is commonly used in .NET metadata streams.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-8 encoding.

§Examples
use dotscope::Parser;

// Length 5, followed by "Hello"
let data = [5, b'H', b'e', b'l', b'l', b'o'];
let mut parser = Parser::new(&data);

let result = parser.read_prefixed_string_utf8()?;
assert_eq!(result, "Hello");

pub fn read_prefixed_string_utf16(&mut self) -> Result<String>

Read a length-prefixed UTF-16 string.

The string length is encoded as a 7-bit encoded integer (in bytes), followed by that many UTF-16 bytes in little-endian format. This format is used for wide character strings in .NET metadata.

§Errors

Returns crate::Error::OutOfBounds if reading would exceed the data length or crate::Error::Malformed for invalid UTF-16 encoding or odd byte length.

§Examples
use dotscope::Parser;

// Length 10 bytes (5 UTF-16 chars), followed by "Hello" in UTF-16 LE
let data = [10, 0x48, 0x00, 0x65, 0x00, 0x6C, 0x00, 0x6C, 0x00, 0x6F, 0x00];
let mut parser = Parser::new(&data);

let result = parser.read_prefixed_string_utf16()?;
assert_eq!(result, "Hello");

Auto Trait Implementations§

§

impl<'a> Freeze for Parser<'a>

§

impl<'a> RefUnwindSafe for Parser<'a>

§

impl<'a> Send for Parser<'a>

§

impl<'a> Sync for Parser<'a>

§

impl<'a> Unpin for Parser<'a>

§

impl<'a> UnwindSafe for Parser<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.