Module assembly

Expand description

CIL instruction processing: disassembly, analysis, and assembly based on ECMA-335.

This module provides comprehensive CIL (Common Intermediate Language) instruction processing capabilities, including both disassembly (bytecode to instructions) and assembly (instructions to bytecode). It implements the complete ECMA-335 instruction set with support for control flow analysis, stack effect tracking, and bidirectional instruction processing.

§Architecture

The assembly module is built around several core concepts:

  • Instruction Decoding: Binary CIL bytecode to structured instruction representation
  • Instruction Encoding: Structured instructions back to binary CIL bytecode
  • Control Flow Analysis: Building basic blocks and analyzing program flow
  • Stack Effect Analysis: Tracking how instructions affect the evaluation stack
  • Label Resolution: Automatic resolution of branch targets and labels
  • Type Safety: Compile-time validation of instruction operand types

§Key Components

§Disassembly Components

§Assembly Components

§Shared Components

§Usage Examples

§Disassembly

use dotscope::{assembly::decode_instruction, Parser};

let bytecode = &[0x00, 0x2A]; // nop, ret
let mut parser = Parser::new(bytecode);
let instruction = decode_instruction(&mut parser, 0x1000)?;

println!("Mnemonic: {}", instruction.mnemonic);
println!("Flow type: {:?}", instruction.flow_type);

§High-Level Assembly

use dotscope::assembly::InstructionAssembler;

let mut asm = InstructionAssembler::new();
asm.ldarg_0()?      // Load first argument
   .ldarg_1()?      // Load second argument
   .add()?          // Add them together
   .ret()?;         // Return result
let bytecode = asm.finish()?;

§Low-Level Assembly

use dotscope::assembly::{InstructionEncoder, Operand, Immediate};

let mut encoder = InstructionEncoder::new();
encoder.emit_instruction("nop", None)?;
encoder.emit_instruction("ldarg.s", Some(Operand::Immediate(Immediate::Int8(1))))?;
encoder.emit_instruction("ret", None)?;
let bytecode = encoder.finalize()?;

§Integration

The assembly module integrates with the metadata system to resolve tokens and provide rich semantic information about method calls, field access, and type operations. The encoder and assembler use the same instruction metadata as the disassembler, ensuring perfect consistency between assembly and disassembly operations.

§Thread Safety

All assembly types are std::marker::Send and std::marker::Sync for safe concurrent processing. CIL (Common Intermediate Language) instruction processing engine.

This module provides comprehensive support for processing CIL bytecode from .NET assemblies according to ECMA-335 specifications. It implements both disassembly and assembly pipelines, including instruction parsing, encoding, control flow analysis, stack effect tracking, and basic block construction for advanced static analysis and code generation capabilities.

§Architecture

The module is organized into several cooperating components: instruction decoding and encoding transform between raw bytecode and structured instruction objects, control flow analysis builds basic blocks with predecessor/successor relationships, and metadata integration provides semantic context for method-level analysis and code generation.

§Key Components

§Usage Examples

§Disassembly Examples

use dotscope::assembly::{decode_instruction, decode_stream, decode_blocks};
use dotscope::Parser;

// Decode a single instruction
let bytecode = &[0x2A]; // ret
let mut parser = Parser::new(bytecode);
let instruction = decode_instruction(&mut parser, 0x1000)?;
println!("Instruction: {}", instruction.mnemonic);

// Decode a sequence of instructions
let bytecode = &[0x00, 0x2A]; // nop, ret
let mut parser = Parser::new(bytecode);
let instructions = decode_stream(&mut parser, 0x1000)?;
assert_eq!(instructions.len(), 2);

// Decode with control flow analysis
let bytecode = &[0x00, 0x2A]; // nop, ret
let blocks = decode_blocks(bytecode, 0, 0x1000, None)?;
assert_eq!(blocks.len(), 1);

§Assembly Examples

use dotscope::assembly::{InstructionAssembler, InstructionEncoder};
use dotscope::assembly::{Operand, Immediate};

// High-level fluent API
let mut assembler = InstructionAssembler::new();
assembler
    .ldarg_0()?
    .ldarg_1()?
    .add()?
    .ret()?;
let bytecode = assembler.finish()?;

// Low-level encoder API
let mut encoder = InstructionEncoder::new();
encoder.emit_instruction("ldarg.0", None)?;
encoder.emit_instruction("ldarg.1", None)?;
encoder.emit_instruction("add", None)?;
encoder.emit_instruction("ret", None)?;
let bytecode2 = encoder.finalize()?;

assert_eq!(bytecode, bytecode2); // Both produce identical results

§Thread Safety

All public types in this module are designed to be thread-safe where appropriate. crate::assembly::Instruction, crate::assembly::BasicBlock, and related types implement std::marker::Send and std::marker::Sync as they contain only thread-safe data. The decoder functions can be called concurrently from different threads with separate parser instances.

§Integration

This module integrates with:

Structs§

BasicBlock
Represents a basic block in the control flow graph.
CilInstruction
Metadata for a CIL instruction definition.
Instruction
A decoded CIL instruction with all metadata needed for analysis and emulation.
InstructionAssembler
High-level fluent API for assembling CIL instructions.
InstructionEncoder
Core CIL instruction encoder.
LabelFixup
Label fixup information for branch instruction resolution.
StackBehavior
Stack effect of an instruction.

Enums§

FlowType
How an instruction affects control flow.
Immediate
Represents an immediate value type embedded in CIL instructions.
InstructionCategory
Categorization of instructions by their primary function.
Operand
Represents an operand in a more structured way.
OperandType
Types of operands for CIL instructions.

Constants§

INSTRUCTIONS
Lookup table for single-byte CIL instruction metadata.
INSTRUCTIONS_FE
Lookup table for double-byte CIL instruction metadata (0xFE prefix).
INSTRUCTIONS_FE_MAX
Maximum opcode value for double-byte CIL instructions prefixed with 0xFE.
INSTRUCTIONS_MAX
Maximum opcode value for single-byte CIL instructions.

Functions§

decode_blocks
Decodes bytecode into a collection of basic blocks with control flow analysis.
decode_instruction
Decodes a single CIL instruction from the current parser position.
decode_stream
Decodes a continuous stream of CIL instructions from a byte stream.