bhc-loop-ir

Loop Intermediate Representation for the Basel Haskell Compiler.

Overview

This crate defines the Loop IR, a low-level representation with explicit loops, memory operations, and SIMD primitives. It bridges the gap between high-level Tensor IR and machine code, enabling vectorization, parallelization, and efficient code generation.

Features

Explicit loop nests with iteration bounds
SIMD vector types and operations
Parallel loop primitives (ParFor, ParMap, ParReduce)
Memory references with aliasing information
Loop transformations (tiling, unrolling, interchange)
Target-specific SIMD: SSE, AVX, AVX-512, NEON

Key Types

Type	Description
`LoopIR`	Root of the Loop IR tree
`Loop`	Loop construct with bounds and body
`LoopType`	Loop classification (Sequential, Parallel, SIMD)
`Stmt`	Statement enum (assignments, stores, loops)
`Value`	Value representation (registers, constants)
`MemRef`	Memory reference with type and aliasing
`VectorType`	SIMD vector types

Usage

Creating Loop Structures

use bhc_loop_ir::{Loop, LoopType, Stmt, Value, MemRef};

// Create a simple loop: for i in 0..n
let loop_ir = Loop {
    var: "i",
    lower: Value::Const(0),
    upper: Value::Var("n"),
    step: Value::Const(1),
    loop_type: LoopType::Sequential,
    body: vec![
        Stmt::Store {
            dst: MemRef::index("output", "i"),
            value: Value::Load(MemRef::index("input", "i")),
        },
    ],
};

// Parallel loop
let par_loop = Loop {
    loop_type: LoopType::Parallel { num_threads: 8 },
    ..loop_ir
};

// SIMD vectorized loop
let simd_loop = Loop {
    loop_type: LoopType::SIMD { width: 8 },
    step: Value::Const(8),
    ..loop_ir
};

SIMD Operations

use bhc_loop_ir::{VectorType, VectorOp};

// Vector types
let v4f32 = VectorType::VEC4F32;  // 4x f32 (SSE)
let v8f32 = VectorType::VEC8F32;  // 8x f32 (AVX)
let v4f64 = VectorType::VEC4F64;  // 4x f64 (AVX)

// Vector operations
let vadd = Stmt::VectorOp {
    op: VectorOp::Add,
    dst: "v_result",
    lhs: Value::VecReg("v_a"),
    rhs: Value::VecReg("v_b"),
    ty: VectorType::VEC8F32,
};

// Vector load/store
let vload = Stmt::VectorLoad {
    dst: "v_data",
    src: MemRef::aligned("array", "i", 32),
    ty: VectorType::VEC8F32,
};

Statement Variants

pub enum Stmt {
    // Scalar operations
    Assign { dst: String, value: Value },
    Store { dst: MemRef, value: Value },

    // Control flow
    Loop(Loop),
    If { cond: Value, then_body: Vec<Stmt>, else_body: Vec<Stmt> },

    // Vector operations
    VectorOp { op: VectorOp, dst: String, lhs: Value, rhs: Value, ty: VectorType },
    VectorLoad { dst: String, src: MemRef, ty: VectorType },
    VectorStore { dst: MemRef, value: Value, ty: VectorType },
    VectorBroadcast { dst: String, scalar: Value, ty: VectorType },
    VectorReduce { op: ReduceOp, dst: String, src: String, ty: VectorType },

    // Parallel primitives
    ParFor { var: String, range: Range, body: Vec<Stmt>, num_threads: usize },
    ParMap { input: MemRef, output: MemRef, kernel: Vec<Stmt> },
    ParReduce { input: MemRef, op: ReduceOp, init: Value },

    // Synchronization
    Barrier,
    Atomic { op: AtomicOp, dst: MemRef, value: Value },
}

Loop Types

pub enum LoopType {
    /// Standard sequential loop
    Sequential,

    /// Parallel loop with thread count
    Parallel { num_threads: usize },

    /// SIMD vectorized loop
    SIMD { width: usize },

    /// Unrolled loop
    Unrolled { factor: usize },

    /// Tiled loop for cache optimization
    Tiled { tile_size: usize },
}

Vector Types

Type	Elements	Size	Target
`VEC4F32`	4 × f32	128-bit	SSE, NEON
`VEC8F32`	8 × f32	256-bit	AVX, AVX2
`VEC16F32`	16 × f32	512-bit	AVX-512
`VEC2F64`	2 × f64	128-bit	SSE2, NEON
`VEC4F64`	4 × f64	256-bit	AVX
`VEC8F64`	8 × f64	512-bit	AVX-512
`VEC4I32`	4 × i32	128-bit	SSE2, NEON
`VEC8I32`	8 × i32	256-bit	AVX2

Memory References

pub struct MemRef {
    /// Base pointer name
    pub base: String,

    /// Index expression
    pub index: IndexExpr,

    /// Element type
    pub elem_ty: ScalarType,

    /// Alignment requirement
    pub align: usize,

    /// Aliasing information
    pub alias_set: AliasSet,
}

impl MemRef {
    /// Simple indexed access: base[i]
    pub fn index(base: &str, idx: &str) -> Self;

    /// Aligned access for SIMD
    pub fn aligned(base: &str, idx: &str, align: usize) -> Self;

    /// Strided access: base[i * stride]
    pub fn strided(base: &str, idx: &str, stride: usize) -> Self;
}

Target Architectures

pub enum Target {
    X86_64 {
        features: X86Features,  // SSE, AVX, AVX2, AVX512
    },
    AArch64 {
        features: ArmFeatures,  // NEON, SVE, SVE2
    },
    WASM {
        features: WasmFeatures, // SIMD128
    },
}

pub struct X86Features {
    pub sse: bool,
    pub sse2: bool,
    pub sse4_1: bool,
    pub avx: bool,
    pub avx2: bool,
    pub avx512f: bool,
    pub fma: bool,
}

Loop Transformations

The Loop IR supports these transformations:

Transformation	Description
Tiling	Break loops into tiles for cache locality
Unrolling	Repeat loop body to reduce overhead
Vectorization	Convert scalar ops to SIMD ops
Interchange	Reorder nested loops
Fusion	Combine adjacent loops
Fission	Split loops for parallelism
Peeling	Handle edge cases separately

Design Notes

Loop IR is the last stage before target-specific codegen
SIMD width adapts to target architecture
Aliasing information enables safe optimizations
Parallel loops generate threading runtime calls
Alignment requirements are explicit for SIMD

Related Crates

bhc-tensor-ir - Input tensor operations
bhc-codegen - Machine code generation
bhc-target - Target architecture definitions
bhc-gpu - GPU-specific lowering (bypasses Loop IR)

Specification References

H26-SPEC Section 3.5: Loop IR
H26-SPEC Section 9: SIMD Model (M3)
H26-SPEC Section 10: Parallelism Model

bhc-loop-ir 0.2.8