bhc-loop-ir
Loop Intermediate Representation for the Basel Haskell Compiler.
Overview
This crate defines the Loop IR, a low-level representation with explicit loops, memory operations, and SIMD primitives. It bridges the gap between high-level Tensor IR and machine code, enabling vectorization, parallelization, and efficient code generation.
Features
- Explicit loop nests with iteration bounds
- SIMD vector types and operations
- Parallel loop primitives (ParFor, ParMap, ParReduce)
- Memory references with aliasing information
- Loop transformations (tiling, unrolling, interchange)
- Target-specific SIMD: SSE, AVX, AVX-512, NEON
Key Types
| Type | Description |
|---|---|
LoopIR |
Root of the Loop IR tree |
Loop |
Loop construct with bounds and body |
LoopType |
Loop classification (Sequential, Parallel, SIMD) |
Stmt |
Statement enum (assignments, stores, loops) |
Value |
Value representation (registers, constants) |
MemRef |
Memory reference with type and aliasing |
VectorType |
SIMD vector types |
Usage
Creating Loop Structures
use ;
// Create a simple loop: for i in 0..n
let loop_ir = Loop ;
// Parallel loop
let par_loop = Loop ;
// SIMD vectorized loop
let simd_loop = Loop ;
SIMD Operations
use ;
// Vector types
let v4f32 = VEC4F32; // 4x f32 (SSE)
let v8f32 = VEC8F32; // 8x f32 (AVX)
let v4f64 = VEC4F64; // 4x f64 (AVX)
// Vector operations
let vadd = VectorOp ;
// Vector load/store
let vload = VectorLoad ;
Statement Variants
Loop Types
Vector Types
| Type | Elements | Size | Target |
|---|---|---|---|
VEC4F32 |
4 × f32 | 128-bit | SSE, NEON |
VEC8F32 |
8 × f32 | 256-bit | AVX, AVX2 |
VEC16F32 |
16 × f32 | 512-bit | AVX-512 |
VEC2F64 |
2 × f64 | 128-bit | SSE2, NEON |
VEC4F64 |
4 × f64 | 256-bit | AVX |
VEC8F64 |
8 × f64 | 512-bit | AVX-512 |
VEC4I32 |
4 × i32 | 128-bit | SSE2, NEON |
VEC8I32 |
8 × i32 | 256-bit | AVX2 |
Memory References
Target Architectures
Loop Transformations
The Loop IR supports these transformations:
| Transformation | Description |
|---|---|
| Tiling | Break loops into tiles for cache locality |
| Unrolling | Repeat loop body to reduce overhead |
| Vectorization | Convert scalar ops to SIMD ops |
| Interchange | Reorder nested loops |
| Fusion | Combine adjacent loops |
| Fission | Split loops for parallelism |
| Peeling | Handle edge cases separately |
Design Notes
- Loop IR is the last stage before target-specific codegen
- SIMD width adapts to target architecture
- Aliasing information enables safe optimizations
- Parallel loops generate threading runtime calls
- Alignment requirements are explicit for SIMD
Related Crates
bhc-tensor-ir- Input tensor operationsbhc-codegen- Machine code generationbhc-target- Target architecture definitionsbhc-gpu- GPU-specific lowering (bypasses Loop IR)
Specification References
- H26-SPEC Section 3.5: Loop IR
- H26-SPEC Section 9: SIMD Model (M3)
- H26-SPEC Section 10: Parallelism Model