bhc-tensor-ir

Tensor Intermediate Representation for the Basel Haskell Compiler.

Overview

This crate defines the Tensor IR, a specialized representation for numeric computations that enables aggressive optimization of array and matrix operations. It implements the M9 tensor model from the H26-SPEC, providing guaranteed fusion and efficient memory layout.

Features

Shape-indexed tensor types with compile-time dimension checking
Guaranteed fusion for composable operations
Memory layout optimization (row-major, column-major, strided)
Automatic broadcasting semantics
Element-wise, reduction, and contraction operations
Fusion-friendly kernel representation

Key Types

Type	Description
`TensorOp`	Tensor operation enum
`TensorMeta`	Tensor metadata (dtype, shape, strides, layout)
`Kernel`	Fused operation kernel
`Shape`	Tensor dimensions
`DType`	Element data type
`Strides`	Memory strides for each dimension
`Layout`	Memory layout (RowMajor, ColMajor, Strided)

Usage

Creating Tensor Operations

use bhc_tensor_ir::{TensorOp, TensorMeta, Shape, DType, Layout};

// Create tensor metadata
let meta = TensorMeta {
    dtype: DType::F32,
    shape: Shape::from([1024, 768]),
    strides: Strides::contiguous(&[1024, 768]),
    layout: Layout::RowMajor,
    alias: None,
};

// Element-wise operation: a + b
let add = TensorOp::Binary {
    op: BinOp::Add,
    lhs: tensor_a,
    rhs: tensor_b,
    out_meta: meta,
};

// Matrix multiplication
let matmul = TensorOp::Contraction {
    lhs: matrix_a,  // [m, k]
    rhs: matrix_b,  // [k, n]
    axes: (1, 0),   // Contract along k
    out_meta: matmul_meta,  // [m, n]
};

Tensor Metadata (H26-SPEC Section 7.3)

pub struct TensorMeta {
    /// Element type: f32, f64, i32, i64, etc.
    pub dtype: DType,

    /// Shape: dimensions of the tensor
    pub shape: Shape,

    /// Strides: memory stride for each dimension
    pub strides: Strides,

    /// Layout: memory organization
    pub layout: Layout,

    /// Alias: memory aliasing information
    pub alias: Option<AliasInfo>,
}

Operation Variants

pub enum TensorOp {
    // Creation
    Zeros(TensorMeta),
    Ones(TensorMeta),
    Full(Scalar, TensorMeta),
    FromData(Vec<Scalar>, TensorMeta),

    // Element-wise unary
    Unary { op: UnaryOp, input: TensorRef, out_meta: TensorMeta },

    // Element-wise binary
    Binary { op: BinOp, lhs: TensorRef, rhs: TensorRef, out_meta: TensorMeta },

    // Reductions
    Reduce { op: ReduceOp, input: TensorRef, axes: Vec<usize>, out_meta: TensorMeta },

    // Shape operations
    Reshape { input: TensorRef, new_shape: Shape },
    Transpose { input: TensorRef, perm: Vec<usize> },
    Broadcast { input: TensorRef, new_shape: Shape },
    Slice { input: TensorRef, ranges: Vec<Range> },

    // Contractions (matmul, einsum)
    Contraction { lhs: TensorRef, rhs: TensorRef, axes: (usize, usize), out_meta: TensorMeta },

    // Memory
    Copy { input: TensorRef, out_meta: TensorMeta },
    View { input: TensorRef, out_meta: TensorMeta },
}

Guaranteed Fusion (H26-SPEC Section 8)

The Tensor IR guarantees fusion for these patterns:

// Pattern 1: Element-wise chains
// a.map(f).map(g).map(h) → a.map(f ∘ g ∘ h)

// Pattern 2: Map-reduce
// a.map(f).reduce(+) → fused map-reduce kernel

// Pattern 3: Broadcast-binary
// a + broadcast(b) → single pass with inline broadcast

// Pattern 4: Transpose-matmul
// matmul(transpose(a), b) → single matmul with transposed access

Fusion in Practice

use bhc_tensor_ir::{Kernel, FusedOp};

// Multiple operations fuse into a single kernel
let kernel = Kernel {
    ops: vec![
        FusedOp::Load { src: tensor_a },
        FusedOp::Unary { op: UnaryOp::Exp },
        FusedOp::Binary { op: BinOp::Mul, rhs: tensor_b },
        FusedOp::Reduce { op: ReduceOp::Sum, axis: 1 },
        FusedOp::Store { dst: output },
    ],
    shape: output_shape,
    parallelism: Parallelism::DataParallel,
};

Data Types

pub enum DType {
    F16,    // 16-bit float (half)
    F32,    // 32-bit float
    F64,    // 64-bit float (double)
    BF16,   // Brain float 16
    I8,     // 8-bit signed integer
    I16,    // 16-bit signed integer
    I32,    // 32-bit signed integer
    I64,    // 64-bit signed integer
    U8,     // 8-bit unsigned
    U32,    // 32-bit unsigned
    Bool,   // Boolean
}

Memory Layouts

pub enum Layout {
    RowMajor,      // C-style: last dimension contiguous
    ColMajor,      // Fortran-style: first dimension contiguous
    Strided,       // Arbitrary strides (slices, transposes)
}

impl Strides {
    /// Contiguous row-major strides for shape [d0, d1, d2]
    /// → [d1*d2, d2, 1]
    pub fn contiguous(shape: &[usize]) -> Self;

    /// Contiguous column-major strides for shape [d0, d1, d2]
    /// → [1, d0, d0*d1]
    pub fn fortran(shape: &[usize]) -> Self;
}

Design Notes

Tensor IR operates on logical tensor operations, not loops
Shape information enables static dimension checking
Stride representation allows zero-copy views and slices
Fusion decisions are made before lowering to Loop IR
Alias analysis prevents incorrect optimizations

Related Crates

bhc-core - Input Core IR
bhc-loop-ir - Output Loop IR for codegen
bhc-types - Type-level shape representation (M9)
bhc-gpu - GPU code generation from Tensor IR
bhc-codegen - CPU code generation

Specification References

H26-SPEC Section 7: Tensor Model (M9)
H26-SPEC Section 7.3: TensorMeta Requirements
H26-SPEC Section 8: Fusion Guarantees

bhc-tensor-ir 0.2.4