bhc-tensor-ir 0.2.3

Tensor IR for numeric optimization in BHC
Documentation
# bhc-tensor-ir

Tensor Intermediate Representation for the Basel Haskell Compiler.

## Overview

This crate defines the Tensor IR, a specialized representation for numeric computations that enables aggressive optimization of array and matrix operations. It implements the M9 tensor model from the H26-SPEC, providing guaranteed fusion and efficient memory layout.

## Features

- Shape-indexed tensor types with compile-time dimension checking
- Guaranteed fusion for composable operations
- Memory layout optimization (row-major, column-major, strided)
- Automatic broadcasting semantics
- Element-wise, reduction, and contraction operations
- Fusion-friendly kernel representation

## Key Types

| Type | Description |
|------|-------------|
| `TensorOp` | Tensor operation enum |
| `TensorMeta` | Tensor metadata (dtype, shape, strides, layout) |
| `Kernel` | Fused operation kernel |
| `Shape` | Tensor dimensions |
| `DType` | Element data type |
| `Strides` | Memory strides for each dimension |
| `Layout` | Memory layout (RowMajor, ColMajor, Strided) |

## Usage

### Creating Tensor Operations

```rust
use bhc_tensor_ir::{TensorOp, TensorMeta, Shape, DType, Layout};

// Create tensor metadata
let meta = TensorMeta {
    dtype: DType::F32,
    shape: Shape::from([1024, 768]),
    strides: Strides::contiguous(&[1024, 768]),
    layout: Layout::RowMajor,
    alias: None,
};

// Element-wise operation: a + b
let add = TensorOp::Binary {
    op: BinOp::Add,
    lhs: tensor_a,
    rhs: tensor_b,
    out_meta: meta,
};

// Matrix multiplication
let matmul = TensorOp::Contraction {
    lhs: matrix_a,  // [m, k]
    rhs: matrix_b,  // [k, n]
    axes: (1, 0),   // Contract along k
    out_meta: matmul_meta,  // [m, n]
};
```

### Tensor Metadata (H26-SPEC Section 7.3)

```rust
pub struct TensorMeta {
    /// Element type: f32, f64, i32, i64, etc.
    pub dtype: DType,

    /// Shape: dimensions of the tensor
    pub shape: Shape,

    /// Strides: memory stride for each dimension
    pub strides: Strides,

    /// Layout: memory organization
    pub layout: Layout,

    /// Alias: memory aliasing information
    pub alias: Option<AliasInfo>,
}
```

## Operation Variants

```rust
pub enum TensorOp {
    // Creation
    Zeros(TensorMeta),
    Ones(TensorMeta),
    Full(Scalar, TensorMeta),
    FromData(Vec<Scalar>, TensorMeta),

    // Element-wise unary
    Unary { op: UnaryOp, input: TensorRef, out_meta: TensorMeta },

    // Element-wise binary
    Binary { op: BinOp, lhs: TensorRef, rhs: TensorRef, out_meta: TensorMeta },

    // Reductions
    Reduce { op: ReduceOp, input: TensorRef, axes: Vec<usize>, out_meta: TensorMeta },

    // Shape operations
    Reshape { input: TensorRef, new_shape: Shape },
    Transpose { input: TensorRef, perm: Vec<usize> },
    Broadcast { input: TensorRef, new_shape: Shape },
    Slice { input: TensorRef, ranges: Vec<Range> },

    // Contractions (matmul, einsum)
    Contraction { lhs: TensorRef, rhs: TensorRef, axes: (usize, usize), out_meta: TensorMeta },

    // Memory
    Copy { input: TensorRef, out_meta: TensorMeta },
    View { input: TensorRef, out_meta: TensorMeta },
}
```

## Guaranteed Fusion (H26-SPEC Section 8)

The Tensor IR guarantees fusion for these patterns:

```rust
// Pattern 1: Element-wise chains
// a.map(f).map(g).map(h) → a.map(f ∘ g ∘ h)

// Pattern 2: Map-reduce
// a.map(f).reduce(+) → fused map-reduce kernel

// Pattern 3: Broadcast-binary
// a + broadcast(b) → single pass with inline broadcast

// Pattern 4: Transpose-matmul
// matmul(transpose(a), b) → single matmul with transposed access
```

### Fusion in Practice

```rust
use bhc_tensor_ir::{Kernel, FusedOp};

// Multiple operations fuse into a single kernel
let kernel = Kernel {
    ops: vec![
        FusedOp::Load { src: tensor_a },
        FusedOp::Unary { op: UnaryOp::Exp },
        FusedOp::Binary { op: BinOp::Mul, rhs: tensor_b },
        FusedOp::Reduce { op: ReduceOp::Sum, axis: 1 },
        FusedOp::Store { dst: output },
    ],
    shape: output_shape,
    parallelism: Parallelism::DataParallel,
};
```

## Data Types

```rust
pub enum DType {
    F16,    // 16-bit float (half)
    F32,    // 32-bit float
    F64,    // 64-bit float (double)
    BF16,   // Brain float 16
    I8,     // 8-bit signed integer
    I16,    // 16-bit signed integer
    I32,    // 32-bit signed integer
    I64,    // 64-bit signed integer
    U8,     // 8-bit unsigned
    U32,    // 32-bit unsigned
    Bool,   // Boolean
}
```

## Memory Layouts

```rust
pub enum Layout {
    RowMajor,      // C-style: last dimension contiguous
    ColMajor,      // Fortran-style: first dimension contiguous
    Strided,       // Arbitrary strides (slices, transposes)
}

impl Strides {
    /// Contiguous row-major strides for shape [d0, d1, d2]
    /// → [d1*d2, d2, 1]
    pub fn contiguous(shape: &[usize]) -> Self;

    /// Contiguous column-major strides for shape [d0, d1, d2]
    /// → [1, d0, d0*d1]
    pub fn fortran(shape: &[usize]) -> Self;
}
```

## Design Notes

- Tensor IR operates on logical tensor operations, not loops
- Shape information enables static dimension checking
- Stride representation allows zero-copy views and slices
- Fusion decisions are made before lowering to Loop IR
- Alias analysis prevents incorrect optimizations

## Related Crates

- `bhc-core` - Input Core IR
- `bhc-loop-ir` - Output Loop IR for codegen
- `bhc-types` - Type-level shape representation (M9)
- `bhc-gpu` - GPU code generation from Tensor IR
- `bhc-codegen` - CPU code generation

## Specification References

- H26-SPEC Section 7: Tensor Model (M9)
- H26-SPEC Section 7.3: TensorMeta Requirements
- H26-SPEC Section 8: Fusion Guarantees