Skip to main content

Module gpu

Module gpu 

Source
Expand description

GPU backend using wgpu (Vulkan/Metal/DX12/WebGPU)

This backend provides GPU-accelerated compute for large-scale operations. It uses wgpu for cross-platform GPU access and WGSL compute shaders.

§Performance

GPU backend is optimal for very large workloads (>100K elements for reductions,

1000×1000 for matrix operations) where transfer overhead is amortized.

Expected speedups vs SIMD:

  • Matrix multiplication (large): 10-50x
  • Reductions (large): 5-20x

§Architecture

  • Device initialization is lazy (first GPU operation)
  • Compute shaders written in WGSL
  • Asynchronous execution with pollster for blocking
  • Automatic fallback to CPU if GPU unavailable

§Memory Hierarchy Abstractions

  • TensorView - Structured view into GPU memory with shape/stride metadata
  • PartitionView - Tiling strategy for efficient GPU work distribution

Based on cuda-tile-behavior.md Section 3.2.

Re-exports§

pub use wgpu;

Modules§

runtime
Cross-platform async runtime helpers for GPU operations.
shaders
WGSL compute shaders for GPU operations

Structs§

BufferId
Unique identifier for a buffer in a batch
GpuBackend
GPU backend for compute operations (native only, uses sync wrappers)
GpuCommandBatch
Command batch for async GPU execution
GpuDevice
GPU device manager
GpuDevicePool
Pool of GPU devices for multi-GPU workloads
GpuMatmulCache
PMAT-322: Cached matmul with persistent weight buffers for LLM inference. Cached matmul state: pipeline + pre-uploaded weight buffers + persistent I/O.
MaxOp
Max reduction operation
MinOp
Min reduction operation
PartitionView
A tiling strategy over a TensorView.
QkvLoRA
PMAT-324: WGSL transformer forward pass shaders. Optional LoRA buffers for Q/K/V projections in a layer’s forward pass.
SumOp
Sum reduction operation
TensorView
A view into a contiguous memory region with shape and stride information.
TileInfo
Information about a single tile within a partition.
WgslForwardPass
PMAT-324: WGSL transformer forward pass shaders. GPU-resident transformer layer state. All buffers persist across tokens — only input/output change per step.

Enums§

MemoryLayout
Memory layout for tensor storage

Constants§

TILE_SIZE
Default tile size for 2D reductions (matches GPU workgroup size)

Traits§

ReduceOp
Reduction operation trait for generic tile reduction

Functions§

tiled_max_2d
Convenience function for tiled max reduction
tiled_min_2d
Convenience function for tiled min reduction
tiled_reduce_2d
Perform tiled reduction on 2D data (CPU fallback)
tiled_reduce_partial
Compute partial tile results for verification
tiled_sum_2d
Convenience function for tiled sum reduction

Type Aliases§

PipelineCache
Pipeline cache keyed by shader source pointer address.