Skip to main content

Module gpu_ndarray

Module gpu_ndarray 

Source
Expand description

Real wgpu-backed GpuNdarray<T> that implements ArrayProtocol.

Enabled only with the array_protocol_wgpu feature, which implies wgpu_backend.

§Supported operations (GPU dispatch)

  • add, subtract, multiply — elementwise binary, workgroup (256,1,1), uses arrayLength
  • multiply_by_scalar_f32 — elementwise scalar multiply, workgroup (256,1,1)
  • matmul — naive (one thread per output element), workgroup (16,16,1)
  • sum(axis=None) — two-pass reduce, workgroup (256,1,1)
  • transpose (2-D) — 16×16 bank-conflict-padded tile, workgroup (16,16,1) (32×32 exceeds Metal’s 256-invocation-per-workgroup limit)
  • concatenate(axis=0) — via copy_buffer_to_buffer, no shader
  • concatenate(axis>0) — WGSL gather kernel (CONCAT_AXISN_WGSL), storage-buf strides
  • sum(axis=Some(ax)) — WGSL per-output-element axis reduction (REDUCE_SUM_AXIS_WGSL)
  • reshape — zero-copy (clone Arc<Buffer>, new shape/strides)

§CPU-fallback operations

  • svd — falls back to CPU NdarrayWrapper
  • inverse — falls back to CPU NdarrayWrapper
  • multiply_by_scalar_f64, divide_by_scalar_f64 — convert to f32, then GPU
  • GPU kernel errors on axis ops — fallback to CPU (graceful degradation)

§GPU threshold

Arrays with fewer than 4096 elements skip GPU dispatch entirely and fall back to CPU.

Structs§

GpuNdarray
A GPU-backed n-dimensional array backed by a real wgpu Buffer.

Traits§

GpuScalar
Marker trait for element types that wgpu-29 supports natively (f32 only; f64 is not supported in WGSL without extensions).

Functions§

global_context
Returns the shared WebGPUContext, or None if no adapter is available.
is_gpu_available
Returns true if a wgpu adapter was found when first called; cached afterwards.