yscv-kernels 0.1.1

CPU and GPU compute backends with SIMD dispatch and BLAS integration
Documentation

Execution kernels and backend abstraction for yscv.

GPU Inference (Cross-Platform via wgpu)

The gpu feature enables compute shader acceleration via wgpu — Vulkan (Linux/Windows/Android), Metal (macOS/iOS), DX12 (Windows). No CUDA dependency. GPU-accelerated operations:

  • Matrix multiplication (tiled 16×16 workgroups)
  • Elementwise: add, sub, mul
  • Activations: relu, sigmoid
  • Normalization: batch_norm, layer_norm, group_norm, rms_norm, softmax
  • Convolution: conv2d, depthwise_conv2d, separable_conv2d, transpose_conv2d
  • Pooling: max_pool2d, avg_pool2d

GPU training (backward passes) is on the roadmap. CPU backend is fully optimized with NEON/AVX/SSE SIMD on all platforms.