Expand description
GPU-accelerated batch inference for the MoE classifier via wgpu compute shaders.
Processes N feature vectors in a single GPU dispatch, achieving ~10-100x throughput over CPU for large batches. Falls back to CPU when no GPU is available or for batches smaller than the crossover threshold.
Architecture mirrors ml_scorer.rs exactly:
- Gate: Linear(41→6) + softmax
- 6 experts: Linear(41→32)+ReLU → Linear(32→16)+ReLU → Linear(16→1)
- Output: sigmoid(weighted sum of expert logits)
Functions§
- batch_
ml_ inference - Score multiple (credential, context) pairs in a single batch.
- gpu_
available - Check if GPU acceleration is available.
Return
truewhen GPU scoring support is available in this build/runtime. - gpu_
probe - Probe GPU availability and adapter metadata without panicking.