oxillama-gpu
Optional wgpu-based GPU compute backend for OxiLLaMa — zero C, zero OpenCL, zero CUDA.
Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.
What It Provides
- wgpu compute shaders (WGSL) for Q4_0 and Q8_0 dequantization
- Async GPU tensor dispatch with
pollsterfor synchronous usage - Graceful CPU fallback when no compatible GPU adapter is found
- Works on Vulkan, Metal, DX12, and WebGPU backends via
wgpu
Status
Tests: 77 passing
Feature Flags
| Feature | Default | Description |
|---|---|---|
gpu |
no | Enable wgpu, pollster, and bytemuck; compile WGSL shaders |
The crate compiles and links with zero GPU dependencies when gpu is not enabled — it exports only stub types that delegate to the CPU quant kernels.
Usage
use ;
Enable at build time:
[]
= { = "...", = ["gpu"] }
License
Apache-2.0 — COOLJAPAN OU (Team Kitasan)