oxiui-compute-wgpu 0.1.2

Pure-Rust wgpu GPU-compute abstraction for the COOLJAPAN ecosystem
Documentation

oxiui-compute-wgpu — Pure-Rust wgpu GPU-compute abstraction for OxiUI

Crates.io License

oxiui-compute-wgpu is the shared GPU-compute layer for the COOLJAPAN ecosystem. It consolidates the repeated Instance → Adapter → Device → Queue initialisation boilerplate that headless compute workloads (sparse solvers, Lattice-Boltzmann, Monte-Carlo) each duplicated, and adds typed buffers, buffer pooling and sub-allocation, pipeline caching, dispatch helpers, a WGSL preprocessor and validator, and a set of validated built-in compute kernels. #![forbid(unsafe_code)] is enforced crate-wide; pollster blocks on wgpu's async adapter/device requests so the public API stays synchronous; bytemuck handles zero-copy Pod casting between CPU and GPU.

Targets wgpu 29. See the wgpu 29 Notes for the renamed APIs.

Installation

[dependencies]
oxiui-compute-wgpu = "0.1"

wgpu, bytemuck, and pollster are re-exported, so a single dependency declaration is enough.

Quick Start

Configure a context with the builder, double every value in a buffer on the GPU with a DispatchBuilder, and read the result back:

use oxiui_compute_wgpu::{
    bytemuck, compute_pipeline, read_back, storage_buffer_init, wgpu,
    ComputeContext, DispatchBuilder,
};

// 1. Configure and build a compute context (Err on hosts with no GPU adapter).
let Ok(ctx) = ComputeContext::builder()
    .with_power_preference(wgpu::PowerPreference::HighPerformance)
    .build()
else {
    return; // no GPU — skip gracefully
};

// 2. Upload input data to a storage buffer.
let input: Vec<f32> = vec![1.0, 2.0, 3.0, 4.0];
let buffer = storage_buffer_init(&ctx.device, "values", bytemuck::cast_slice(&input));

// 3. Compile a WGSL compute shader (auto-layout from reflection).
const SHADER: &str = r#"
    @group(0) @binding(0) var<storage, read_write> data: array<f32>;

    @compute @workgroup_size(64)
    fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
        if gid.x < arrayLength(&data) {
            data[gid.x] = data[gid.x] * 2.0;
        }
    }
"#;
let pipeline = compute_pipeline(&ctx.device, SHADER, "main");

// 4. Bind, dispatch (ceil-div grid), and submit in one call.
let bind_group = ctx.device.create_bind_group(&wgpu::BindGroupDescriptor {
    label: Some("values-bind"),
    layout: &pipeline.get_bind_group_layout(0),
    entries: &[wgpu::BindGroupEntry {
        binding: 0,
        resource: buffer.as_entire_binding(),
    }],
});

DispatchBuilder::new(&pipeline)
    .bind(0, &bind_group)
    .dispatch_1d(input.len() as u32, 64)
    .label("double")
    .submit(&ctx.device, &ctx.queue);

// 5. Read the results back to the CPU.
let output: Vec<f32> = read_back(&ctx.device, &ctx.queue, &buffer, input.len());
assert_eq!(output, vec![2.0, 4.0, 6.0, 8.0]);

API Overview

Module Key exports
context ComputeContext, ContextBuilder
buffer storage_buffer_init, uniform_buffer, staging_buffer, read_back, read_back_range, read_back_async, TypedBuffer<T>, BufferPool, SubAllocator, mapped_storage_init
pipeline compute_pipeline, checked_compute_pipeline, PipelineCache, DispatchBuilder, dispatch_1d, dispatch_2d, dispatch_3d, validate_immediates, encode_indirect_dispatch
wgsl preprocess, validate, SHADER_PREFIX_SUM, SHADER_REDUCTION_SUM, SHADER_HISTOGRAM, SHADER_MATMUL
error ComputeError (NoAdapter, DeviceRequest, OutOfMemory, ShaderCompilation, Operation)

Built-in Kernels

Four validated WGSL kernels ship as pub const source strings in wgsl. Each uses the entry point main_cs and is compiled with compute_pipeline (or checked_compute_pipeline).

Constant Entry point Constraints
SHADER_PREFIX_SUM main_cs Inclusive scan, f32. Single workgroup; input length ≤ 256. Dispatch one workgroup of size 256.
SHADER_REDUCTION_SUM main_cs Sum reduction, f32 → f32. Single workgroup; input length ≤ 256. Dispatch one workgroup of size 256.
SHADER_HISTOGRAM main_cs u32 values → bin counts. Up to 256 bins; each element binned as input[i] % num_bins. Workgroup size 64; workgroup-local atomic histogram.
SHADER_MATMUL main_cs Tiled f32 matmul, M×K · K×N → M×N (row-major). 16×16 shared-memory tiles; bind MatDims { M, K, N } uniform at @binding(3). Dispatch ceil(N/16) × ceil(M/16) × 1.

wgpu 29 Notes

This crate targets wgpu 29. Relevant API changes from earlier versions:

  • Push constants are now immediates: wgpu::Features::IMMEDIATES, ComputePass::set_immediates, and Limits::max_immediate_size. Alignment is wgpu::IMMEDIATE_DATA_ALIGNMENT (4); validate with validate_immediates.
  • Instance::request_adapter returns Result (not Option).
  • Adapter::request_device takes a single &DeviceDescriptor argument.

Feature Flags

None. All functionality is on by default. GPU access is optional at runtime: ComputeContext::try_new returns None on headless hosts (CI, VMs without GPU pass-through), and ComputeContext::new / ContextBuilder::build return ComputeError::NoAdapter, so callers can skip gracefully rather than fail.

Related Crates

  • oxiui — the OxiUI facade crate.
  • oxiui-coreRenderBackend, DrawList, geometry types, shared require_gpu! macro.
  • oxiui-render-soft — CPU render backend; will use compute shaders for blur/dithering.
  • oxiui-render-wgpu — GPU render backend; shares the device/queue with this crate.
  • oxiui-text — text pipeline; GPU glyph rasterization is a planned consumer.
  • oxiui-theme — design tokens and ShadowSpec.

License

Apache-2.0 — COOLJAPAN OU (Team Kitasan)