cuneus 0.4.7

A WGPU-based shader development tool
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
# Cuneus Usage Guide

Cuneus is a GPU compute shader engine with a unified backend for single-pass, multi-pass, and atomic compute shaders. It features built-in UI controls, hot-reloading, media integration, and GPU-driven audio synthesis.

**Key Philosophy:** Declare what you need in the builder → get predictable bindings in WGSL. No manual binding management, no boilerplate. Add `.with_mouse()` in Rust, access `@group(2) mouse` in your shader. The **4-Group Binding Convention** guarantees where every resource lives: Group 0 (time), Group 1 (output/params), Group 2 (engine resources), Group 3 (user data/multi-pass). Everything flows from the builder.

## Shadertoy Mapping

| Shadertoy | Cuneus WGSL |
|-----------|-------------|
| `iResolution.xy` | `vec2<f32>(textureDimensions(output))` |
| `iTime` | `time_data.time` |
| `iTimeDelta` | `time_data.delta` |
| `iFrame` | `time_data.frame` |
| `iMouse` | `mouse` (requires `.with_mouse()`) |
| `iChannel0` | `channel0` (requires `.with_channels(1)`) |
| `fragCoord` | `vec2<f32>(id.xy)` from `@builtin(global_invocation_id)` |
| `fragColor = ...` | `textureStore(output, id.xy, color)` |

## Core Concepts

### 1. The Unified Compute Pipeline

In Cuneus, almost everything is a compute shader. Instead of writing traditional vertex/fragment shaders, you write compute kernels that write directly to an output texture. The framework provides a simple renderer to blit this texture to the screen. This approach gives you maximum control and performance for GPU tasks.

### 2. The Builder Pattern (`ComputeShaderBuilder`)

The `ComputeShader::builder()` is the single entry point for configuring your shader. It specifies exactly what resources your shader needs, and Cuneus handles all the complex WGPU boilerplate — including hot reload.

```rust
let config = ComputeShader::builder()
    .with_label("My Awesome Shader")
    .with_custom_uniforms::<MyParams>() // Custom parameters
    .with_mouse()                       // Enable mouse input
    .with_channels(1)                   // Enable one external texture (e.g., video)
    .build();

// The compute_shader! macro embeds the shader source AND enables hot reload automatically.
let compute_shader = cuneus::compute_shader!(core, "shaders/my_shader.wgsl", config);
```

### 3. The 4-Group Binding Convention

Cuneus enforces a standard bind group layout to create a stable and predictable contract between your Rust code and your WGSL shader. This eliminates the need to manually track binding numbers.

| Group | Binding(s) | Description | Configuration |
| :--- | :--- | :--- | :--- |
| **0** | `@binding(0)` | **Per-Frame Data** (Time, frame count). | Engine-managed. Always available. |
| **1** | `@binding(0)`<br/>`@binding(1)`<br/>`@binding(2..)` | **Primary I/O & Params**. Output texture, your custom `UniformProvider`, and an optional input texture. | User-configured via builder (`.with_custom_uniforms()`, `.with_input_texture()`). |
| **2** | `@binding(0..N)` | **Global Engine Resources**. Mouse, fonts, audio buffer, atomics, and media channels. The binding order is fixed. | User-configured via builder (`.with_mouse()`, `.with_fonts()`, etc.). |
| **3** | `@binding(0..N)` | **User Data & Multi-Pass I/O**. User-defined storage buffers or textures for multi-pass feedback loops. | User-configured via builder (`.with_storage_buffer()` or `.with_multi_pass()`). |

### 4. Execution Models (Dispatching)

- **Automatic (`.dispatch()`):** This is the recommended method. It executes the entire pipeline you defined in the builder (including all multi-pass stages) and automatically increments the frame counter.
- **Manual (`.dispatch_stage()`):** This gives you fine-grained control to run specific compute kernels from your WGSL file. It is essential for advanced patterns like path tracing accumulation or conditional updates. **You must manually increment `compute_shader.current_frame` when using this method.**

### 5. Multi-Pass Models

The framework elegantly handles two types of multi-pass computation:

1. **Texture-Based (Ping-Pong):** Ideal for image processing and feedback effects. Intermediate results are stored in textures. Each buffer independently tracks its write state, so any pass can read from any previous pass's output — and cross-frame feedback (self-referencing passes) works automatically.
   - *Examples with cross-frame feedback: `lich.rs`, `currents.rs`, `rorschach.rs`*
   - *Examples with within-frame only: `kuwahara.rs`, `fluid.rs`, `jfa.rs`, `2dneuron.rs`*

2. **Storage-Buffer-Based (Shared Memory):** Ideal for GPU algorithms like FFT or simulations like CNNs. All passes read from and write to the same large, user-defined storage buffers. This is enabled by using `.with_multi_pass()` *and* `.with_storage_buffer()`.
   - *Examples: `fft.rs`, `cnn.rs`*

## Getting Started: Shader Structure

Every shader application follows a similar pattern implementing the `ShaderManager` trait.

```rust
use cuneus::prelude::*;
use cuneus::compute::*;

// 1. Define custom parameters for the UI
#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]
struct MyParams {
    strength: f32,
    color: [f32; 3],
    _padding: f32,
}

impl UniformProvider for MyParams {
    fn as_bytes(&self) -> &[u8] { bytemuck::bytes_of(self) }
}

// 2. Define the main application struct
struct MyShader {
    base: RenderKit,
    compute_shader: ComputeShader,
    current_params: MyParams,
}

// 3. Implement the ShaderManager trait
impl ShaderManager for MyShader {
    fn init(core: &Core) -> Self {
        // RenderKit handles the final blit to screen and UI (vertex/blit shaders built-in)
        let texture_bind_group_layout = RenderKit::create_standard_texture_layout(&core.device);
        let base = RenderKit::new(core, &texture_bind_group_layout, None);
        let initial_params = MyParams { /* ... */ };

        // --- To convert this to a Multi-Pass shader, make the following changes: ---
        
        // 1. (Multi-Pass) Define your passes and their dependencies.
        //    The string in `new()` is the WGSL entry point name.
        //    The slice `&[]` lists buffers to bind as input_texture0, input_texture1, etc.
        //    Self-reference (e.g., "buffer_a" in its own inputs) enables cross-frame feedback.
        /*
        let passes = vec![
            PassDescription::new("buffer_a", &[]),              // No inputs
            PassDescription::new("buffer_b", &["buffer_a"]),    // input_texture0 = buffer_a
            PassDescription::new("main_image", &["buffer_b"]),
        ];
        // For cross-frame feedback (temporal effects), add self to inputs:
        // PassDescription::new("buffer_b", &["buffer_a", "buffer_b"])
        // Then input_texture1 = buffer_b's PREVIOUS frame output (automatic)
        */

        // Configure the compute shader using the builder
        let config = ComputeShader::builder()
            // For Single-Pass, use .with_entry_point():
            .with_entry_point("main")
            // 2. (Multi-Pass) Comment out .with_entry_point() and use .with_multi_pass() instead: (we define the passes above)
            // .with_multi_pass(&passes)
            .with_custom_uniforms::<MyParams>()
            .with_mouse()
            .with_label("My Shader")
            .build();

        // Create the compute shader with automatic hot reload.
        let compute_shader = cuneus::compute_shader!(core, "shaders/my_shader.wgsl", config);

        // Set initial parameters
        compute_shader.set_custom_params(initial_params, &core.queue);

        Self { base, compute_shader, current_params: initial_params }
    }

    fn update(&mut self, core: &Core) {
        // Update time uniform, check for hot-reloads, etc.
        let time = self.base.controls.get_time(&self.base.start_time);
        self.compute_shader.set_time(time, 1.0/60.0, &core.queue);
        self.compute_shader.check_hot_reload(&core.device);
    }

    fn render(&mut self, core: &Core) -> Result<(), wgpu::SurfaceError> {
        // begin_frame() bundles surface texture + view + encoder into a FrameContext
        let mut frame = self.base.begin_frame(core)?;

        // Build the UI (apply_default_style sets the standard theme)
        let full_output = if self.base.key_handler.show_ui {
            self.base.render_ui(core, |ctx| {
                RenderKit::apply_default_style(ctx);
                // ... egui windows here ...
            })
        } else {
            self.base.render_ui(core, |_ctx| {})
        };

        // Apply UI control requests after the UI closure:
        // - Non-media examples: use apply_control_request (handles time reset + param updates)
        //   self.base.apply_control_request(controls_request.clone());
        // - Media examples (video/webcam/hdri): use apply_media_requests (bundles
        //   apply_control_request + handle_video/webcam/hdri_requests in one call)
        //   self.base.apply_media_requests(core, &controls_request);

        // Execute the entire compute pipeline.
        // This works for both single-pass and multi-pass shaders automatically.
        self.compute_shader.dispatch(&mut frame.encoder, core);

        // Blit the compute shader's output texture to the screen
        self.base.renderer.render_to_view(&mut frame.encoder, &frame.view, &self.compute_shader);

        // end_frame() handles UI overlay + submit + present in one call
        self.base.end_frame(core, frame, full_output);

        // Cross-frame feedback (self-referencing passes like buffer_a reading buffer_a)
        // works automatically

        Ok(())
    }
    
    fn resize(&mut self, core: &Core) {
        // default_resize updates resolution uniform + resizes compute shader
        self.base.default_resize(core, &mut self.compute_shader);
    }

    fn handle_input(&mut self, core: &Core, event: &WindowEvent) -> bool {
        // default_handle_input handles egui events + keyboard shortcuts (H to toggle UI, etc.)
        // For DroppedFile support, check the event after this call.
        // For mouse input, add: self.base.handle_mouse_input(core, event, false)
        self.base.default_handle_input(core, event)
    }
}
```

## Standard Bind Group Layout

Your WGSL shaders should follow this layout for predictable resource access.

```wgsl
// Group 0: Per-Frame Data (Engine-Managed)
struct TimeUniform { time: f32, delta: f32, frame: u32, _padding: u32 };
@group(0) @binding(0) var<uniform> time_data: TimeUniform;

// Group 1: Primary Pass I/O & Custom Parameters
@group(1) @binding(0) var output: texture_storage_2d<rgba16float, write>;
// Optional: Your custom uniform struct
@group(1) @binding(1) var<uniform> params: MyParams; 
// Optional: Input texture for image processing
@group(1) @binding(2) var input_texture: texture_2d<f32>;
@group(1) @binding(3) var input_sampler: sampler;

// Group 2: Global Engine Resources
// IMPORTANT: Binding numbers are DYNAMIC based on what you enable in the builder.
// Resources are added in this order: mouse → fonts → audio → audio_spectrum → atomics → channels
// Example 1: Only .with_audio_spectrum() → audio_spectrum is @binding(0)
// Example 2: .with_audio_spectrum() + .with_atomic_buffer() → audio_spectrum @binding(0), atomic_buffer @binding(1)
// Example 3: .with_mouse() + .with_fonts() + .with_audio() → mouse @binding(0), fonts @binding(1-2), audio @binding(3)

// Mouse (if .with_mouse() is used) - takes 1 binding
@group(2) @binding(N) var<uniform> mouse: MouseUniform;
// Fonts (if .with_fonts() is used) - takes 2 bindings (uses textureLoad, no sampler needed)
@group(2) @binding(N) var<uniform> font_uniform: FontUniforms;
@group(2) @binding(N+1) var font_texture: texture_2d<f32>;
// Audio buffer (if .with_audio() is used) - takes 1 binding
@group(2) @binding(N) var<storage, read_write> audio_buffer: array<f32>;
// Audio spectrum (if .with_audio_spectrum() is used) - takes 1 binding
@group(2) @binding(N) var<storage, read> audio_spectrum: array<f32>;
// Atomic buffer (if .with_atomic_buffer() is used) - takes 1 binding
@group(2) @binding(N) var<storage, read_write> atomic_buffer: array<atomic<u32>>;
// Media channels (if .with_channels(2) is used) - takes 2 bindings per channel
@group(2) @binding(N) var channel0: texture_2d<f32>;
@group(2) @binding(N+1) var channel0_sampler: sampler;

// Group 3: User Data & Multi-Pass I/O
// User-defined storage buffers (if .with_storage_buffer() is used, this takes priority)
@group(3) @binding(0) var<storage, read_write> my_data: array<f32>;
// OR: Multi-pass input textures (if .with_multi_pass() is used without storage buffers)
@group(3) @binding(0) var input_texture0: texture_2d<f32>;
@group(3) @binding(1) var input_sampler0: sampler;
```

## Advanced Topics

### Multi-Pass Texture Dependencies

When using `.with_multi_pass()`, the framework uses **ping-pong double-buffering** with per-buffer write tracking. Each buffer independently remembers which side was last written, so **any pass can read from any previous pass's output** — no adjacency restrictions.

**How dependencies map to input textures:**

The `&["dep1", "dep2"]` array in `PassDescription::new()` maps directly by position:

- `deps[0]``input_texture0` in WGSL
- `deps[1]``input_texture1` in WGSL
- `deps[2]``input_texture2` in WGSL (max 3)

If fewer than 3 dependencies are listed, the remaining slots repeat the first dependency.

```rust
// Each pass reads from any previous pass — order doesn't matter
PassDescription::new("structure_tensor", &[]),
PassDescription::new("tensor_field", &["structure_tensor"]),     // input_texture0 = structure_tensor
PassDescription::new("kuwahara", &["tensor_field"]),             // input_texture0 = tensor_field
// Reading from non-adjacent passes is fine:
PassDescription::new("lic_edges", &["tensor_field", "kuwahara"]), // input_texture0 = tensor_field, 1 = kuwahara
PassDescription::new("main_image", &["lic_edges"]),
```

### Workgroup Sizes

- **WGSL is the Source of Truth:** A workgroup size defined in your shader with `@workgroup_size(x, y, z)` will always be used to compile the pipeline.
- **Builder is a Fallback:** `.with_workgroup_size()` is only used if the WGSL entry point has no size decorator.
- **Per-Pass Specificity:** For multi-pass shaders, you can specify a unique workgroup size for each stage. This is critical for performance in algorithms like FFTs or CNNs.

```rust
// See cnn.rs for a practical example
let passes = vec![
    PassDescription::new("conv_layer1", &["canvas_update"])
        .with_workgroup_size([12, 12, 8]), // Custom size for this pass
    PassDescription::new("main_image", &["fully_connected"]), // Uses default or WGSL size
];
```

### Manual Dispatching

For effects like path tracing that require conditional accumulation, use `dispatch_stage()`. This prevents the frame counter from advancing automatically, allowing you to build up an image over multiple real frames that all correspond to a single logical `time_data.frame`.

```rust
// See mandelbulb.rs for a practical example
fn render(&mut self, core: &Core) -> Result<(), wgpu::SurfaceError> {
    // ...
    // Set frame uniform manually for accumulation
    self.compute_shader.time_uniform.data.frame = self.frame_count;
    self.compute_shader.time_uniform.update(&core.queue);
    
    // Dispatch the single stage of the path tracer
    self.compute_shader.dispatch_stage(&mut encoder, core, 0);

    // Only increment the logical frame count when accumulation is active
    if self.current_params.accumulate > 0 {
        self.frame_count += 1;
    }
    // ...
}
```

### Mid-Frame Buffer Updates (`flush_encoder`)

When doing ping-pong buffer simulations, you may need buffer updates to take effect before the next dispatch. wgpu batches all `write_buffer` calls before any dispatches in the same submit, so use `core.flush_encoder()` to force changes through:

```rust
// Update params, submit, get new encoder
self.params.ping = 1 - self.params.ping;
self.compute_shader.set_custom_params(self.params, &core.queue);
frame.encoder = core.flush_encoder(frame.encoder);

// Now the next dispatch sees the updated ping value
self.compute_shader.dispatch_stage(&mut frame.encoder, core, NEXT_PASS);
```

*See `fluidsim.rs` for a full example with 20+ pressure iterations per frame.*

## Media & Integration

### GPU Music Generation & Synthesis

Cuneus supports **bidirectional GPU-CPU audio workflows** using two complementary systems:

**1. Audio Visualization (`.with_audio_spectrum()`)** - Analyze loaded audio/video:
- **Flow**: Media file → GStreamer spectrum analyzer → CPU writes to buffer → GPU reads for visualization
- **Shader Access**: `@group(2) var<storage, read> audio_spectrum: array<f32>` (read-only)
- **Use Case**: Audio visualizers like `audiovis.rs`

**2. Audio Synthesis (`.with_audio()`)** - Generate music on GPU:
- **Flow**: GPU calculates frequencies/amplitudes → writes to buffer → CPU reads → GStreamer plays audio
- **Shader Access**: `@group(2) var<storage, read_write> audio_buffer: array<f32>` (read-write)
- **Use Case**: Music generators like `synth.rs`, `veridisquo.rs`

> **Note:** Audio synthesis examples use two different patterns depending on *who decides what to play*:
> - **GPU-driven** (`veridisquo.rs`): The shader composes the music (frequencies, amplitudes) and writes to the audio buffer. The CPU reads it back with `pollster::block_on(read_audio_buffer(...))` and feeds it to `SynthesisManager`. The GPU is the "composer".
> - **CPU-driven** (`synth.rs`): The user presses keys on the keyboard and the CPU calls `synth.set_voice(...)` directly. No GPU→CPU readback is needed — the GPU only handles visualization.

#### Composing Music on the GPU

You can write entire songs in your compute shader by calculating note sequences, melodies, and synthesis parameters:

```wgsl
// In WGSL: Compose music and write synthesis parameters
// This pattern is from veridisquo.wgsl - a complete GPU-composed song
if (global_id.x == 0u && global_id.y == 0u) {
    // Calculate melody notes based on time
    let beat = u32(u_time.time * tempo / 60.0);
    let melody_note = get_melody_for_beat(beat);
    let bass_note = get_bass_for_beat(beat);

    // Write to audio buffer for CPU playback
    audio_buffer[0] = melody_note.frequency;
    audio_buffer[1] = melody_note.amplitude;
    audio_buffer[2] = bass_note.frequency;
    audio_buffer[3] = bass_note.amplitude;
}
```

```rust
// In Rust: Read GPU-composed music and play it
// This pattern is from veridisquo.rs
if let Ok(data) = pollster::block_on(
    compute.read_audio_buffer(&core.device, &core.queue)
) {
    synth.set_voice(0, data[0], data[1], true);  // Melody
    synth.set_voice(1, data[2], data[3], true);  // Bass
}
```

**Examples:**

- `veridisquo.rs` - Complete GPU-composed song with melody and bassline
- `synth.rs` - Interactive polyphonic synthesizer with ADSR envelopes
- `debugscreen.rs` - Simple tone generation for testing

**Pro-tip - Generic Storage:** The `.with_audio()` buffer is just a `storage, read_write` array of floats. You don't have to use it for audio! Any shader can use it as generic persistent storage:

- `blockgame.rs` - Uses the "audio buffer" to store game state (score, block positions, camera) - no audio at all!
- The buffer persists across frames, making it stateful GPU applications beyond audio synthesis

### External Textures

Two methods for external texture input:

**`.with_input_texture()`** - Single input in **Group 1** (bindings 2-3).

```wgsl
@group(1) @binding(2) var input_texture: texture_2d<f32>;
@group(1) @binding(3) var input_sampler: sampler;
```

```rust
compute_shader.update_input_texture(&tm.view, &tm.sampler, &core.device);
```

**Important for multi-pass:** When using `.dispatch()`, `input_texture` is only available in `main_image` pass. Intermediate passes do not receive it. To access `input_texture` from all passes, use `dispatch_stage()` instead. See `fft.rs` and `computecolors.rs` for this pattern.

**`.with_channels(N)`** - N texture/sampler pairs in **Group 2**. Accessible from **all passes** with both `.dispatch()` and `dispatch_stage()`.

```wgsl
@group(2) @binding(0) var channel0: texture_2d<f32>;
@group(2) @binding(1) var channel0_sampler: sampler;
```

```rust
compute_shader.update_channel_texture(0, &tm.view, &tm.sampler, &core.device, &core.queue);
```

*See `kuwahara.wgsl` where `channel0` is sampled from multiple passes via a helper function.*

**Summary:**

| Method                  | Single-pass | Multi-pass `.dispatch()` | Multi-pass `dispatch_stage()` |
|-------------------------|-------------|--------------------------|-------------------------------|
| `.with_input_texture()` | All passes  | `main_image` only        | All stages                    |
| `.with_channels()`      | All passes  | All passes               | All stages                    |

### Audio Spectrum Analysis (`.with_audio_spectrum()`)

Use `.with_audio_spectrum(69)` to **visualize** audio from loaded media files. GStreamer's spectrum analyzer processes the audio stream and writes frequency data to a GPU buffer that your shader can read.

- **Buffer Layout**:
  - Indices 0-63: frequency band magnitudes (RMS-normalized)
  - Index 64: BPM value
  - Index 65: bass energy (pre-computed, ~0-200Hz)
  - Index 66: mid energy (pre-computed, ~200-4000Hz)
  - Index 67: high energy (pre-computed, ~4000-20000Hz)
  - Index 68: total energy (weighted average)
- **Shader Access**: `@group(2) var<storage, read> audio_spectrum: array<f32>` (read-only)
- **Data Source**: Loaded audio/video files (mp3, wav, ogg, mp4, etc.)
- **Features**: RMS-normalized, real-time BPM detection, pre-computed energy bands
- **Example**: `audiovis.rs` - Spectrum visualizer with beat-synced animations

### Fonts

The `.with_fonts()` method provides texture (see `assets/fonts/fonttexture.png`) needed to render text directly inside your shader

- *Examples: `debugscreen.rs` uses this for its UI, and `cnn.rs` uses it to label its output bars.*