ringkernel-wavesim 0.1.0

Interactive 2D wave propagation showcase for RingKernel
Documentation

RingKernel WaveSim

Interactive 2D acoustic wave propagation simulator showcasing RingKernel's GPU compute and actor model capabilities.

Overview

WaveSim implements a Finite-Difference Time-Domain (FDTD) solver for the 2D wave equation, demonstrating several RingKernel features:

  • Tile-based Actor Model: 16×16 cell tiles as actors with K2K messaging for halo exchange
  • Multiple Backends: CPU (SoA + SIMD + Rayon), CUDA, and WGPU
  • GPU-Only Halo Exchange: Zero host transfers during simulation (CUDA Packed backend)

Performance

Backend 256×256 512×512 Notes
CPU SimulationGrid 35,418 steps/s 7,229 steps/s SoA + SIMD + Rayon
CPU TileKernelGrid 1,357 steps/s Actor model with K2K
CUDA Packed 112,837 steps/s 71,324 steps/s GPU-only halo exchange

GPU vs CPU speedup: 3.1x at 256×256, 9.9x at 512×512

See PERFORMANCE.md for detailed analysis.

Quick Start

Interactive GUI

# CPU backend (default)
cargo run -p ringkernel-wavesim --release

# With GPU compute (requires wgpu feature)
cargo run -p ringkernel-wavesim --release --features wgpu

Click anywhere on the canvas to inject wave impulses.

Benchmarks

# Full benchmark suite (CPU, CUDA, WGPU)
cargo run -p ringkernel-wavesim --bin full_benchmark --release --features "cuda,wgpu"

# CUDA Packed backend benchmark (GPU-only halo exchange)
cargo run -p ringkernel-wavesim --bin bench_packed --release --features cuda

# Verify CUDA correctness against CPU
cargo run -p ringkernel-wavesim --bin verify_packed --release --features cuda

Architecture

Backends

  1. CPU SimulationGrid: Optimized SoA layout with SIMD (f32x8) and Rayon parallelization. Best for small-to-medium grids.

  2. CPU TileKernelGrid: 16×16 tile actors demonstrating RingKernel's K2K messaging. Each tile exchanges halo data with neighbors.

  3. CUDA Packed (recommended for large grids): All tiles packed in a single GPU buffer. Halo exchange happens entirely on GPU via memory copies—zero host transfers during simulation.

Memory Layout (CUDA Packed)

GPU Buffer: [Tile(0,0)][Tile(1,0)]...[Tile(n,m)]

Each tile: 18×18 floats
  ┌───┬────────────────┬───┐
  │ NW│  North Halo    │NE │  ← Row 0 (from neighbor)
  ├───┼────────────────┼───┤
  │ W │  16×16 Interior│ E │  ← Owned cells
  ├───┼────────────────┼───┤
  │ SW│  South Halo    │SE │  ← Row 17 (from neighbor)
  └───┴────────────────┴───┘

Simulation Step (CUDA Packed)

1. exchange_all_halos kernel  ─┐
   (GPU-to-GPU memory copies)  │  Zero host transfers
2. apply_boundary_conditions   │  Only 2-3 kernel launches
3. fdtd_all_tiles kernel      ─┘
4. Swap buffer pointers (host-side, trivial)

Features

Feature Description
cpu CPU backend (default)
cuda NVIDIA CUDA backend
wgpu WebGPU cross-platform backend
simd SIMD optimizations (requires nightly)
all-backends Enable all GPU backends

Files

src/
├── simulation/
│   ├── grid.rs           # CPU SimulationGrid (SoA + SIMD)
│   ├── tile_grid.rs      # TileKernelGrid (actor model)
│   ├── cuda_packed.rs    # CUDA Packed backend
│   ├── cuda_compute.rs   # CUDA per-tile backend
│   └── wgpu_compute.rs   # WGPU backend
├── shaders/
│   ├── fdtd_packed.cu    # CUDA kernels (packed layout)
│   └── fdtd_tile.cu      # CUDA kernels (per-tile)
└── bin/
    ├── wavesim.rs        # Interactive GUI
    ├── benchmark.rs      # Quick benchmark
    ├── full_benchmark.rs # Comprehensive benchmark
    ├── bench_packed.rs   # CUDA Packed benchmark
    └── verify_packed.rs  # Correctness verification

Physics

The simulation solves the 2D acoustic wave equation:

∂²p/∂t² = c² (∂²p/∂x² + ∂²p/∂y²) - γ·∂p/∂t

Where:

  • p = pressure field
  • c = speed of sound (343 m/s default)
  • γ = damping coefficient

Discretized using central differences (FDTD):

p_new = 2p - p_prev + c²Δt²/Δx² · (p_N + p_S + p_E + p_W - 4p) - γ·(p - p_prev)

License

Apache-2.0