# Advanced CUDA-Rust-WASM Patterns
## Introduction
This tutorial covers advanced patterns and techniques for building high-performance GPU applications with CUDA-Rust-WASM. We'll explore sophisticated optimization strategies, neural network integration, and real-world applications.
## Table of Contents
1. [Advanced Memory Patterns](#advanced-memory-patterns)
2. [Kernel Fusion and Optimization](#kernel-fusion-and-optimization)
3. [Neural Network Acceleration](#neural-network-acceleration)
4. [Real-Time Applications](#real-time-applications)
5. [Multi-Kernel Workflows](#multi-kernel-workflows)
6. [Advanced Profiling](#advanced-profiling)
7. [Production Deployment](#production-deployment)
## Advanced Memory Patterns
### Tiled Memory Access with Shared Memory
This pattern is essential for matrix operations and convolutions:
```cuda
// advanced_matmul.cu - Optimized matrix multiplication
#define TILE_SIZE 16
__global__ void tiledMatMul(
float* A, float* B, float* C,
int M, int N, int K
) {
// Shared memory tiles
__shared__ float As[TILE_SIZE][TILE_SIZE];
__shared__ float Bs[TILE_SIZE][TILE_SIZE];
// Thread and block indices
int bx = blockIdx.x, by = blockIdx.y;
int tx = threadIdx.x, ty = threadIdx.y;
// Calculate row and column
int row = by * TILE_SIZE + ty;
int col = bx * TILE_SIZE + tx;
float sum = 0.0f;
// Loop over tiles
for (int t = 0; t < (K + TILE_SIZE - 1) / TILE_SIZE; t++) {
// Load tiles into shared memory with bounds checking
if (row < M && t * TILE_SIZE + tx < K) {
As[ty][tx] = A[row * K + t * TILE_SIZE + tx];
} else {
As[ty][tx] = 0.0f;
}
if (col < N && t * TILE_SIZE + ty < K) {
Bs[ty][tx] = B[(t * TILE_SIZE + ty) * N + col];
} else {
Bs[ty][tx] = 0.0f;
}
__syncthreads();
// Compute partial result
for (int k = 0; k < TILE_SIZE; k++) {
sum += As[ty][k] * Bs[k][tx];
}
__syncthreads();
}
// Write result
if (row < M && col < N) {
C[row * N + col] = sum;
}
}
```
#### Transpilation with Advanced Analysis
```javascript
// Advanced matrix multiplication with optimization
import {
transpileCuda,
analyzeKernel,
NeuralOptimizer,
detectHardware
} from 'cuda-rust-wasm';
class AdvancedMatrixMultiplier {
constructor() {
this.optimizer = new NeuralOptimizer();
this.variants = new Map();
}
async initialize() {
// Load and analyze the kernel
const kernelCode = await this.loadKernel('./advanced_matmul.cu');
// Deep analysis with hardware profiling
const analysis = await analyzeKernel(kernelCode, {
deepAnalysis: true,
hardwareProfile: await detectHardware(),
includeVisualization: true,
performanceModeling: true
});
console.log('Memory access pattern:', analysis.memoryPattern);
console.log('Shared memory efficiency:', analysis.sharedMemoryEfficiency);
console.log('Bank conflicts detected:', analysis.bankConflicts);
// Generate multiple optimized variants
await this.generateVariants(kernelCode, analysis);
}
async generateVariants(kernelCode, analysis) {
const strategies = [
{ name: 'bandwidth_optimal', tileSize: 16, unrollFactor: 4 },
{ name: 'latency_optimal', tileSize: 32, unrollFactor: 2 },
{ name: 'memory_optimal', tileSize: 8, unrollFactor: 8 },
{ name: 'balanced', tileSize: 24, unrollFactor: 3 }
];
for (const strategy of strategies) {
const optimized = await transpileCuda(kernelCode, {
target: 'webgpu',
optimize: true,
applyAnalysis: analysis,
constants: {
TILE_SIZE: strategy.tileSize,
UNROLL_FACTOR: strategy.unrollFactor
},
optimizationStrategy: strategy.name
});
const kernel = await createWebGPUKernel(device, optimized.code, {
workgroupSize: [strategy.tileSize, strategy.tileSize, 1],
enableProfiling: true
});
this.variants.set(strategy.name, {
kernel,
strategy,
performance: null
});
}
}
async benchmarkVariants(M, N, K) {
const results = [];
for (const [name, variant] of this.variants) {
const performance = await this.benchmarkVariant(variant, M, N, K);
variant.performance = performance;
results.push({ name, performance });
}
// Neural network selects optimal variant
const optimal = this.optimizer.selectOptimalVariant(results);
console.log(`Optimal strategy for ${M}x${N}x${K}: ${optimal.name}`);
return optimal;
}
async benchmarkVariant(variant, M, N, K) {
const iterations = 10;
const times = [];
// Generate test matrices
const A = new Float32Array(M * K).map(() => Math.random());
const B = new Float32Array(K * N).map(() => Math.random());
for (let i = 0; i < iterations; i++) {
const start = performance.now();
// Execute kernel
await variant.kernel.writeBuffer(0, A.buffer);
await variant.kernel.writeBuffer(1, B.buffer);
variant.kernel.setArgs({ M, N, K });
const profile = await variant.kernel.dispatchWithProfiling(
Math.ceil(N / variant.strategy.tileSize),
Math.ceil(M / variant.strategy.tileSize)
);
times.push(profile.executionTime);
}
const avgTime = times.reduce((a, b) => a + b) / times.length;
const throughput = (2 * M * N * K) / (avgTime / 1000) / (1000 * 1000 * 1000); // GFLOPS
return { avgTime, throughput, variance: this.calculateVariance(times) };
}
}
```
### Memory Coalescing Optimization
```cuda
// memory_coalescing.cu - Demonstrate coalescing patterns
__global__ void coalesced_transpose(
float* input, float* output,
int width, int height
) {
__shared__ float tile[32][33]; // +1 to avoid bank conflicts
int x = blockIdx.x * 32 + threadIdx.x;
int y = blockIdx.y * 32 + threadIdx.y;
// Coalesced read from input
if (x < width && y < height) {
tile[threadIdx.y][threadIdx.x] = input[y * width + x];
}
__syncthreads();
// Calculate transposed position
x = blockIdx.y * 32 + threadIdx.x;
y = blockIdx.x * 32 + threadIdx.y;
// Coalesced write to output
if (x < height && y < width) {
output[y * height + x] = tile[threadIdx.x][threadIdx.y];
}
}
```
## Kernel Fusion and Optimization
### Automatic Kernel Fusion
```javascript
// Kernel fusion for better performance
class KernelFusion {
constructor() {
this.fusionCandidates = [];
this.optimizer = new NeuralOptimizer();
}
async analyzeFusionOpportunities(kernels) {
const opportunities = [];
for (let i = 0; i < kernels.length - 1; i++) {
for (let j = i + 1; j < kernels.length; j++) {
const analysis = await this.analyzeKernelPair(kernels[i], kernels[j]);
if (analysis.fusionBenefit > 0.2) { // 20% improvement threshold
opportunities.push({
kernels: [i, j],
benefit: analysis.fusionBenefit,
strategy: analysis.recommendedStrategy
});
}
}
}
return opportunities.sort((a, b) => b.benefit - a.benefit);
}
async fuseKernels(kernel1Code, kernel2Code, strategy) {
switch (strategy) {
case 'data_flow':
return await this.fuseDataFlow(kernel1Code, kernel2Code);
case 'memory_bandwidth':
return await this.fuseMemoryBandwidth(kernel1Code, kernel2Code);
case 'compute_bound':
return await this.fuseComputeBound(kernel1Code, kernel2Code);
default:
return await this.fuseGeneric(kernel1Code, kernel2Code);
}
}
async fuseDataFlow(kernel1, kernel2) {
// Analyze data dependencies
const deps = await this.analyzeDataDependencies(kernel1, kernel2);
if (deps.canFuse) {
// Generate fused kernel
const fusedCode = this.generateFusedKernel(kernel1, kernel2, {
eliminateIntermediateBuffers: true,
optimizeMemoryAccess: true,
shareComputations: deps.sharedComputations
});
return await transpileCuda(fusedCode, {
target: 'webgpu',
optimize: true,
fusionOptimizations: true
});
}
return null;
}
}
// Example usage
const fusion = new KernelFusion();
const preprocessing = await loadKernel('./preprocess.cu');
const mainCompute = await loadKernel('./compute.cu');
const postprocessing = await loadKernel('./postprocess.cu');
const opportunities = await fusion.analyzeFusionOpportunities([
preprocessing, mainCompute, postprocessing
]);
console.log('Fusion opportunities:', opportunities);
if (opportunities.length > 0) {
const fusedKernel = await fusion.fuseKernels(
preprocessing,
mainCompute,
opportunities[0].strategy
);
console.log('Fused kernel speedup:', fusedKernel.expectedSpeedup);
}
```
## Neural Network Acceleration
### Integration with ruv-FANN
```javascript
// Neural network acceleration with CUDA-Rust-WASM
import { RuvFANN } from 'ruv-fann';
import { transpileCuda, NeuralOptimizer } from 'cuda-rust-wasm';
class NeuralAcceleratedTraining {
constructor(topology) {
this.network = new RuvFANN(topology);
this.optimizer = new NeuralOptimizer();
this.gpuKernels = new Map();
}
async accelerateTraining() {
// Define training kernels
const kernels = {
forward: this.createForwardKernel(),
backward: this.createBackwardKernel(),
weightUpdate: this.createWeightUpdateKernel(),
activation: this.createActivationKernel()
};
// Transpile and optimize each kernel
for (const [name, kernelCode] of Object.entries(kernels)) {
const optimized = await transpileCuda(kernelCode, {
target: 'webgpu',
optimize: true,
neuralOptimization: true,
performanceTarget: 'throughput'
});
this.gpuKernels.set(name, await createWebGPUKernel(device, optimized.code, {
enableProfiling: true,
optimizationLevel: 3
}));
}
// Setup GPU memory buffers
await this.setupGPUMemory();
}
createForwardKernel() {
return `
__global__ void forward_pass(
float* input,
float* weights,
float* bias,
float* output,
int input_size,
int output_size,
int batch_size
) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int batch = tid / output_size;
int neuron = tid % output_size;
if (batch < batch_size && neuron < output_size) {
float sum = bias[neuron];
for (int i = 0; i < input_size; i++) {
sum += input[batch * input_size + i] *
weights[neuron * input_size + i];
}
// Apply activation function (sigmoid)
output[batch * output_size + neuron] = 1.0f / (1.0f + expf(-sum));
}
}`;
}
createBackwardKernel() {
return `
__global__ void backward_pass(
float* output_error,
float* weights,
float* input_error,
float* activations,
int input_size,
int output_size,
int batch_size
) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int batch = tid / input_size;
int input_idx = tid % input_size;
if (batch < batch_size && input_idx < input_size) {
float error = 0.0f;
for (int o = 0; o < output_size; o++) {
// Sigmoid derivative
float activation = activations[batch * output_size + o];
float derivative = activation * (1.0f - activation);
error += output_error[batch * output_size + o] *
derivative * weights[o * input_size + input_idx];
}
input_error[batch * input_size + input_idx] = error;
}
}`;
}
async trainBatch(inputs, targets) {
const batchSize = inputs.length;
const startTime = performance.now();
// Forward pass
await this.runForwardPass(inputs, batchSize);
// Calculate error
const error = await this.calculateError(targets);
// Backward pass
await this.runBackwardPass(batchSize);
// Update weights
await this.updateWeights(batchSize);
const trainingTime = performance.now() - startTime;
return {
error: error,
trainingTime: trainingTime,
throughput: batchSize / (trainingTime / 1000) // samples/second
};
}
async runForwardPass(inputs, batchSize) {
const forwardKernel = this.gpuKernels.get('forward');
// Copy inputs to GPU
await forwardKernel.writeBuffer(0, new Float32Array(inputs.flat()).buffer);
// Launch kernel
const totalThreads = batchSize * this.network.getOutputSize();
const blockSize = 256;
const gridSize = Math.ceil(totalThreads / blockSize);
const profile = await forwardKernel.dispatchWithProfiling(gridSize, 1, 1);
console.log(`Forward pass: ${profile.executionTime}ms`);
}
// Advanced optimization: adaptive batch sizing
async adaptiveBatchTraining(dataset, initialBatchSize = 32) {
let currentBatchSize = initialBatchSize;
const performanceHistory = [];
for (let epoch = 0; epoch < 100; epoch++) {
const epochStart = performance.now();
let totalError = 0;
// Train with current batch size
for (let i = 0; i < dataset.length; i += currentBatchSize) {
const batch = dataset.slice(i, i + currentBatchSize);
const result = await this.trainBatch(batch.inputs, batch.targets);
totalError += result.error;
}
const epochTime = performance.now() - epochStart;
const throughput = dataset.length / (epochTime / 1000);
performanceHistory.push({
epoch,
batchSize: currentBatchSize,
throughput,
error: totalError / Math.ceil(dataset.length / currentBatchSize)
});
// Adaptive batch size adjustment
if (epoch > 10 && epoch % 10 === 0) {
currentBatchSize = this.optimizer.optimizeBatchSize(performanceHistory);
console.log(`Adjusted batch size to: ${currentBatchSize}`);
}
}
return performanceHistory;
}
}
// Usage example
const neuralTrainer = new NeuralAcceleratedTraining([784, 256, 128, 10]);
await neuralTrainer.accelerateTraining();
// Train with adaptive optimization
const dataset = generateMNISTDataset();
const trainingHistory = await neuralTrainer.adaptiveBatchTraining(dataset);
console.log('Training completed:', trainingHistory);
```
## Real-Time Applications
### Real-Time Video Processing
```javascript
// Real-time video processing with GPU acceleration
class GPUVideoProcessor {
constructor() {
this.kernels = new Map();
this.frameBuffer = null;
this.isProcessing = false;
}
async initialize() {
// Initialize WebGPU
this.device = await this.initializeWebGPU();
// Load and compile video processing kernels
await this.loadKernels();
// Setup video capture
await this.setupVideoCapture();
// Setup output canvas
this.setupCanvas();
}
async loadKernels() {
const kernelSources = {
gaussian_blur: `
__global__ void gaussian_blur(
float* input, float* output,
int width, int height,
float sigma
) {
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height) return;
float sum = 0.0f;
float weight_sum = 0.0f;
int radius = (int)(3.0f * sigma);
for (int dy = -radius; dy <= radius; dy++) {
for (int dx = -radius; dx <= radius; dx++) {
int nx = x + dx;
int ny = y + dy;
if (nx >= 0 && nx < width && ny >= 0 && ny < height) {
float distance = sqrt(dx*dx + dy*dy);
float weight = exp(-(distance*distance) / (2*sigma*sigma));
sum += input[ny * width + nx] * weight;
weight_sum += weight;
}
}
}
output[y * width + x] = sum / weight_sum;
}
`,
edge_detection: `
__global__ void sobel_edge_detection(
float* input, float* output,
int width, int height
) {
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width-1 || y >= height-1 || x < 1 || y < 1) return;
// Sobel X kernel
float gx = -input[(y-1)*width + (x-1)] + input[(y-1)*width + (x+1)]
+ -2*input[y*width + (x-1)] + 2*input[y*width + (x+1)]
+ -input[(y+1)*width + (x-1)] + input[(y+1)*width + (x+1)];
// Sobel Y kernel
float gy = -input[(y-1)*width + (x-1)] - 2*input[(y-1)*width + x] - input[(y-1)*width + (x+1)]
+ input[(y+1)*width + (x-1)] + 2*input[(y+1)*width + x] + input[(y+1)*width + (x+1)];
output[y * width + x] = sqrt(gx*gx + gy*gy);
}
`,
color_correction: `
__global__ void color_correction(
float* input, float* output,
int width, int height,
float brightness, float contrast, float saturation
) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx >= width * height) return;
// Assuming RGB packed format
int pixel_idx = idx * 3;
float r = input[pixel_idx];
float g = input[pixel_idx + 1];
float b = input[pixel_idx + 2];
// Apply brightness and contrast
r = (r - 0.5f) * contrast + 0.5f + brightness;
g = (g - 0.5f) * contrast + 0.5f + brightness;
b = (b - 0.5f) * contrast + 0.5f + brightness;
// Convert to HSV for saturation adjustment
float max_val = fmaxf(fmaxf(r, g), b);
float min_val = fminf(fminf(r, g), b);
float delta = max_val - min_val;
if (delta > 0) {
float h, s, v = max_val;
s = delta / max_val;
// Apply saturation
s *= saturation;
s = fminf(1.0f, fmaxf(0.0f, s));
// Convert back to RGB
if (max_val == r) h = (g - b) / delta;
else if (max_val == g) h = 2 + (b - r) / delta;
else h = 4 + (r - g) / delta;
h *= 60;
if (h < 0) h += 360;
float c = v * s;
float x = c * (1 - fabsf(fmodf(h / 60, 2) - 1));
float m = v - c;
if (h >= 0 && h < 60) { r = c; g = x; b = 0; }
else if (h >= 60 && h < 120) { r = x; g = c; b = 0; }
else if (h >= 120 && h < 180) { r = 0; g = c; b = x; }
else if (h >= 180 && h < 240) { r = 0; g = x; b = c; }
else if (h >= 240 && h < 300) { r = x; g = 0; b = c; }
else { r = c; g = 0; b = x; }
r += m; g += m; b += m;
}
// Clamp values
output[pixel_idx] = fminf(1.0f, fmaxf(0.0f, r));
output[pixel_idx + 1] = fminf(1.0f, fmaxf(0.0f, g));
output[pixel_idx + 2] = fminf(1.0f, fmaxf(0.0f, b));
}
`
};
// Transpile and create kernels
for (const [name, source] of Object.entries(kernelSources)) {
const optimized = await transpileCuda(source, {
target: 'webgpu',
optimize: true,
realTimeOptimization: true
});
this.kernels.set(name, await createWebGPUKernel(this.device, optimized.code, {
workgroupSize: [16, 16, 1],
enableProfiling: true
}));
}
}
async setupVideoCapture() {
this.stream = await navigator.mediaDevices.getUserMedia({
video: {
width: { ideal: 1280 },
height: { ideal: 720 },
frameRate: { ideal: 30 }
}
});
this.video = document.createElement('video');
this.video.srcObject = this.stream;
this.video.autoplay = true;
return new Promise(resolve => {
this.video.onloadedmetadata = () => {
this.width = this.video.videoWidth;
this.height = this.video.videoHeight;
console.log(`Video resolution: ${this.width}x${this.height}`);
resolve();
};
});
}
setupCanvas() {
this.canvas = document.createElement('canvas');
this.canvas.width = this.width;
this.canvas.height = this.height;
this.ctx = this.canvas.getContext('2d');
document.body.appendChild(this.canvas);
}
async processFrame() {
if (this.isProcessing) return;
this.isProcessing = true;
try {
const frameStart = performance.now();
// Capture frame from video
this.ctx.drawImage(this.video, 0, 0);
const imageData = this.ctx.getImageData(0, 0, this.width, this.height);
// Convert to float array
const floatData = new Float32Array(imageData.data.length);
for (let i = 0; i < imageData.data.length; i++) {
floatData[i] = imageData.data[i] / 255.0;
}
// Apply GPU processing pipeline
const processed = await this.runProcessingPipeline(floatData);
// Convert back to image data
const resultData = new Uint8ClampedArray(processed.length);
for (let i = 0; i < processed.length; i++) {
resultData[i] = Math.min(255, Math.max(0, processed[i] * 255));
}
// Display result
const resultImageData = new ImageData(resultData, this.width, this.height);
this.ctx.putImageData(resultImageData, 0, 0);
const frameTime = performance.now() - frameStart;
const fps = 1000 / frameTime;
// Display performance info
this.ctx.fillStyle = 'white';
this.ctx.font = '16px Arial';
this.ctx.fillText(`FPS: ${fps.toFixed(1)} | Frame time: ${frameTime.toFixed(2)}ms`, 10, 30);
} catch (error) {
console.error('Frame processing error:', error);
} finally {\n this.isProcessing = false;\n }\n \n // Schedule next frame\n requestAnimationFrame(() => this.processFrame());\n }\n \n async runProcessingPipeline(inputData) {\n const pipeline = [\n { kernel: 'gaussian_blur', params: { sigma: 1.5 } },\n { kernel: 'edge_detection', params: {} },\n { kernel: 'color_correction', params: { brightness: 0.1, contrast: 1.2, saturation: 1.1 } }\n ];\n \n let currentData = inputData;\n \n for (const stage of pipeline) {\n currentData = await this.runKernelStage(stage.kernel, currentData, stage.params);\n }\n \n return currentData;\n }\n \n async runKernelStage(kernelName, inputData, params) {\n const kernel = this.kernels.get(kernelName);\n \n // Setup buffers\n const inputBuffer = kernel.createBuffer(inputData.byteLength, GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST);\n const outputBuffer = kernel.createBuffer(inputData.byteLength, GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC);\n \n // Copy input data\n await kernel.writeBuffer(0, inputData.buffer);\n \n // Set parameters\n kernel.setArgs({ \n width: this.width, \n height: this.height, \n ...params \n });\n \n // Launch kernel\n const gridX = Math.ceil(this.width / 16);\n const gridY = Math.ceil(this.height / 16);\n \n await kernel.dispatch(gridX, gridY, 1);\n \n // Read result\n const resultBuffer = await kernel.readBuffer(1);\n return new Float32Array(resultBuffer);\n }\n \n startProcessing() {\n console.log('Starting real-time GPU video processing...');\n this.processFrame();\n }\n \n stopProcessing() {\n this.isProcessing = false;\n if (this.stream) {\n this.stream.getTracks().forEach(track => track.stop());\n }\n }\n}\n\n// Usage\nconst processor = new GPUVideoProcessor();\nawait processor.initialize();\nprocessor.startProcessing();\n```\n\n## Production Deployment\n\n### Error Handling and Fallback Strategies\n\n```javascript\n// Production-ready deployment with robust error handling\nclass ProductionGPUApplication {\n constructor() {\n this.primaryRuntime = null;\n this.fallbackRuntime = null;\n this.performanceMonitor = new PerformanceMonitor();\n this.errorRecovery = new ErrorRecovery();\n }\n \n async initialize() {\n try {\n // Try WebGPU first\n this.primaryRuntime = await this.initializeWebGPU();\n console.log('✅ WebGPU runtime initialized');\n \n } catch (webgpuError) {\n console.warn('WebGPU initialization failed:', webgpuError.message);\n \n try {\n // Fallback to WebAssembly\n this.fallbackRuntime = await this.initializeWebAssembly();\n console.log('✅ WebAssembly fallback initialized');\n \n } catch (wasmError) {\n console.error('All GPU runtimes failed:', wasmError.message);\n throw new Error('No GPU runtime available');\n }\n }\n \n // Setup monitoring and error recovery\n this.setupMonitoring();\n this.setupErrorRecovery();\n }\n \n async executeKernel(kernelName, data, params) {\n const startTime = performance.now();\n \n try {\n let result;\n \n if (this.primaryRuntime && await this.checkRuntimeHealth(this.primaryRuntime)) {\n result = await this.primaryRuntime.execute(kernelName, data, params);\n } else if (this.fallbackRuntime) {\n console.warn('Using fallback runtime');\n result = await this.fallbackRuntime.execute(kernelName, data, params);\n } else {\n throw new Error('No runtime available');\n }\n \n // Record successful execution\n this.performanceMonitor.record({\n kernel: kernelName,\n executionTime: performance.now() - startTime,\n dataSize: data.length,\n success: true,\n runtime: this.primaryRuntime ? 'webgpu' : 'wasm'\n });\n \n return result;\n \n } catch (error) {\n // Record failure and attempt recovery\n this.performanceMonitor.record({\n kernel: kernelName,\n executionTime: performance.now() - startTime,\n success: false,\n error: error.message\n });\n \n return await this.errorRecovery.handleError(error, kernelName, data, params);\n }\n }\n \n async checkRuntimeHealth(runtime) {\n try {\n // Simple health check\n const testResult = await runtime.execute('health_check', new Float32Array(10), {});\n return testResult !== null;\n } catch (error) {\n return false;\n }\n }\n \n setupMonitoring() {\n // Monitor performance metrics\n setInterval(() => {\n const metrics = this.performanceMonitor.getMetrics();\n \n if (metrics.errorRate > 0.1) { // 10% error rate threshold\n console.warn('High error rate detected:', metrics.errorRate);\n this.triggerRuntimeSwitch();\n }\n \n if (metrics.averageExecutionTime > metrics.baseline * 2) {\n console.warn('Performance degradation detected');\n this.optimizeRuntime();\n }\n \n }, 10000); // Check every 10 seconds\n }\n \n async triggerRuntimeSwitch() {\n if (this.primaryRuntime && this.fallbackRuntime) {\n console.log('Switching to fallback runtime due to errors');\n [this.primaryRuntime, this.fallbackRuntime] = [this.fallbackRuntime, this.primaryRuntime];\n }\n }\n \n generateDeploymentReport() {\n const metrics = this.performanceMonitor.getMetrics();\n \n return {\n deployment: {\n timestamp: new Date().toISOString(),\n runtime: this.primaryRuntime ? 'webgpu' : 'wasm',\n fallbackAvailable: !!this.fallbackRuntime\n },\n performance: {\n averageExecutionTime: metrics.averageExecutionTime,\n throughput: metrics.throughput,\n errorRate: metrics.errorRate,\n uptime: metrics.uptime\n },\n browser: {\n userAgent: navigator.userAgent,\n webgpuSupported: !!navigator.gpu,\n wasmSupported: typeof WebAssembly !== 'undefined'\n },\n hardware: {\n deviceMemory: navigator.deviceMemory || 'unknown',\n hardwareConcurrency: navigator.hardwareConcurrency,\n platform: navigator.platform\n }\n };\n }\n}\n\n// Usage in production\nconst app = new ProductionGPUApplication();\nawait app.initialize();\n\n// Regular execution with monitoring\nconst result = await app.executeKernel('matrix_multiply', matrices, { size: 1024 });\n\n// Generate deployment report\nconst report = app.generateDeploymentReport();\nconsole.log('Deployment report:', report);\n```\n\nThis advanced tutorial covers sophisticated patterns for building production-ready GPU applications with CUDA-Rust-WASM. The techniques shown here enable developers to create high-performance, robust applications that can handle real-world workloads efficiently.\n\nNext, explore our [Performance Optimization Guide](../performance.md) for even more advanced optimization techniques!