Crate ringkernel

Crate ringkernel 

Source
Expand description

§RingKernel

GPU-native persistent actor model framework for Rust.

RingKernel is a Rust port of DotCompute’s Ring Kernel system, enabling GPU-accelerated actor systems with persistent kernels, lock-free message passing, and hybrid logical clocks for causal ordering.

§Features

  • Persistent GPU-resident state across kernel invocations
  • Lock-free message passing between kernels (K2K messaging)
  • Hybrid Logical Clocks (HLC) for temporal ordering
  • Multiple GPU backends: CUDA, Metal, WebGPU
  • Type-safe serialization via rkyv/zerocopy

§Quick Start

use ringkernel::prelude::*;

#[derive(RingMessage)]
struct AddRequest {
    #[message(id)]
    id: MessageId,
    a: f32,
    b: f32,
}

#[tokio::main]
async fn main() -> Result<()> {
    // Create runtime with auto-detected backend
    let runtime = RingKernel::builder()
        .backend(Backend::Auto)
        .build()
        .await?;

    // Launch a kernel
    let kernel = runtime.launch("adder", LaunchOptions::default()).await?;
    kernel.activate().await?;

    // Send a message
    kernel.send(AddRequest {
        id: MessageId::generate(),
        a: 1.0,
        b: 2.0,
    }).await?;

    // Receive response
    let response = kernel.receive().await?;
    println!("Result: {:?}", response);

    kernel.terminate().await?;
    Ok(())
}

§Backends

RingKernel supports multiple GPU backends:

  • CPU - Testing and fallback (always available)
  • CUDA - NVIDIA GPUs (requires cuda feature)
  • Metal - Apple GPUs (requires metal feature, macOS only)
  • WebGPU - Cross-platform via wgpu (requires wgpu feature)

Enable backends via Cargo features:

[dependencies]
ringkernel = { version = "0.1", features = ["cuda", "wgpu"] }

§Architecture

┌─────────────────────────────────────────────────────────┐
│                    Host (CPU)                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐  │
│  │ Application │──│   Runtime   │──│  Message Bridge │  │
│  └─────────────┘  └─────────────┘  └─────────────────┘  │
└──────────────────────────┬──────────────────────────────┘
                           │ DMA Transfers
┌──────────────────────────┴──────────────────────────────┐
│                   Device (GPU)                          │
│  ┌───────────┐  ┌───────────────┐  ┌───────────────┐    │
│  │ Control   │  │ Input Queue   │  │ Output Queue  │    │
│  │ Block     │  │ (lock-free)   │  │ (lock-free)   │    │
│  └───────────┘  └───────────────┘  └───────────────┘    │
│  ┌─────────────────────────────────────────────────┐    │
│  │         Persistent Kernel (your code)          │    │
│  └─────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘

Re-exports§

pub use ringkernel_codegen as codegen;

Modules§

availability
Check availability of backends at runtime.
context
Ring context providing GPU intrinsics facade for kernel handlers.
control
Control block for kernel state management.
error
Error types for RingKernel operations.
hlc
Hybrid Logical Clock (HLC) implementation for causal ordering.
k2k
Kernel-to-Kernel (K2K) direct messaging.
memory
GPU and host memory management abstractions.
message
Message types and traits for kernel-to-kernel communication.
multi_gpu
Multi-GPU coordination and load balancing.
prelude
Prelude module for convenient imports.
pubsub
Topic-based publish/subscribe messaging.
queue
Lock-free message queue implementation.
runtime
Runtime traits and types for kernel management.
telemetry
Telemetry and metrics collection for kernel monitoring.
telemetry_pipeline
Real-time telemetry pipeline for streaming metrics.
types
Core type definitions for GPU thread identification and coordination.

Structs§

BlockId
Block ID within a grid (0 to grid_size - 1).
ControlBlock
Kernel control block (128 bytes, cache-line aligned).
CpuRuntime
CPU-based implementation of RingKernelRuntime.
GlobalThreadId
Global thread ID across all blocks.
HlcTimestamp
Hybrid Logical Clock timestamp.
KernelHandle
Handle to a launched kernel.
KernelId
Unique kernel identifier.
KernelStatus
Kernel status including state and metrics.
LaunchOptions
Options for launching a kernel.
MemoryPool
Memory pool for efficient allocation/deallocation.
MessageHeader
Fixed-size message header (256 bytes, cache-line aligned).
MessageId
Unique message identifier.
PinnedMemory
Pinned (page-locked) host memory for efficient DMA transfers.
QueueStats
Statistics for a message queue.
RingContext
GPU intrinsics facade for kernel handlers.
RingKernel
Main RingKernel runtime facade.
RingKernelBuilder
Builder for RingKernel runtime.
TelemetryBuffer
Telemetry buffer (64 bytes, cache-line aligned).
ThreadId
Thread ID within a block (0 to block_size - 1).
WarpId
Warp ID within a block.

Enums§

Backend
GPU backend type.
KernelState
Kernel lifecycle state.
Priority
Message priority levels.
RingKernelError
Comprehensive error type for RingKernel operations.

Traits§

DeviceMemory
Trait for device memory allocation.
GpuBuffer
Trait for GPU buffer operations.
MessageQueue
Trait for message queue implementations.
RingKernelRuntime
Backend-agnostic runtime trait for kernel management.
RingMessage
Trait for types that can be sent as kernel messages.

Functions§

registered_kernels
Get list of registered kernels from the inventory.

Type Aliases§

Result
Result type alias for RingKernel operations.

Attribute Macros§

ring_kernel
Attribute macro for defining ring kernel handlers.
stencil_kernel
Attribute macro for defining stencil kernels that transpile to CUDA.

Derive Macros§

GpuType
Derive macro for GPU-compatible types.
RingMessage
Derive macro for implementing the RingMessage trait.