Expand description
Kernel checkpointing for persistent state snapshot and restore.
This module provides infrastructure for checkpointing persistent GPU kernels, enabling fault tolerance, migration, and debugging capabilities.
§Architecture
┌─────────────────────────────────────────────────────────────────┐
│ CheckpointableKernel │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Control │ │ Queue │ │ Device Memory │ │
│ │ Block │ │ State │ │ (pressure, halo, etc.) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Checkpoint │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Header │ │ Metadata │ │ Compressed Data Chunks │ │
│ │ (magic,ver) │ │ (kernel_id) │ │ (control,queues,memory) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CheckpointStorage │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ File │ │ Memory │ │ Cloud (S3/GCS) │ │
│ │ Backend │ │ Backend │ │ Backend │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘§Example
ⓘ
use ringkernel_core::checkpoint::{Checkpoint, FileStorage, CheckpointableKernel};
// Create checkpoint from running kernel
let checkpoint = kernel.create_checkpoint()?;
// Save to file
let storage = FileStorage::new("/checkpoints");
storage.save(&checkpoint, "sim_step_1000")?;
// Later: restore from checkpoint
let checkpoint = storage.load("sim_step_1000")?;
kernel.restore_from_checkpoint(&checkpoint)?;Structs§
- Checkpoint
- Complete checkpoint containing all kernel state.
- Checkpoint
Builder - Builder for creating checkpoints incrementally.
- Checkpoint
Header - Checkpoint file header (64 bytes, fixed size).
- Checkpoint
Metadata - Kernel-specific metadata stored in checkpoint.
- Chunk
Header - Header for each data chunk (32 bytes).
- Data
Chunk - A single data chunk in a checkpoint.
- File
Storage - File-based checkpoint storage.
- Memory
Storage - In-memory checkpoint storage (for testing and fast operations).
Enums§
- Chunk
Type - Chunk types for checkpoint data sections.
Constants§
- CHECKPOINT_
MAGIC - Magic number for checkpoint files: “RKCKPT01” in ASCII.
- CHECKPOINT_
VERSION - Current checkpoint format version.
- MAX_
CHECKPOINT_ SIZE - Maximum supported checkpoint size (1 GB).
Traits§
- Checkpoint
Storage - Trait for checkpoint storage backends.
- Checkpointable
Kernel - Trait for kernels that support checkpointing.