Skip to main content

Module checkpoint

Module checkpoint 

Source
Expand description

Kernel checkpointing for persistent state snapshot and restore.

This module provides infrastructure for checkpointing persistent GPU kernels, enabling fault tolerance, migration, and debugging capabilities.

§Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CheckpointableKernel                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Control     │  │ Queue       │  │ Device Memory           │  │
│  │ Block       │  │ State       │  │ (pressure, halo, etc.)  │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Checkpoint                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Header      │  │ Metadata    │  │ Compressed Data Chunks  │  │
│  │ (magic,ver) │  │ (kernel_id) │  │ (control,queues,memory) │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   CheckpointStorage                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ File        │  │ Memory      │  │ Cloud (S3/GCS)          │  │
│  │ Backend     │  │ Backend     │  │ Backend                 │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

§Example

use ringkernel_core::checkpoint::{Checkpoint, FileStorage, CheckpointableKernel};

// Create checkpoint from running kernel
let checkpoint = kernel.create_checkpoint()?;

// Save to file
let storage = FileStorage::new("/checkpoints");
storage.save(&checkpoint, "sim_step_1000")?;

// Later: restore from checkpoint
let checkpoint = storage.load("sim_step_1000")?;
kernel.restore_from_checkpoint(&checkpoint)?;

Structs§

Checkpoint
Complete checkpoint containing all kernel state.
CheckpointBuilder
Builder for creating checkpoints incrementally.
CheckpointHeader
Checkpoint file header (64 bytes, fixed size).
CheckpointMetadata
Kernel-specific metadata stored in checkpoint.
ChunkHeader
Header for each data chunk (32 bytes).
DataChunk
A single data chunk in a checkpoint.
FileStorage
File-based checkpoint storage.
MemoryStorage
In-memory checkpoint storage (for testing and fast operations).

Enums§

ChunkType
Chunk types for checkpoint data sections.

Constants§

CHECKPOINT_MAGIC
Magic number for checkpoint files: “RKCKPT01” in ASCII.
CHECKPOINT_VERSION
Current checkpoint format version.
MAX_CHECKPOINT_SIZE
Maximum supported checkpoint size (1 GB).

Traits§

CheckpointStorage
Trait for checkpoint storage backends.
CheckpointableKernel
Trait for kernels that support checkpointing.