Skip to main content

Module checkpoint

Module checkpoint 

Source
Expand description

Kernel checkpointing for persistent state snapshot and restore.

This module provides infrastructure for checkpointing persistent GPU kernels, enabling fault tolerance, migration, and debugging capabilities.

§Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CheckpointableKernel                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Control     │  │ Queue       │  │ Device Memory           │  │
│  │ Block       │  │ State       │  │ (pressure, halo, etc.)  │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Checkpoint                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ Header      │  │ Metadata    │  │ Compressed Data Chunks  │  │
│  │ (magic,ver) │  │ (kernel_id) │  │ (control,queues,memory) │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   CheckpointStorage                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │ File        │  │ Memory      │  │ Cloud (S3/GCS)          │  │
│  │ Backend     │  │ Backend     │  │ Backend                 │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

§Example

use ringkernel_core::checkpoint::{Checkpoint, FileStorage, CheckpointableKernel};

// Create checkpoint from running kernel
let checkpoint = kernel.create_checkpoint()?;

// Save to file
let storage = FileStorage::new("/checkpoints");
storage.save(&checkpoint, "sim_step_1000")?;

// Later: restore from checkpoint
let checkpoint = storage.load("sim_step_1000")?;
kernel.restore_from_checkpoint(&checkpoint)?;

Structs§

Checkpoint
Complete checkpoint containing all kernel state.
CheckpointBuilder
Builder for creating checkpoints incrementally.
CheckpointConfig
Configuration for periodic actor state checkpointing.
CheckpointHeader
Checkpoint file header (64 bytes, fixed size).
CheckpointManager
Manages periodic checkpointing for persistent GPU actors.
CheckpointMetadata
Kernel-specific metadata stored in checkpoint.
ChunkHeader
Header for each data chunk (32 bytes).
DataChunk
A single data chunk in a checkpoint.
FileStorage
File-based checkpoint storage.
MemoryStorage
In-memory checkpoint storage (for testing and fast operations).
SnapshotRequest
A request to snapshot a specific actor’s state.
SnapshotResponse
A completed snapshot response from the device.

Enums§

ChunkType
Chunk types for checkpoint data sections.

Constants§

CHECKPOINT_MAGIC
Magic number for checkpoint files: “RKCKPT01” in ASCII.
CHECKPOINT_VERSION
Current checkpoint format version.
DELTA_PARENT_DIGEST_KEY
Metadata custom key that records a delta checkpoint’s parent.
MAX_CHECKPOINT_SIZE
Maximum supported checkpoint size (1 GB).

Traits§

CheckpointStorage
Trait for checkpoint storage backends.
CheckpointableKernel
Trait for kernels that support checkpointing.