Pepita: Sovereign AI Kernel Interfaces
Pepita is a pure Rust library providing minimal kernel interfaces and distributed computing primitives for Sovereign AI workloads. It combines low-level Linux kernel interfaces (ublk, io_uring, blk-mq) with high-level infrastructure (scheduler, executor, vmm, simd, gpu).
Design Principles (Iron Lotus Framework)
- First-Principles Rust: Zero external dependencies in kernel mode
- Pure Rust Sovereignty: 100% auditable, zero C/C++ dependencies
- Work-Stealing Scheduler: Blumofe-Leiserson algorithm
- Toyota Way Quality: Jidoka, Poka-yoke, Genchi Genbutsu
- EXTREME TDD: 417 tests, comprehensive coverage
Installation
[]
= { = "../pepita" }
Module Overview
Core Kernel Interfaces (no_std compatible)
These modules define data structures compatible with the Linux kernel ABI, enabling userspace block devices and async I/O without any kernel modifications.
| Module | Purpose | Key Types |
|---|---|---|
io_uring |
Linux async I/O interface. Submit I/O operations and receive completions without syscall overhead per operation. | IoUringSqe, IoUringCqe |
ublk |
Userspace block device driver. Implement virtual disks entirely in userspace (like loop devices, but programmable). | UblkCtrlCmd, UblkIoDesc, UblkIoCmd |
blk_mq |
Multi-queue block layer. Manage parallel I/O queues for high-performance NVMe-style storage. | TagSetConfig, Request, RequestOp |
memory |
Physical/virtual memory management. DMA-safe allocations, page management, address translation. | DmaBuffer, PageAllocator, Pfn, PhysAddr, VirtAddr |
error |
Unified error types for all pepita operations. | KernelError, Result |
Distributed Computing (std required)
These modules provide the runtime for executing tasks across CPU cores with work-stealing load balancing.
| Module | Purpose | Key Types |
|---|---|---|
scheduler |
Work-stealing scheduler (Blumofe-Leiserson). Each worker has a deque - pushes/pops from bottom, thieves steal from top. Provides automatic load balancing. | Scheduler, WorkerDeque |
executor |
Execution backends. Takes tasks and runs them on CPU threads, returning stdout/stderr/exit code. | CpuExecutor, Backend |
task |
Task definitions. Wraps a binary with arguments, environment, timeout, priority, and backend selection. | Task, TaskId, ExecutionResult, Priority |
pool |
High-level API combining scheduler + executor. Simple submit(task) interface for common use cases. |
Pool, PoolBuilder |
transport |
Wire protocol for distributed communication. Message framing, length-prefixed serialization. | Message, Transport |
fault |
Fault tolerance primitives. Retry policies, circuit breakers, failure detection for distributed systems. | RetryPolicy, CircuitBreaker |
Sovereign Infrastructure (std required)
These modules provide the building blocks for a complete Docker/Lambda/Kubernetes replacement in pure Rust.
| Module | Purpose | Key Types |
|---|---|---|
zram |
Compressed RAM block device. Stores pages in LZ4-compressed form in memory. Same-page deduplication, zero-page optimization. Typically 3-4x compression ratio. | ZramDevice, ZramConfig, ZramCompressor, ZramStats |
vmm |
KVM-based MicroVM runtime. Creates lightweight VMs with configurable vCPUs, memory, and kernel. Sub-100ms boot time. Used for serverless isolation. | MicroVm, VmConfig, VmState, ExitReason |
virtio |
Virtio device implementations for VM communication. Standard Linux virtio protocol for high-performance VM I/O. | VirtQueue, VirtioVsock, VirtioBlock, VsockAddr |
simd |
SIMD-accelerated vector operations. Auto-detects AVX-512/AVX2/SSE4.1/NEON and uses best available. | SimdCapabilities, SimdOps, MatrixOps |
gpu |
GPU compute via wgpu (Vulkan/Metal/DX12). Cross-platform GPU detection and compute shader execution. | GpuDevice, ComputeKernel, GpuBuffer |
Module Details
io_uring - Async I/O
use ;
// Submission queue entry - describes an I/O operation
let sqe = new;
// Completion queue entry - result of the operation
// user_data links back to the original submission
let cqe: IoUringCqe = /* from kernel */;
assert_eq!; // Success
Why it matters: io_uring eliminates syscall overhead by batching I/O operations. One syscall can submit hundreds of operations and reap hundreds of completions.
ublk - Userspace Block Devices
use ;
// Control command - add a new block device
let cmd = new;
// I/O descriptor - describes a read/write request
let io_desc: UblkIoDesc = /* from kernel */;
let sector = io_desc.start_sector;
let num_sectors = io_desc.nr_sectors;
Why it matters: ublk allows implementing block devices (virtual disks, compressed storage, network-backed storage) entirely in userspace with near-native performance.
zram - Compressed Memory
use ;
// Create a 1GB compressed RAM device
let config = with_size
.compressor;
let device = new?;
// Write a page (4KB)
let data = ;
device.write_page?;
// Check compression stats
let stats = device.stats;
println!;
println!;
Why it matters: zram provides swap/storage that lives in compressed RAM. A 4GB system can effectively have 12-16GB of memory for compressible workloads.
vmm - MicroVM Runtime
use ;
// Configure a MicroVM
let config = builder
.vcpus
.memory_mb
.kernel_path
.build?;
// Create and run
let vm = create?;
assert_eq!;
vm.start?;
assert_eq!;
// VM runs until exit
let exit_reason = vm.run?;
Why it matters: MicroVMs provide hardware-level isolation (like Docker) with sub-100ms cold start (like Lambda). Each function runs in its own VM with dedicated vCPUs and memory.
virtio - VM Device Communication
use ;
// Vsock - socket communication between VM and host
let vsock = new; // CID 3
vsock.activate;
vsock.connect?;
vsock.send?;
// Block device - virtual disk for the VM
let block = new; // 1GB
block.activate;
block.write?;
Why it matters: virtio is the standard interface for high-performance VM I/O. Vsock enables networking without a virtual NIC, block devices provide storage.
simd - Vector Operations
use ;
// Detect CPU capabilities
let caps = detect;
println!; // 512 for AVX-512
// Vector operations
let ops = new;
let a = vec!;
let b = vec!;
let mut c = vec!;
ops.vadd_f32; // c = a + b (SIMD accelerated)
ops.vmul_f32; // c = a * b
let dot = ops.dot_f32; // dot product
// Matrix multiplication
let matrix_ops = new;
matrix_ops.matmul_f32;
Why it matters: SIMD provides 4-16x speedup for numerical operations. AVX-512 processes 16 floats per instruction vs 1 for scalar code.
scheduler - Work Stealing
use Scheduler;
use ;
let scheduler = with_workers;
// Submit tasks with priorities
let task = builder
.binary
.priority
.build?;
scheduler.submit.await?;
// Work stealing happens automatically:
// - Idle workers steal from busy workers' queues
// - Provably optimal load balancing (Blumofe-Leiserson)
Why it matters: Work stealing provides automatic load balancing. If one worker finishes early, it steals work from others rather than sitting idle.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ User Code │
└──────────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────┐
│ pool.rs │
│ (High-level Pool API) │
└──────────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────┐
│ scheduler.rs │
│ (Work-Stealing, Blumofe-Leiserson) │
└──────────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────┐
│ executor.rs │
│ (Backend Dispatch) │
├─────────────┬─────────────┬─────────────┬───────────────────────┤
│ CPU │ GPU │ MicroVM │ SIMD │
│ (threads) │ (wgpu) │ (KVM) │ (AVX/NEON) │
└─────────────┴──────┬──────┴──────┬──────┴───────────┬───────────┘
│ │ │
┌──────▼──────┐ ┌────▼─────┐ ┌───────▼───────┐
│ gpu.rs │ │ vmm.rs │ │ simd.rs │
│ (wgpu) │ │ (KVM) │ │ (AVX-512/NEON)│
└─────────────┘ └────┬─────┘ └───────────────┘
│
┌──────▼──────┐
│ virtio.rs │
│(vsock,block)│
└─────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Kernel Interfaces (no_std) │
├─────────────┬─────────────┬─────────────┬───────────────────────┤
│ io_uring │ ublk │ blk_mq │ memory │
│ (async I/O) │(block dev) │ (multiqueue)│ (DMA/pages) │
└─────────────┴─────────────┴─────────────┴───────────────────────┘
Integration with Repartir
Pepita provides the low-level primitives that repartir uses for its high-level distributed computing API:
// repartir uses pepita's SIMD executor
use ;
let executor = new; // Uses pepita::simd internally
let task = vadd_f32;
let result = executor.execute_simd.await?;
// repartir uses pepita's MicroVM for serverless
use MicroVmExecutor;
let executor = new?; // Uses pepita::vmm internally
Test Results
running 417 tests
test result: ok. 417 passed; 0 failed; 0 ignored
License
MIT License - see LICENSE file for details.
Built with the Iron Lotus Framework Quality is not inspected in; it is built in.