Repartir: Sovereign AI-Grade Distributed Computing

Repartir is a pure Rust library for distributed execution across CPUs, GPUs, and remote machines. Built on the Iron Lotus Framework (Toyota Way principles for systems programming) and validated by the certeza testing methodology.

Features
Installation
Quick Start
Feature Flags
Architecture
Iron Lotus Framework
Testing (Certeza Methodology)
Sovereign AI Principles
Roadmap
Contributing
Documentation
Academic Foundations
Comparison with Existing Systems
License
Acknowledgments

Features

✅ 100% Rust, Zero C/C++: True digital sovereignty through complete auditability
✅ Memory Safety Guaranteed: Provably safe via RustBelt formal verification
✅ Work-Stealing Scheduler: Based on Blumofe & Leiserson (1999)
✅ Priority-Based Execution: High, Normal, and Low priority queues
✅ Fault Tolerance: Task retry, timeout handling, graceful failure
✅ Supply Chain Security: Dependency pinning, binary signing, license enforcement
✅ Iron Lotus Quality: ≥95% coverage target, ≥80% mutation score, formal verification
✅ Certeza Testing: Three-tiered testing (sub-second → minutes → hours)

Installation

# From crates.io
cargo add repartir

# From source
git clone https://github.com/paiml/repartir
cd repartir
cargo install --path .

Quick Start

Add to your Cargo.toml:

[dependencies]
repartir = "0.1"
tokio = { version = "1.35", features = ["rt-multi-thread", "macros"] }

Basic Example

use repartir::{Pool, task::{Task, Backend}};

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    // Create a pool with 4 CPU workers
    let pool = Pool::builder()
        .cpu_workers(4)
        .build()?;

    // Submit a task
    let task = Task::builder()
        .binary("/bin/echo")
        .arg("Hello from Repartir!")
        .backend(Backend::Cpu)
        .build()?;

    let result = pool.submit(task).await?;

    if result.is_success() {
        println!("Output: {}", result.stdout_str()?.trim());
    }

    pool.shutdown().await;
    Ok(())
}

Run the Example

cargo run --example hello_repartir

Comprehensive v1.1 Showcase

See all v1.1 features in action:

# Generate TLS certificates first
./scripts/generate-test-certs.sh ./certs

# Run comprehensive showcase
cargo run --example v1_1_showcase --features full

Demonstrates:

✅ CPU executor with work-stealing (48 workers, Blumofe & Leiserson algorithm)
✅ GPU detection (NVIDIA RTX 4090, 2048 compute units)
✅ TLS encryption (certificate-based auth, TLS 1.3)
✅ Priority scheduling (High/Normal/Low queues)
✅ Parallel speedup (3.82x with 4 workers)
✅ Fault tolerance (graceful error handling)

v2.0 Data Integration Features

Repartir v2.0 introduces Parquet checkpoint storage and data-locality aware scheduling for enterprise-grade distributed computing.

Checkpoint with Parquet Storage (v2.0)

Enable persistent checkpointing with Apache Parquet format (5-10x compression vs JSON):

use repartir::checkpoint::{CheckpointManager, CheckpointId};
use repartir::task::TaskState;

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    // Create checkpoint manager with Parquet backend
    let manager = CheckpointManager::new("./checkpoints")?;

    // Save checkpoint (automatically uses Parquet format)
    let checkpoint_id = CheckpointId::new();
    let state = TaskState::new(vec![1, 2, 3, 4]);
    manager.save(checkpoint_id, &state).await?;

    // Restore checkpoint (supports both Parquet and legacy JSON)
    let restored = manager.restore(checkpoint_id).await?;

    println!("Restored iteration: {}", restored.iteration);
    Ok(())
}

Storage efficiency:

SNAPPY compression: 5-10x smaller checkpoint files
Columnar format: Optimized for analytical queries
Backward compatible: Reads both .parquet and .json formats

Run the example:

cargo run --example checkpoint_example --features checkpoint

Locality-Aware Scheduling (v2.0)

Minimize network transfers by scheduling tasks on workers that already have required data:

use repartir::{Pool, task::Task, scheduler::Scheduler};

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    let scheduler = Scheduler::with_capacity(100);

    // Track data locations
    let worker_id = uuid::Uuid::new_v4();
    scheduler.data_tracker()
        .track_data("dataset_A", worker_id).await;
    scheduler.data_tracker()
        .track_data("dataset_B", worker_id).await;

    // Submit task with affinity (automatically prefers worker_id)
    let task = Task::builder()
        .binary("/usr/bin/process")
        .build()?;

    scheduler.submit_with_data_locality(
        task,
        &["dataset_A".to_string(), "dataset_B".to_string()]
    ).await?;

    // Check locality metrics
    let metrics = scheduler.locality_metrics().await;
    println!("Locality hit rate: {:.1}%", metrics.hit_rate() * 100.0);

    Ok(())
}

Scheduling intelligence:

Affinity scoring: (data_items_present / total_data_items)
Automatic optimization: Prefers workers with matching data
Real-time metrics: Track locality hit rate (0.0 to 1.0)

Performance benefits:

Reduces network I/O for data-intensive workloads
Improves cache utilization
Enables efficient batch processing

Tensor Operations with SIMD (v2.0)

High-performance tensor operations with automatic SIMD optimization:

use repartir::tensor::{TensorExecutor, Tensor};
use repartir::task::Backend;

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    // Create tensor executor with CPU backend
    let executor = TensorExecutor::builder()
        .backend(Backend::Cpu)
        .build()?;

    // SIMD-accelerated operations
    let a = Tensor::from_slice(&[1.0, 2.0, 3.0, 4.0]);
    let b = Tensor::from_slice(&[5.0, 6.0, 7.0, 8.0]);

    // Element-wise operations
    let sum = executor.add(&a, &b).await?;
    let product = executor.mul(&a, &b).await?;

    // Dot product
    let dot = executor.dot(&a, &b).await?;
    println!("Dot product: {}", dot);

    // Scalar operations
    let scaled = executor.scalar_mul(&a, 2.5).await?;

    Ok(())
}

SIMD acceleration:

Leverages trueno library's AVX2/AVX-512 optimizations
2-8x speedup vs scalar operations on modern CPUs
Automatic backend selection based on CPU features
f32 precision for optimal SIMD register utilization

Operations:

Element-wise: add(), sub(), mul(), div()
Dot product: dot()
Scalar: scalar_mul()

Run the example:

cargo run --example tensor_example --features tensor

Feature Flags

Repartir supports multiple execution backends via feature flags:

[dependencies]
# CPU only (default)
repartir = "0.1"

# With GPU support (v1.1+)
repartir = { version = "0.1", features = ["gpu"] }

# With remote execution (v1.1+)
repartir = { version = "0.1", features = ["remote"] }

# With TLS encryption (v1.1+)
repartir = { version = "0.1", features = ["remote-tls"] }

# With Parquet checkpointing (v2.0+)
repartir = { version = "0.1", features = ["checkpoint"] }

# With SIMD tensor operations (v2.0+)
repartir = { version = "0.1", features = ["tensor"] }

# All features
repartir = { version = "0.1", features = ["full"] }

GPU Executor (v1.1+)

The GPU executor uses wgpu for cross-platform GPU compute:

use repartir::executor::gpu::GpuExecutor;
use repartir::executor::Executor;

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    let executor = GpuExecutor::new().await?;
    println!("GPU: {}", executor.device_name());
    println!("Compute units: {}", executor.capacity());
    Ok(())
}

Supported backends:

Vulkan (Linux/Windows/Android)
Metal (macOS/iOS)
DirectX 12 (Windows)
WebGPU (browsers)

Note (v1.1): GPU detection and initialization only. Binary task execution on GPU requires compute shader compilation (v1.2+ with rust-gpu).

cargo run --example gpu_detect --features gpu

TLS Encryption (v1.1+)

Secure remote execution with TLS/SSL encryption using rustls:

use repartir::executor::tls::TlsConfig;

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    // Generate test certificates:
    // ./scripts/generate-test-certs.sh ./certs

    let tls_config = TlsConfig::builder()
        .client_cert("./certs/client.pem")
        .client_key("./certs/client.key")
        .server_cert("./certs/server.pem")
        .server_key("./certs/server.key")
        .ca_cert("./certs/ca.pem")
        .build()?;

    println!("TLS enabled!");
    Ok(())
}

Security features:

TLS 1.3 end-to-end encryption
Certificate-based authentication
Perfect forward secrecy
MITM attack protection

Generate test certificates:

./scripts/generate-test-certs.sh ./certs
cargo run --example tls_example --features remote-tls

⚠️ WARNING: The included certificate generator creates self-signed certificates for TESTING ONLY. For production, use certificates from a trusted CA (Let's Encrypt, DigiCert, etc.).

Messaging Patterns (v1.1+)

Advanced messaging for distributed coordination with PUB/SUB and PUSH/PULL patterns:

Publish-Subscribe (PUB/SUB)

One publisher broadcasts to multiple subscribers:

use repartir::messaging::{PubSubChannel, Message};

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    let channel = PubSubChannel::new();

    // Subscribe to topics
    let mut events = channel.subscribe("events").await;
    let mut alerts = channel.subscribe("alerts").await;

    // Publish messages
    channel.publish("events", Message::text("Task completed")).await?;

    // All subscribers receive broadcast
    if let Some(msg) = events.recv().await {
        println!("Event: {}", msg.as_text()?);
    }

    Ok(())
}

Use cases: Event notifications, logging, monitoring, real-time updates

Push-Pull (PUSH/PULL)

Work distribution with automatic load balancing:

use repartir::messaging::{PushPullChannel, Message};

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    let channel = PushPullChannel::new(100);

    // Producers push work
    channel.push(Message::text("Work item 1")).await?;
    channel.push(Message::text("Work item 2")).await?;

    // Consumers pull work (load balanced)
    let work = channel.pull().await;

    Ok(())
}

Use cases: Work queues, job scheduling, pipeline processing, task distribution

Run examples:

cargo run --example pubsub_example
cargo run --example pushpull_example

Architecture

Repartir follows a clean, layered architecture with pepita providing low-level primitives:

┌─────────────────────────────────────────────────────────────────┐
│                         repartir                                 │
│                  (High-level Distributed API)                    │
├─────────────────────────────────────────────────────────────────┤
│  Pool │ Scheduler │ Serverless │ Checkpoint │ Messaging         │
├─────────────────────────────────────────────────────────────────┤
│                      Executor Backends                           │
│  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐    │
│  │  CPU   │  │  GPU   │  │MicroVM │  │  SIMD  │  │ Remote │    │
│  │(v1.0)  │  │(v1.1+) │  │(v2.0)  │  │(v2.0)  │  │(v1.1+) │    │
│  └────────┘  └────────┘  └───┬────┘  └───┬────┘  └────────┘    │
└──────────────────────────────┼───────────┼──────────────────────┘
                               │           │
┌──────────────────────────────▼───────────▼──────────────────────┐
│                          pepita                                  │
│              (Sovereign AI Kernel Interfaces)                    │
├─────────────┬─────────────┬─────────────┬───────────────────────┤
│   vmm.rs    │  virtio.rs  │  simd.rs    │      zram.rs          │
│  (MicroVMs) │  (devices)  │  (vectors)  │   (compression)       │
├─────────────┼─────────────┼─────────────┼───────────────────────┤
│   gpu.rs    │ scheduler   │  executor   │     pool.rs           │
│  (compute)  │(work-steal) │ (backends)  │   (high-level)        │
├─────────────┴─────────────┴─────────────┴───────────────────────┤
│          io_uring │ ublk │ blk_mq │ memory (Kernel ABI)         │
└─────────────────────────────────────────────────────────────────┘

Module Overview

Module	Purpose	Backend
`executor::cpu`	Execute binaries on CPU threads with work-stealing	Native threads
`executor::gpu`	GPU compute detection and shader execution	wgpu (Vulkan/Metal/DX12)
`executor::microvm`	Hardware-isolated execution in KVM MicroVMs	pepita::vmm
`executor::simd`	SIMD-accelerated vector operations	pepita::simd (AVX-512/NEON)
`executor::remote`	Distributed execution over TCP	rustls TLS 1.3
`serverless`	Function-as-a-Service with warm pools	pepita::vmm + pepita::virtio
`scheduler`	Priority-based work-stealing scheduler	Blumofe-Leiserson algorithm
`checkpoint`	Parquet-based checkpoint storage	Apache Parquet
`messaging`	PUB/SUB and PUSH/PULL patterns	tokio channels

Pepita Integration (v2.0+)

Repartir v2.0 integrates with pepita for hardware-accelerated execution:

SIMD Executor

Execute vectorized operations using AVX-512/AVX2/SSE/NEON:

use repartir::executor::simd::{SimdExecutor, SimdTask};

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    let executor = SimdExecutor::new();

    // Check capabilities
    println!("SIMD: {}-bit vectors", executor.vector_width());

    // Vector addition (SIMD accelerated)
    let a = vec![1.0f32; 10000];
    let b = vec![2.0f32; 10000];
    let task = SimdTask::vadd_f32(a, b);
    let result = executor.execute_simd(task).await?;

    println!("Throughput: {:.2}M elem/s", result.throughput() / 1_000_000.0);
    Ok(())
}

Supported operations: vadd_f32, vadd_f64, vmul_f32, dot_f32, matmul_f32

MicroVM Executor

Hardware-isolated execution with sub-100ms cold start:

use repartir::executor::microvm::{MicroVmExecutor, MicroVmExecutorConfig};

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    let config = MicroVmExecutorConfig::builder()
        .memory_mib(256)
        .vcpus(2)
        .warm_pool(true, 3)  // Keep 3 VMs warm
        .build()?;

    let executor = MicroVmExecutor::new(config)?;

    // VMs from warm pool start in <5ms
    // Cold start is <100ms
    Ok(())
}

Features: KVM isolation, warm pools, jailer security, virtio devices

Serverless Functions

Function-as-a-Service with automatic warm pool management:

use repartir::serverless::{Function, FunctionService, Runtime, Trigger, HttpMethod};
use std::path::PathBuf;

#[tokio::main]
async fn main() -> repartir::error::Result<()> {
    let mut service = FunctionService::new();

    let function = Function::builder()
        .name("process-data")
        .runtime(Runtime::RustNative { binary: PathBuf::from("./target/release/worker") })
        .memory_mib(256)
        .trigger(Trigger::Http {
            path: "/api/process".to_string(),
            methods: vec![HttpMethod::Post],
        })
        .build()?;

    service.register(function)?;

    // Invoke function
    let request = InvocationRequest::new("process-data", b"input data");
    let response = service.invoke(request).await?;

    Ok(())
}

Run examples:

cargo run --example simd_example --features simd
cargo run --example microvm_example --features microvm
cargo run --example serverless_example --features serverless

Iron Lotus Framework

Repartir embodies Toyota Production System principles:

Genchi Genbutsu (現地現物 - "Go and See")

Radical Transparency: Every operation traceable from API → scheduler → executor
No Black Boxes: 100% pure Rust, zero opaque C/C++ libraries
AST-Level Inspection: Code structure visible via pmat

Jidoka (自働化 - "Automation with Human Touch")

Automated Quality Gates: CI enforces clippy, rustfmt, tests, coverage
Andon Cord: Build fails immediately on any defect
No Manual Checks: Machines verify before humans review

Kaizen (改善 - "Continuous Improvement")

Technical Debt Grading: TDG score must never decrease
Ratchet Effect: Each PR improves or maintains quality
Five Whys: Root cause analysis for all incidents

Muda (無駄 - "Waste Elimination")

No Overproduction: Zero YAGNI features
No Waiting: Fast compilation with sccache
No Transportation: Zero-copy data flow, single language
No Defects: EXTREME TDD with mutation testing

Testing (Certeza Methodology)

Repartir uses a three-tiered testing approach:

Tier 1: ON-SAVE (Sub-Second)

Fast feedback for flow state:

make tier1

Unit tests (21 tests)
cargo check
cargo clippy
cargo fmt

Target: < 3 seconds

Tier 2: ON-COMMIT (1-5 Minutes)

Comprehensive pre-commit gate:

make tier2

All tests (21 unit + 4 property + 4 doc = 29 tests)
Property-based tests (proptest)
Coverage analysis (target ≥95%)
Documentation tests
Security audit (cargo-audit, cargo-deny)

Target: 1-5 minutes

Tier 3: ON-MERGE (Hours)

Exhaustive validation:

make tier3

Mutation testing (cargo-mutants, target ≥80%)
Formal verification (Kani, for critical paths)
Extended fuzzing
Performance benchmarks

Target: 1-6 hours (run overnight or in CI)

Test Results (v2.0)

✓ 190 unit tests         (0.12s)  - lib tests
✓ 32 integration tests   (0.10s)  - pepita integration
✓ 4 property-based tests (1.53s)  - proptest
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  226 tests PASSED

pepita (dependency):
✓ 417 unit tests         (0.71s)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  643 total tests across pepita + repartir

Sovereign AI Principles

Digital Sovereignty Requirements

Auditability: Trace execution from user API → hardware instruction
Supply Chain Independence: Deterministic rebuild from source
No Foreign Dependencies: Zero opaque binary blobs
Memory Safety Guarantees: Provable absence of vulnerabilities

Supply Chain Security

Dependency Pinning: Cargo.lock committed and reviewed
License Enforcement: Only MIT/Apache-2.0/BSD allowed (via cargo-deny)
Binary Signing: ed25519 signatures for distributed binaries
Audit Trail: cargo tree logged in CI

Memory Safety (NSA/CISA Mandate)

Per NSA/CISA joint guidance on memory-safe languages:

✅ Rust provides compile-time memory safety guarantees
✅ RustBelt formal verification proves soundness
✅ #![deny(unsafe_code)] in v1.0 (no unsafe code)
✅ Eliminates buffer overflows, use-after-free, data races

Roadmap

v1.0: Sovereign Foundation (Current)

✅ CPU executor with work-stealing scheduler
✅ Priority-based task scheduling
✅ High-level Pool API
✅ Comprehensive testing (29 tests)
✅ Iron Lotus quality gates
✅ Supply chain security

v1.1: Production Hardening (Complete)

✅ GPU executor skeleton (wgpu detection, v1.2 for rust-gpu compute)
✅ Remote executor (TCP transport, length-prefixed bincode protocol)
✅ TLS encryption (rustls, certificate-based auth, TLS 1.3)
✅ Performance benchmarks vs Ray/Dask (5 benchmark suites)
✅ Mutation testing ≥85% (framework + documentation)
✅ Comprehensive Makefile (tier1/tier2/tier3, coverage enforcement)
✅ bashrs purification (POSIX-compliant shell code)
✅ Advanced messaging patterns (PUB/SUB, PUSH/PULL)

v2.0: Data Integration (In Progress)

✅ Parquet Checkpoint Storage (Phase 1): SNAPPY compression, 5-10x size reduction
✅ Data-Locality Tracking (Phase 1): DataLocationTracker with batch affinity queries
✅ Affinity-Based Scheduling (Phase 2): Automatic locality-aware task assignment
✅ Locality Metrics (Phase 2): Real-time hit rate tracking (0.0 to 1.0)
✅ Tensor Operations (Phase 3): SIMD-accelerated operations via trueno (2-8x speedup)
✅ Performance Benchmarks (Phase 2): Locality scheduling validated (<1ms overhead)
Advanced ML patterns (pipeline/tensor parallelism) - Phase 4

v3.0: Enterprise & Cloud

RDMA support (low-latency networking)
Multi-tenant isolation
Kubernetes operator
FIPS 140-2 compliance mode

Contributing

Contributions welcome! Please ensure:

All tests pass: make tier2
Coverage ≥95%: make coverage (when configured)
Clippy passes: cargo clippy -- -D warnings
Code formatted: cargo fmt
No SATD comments (TODO without ticket number)

See Iron Lotus Code Review Framework for detailed guidelines.

Documentation

Full Specification (1,500 lines, 10+ academic citations)
API Documentation (run cargo doc --open)
Examples - Hands-on demonstrations

Academic Foundations

Repartir is grounded in peer-reviewed research:

Jung et al. (2017) - RustBelt: Formal verification of Rust's safety
Blumofe & Leiserson (1999) - Provably optimal work-stealing
Chandra & Toueg (1996) - Unreliable failure detectors
NSA/CISA (2023) - Memory-safe languages guidance
Pereira et al. (2017) - Energy efficiency of Rust vs C++

See specification for complete citations.

Comparison with Existing Systems

Feature	Repartir (v1.1)	Ray	Dask
Language	Rust	Python	Python
C Dependencies	Zero*	Many	Some
GPU Support	Yes (wgpu)	Limited	No
Work Stealing	Yes	No	Yes
Fault Tolerance	Yes	Yes	Limited
Memory Safety	Guaranteed	Runtime	Runtime
Binary Execution	Yes	No	No
Remote Execution	Yes (TCP)	Yes	Yes

*Note: rustls (used for TLS) currently depends on aws-lc-rs (C). Pure Rust alternatives under evaluation for v1.2+.

License

MIT License - see LICENSE file for details.

Acknowledgments

Iron Lotus Framework: Toyota Production System for systems programming
Certeza Project: Asymptotic test effectiveness methodology
PAIML Stack: trueno, aprender, paiml-mcp-agent-toolkit, bashrs
Rust Community: rust-gpu, wgpu, tokio, and the broader ecosystem

Built with the Iron Lotus Framework Quality is not inspected in; it is built in.

repartir 2.0.1