Kronos Compute π
π¦ Release Candidate 2 (v0.1.5-rc2): This project has reached release candidate status! The core functionality is stable, the unified safe API is complete, and all critical issues have been resolved. We welcome beta testing and feedback.
A high-performance, compute-only Vulkan implementation in Rust, featuring state-of-the-art GPU compute optimizations.
Overview
Kronos Compute is a streamlined Vulkan implementation that removes all graphics functionality to achieve maximum GPU compute performance. This Rust port not only provides memory-safe abstractions over the C API but also implements cutting-edge optimizations that deliver:
- Zero descriptor updates per dispatch
- β€0.5 barriers per dispatch (83% reduction)
- 30-50% reduction in CPU submit time
- Zero memory allocations in steady state
- 13.9% reduction in structure sizes
π― Key Features
1. Safe Unified API π
- Zero unsafe code required
- Automatic resource management (RAII)
- Builder patterns and fluent interfaces
- Type-safe abstractions
- All optimizations work transparently
2. Advanced Optimizations
Persistent Descriptors
- Set0 reserved for storage buffers with zero updates in hot path
- Parameters passed via push constants (β€128 bytes)
- Eliminates descriptor set allocation and update overhead
Intelligent Barrier Policy
- Smart tracking reduces barriers from 3 per dispatch to β€0.5
- Only three transition types: uploadβread, readβwrite, writeβread
- Vendor-specific optimizations for AMD, NVIDIA, and Intel GPUs
Timeline Semaphore Batching
- One timeline semaphore per queue
- Batch multiple submissions with a single fence
- 30-50% reduction in CPU overhead
Advanced Memory Allocator
- Three-pool system: DEVICE_LOCAL, HOST_VISIBLE|COHERENT, HOST_VISIBLE|CACHED
- Slab-based sub-allocation with 256MB slabs
- Power-of-2 block sizes for O(1) allocation/deallocation
3. Type-Safe Implementation
- Safe handles with phantom types
- Proper error handling with Result types
- Zero-cost abstractions
- Memory safety guarantees
4. Optimized Structures
VkPhysicalDeviceFeatures: 32 bytes (vs 220 in standard Vulkan)VkBufferCreateInfo: Reordered fields for better packingVkMemoryTypeCache: O(1) memory type lookups
π Project Structure
kronos/
βββ src/
β βββ lib.rs # Main library entry point
β βββ sys/ # Low-level FFI types
β βββ core/ # Core Kronos types
β βββ ffi/ # C-compatible function signatures
β βββ implementation/ # Kronos optimizations
βββ benches/ # Performance benchmarks
βββ examples/ # Usage examples
βββ tests/ # Integration and unit tests
βββ shaders/ # SPIR-V compute shaders
βββ scripts/ # Build and validation scripts
βββ docs/ # Documentation
βββ architecture/ # Design documents
β βββ OPTIMIZATION_SUMMARY.md
β βββ VULKAN_COMPARISON.md
β βββ ICD_SUCCESS.md
β βββ COMPATIBILITY.md
βββ benchmarks/ # Performance results
β βββ BENCHMARK_RESULTS.md
βββ qa/ # Quality assurance
β βββ QA_REPORT.md
β βββ MINI_REVIEW.md
β βββ TEST_RESULTS.md
βββ EPIC.md # Project epic and vision
βββ TODO.md # Development roadmap
π οΈ Installation
From crates.io
From Source
Prerequisites
- Rust 1.70 or later
- Vulkan SDK (for ICD loader and validation layers)
- A Vulkan-capable GPU with compute support
- Build tools (gcc/clang on Linux, Visual Studio on Windows, Xcode on macOS)
- (Optional) SPIR-V compiler (glslc or glslangValidator) for shader development
See Development Setup Guide for detailed installation instructions.
Build Steps
# Clone the repository
# Build SPIR-V shaders (optional, pre-built shaders included)
# Build with optimizations enabled
# Run tests
# Run benchmarks
# Run validation scripts
π Benchmarks
Kronos includes comprehensive benchmarks for common compute workloads:
- SAXPY: Vector multiply-add operations (c = a*x + b)
- Reduction: Parallel array summation
- Prefix Sum: Parallel scan algorithm
- GEMM: Dense matrix multiplication (C = A * B)
Each benchmark tests multiple configurations:
- Sizes: 64KB (small), 8MB (medium), 64MB (large)
- Batch sizes: 1, 16, 256 dispatches
- Metrics: descriptor updates, barriers, CPU time, memory allocations
# Run specific benchmark
# Run with custom parameters
π Usage Example
Safe Unified API (Recommended)
use ;
// No unsafe code needed!
let ctx = new?;
// Load shader and create pipeline
let shader = ctx.load_shader?;
let pipeline = ctx.create_pipeline?;
// Create buffers
let input = ctx.create_buffer?;
let output = ctx.create_buffer_uninit?;
// Dispatch compute work
ctx.dispatch
.bind_buffer
.bind_buffer
.workgroups
.execute?;
// Read results
let results: = output.read?;
All optimizations work transparently through the safe API!
Low-Level FFI (Advanced)
use *;
unsafe
π Performance
Based on Mini's optimization targets:
| Metric | Baseline Vulkan | Kronos | Improvement |
|---|---|---|---|
| Descriptor updates/dispatch | 3-5 | 0 | 100% β¬οΈ |
| Barriers/dispatch | 3 | β€0.5 | 83% β¬οΈ |
| CPU submit time | 100% | 50-70% | 30-50% β¬οΈ |
| Memory allocations | Continuous | 0* | 100% β¬οΈ |
| Structure size (avg) | 100% | 86.1% | 13.9% β¬οΈ |
*After initial warm-up
π§ Configuration
Kronos can be configured via environment variables:
KRONOS_ICD_SEARCH_PATHS: Custom Vulkan ICD search pathsVK_ICD_FILENAMES: Standard Vulkan ICD overrideRUST_LOG: Logging level (info, debug, trace)
Runtime configuration through the API:
// Set timeline batch size
set_batch_size?;
// Configure memory pools
set_slab_size?;
β‘ How It Works
Persistent Descriptors
Traditional Vulkan requires updating descriptor sets for each dispatch. Kronos pre-allocates all storage buffer descriptors in Set0 and uses push constants for parameters:
// Traditional: 3-5 descriptor updates per dispatch
vkUpdateDescriptorSets;
vkCmdBindDescriptorSets;
// Kronos: 0 descriptor updates
vkCmdPushConstants;
vkCmdDispatch;
Smart Barriers
Kronos tracks buffer usage patterns and inserts only the minimum required barriers:
// Traditional: 3 barriers per dispatch
vkCmdPipelineBarrier; // uploadβcompute
vkCmdPipelineBarrier; // computeβcompute
vkCmdPipelineBarrier; // computeβdownload
// Kronos: β€0.5 barriers per dispatch (automatic)
Timeline Batching
Instead of submitting each command buffer individually:
// Traditional: N submits, N fences
for cmd in commands
// Kronos: 1 submit, 1 timeline semaphore
new
.add_command_buffer
.add_command_buffer
.submit?;
π Documentation
Comprehensive documentation is available in the docs/ directory:
-
API Documentation:
- Unified Safe API - π Safe, ergonomic Rust API (recommended)
-
Architecture: Design decisions, optimization details, and comparisons
- Optimization Summary - Mini's 4 optimizations explained
- Vulkan Comparison - Differences from standard Vulkan
- ICD Integration - How Kronos integrates with existing drivers
-
Quality Assurance: Test results and validation reports
- QA Report - Comprehensive validation for Sporkle integration
- Test Results - Unit and integration test details
-
Benchmarks: Performance measurements and analysis
- Benchmark Results - Detailed performance metrics
π€ Contributing
Contributions are welcome! Areas of interest:
- SPIR-V shader integration for benchmarks
- Additional vendor-specific optimizations
- Performance profiling on different GPUs
- Safe wrapper API design
- Documentation improvements
Please read our Contributing Guide for details.
π Safety
This crate uses unsafe for FFI compatibility but provides safe abstractions where possible:
// Unsafe C-style API (required for compatibility)
let result = unsafe ;
// Safe Rust wrapper (future work)
let buffer = device.create_buffer?;
All unsafe functions include comprehensive safety documentation.
π¦ Features
implementation- Enable Kronos optimizations and ICD forwardingvalidation- Enable additional safety checks (default)compare-ash- Enable comparison benchmarks with ash
π Status
- β Core implementation complete
- β All optimizations integrated
- β ICD loader with Vulkan forwarding
- β Comprehensive benchmark suite
- β Basic examples working
- β Published to crates.io (v0.1.0)
- β C header generation
- β SPIR-V shader build scripts
- β Safe unified API (NEW!)
- β Compute correctness fixed (1024/1024 correct results)
- β Safety documentation complete (100% coverage)
- β CI/CD pipeline with multi-platform testing
- β Test suite expanded (46 tests passing)
- β³ Production testing
πΊοΈ Roadmap
v0.2.0 (Q1 2025)
- NVIDIA & Intel GPU optimizations
- Multi-queue concurrent dispatch support
- Dynamic memory pool resizing
- Vulkan validation layer support
v0.3.0 (Q2 2025)
- Enhanced Sporkle integration
- Advanced timeline semaphore patterns
- Ray query & cooperative matrix support
- Performance regression testing
v1.0.0 (Q3 2025)
- Production-ready status
- Full Vulkan 1.3 compute coverage
- Platform-specific optimizations
- Enterprise support
See TODO.md for the complete roadmap and contribution opportunities.
π Acknowledgments
- Mini (@notmini) for the groundbreaking optimization techniques
- The Vulkan community for driver support
- Contributors who helped port these optimizations to Rust
π License
This project is dual-licensed under MIT OR Apache-2.0. See LICENSE-MIT and LICENSE-APACHE for details.
Built with β€οΈ and π¦ for maximum GPU compute performance.
Citation
If you use Kronos in your research, please cite: