Kronos Compute π
A high-performance, compute-only Vulkan implementation in Rust, featuring state-of-the-art GPU compute optimizations.
Overview
Kronos Compute is a streamlined Vulkan implementation that removes all graphics functionality to achieve maximum GPU compute performance. This Rust port not only provides memory-safe abstractions over the C API but also implements cutting-edge optimizations that deliver:
- Zero descriptor updates per dispatch
- β€0.5 barriers per dispatch (83% reduction)
- 30-50% reduction in CPU submit time
- Zero memory allocations in steady state
- 13.9% reduction in structure sizes
π― Key Features
1. Advanced Optimizations
Persistent Descriptors
- Set0 reserved for storage buffers with zero updates in hot path
- Parameters passed via push constants (β€128 bytes)
- Eliminates descriptor set allocation and update overhead
Intelligent Barrier Policy
- Smart tracking reduces barriers from 3 per dispatch to β€0.5
- Only three transition types: uploadβread, readβwrite, writeβread
- Vendor-specific optimizations for AMD, NVIDIA, and Intel GPUs
Timeline Semaphore Batching
- One timeline semaphore per queue
- Batch multiple submissions with a single fence
- 30-50% reduction in CPU overhead
Advanced Memory Allocator
- Three-pool system: DEVICE_LOCAL, HOST_VISIBLE|COHERENT, HOST_VISIBLE|CACHED
- Slab-based sub-allocation with 256MB slabs
- Power-of-2 block sizes for O(1) allocation/deallocation
2. Type-Safe Rust API
3. Optimized Structures
VkPhysicalDeviceFeatures: 32 bytes (vs 220 in standard Vulkan)VkBufferCreateInfo: Reordered fields for better packingVkMemoryTypeCache: O(1) memory type lookups
π Project Structure
kronos/
βββ src/
β βββ lib.rs # Main library entry point
β βββ sys/ # Low-level FFI types
β βββ core/ # Core Kronos types
β βββ ffi/ # C-compatible function signatures
β βββ implementation/ # Kronos optimizations
βββ benches/ # Performance benchmarks
βββ examples/ # Usage examples
βββ tests/ # Integration and unit tests
βββ shaders/ # SPIR-V compute shaders
βββ scripts/ # Build and validation scripts
βββ docs/ # Documentation
βββ architecture/ # Design documents
β βββ OPTIMIZATION_SUMMARY.md
β βββ VULKAN_COMPARISON.md
β βββ ICD_SUCCESS.md
β βββ COMPATIBILITY.md
βββ benchmarks/ # Performance results
β βββ BENCHMARK_RESULTS.md
βββ qa/ # Quality assurance
β βββ QA_REPORT.md
β βββ MINI_REVIEW.md
β βββ TEST_RESULTS.md
βββ EPIC.md # Project epic and vision
βββ TODO.md # Development roadmap
π οΈ Installation
From crates.io (Coming Soon)
From Source
Prerequisites
- Rust 1.70 or later
- Vulkan SDK (for ICD loader)
- A Vulkan-capable GPU
Build Steps
# Clone the repository
# Build with optimizations enabled
# Run tests
# Run benchmarks
# Run validation scripts
π Benchmarks
Kronos includes comprehensive benchmarks for common compute workloads:
- SAXPY: Vector multiply-add operations (c = a*x + b)
- Reduction: Parallel array summation
- Prefix Sum: Parallel scan algorithm
- GEMM: Dense matrix multiplication (C = A * B)
Each benchmark tests multiple configurations:
- Sizes: 64KB (small), 8MB (medium), 64MB (large)
- Batch sizes: 1, 16, 256 dispatches
- Metrics: descriptor updates, barriers, CPU time, memory allocations
# Run specific benchmark
# Run with custom parameters
π Usage Example
use *;
unsafe
π Performance
Based on Mini's optimization targets:
| Metric | Baseline Vulkan | Kronos | Improvement |
|---|---|---|---|
| Descriptor updates/dispatch | 3-5 | 0 | 100% β¬οΈ |
| Barriers/dispatch | 3 | β€0.5 | 83% β¬οΈ |
| CPU submit time | 100% | 50-70% | 30-50% β¬οΈ |
| Memory allocations | Continuous | 0* | 100% β¬οΈ |
| Structure size (avg) | 100% | 86.1% | 13.9% β¬οΈ |
*After initial warm-up
π§ Configuration
Kronos can be configured via environment variables:
KRONOS_ICD_SEARCH_PATHS: Custom Vulkan ICD search pathsVK_ICD_FILENAMES: Standard Vulkan ICD overrideRUST_LOG: Logging level (info, debug, trace)
Runtime configuration through the API:
// Set timeline batch size
set_batch_size?;
// Configure memory pools
set_slab_size?;
β‘ How It Works
Persistent Descriptors
Traditional Vulkan requires updating descriptor sets for each dispatch. Kronos pre-allocates all storage buffer descriptors in Set0 and uses push constants for parameters:
// Traditional: 3-5 descriptor updates per dispatch
vkUpdateDescriptorSets;
vkCmdBindDescriptorSets;
// Kronos: 0 descriptor updates
vkCmdPushConstants;
vkCmdDispatch;
Smart Barriers
Kronos tracks buffer usage patterns and inserts only the minimum required barriers:
// Traditional: 3 barriers per dispatch
vkCmdPipelineBarrier; // uploadβcompute
vkCmdPipelineBarrier; // computeβcompute
vkCmdPipelineBarrier; // computeβdownload
// Kronos: β€0.5 barriers per dispatch (automatic)
Timeline Batching
Instead of submitting each command buffer individually:
// Traditional: N submits, N fences
for cmd in commands
// Kronos: 1 submit, 1 timeline semaphore
new
.add_command_buffer
.add_command_buffer
.submit?;
π Documentation
Comprehensive documentation is available in the docs/ directory:
-
Architecture: Design decisions, optimization details, and comparisons
- Optimization Summary - Mini's 4 optimizations explained
- Vulkan Comparison - Differences from standard Vulkan
- ICD Integration - How Kronos integrates with existing drivers
-
Quality Assurance: Test results and validation reports
- QA Report - Comprehensive validation for Sporkle integration
- Test Results - Unit and integration test details
-
Benchmarks: Performance measurements and analysis
- Benchmark Results - Detailed performance metrics
π€ Contributing
Contributions are welcome! Areas of interest:
- SPIR-V shader integration for benchmarks
- Additional vendor-specific optimizations
- Performance profiling on different GPUs
- Safe wrapper API design
- Documentation improvements
Please read our Contributing Guide for details.
π Safety
This crate uses unsafe for FFI compatibility but provides safe abstractions where possible:
// Unsafe C-style API (required for compatibility)
let result = unsafe ;
// Safe Rust wrapper (future work)
let buffer = device.create_buffer?;
All unsafe functions include comprehensive safety documentation.
π¦ Features
implementation- Enable Kronos optimizations and ICD forwardingvalidation- Enable additional safety checks (default)compare-ash- Enable comparison benchmarks with ash
π Status
- β Core implementation complete
- β All optimizations integrated
- β ICD loader with Vulkan forwarding
- β Comprehensive benchmark suite
- β Basic examples working
- β³ SPIR-V shader integration for benchmarks
- β³ Safe wrapper API
- β³ Production testing
π Acknowledgments
- Mini (@notmini) for the groundbreaking optimization techniques
- The Vulkan community for driver support
- Contributors who helped port these optimizations to Rust
π License
This project is dual-licensed under MIT OR Apache-2.0. See LICENSE-MIT and LICENSE-APACHE for details.
Built with β€οΈ and π¦ for maximum GPU compute performance.
Citation
If you use Kronos in your research, please cite: