kronos-compute 0.2.0-rc7

A high-performance compute-only Vulkan implementation with cutting-edge GPU optimizations
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
# Kronos Compute 🚀

> **📦 Release Candidate 7 (v0.2.0-rc7): Entry point logging to verify Kronos implementation is being used! 🍬 Helps diagnose if vkCreateBuffer calls are reaching our code.**

[![Crates.io](https://img.shields.io/crates/v/kronos-compute.svg)](https://crates.io/crates/kronos-compute)
[![Documentation](https://docs.rs/kronos-compute/badge.svg)](https://docs.rs/kronos-compute)
[![Windows CI](https://github.com/LynnColeArt/kronos-compute/actions/workflows/windows.yml/badge.svg)](https://github.com/LynnColeArt/kronos-compute/actions/workflows/windows.yml)
[![License](https://img.shields.io/crates/l/kronos-compute.svg)](https://github.com/LynnColeArt/kronos-compute)

A high-performance, compute-only Vulkan implementation in Rust, featuring state-of-the-art GPU compute optimizations.

## Overview

Kronos Compute is a streamlined Vulkan implementation that removes all graphics functionality to achieve maximum GPU compute performance. This Rust port not only provides memory-safe abstractions over the C API but also implements cutting-edge optimizations that deliver:

- **Zero descriptor updates** per dispatch
- **≤0.5 barriers** per dispatch (83% reduction)
- **30-50% reduction** in CPU submit time
- **Zero memory allocations** in steady state
- **13.9% reduction** in structure sizes

## 🎯 Key Features

### 1. **Safe Unified API** 🆕

- Zero unsafe code required
- Automatic resource management (RAII)
- Builder patterns and fluent interfaces
- Type-safe abstractions
- All optimizations work transparently

### 2. **Advanced Optimizations**

#### Persistent Descriptors
- Set0 reserved for storage buffers with zero updates in hot path
- Parameters passed via push constants (≤128 bytes)
- Eliminates descriptor set allocation and update overhead

#### Intelligent Barrier Policy
- Smart tracking reduces barriers from 3 per dispatch to ≤0.5
- Only three transition types: upload→read, read→write, write→read
- Vendor-specific optimizations for AMD, NVIDIA, and Intel GPUs

#### Timeline Semaphore Batching
- One timeline semaphore per queue
- Batch multiple submissions with a single fence
- 30-50% reduction in CPU overhead

#### Advanced Memory Allocator
- Three-pool system: DEVICE_LOCAL, HOST_VISIBLE|COHERENT, HOST_VISIBLE|CACHED
- Slab-based sub-allocation with 256MB slabs
- Power-of-2 block sizes for O(1) allocation/deallocation

### 3. **Type-Safe Implementation**
- Safe handles with phantom types
- Proper error handling with Result types
- Zero-cost abstractions
- Memory safety guarantees

### 4. **Smart ICD Loader** (Enhanced in v0.2.0)
- Automatically discovers all available Vulkan drivers
- Prioritizes hardware drivers (AMD, NVIDIA, Intel) over software renderers
- No manual `VK_ICD_FILENAMES` configuration needed
- Falls back to software rendering only when no hardware is available
- Clear logging of available and selected drivers
 - Robust library resolution: resolves `library_path` as provided (via dynamic linker search) and relative to the manifest directory
 - Detailed discovery logs: search paths, discovered JSON files, load attempts, and per-candidate errors

### 5. **Optimized Structures**
- `VkPhysicalDeviceFeatures`: 32 bytes (vs 220 in standard Vulkan)
- `VkBufferCreateInfo`: Reordered fields for better packing
- `VkMemoryTypeCache`: O(1) memory type lookups

## 📁 Project Structure

```
kronos/
├── src/
│   ├── lib.rs              # Main library entry point
│   ├── sys/                # Low-level FFI types
│   ├── core/               # Core Kronos types
│   ├── ffi/                # C-compatible function signatures
│   └── implementation/     # Kronos optimizations
├── benches/                # Performance benchmarks
├── examples/               # Usage examples
├── tests/                  # Integration and unit tests
├── shaders/                # SPIR-V compute shaders
├── scripts/                # Build and validation scripts
└── docs/                   # Documentation
    ├── architecture/       # Design documents
    │   ├── OPTIMIZATION_SUMMARY.md
    │   ├── VULKAN_COMPARISON.md
    │   ├── ICD_SUCCESS.md
    │   └── COMPATIBILITY.md
    ├── benchmarks/         # Performance results
    │   └── BENCHMARK_RESULTS.md
    ├── qa/                 # Quality assurance
    │   ├── QA_REPORT.md
    │   ├── MINI_REVIEW.md
    │   └── TEST_RESULTS.md
    ├── EPIC.md             # Project epic and vision
    └── TODO.md             # Development roadmap
```

## 🛠️ Installation

### From crates.io
```bash
cargo add kronos-compute
```

[![Crates.io](https://img.shields.io/crates/v/kronos-compute.svg)](https://crates.io/crates/kronos-compute)
[![Documentation](https://docs.rs/kronos-compute/badge.svg)](https://docs.rs/kronos-compute)

### From Source

#### Prerequisites
- Rust 1.70 or later
- Vulkan SDK (for ICD loader and validation layers)
- A Vulkan-capable GPU with compute support
- Build tools (gcc/clang on Linux, Visual Studio on Windows, Xcode on macOS)
- (Optional) SPIR-V compiler (glslc or glslangValidator) for shader development

See [Development Setup Guide](docs/DEVELOPMENT_SETUP.md) for detailed installation instructions.

#### Build Steps
```bash
# Clone the repository
git clone https://github.com/LynnColeArt/kronos-compute
cd kronos-compute

# Build SPIR-V shaders (optional, pre-built shaders included)
./scripts/build_shaders.sh

# Build with optimizations enabled
cargo build --release --features implementation

# Run tests
cargo test --features implementation

# Run benchmarks
cargo bench --features implementation

# Run validation scripts
./scripts/validate_bench.sh      # Run all validation tests
./scripts/amd_bench.sh          # AMD-specific validation
```

## 📊 Benchmarks

Kronos includes comprehensive benchmarks for common compute workloads:

- **SAXPY**: Vector multiply-add operations (c = a*x + b)
- **Reduction**: Parallel array summation
- **Prefix Sum**: Parallel scan algorithm
- **GEMM**: Dense matrix multiplication (C = A * B)

Each benchmark tests multiple configurations:
- Sizes: 64KB (small), 8MB (medium), 64MB (large)
- Batch sizes: 1, 16, 256 dispatches
- Metrics: descriptor updates, barriers, CPU time, memory allocations

```bash
# Run specific benchmark
cargo bench --bench compute_workloads --features implementation

# Run with custom parameters
cargo bench --bench compute_workloads -- --warm-up-time 5 --measurement-time 10
```

## 🚀 Usage Example

### Safe Unified API (Recommended)

```rust
use kronos_compute::api::{ComputeContext, PipelineConfig, BufferBinding};

// No unsafe code needed!
let ctx = ComputeContext::new()?;

// Load shader and create pipeline
let shader = ctx.load_shader("compute.spv")?;
let pipeline = ctx.create_pipeline(&shader)?;

// Create buffers
let input = ctx.create_buffer(&data)?;
let output = ctx.create_buffer_uninit(size)?;

// Dispatch compute work
ctx.dispatch(&pipeline)
    .bind_buffer(0, &input)
    .bind_buffer(1, &output)
    .workgroups(1024, 1, 1)
    .execute()?;

// Read results
let results: Vec<f32> = output.read()?;
```

All optimizations work transparently through the safe API!

### Low-Level FFI (Advanced)

```rust
use kronos_compute::*;

unsafe {
    // Traditional Vulkan-style API also available
    initialize_kronos()?;
    let mut instance = VkInstance::NULL;
    vkCreateInstance(&create_info, ptr::null(), &mut instance);
    // ... etc
}
```

## 📈 Performance

Based on Mini's optimization targets:

| Metric | Baseline Vulkan | Kronos | Improvement |
|--------|----------------|---------|-------------|
| Descriptor updates/dispatch | 3-5 | 0 | 100% ⬇️ |
| Barriers/dispatch | 3 | ≤0.5 | 83% ⬇️ |
| CPU submit time | 100% | 50-70% | 30-50% ⬇️ |
| Memory allocations | Continuous | 0* | 100% ⬇️ |
| Structure size (avg) | 100% | 86.1% | 13.9% ⬇️ |

*After initial warm-up

## 🔧 Configuration

Kronos can be configured via environment variables:

- `KRONOS_ICD_SEARCH_PATHS`: Custom Vulkan ICD search paths
- `VK_ICD_FILENAMES`: Standard Vulkan ICD override
- `RUST_LOG`: Logging level (info, debug, trace)

### ICD Discovery Logging
Enable detailed logs to debug ICD discovery and loading:

```bash
RUST_LOG=kronos_compute=info,kronos_compute::implementation::icd_loader=debug cargo run
```

Logs include:
- Search paths scanned
- Each discovered manifest JSON
- Each library load attempt (as-provided and manifest-relative)
- Errors per candidate and the selected ICD summary

### ICD Selection
You can enumerate available ICDs and select one explicitly when creating a context.

- Enumerate programmatically:

```rust
use kronos_compute::implementation::icd_loader;
let icds = icd_loader::available_icds();
for (i, icd) in icds.iter().enumerate() {
    println!("[{i}] {} ({}), api=0x{:x}",
        icd.library_path.display(),
        if icd.is_software { "software" } else { "hardware" },
        icd.api_version);
}
```

- Select via `ContextBuilder`:

```rust
use kronos_compute::api;
let ctx = api::ComputeContext::builder()
    .prefer_icd_index(0)               // or .prefer_icd_path("/path/to/libvulkan_*.so")
    .build()?;
println!("Using ICD: {:?}", ctx.icd_info());
```

- Example CLI:

```bash
cargo run --example icd_select -- list
cargo run --example icd_select -- index 0
cargo run --example icd_select -- path /usr/lib/x86_64-linux-gnu/libvulkan_radeon.so
```

### Aggregated Mode (Experimental)
Aggregated mode exposes physical devices from multiple ICDs in a single instance and routes calls to the correct ICD by handle provenance.

- Enable:
```bash
KRONOS_AGGREGATE_ICD=1 RUST_LOG=kronos_compute=info,kronos_compute::implementation::icd_loader=debug cargo run
```

- Behavior:
  - `vkCreateInstance` creates a meta-instance wrapping per‑ICD instances.
  - `vkEnumeratePhysicalDevices` returns a combined list across all ICDs.
  - `vkCreateDevice` routes by the physical device’s owning ICD.
  - Subsequent queue, pool, command buffer and all `vkCmd*` calls route by handle.

- Caveats:
  - Experimental: Intended for orchestration and testing; API surface remains Vulkan-compatible, but behavior is meta-loader-like.
  - Performance: Routing adds a small handle→ICD lookup; negligible vs GPU work.
  - Diagnostics: enable debug logs for provenance and routing visibility.

### Windows CI / Headless Testing
- Linking: on Windows, linking to `vulkan-1` is opt-in. Set `KRONOS_LINK_VULKAN=1` if the Vulkan runtime is installed. CI uses direct ICD loading by default.
- Unit tests: run on `windows-latest` via `.github/workflows/windows.yml` without a GPU.
- Optional ICD tests: provide a software ICD (e.g., SwiftShader) and set:
  - `VK_ICD_FILENAMES` to the SwiftShader JSON path
  - `KRONOS_ALLOW_UNTRUSTED_LIBS=1` (if path is outside trusted prefixes)
  - `KRONOS_RUN_ICD_TESTS=1` to enable ignored tests
  - (Optional) `KRONOS_AGGREGATE_ICD=1` to test aggregated enumeration

### Security Notes (ICD Loading)
- Paths from `VK_ICD_FILENAMES` and discovery directories are canonicalized and validated.
- Libraries must resolve to regular files under trusted prefixes (Linux defaults: `/usr/lib`, `/usr/lib64`, `/usr/local/lib`, `/lib`, `/lib64`, `/usr/lib/x86_64-linux-gnu`).
- For development on non-standard locations, set `KRONOS_ALLOW_UNTRUSTED_LIBS=1` to override the trust policy (not recommended for production).

Runtime configuration through the API:
```rust
// Set timeline batch size
kronos::implementation::timeline_batching::set_batch_size(32)?;

// Configure memory pools
kronos::implementation::pool_allocator::set_slab_size(512 * 1024 * 1024)?;
```

## ⚡ How It Works

### Persistent Descriptors
Traditional Vulkan requires updating descriptor sets for each dispatch. Kronos pre-allocates all storage buffer descriptors in Set0 and uses push constants for parameters:

```rust
// Traditional: 3-5 descriptor updates per dispatch
vkUpdateDescriptorSets(device, 5, writes, 0, nullptr);
vkCmdBindDescriptorSets(cmd, COMPUTE, layout, 0, 1, &set, 0, nullptr);

// Kronos: 0 descriptor updates
vkCmdPushConstants(cmd, layout, COMPUTE, 0, 128, &params);
vkCmdDispatch(cmd, x, y, z);
```

### Smart Barriers
Kronos tracks buffer usage patterns and inserts only the minimum required barriers:

```rust
// Traditional: 3 barriers per dispatch
vkCmdPipelineBarrier(cmd, TRANSFER, COMPUTE, ...);  // upload→compute
vkCmdPipelineBarrier(cmd, COMPUTE, COMPUTE, ...);   // compute→compute  
vkCmdPipelineBarrier(cmd, COMPUTE, TRANSFER, ...);  // compute→download

// Kronos: ≤0.5 barriers per dispatch (automatic)
```

### Timeline Batching
Instead of submitting each command buffer individually:

```rust
// Traditional: N submits, N fences
for cmd in commands {
    vkQueueSubmit(queue, 1, &submit, fence);
}

// Kronos: 1 submit, 1 timeline semaphore
kronos::BatchBuilder::new(queue)
    .add_command_buffer(cmd1)
    .add_command_buffer(cmd2)
    .submit()?;
```

## 📚 Documentation

Comprehensive documentation is available in the `docs/` directory:

- **API Documentation**:
  - [Unified Safe API]docs/UNIFIED_API.md - 🆕 Safe, ergonomic Rust API (recommended)
  
- **Architecture**: Design decisions, optimization details, and comparisons
  - [Optimization Summary]docs/architecture/OPTIMIZATION_SUMMARY.md - Mini's 4 optimizations explained
  - [Vulkan Comparison]docs/architecture/VULKAN_COMPARISON.md - Differences from standard Vulkan
  - [ICD Integration]docs/architecture/ICD_SUCCESS.md - How Kronos integrates with existing drivers
  - [Troubleshooting]docs/TROUBLESHOOTING.md - Common issues and ICD loader diagnostics
  
- **Quality Assurance**: Test results and validation reports
  - [QA Report]docs/qa/QA_REPORT.md - Comprehensive validation for Sporkle integration
  - [Test Results]docs/qa/TEST_RESULTS.md - Unit and integration test details
  
- **Benchmarks**: Performance measurements and analysis
  - [Benchmark Results]docs/benchmarks/BENCHMARK_RESULTS.md - Detailed performance metrics

## 🤝 Contributing

Contributions are welcome! Areas of interest:

1. SPIR-V shader integration for benchmarks
2. Additional vendor-specific optimizations
3. Performance profiling on different GPUs
4. Safe wrapper API design
5. Documentation improvements

Please read our [Contributing Guide](CONTRIBUTING.md) for details.

## 🔐 Safety

This crate uses `unsafe` for FFI compatibility but provides safe abstractions where possible:

```rust
// Unsafe C-style API (required for compatibility)
let result = unsafe { 
    vkCreateBuffer(device, &info, ptr::null(), &mut buffer) 
};

// Safe Rust wrapper (future work)
let buffer = device.create_buffer(&info)?;
```

All unsafe functions include comprehensive safety documentation.

## 📦 Features

- `implementation` - Enable Kronos optimizations and ICD forwarding
- `validation` - Enable additional safety checks (default)
- `compare-ash` - Enable comparison benchmarks with ash

## 📝 Status

- ✅ Core implementation complete
- ✅ All optimizations integrated  
- ✅ ICD loader with Vulkan forwarding
- ✅ Comprehensive benchmark suite
- ✅ Basic examples working
- ✅ Published to crates.io (v0.1.0)
- ✅ C header generation
- ✅ SPIR-V shader build scripts
- ✅ Safe unified API (NEW!)
- ✅ Compute correctness fixed (1024/1024 correct results)
- ✅ Safety documentation complete (100% coverage)
- ✅ CI/CD pipeline with multi-platform testing
- ✅ Test suite expanded (46 tests passing)
- ⏳ Production testing

## 🗺️ Roadmap

### v0.2.0 (Q1 2025)
- NVIDIA & Intel GPU optimizations
- Multi-queue concurrent dispatch support
- Dynamic memory pool resizing
- Vulkan validation layer support

### v0.3.0 (Q2 2025)
- Enhanced Sporkle integration
- Advanced timeline semaphore patterns
- Ray query & cooperative matrix support
- Performance regression testing

### v1.0.0 (Q3 2025)
- Production-ready status
- Full Vulkan 1.3 compute coverage
- Platform-specific optimizations
- Enterprise support

See [TODO.md](TODO.md) for the complete roadmap and contribution opportunities.

## 🙏 Acknowledgments

- Mini (@notmini) for the groundbreaking optimization techniques
- The Vulkan community for driver support
- Contributors who helped port these optimizations to Rust

## 📜 License

This project is dual-licensed under MIT OR Apache-2.0. See [LICENSE-MIT](LICENSE-MIT) and [LICENSE-APACHE](LICENSE-APACHE) for details.

---

Built with ❤️ and 🦀 for maximum GPU compute performance.

## Citation

If you use Kronos in your research, please cite:

```bibtex
@software{kronoscompute2025,
  author = {Cole, Lynn},
  title = {Kronos Compute: A High-Performance Compute-Only Vulkan Implementation},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/LynnColeArt/kronos-compute}
}
```