CubeK: high-performance multi-platform kernels in CubeCL
Algorithms
| Algorithms | Variants |
|---|---|
| Random | bernoulli normal uniform |
| Quantization | symmetric per-block per-tensor q2 q4 q8 fp4 |
| Reduction | mean sum prod max min arg[max|min] per-cube per-plane |
| Matmul | mma unit tma multi-stage specialization ordered multi-rows |
| Convolution | mma unit tma multi-stage im2col |
| Attention | mma unit multi-rows |
Contributing
If you want to contribute new kernels, please read the GUIDE.md.
Running tests
Note: This applies to most kernels, but
reduceworks slightly differently for now, see its README.
Test suites
Four test suites are available:
- Light test suite: a tractable subset of representative tests that run on the CI.
- Basic test suite: adds to light suite some tests that would be considered basic but may hang on CI (slow on CPU).
- Extended test suite: usually auto-generated combinatorial tests covering many configurations. Good to run when developing kernels. Normally kept tractable.
- Full test suite: all generable test combinations; may be too large to compile or run practically.
Run tests with
# Replace <runtime> with cpu, cuda, rocm, wgpu, vulkan or metal
# Basic test suite (light on cpu)
# Extended test suite
# Full test suite
Cube test mode
You can control test behavior by setting the CUBE_TEST_MODE environment variable.
For more details, see Test Mode.
Modes
-
CUBE_TEST_MODE=correct(default)
Tests pass if results are numerically correct or if the kernel was launched with an invalid configuration.- Useful when tests are auto-generated from multiple parameter combinations, where some invalid configurations are expected.
- Failing tests display only the first index with a discrepancy.
-
CUBE_TEST_MODE=strict
Tests pass only if they compile, run, and produce numerically accurate results.- Ideal for debugging to avoid false positives that can occur in
correctmode.
- Ideal for debugging to avoid false positives that can occur in
-
CUBE_TEST_MODE=printfail
Similar tocorrectmode: tests pass if results are correct or if the kernel is invalid.- Failing tests show all tensor discrepancies.
- Supports filtering, e.g.:
CUBE_TEST_MODE=printfail:0,.,10-20shows elements from the 0th first dimension, all of the second, and elements 10–20 in the third.
-
CUBE_TEST_MODE=printall
All tests fail, displaying all tensor discrepancies.- Filtering works the same as in
printfail.
- Filtering works the same as in
-
CUBE_TEST_MODE=failifrun
Only tests that compile and run will fail; others succeed.- Useful for tracking critical tests in large suites.