cubek-std 0.2.0-pre.4

CubeK: Standard Library
Documentation

Discord Current Crates.io Version Minimum Supported Rust Version Test Status license


CubeK: high-performance multi-platform kernels in CubeCL

Algorithms

Algorithms Variants
Random bernoulli normal uniform
Quantization symmetric per-block per-tensor q2 q4 q8 fp4
Reduction mean sum prod max min arg[max|min] per-cube per-plane
Matmul mma unit tma multi-stage specialization ordered multi-rows
Convolution mma unit tma multi-stage im2col
Attention mma unit multi-rows

Contributing

If you want to contribute new kernels, please read the GUIDE.md.

Running tests

Note: This applies to most kernels, but reduce works slightly differently for now, see its README.

Test suites

Four test suites are available:

  • Light test suite: a tractable subset of representative tests that run on the CI.
  • Basic test suite: adds to light suite some tests that would be considered basic but may hang on CI (slow on CPU).
  • Extended test suite: usually auto-generated combinatorial tests covering many configurations. Good to run when developing kernels. Normally kept tractable.
  • Full test suite: all generable test combinations; may be too large to compile or run practically.

Run tests with

# Replace <runtime> with cpu, cuda, rocm, wgpu, vulkan or metal

# Basic test suite (light on cpu)
cargo test-<runtime>

# Extended test suite
cargo test-<runtime>-extended

# Full test suite
cargo test-<runtime>-full

Cube test mode

You can control test behavior by setting the CUBE_TEST_MODE environment variable.
For more details, see Test Mode.

Modes

  • CUBE_TEST_MODE=correct (default)
    Tests pass if results are numerically correct or if the kernel was launched with an invalid configuration.

    • Useful when tests are auto-generated from multiple parameter combinations, where some invalid configurations are expected.
    • Failing tests display only the first index with a discrepancy.
  • CUBE_TEST_MODE=strict
    Tests pass only if they compile, run, and produce numerically accurate results.

    • Ideal for debugging to avoid false positives that can occur in correct mode.
  • CUBE_TEST_MODE=printfail
    Similar to correct mode: tests pass if results are correct or if the kernel is invalid.

    • Failing tests show all tensor discrepancies.
    • Supports filtering, e.g.: CUBE_TEST_MODE=printfail:0,.,10-20 shows elements from the 0th first dimension, all of the second, and elements 10–20 in the third.
  • CUBE_TEST_MODE=printall
    All tests fail, displaying all tensor discrepancies.

    • Filtering works the same as in printfail.
  • CUBE_TEST_MODE=failifrun
    Only tests that compile and run will fail; others succeed.

    • Useful for tracking critical tests in large suites.