cubecl_runtime/tune/mod.rs
1//! # Autotuning
2//!
3//! Autotuning allows running different kernels or comptime parameters to find the fastest one
4//! for any given input. Kernels must implement [`Tunable`](crate::tune::Tunable) (see below).
5//!
6//! # Example
7//!
8//! ```ignore
9//! #[derive(AutotuneKey)]
10//! struct KernelKey {
11//! size: u32
12//! }
13//!
14//! fn run_kernel_tuned(lhs: Tensor, rhs: Tensor) -> Tensor {
15//! static TUNER: LocalTuner<String, KernelKey> = local_tuner!();
16//!
17//! let tunables = TunableSet::new(KernelKey::new, |_key, lhs, rhs| (lhs.clone(), rhs.clone()))
18//! .with_tunable(kernel_1)
19//! .with_tunable(kernel_2.ok())
20//! .with_tunable(kernel_3);
21//!
22//! TUNER.execute("hello".to_string(), &lhs.client, &tunables, (lhs, rhs));
23//! }
24//! ```
25//!
26//! # Tunable
27//!
28//! [`Tunable`](crate::tune::Tunable) is implemented automatically for all functions and closures
29//! that take a set of cloneable inputs, and return a `Result<Out, impl Into<AutotuneError>>`. If the
30//! kernel does not return a [`Result`], use `kernel_fn.ok()` to wrap it in `Ok` and turn it into a
31//! tunable.
32//!
33//! ## Implementation details
34//!
35//! To implement `Tunable` for all valid tunable functions, a set of patterns is employed.
36//! Tunable functions don't directly implement `Tunable`, they implement `IntoTunable` instead. The
37//! reason for this is that the Rust trait resolver can't detect that traits like `Fn(A, B)`
38//! and `Fn(A)` are mutually exclusive. This means trying to implement `Tunable` for both would
39//! cause conflicting implementations. To solve this problem, a `Marker` generic is employed, that
40//! stores a dummy type (like `IsFunction`), along with the equivalent function pointer of the
41//! signature (which is a type, not a trait), allowing the trait resolver to correctly identify
42//! the implementations as distinct. However, since different kinds of `Tunable` will have different
43//! `Marker` generics, the `IntoTunable` trait is needed to erase the marker.
44//! This way, only [`TunableSet::with_tunable`](crate::tune::TunableSet::with_tunable) requires the
45//! marker as a generic, which it then erases by calling
46//! [`IntoTunable::into_tunable`](crate::tune::IntoTunable::into_tunable).
47//! The same technique is used for [`KeyGenerator`](crate::tune::KeyGenerator) and
48//! [`InputGenerator`](crate::tune::InputGenerator).
49//!
50//! The last set of traits are [`AsFunctionTunable`](crate::tune::AsFunctionTunable) and
51//! [`AsFunctionTunableResult`](crate::tune::AsFunctionTunableResult). These traits are directly
52//! implemented by all tunable functions and allow us to annotate function-like
53//! tunables specifically, to allow things like overriding the name, wrapping the return type in
54//! `Ok` ([`AsFunctionTunable::ok`](crate::tune::AsFunctionTunable::ok)), and other things. They also help with error messages. This is
55//! done by using [`#[diagnostic::on_unimplemented(...)]`](https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-diagnosticon_unimplemented-attribute).
56
57mod function_tunable;
58mod input_generator;
59mod key_generator;
60mod local;
61mod operation;
62mod tune_benchmark;
63mod tune_cache;
64mod tuner;
65mod util;
66
67pub use function_tunable::*;
68pub use input_generator::*;
69pub use key_generator::*;
70pub use local::*;
71pub use operation::*;
72pub use tune_benchmark::AutotuneOutput;
73pub use tune_benchmark::*;
74pub use tune_cache::*;
75pub use tuner::*;
76pub use util::*;