cubecl_runtime/tune/mod.rs
1//! # Autotuning
2//!
3//! Autotuning allows running different kernels or comptime parameters to find the fastest one
4//! for any given input. Kernels must implement [`TuneFn`](crate::tune::TuneFn) (see below).
5//!
6//! # Example
7//!
8//! ```ignore
9//! #[derive(AutotuneKey)]
10//! struct KernelKey {
11//! size: u32
12//! }
13//!
14//! fn run_kernel_tuned(lhs: Tensor, rhs: Tensor) -> Tensor {
15//! static TUNER: LocalTuner<String, KernelKey> = local_tuner!();
16//!
17//! let tunables = TUNER.init(|| {
18//! TunableSet::new(KernelKey::new, |_key, lhs, rhs| (lhs.clone(), rhs.clone()))
19//! .with(Tunable::new(kernel_1))
20//! .with(Tunable::new(kernel_2.ok()))
21//! .with(Tunable::new(kernel_3)
22//! });
23//!
24//! TUNER.execute("hello".to_string(), &lhs.client, &tunables, (lhs, rhs));
25//! }
26//! ```
27//!
28//! # Tunable
29//!
30//! [`TuneFn`](crate::tune::TuneFn) is implemented automatically for all functions and closures
31//! that take a set of cloneable inputs, and return a `Result<Out, impl Into<AutotuneError>>`. If the
32//! kernel does not return a [`Result`], use `kernel_fn.ok()` to wrap it in `Ok` and turn it into a
33//! tunable.
34//!
35//! ## Implementation details
36//!
37//! To implement `TuneFn` for all valid tunable functions, a set of patterns is employed.
38//! TuneFn functions don't directly implement `TuneFn`, they implement `IntoTuneFn` instead. The
39//! reason for this is that the Rust trait resolver can't detect that traits like `Fn(A, B)`
40//! and `Fn(A)` are mutually exclusive. This means trying to implement `TuneFn` for both would
41//! cause conflicting implementations. To solve this problem, a `Marker` generic is employed, that
42//! stores a dummy type (like `IsFunction`), along with the equivalent function pointer of the
43//! signature (which is a type, not a trait), allowing the trait resolver to correctly identify
44//! the implementations as distinct. However, since different kinds of `TuneFn` will have different
45//! `Marker` generics, the `IntoTuneFn` trait is needed to erase the marker.
46//! This way, only [`Tunable::new`](crate::tune::Tunable::new) requires the
47//! marker as a generic, which it then erases by calling
48//! [`IntoTuneFn::into_tunable`](crate::tune::IntoTuneFn::into_tunable).
49//! The same technique is used for [`KeyGenerator`](crate::tune::KeyGenerator) and
50//! [`InputGenerator`](crate::tune::InputGenerator).
51//!
52//! The last set of traits are [`AsFunctionTunable`](crate::tune::AsFunctionTunable) and
53//! [`AsFunctionTunableResult`](crate::tune::AsFunctionTunableResult). These traits are directly
54//! implemented by all tunable functions and allow us to annotate function-like
55//! tunables specifically, to allow things like overriding the name, wrapping the return type in
56//! `Ok` ([`AsFunctionTunable::ok`](crate::tune::AsFunctionTunable::ok)), and other things. They also help with error messages. This is
57//! done by using [`#[diagnostic::on_unimplemented(...)]`](https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-diagnosticon_unimplemented-attribute).
58
59mod base;
60mod function_tunable;
61mod input_generator;
62mod key_generator;
63mod local;
64mod operation;
65mod tune_benchmark;
66mod tune_cache;
67mod tuner;
68mod util;
69
70pub use base::*;
71pub use function_tunable::*;
72pub use input_generator::*;
73pub use key_generator::*;
74pub use local::*;
75pub use operation::*;
76pub use tune_benchmark::AutotuneOutput;
77pub use tune_benchmark::*;
78pub use tune_cache::*;
79pub use tuner::*;
80pub use util::*;