Expand description
Autotune module
§Autotuning
Autotuning runs several candidate kernels on reference inputs and caches the fastest one per key.
ⓘ
#[derive(AutotuneKey)]
struct KernelKey { size: u32 }
fn run_kernel_tuned(lhs: Tensor, rhs: Tensor) -> Tensor {
static TUNER: LocalTuner<String, KernelKey> = local_tuner!();
let tunables = TUNER.init(|| {
TunableSet::new(KernelKey::new, |_key, (lhs, rhs)| (lhs.clone(), rhs.clone()))
.with(Tunable::new("k1", |(lhs, rhs)| kernel_1(lhs, rhs)))
.with(Tunable::new("k2", |(lhs, rhs)| kernel_2(lhs, rhs)))
});
TUNER.execute(&device_id, &lhs.client, tunables, (lhs, rhs));
}Kernels are closures returning Result<Out, impl Into<String>>. Multi-input kernels
take a single tuple argument and destructure: |(lhs, rhs, out)| body.
See [TuneInputs] for the borrowed-inputs story, and [Tunable::new] for why its
HRTB bound is spelled out directly (closure inference).
Macros§
- local_
tuner - Create a local tuner with the provided name.
Structs§
- Autotune
Outcome - The measured outcome for a given autotune invocation.
- Autotune
Result - The result of an autotune job.
- Clone
Input Generator InputGeneratorthat clones the reference inputs verbatim.- Local
Tuner - A local tuner allows to create a tuner for a specific key that can be different from the server key.
- Tunable
- A single candidate for autotune: a named
TuneFnplus the groups it belongs to. A tunable is autotuned whenever any of its groups is prioritized. - Tunable
Set - A set of candidate tunable functions for autotune, sharing a key generator and an
input generator. See
TuneInputsfor theFparameter. - TuneFn
- A named, type-erased tunable function stored in a
TunableSet. Constructed viaTunable::new; callers don’t name this type directly. - Tune
Group - A priority bucket for tunables, computed from the autotune key.
- Tuner
- Runs autotune benchmarks for a single device and caches the results.
Enums§
- Autotune
Error - Error from running autotune.
- Tune
Cache Result - Result of the cache try
Traits§
- Autotune
Key - Trait alias with support for persistent caching
- Autotune
Output - The trait to be implemented by an autotune output.
- Input
Generator - Produces the benchmark inputs for a given key and reference inputs.
- KeyGenerator
- Produces an autotune key from a borrowed view of the tuning inputs.
- Tune
Inputs - Describes the set of inputs a
TunableSetaccepts.
Functions§
- anchor
- Anchor a number to a power of the provided base.
- tune_
benchmark - Benchmark how long this operation takes for a number of samples.