Skip to main content

Module tune

Module tune 

Source
Expand description

Autotune module

§Autotuning

Autotuning runs several candidate kernels on reference inputs and caches the fastest one per key.

#[derive(AutotuneKey)]
struct KernelKey { size: u32 }

fn run_kernel_tuned(lhs: Tensor, rhs: Tensor) -> Tensor {
    static TUNER: LocalTuner<String, KernelKey> = local_tuner!();

    let tunables = TUNER.init(|| {
        TunableSet::new(KernelKey::new, |_key, (lhs, rhs)| (lhs.clone(), rhs.clone()))
            .with(Tunable::new("k1", |(lhs, rhs)| kernel_1(lhs, rhs)))
            .with(Tunable::new("k2", |(lhs, rhs)| kernel_2(lhs, rhs)))
    });

    TUNER.execute(&device_id, &lhs.client, tunables, (lhs, rhs));
}

Kernels are closures returning Result<Out, impl Into<String>>. Multi-input kernels take a single tuple argument and destructure: |(lhs, rhs, out)| body.

See [TuneInputs] for the borrowed-inputs story, and [Tunable::new] for why its HRTB bound is spelled out directly (closure inference).

Macros§

local_tuner
Create a local tuner with the provided name.

Structs§

AutotuneOutcome
The measured outcome for a given autotune invocation.
AutotuneResult
The result of an autotune job.
CloneInputGenerator
InputGenerator that clones the reference inputs verbatim.
LocalTuner
A local tuner allows to create a tuner for a specific key that can be different from the server key.
Tunable
A single candidate for autotune: a named TuneFn plus the groups it belongs to. A tunable is autotuned whenever any of its groups is prioritized.
TunableSet
A set of candidate tunable functions for autotune, sharing a key generator and an input generator. See TuneInputs for the F parameter.
TuneFn
A named, type-erased tunable function stored in a TunableSet. Constructed via Tunable::new; callers don’t name this type directly.
TuneGroup
A priority bucket for tunables, computed from the autotune key.
Tuner
Runs autotune benchmarks for a single device and caches the results.

Enums§

AutotuneError
Error from running autotune.
TuneCacheResult
Result of the cache try

Traits§

AutotuneKey
Trait alias with support for persistent caching
AutotuneOutput
The trait to be implemented by an autotune output.
InputGenerator
Produces the benchmark inputs for a given key and reference inputs.
KeyGenerator
Produces an autotune key from a borrowed view of the tuning inputs.
TuneInputs
Describes the set of inputs a TunableSet accepts.

Functions§

anchor
Anchor a number to a power of the provided base.
tune_benchmark
Benchmark how long this operation takes for a number of samples.