Module compile_cache

Expand description

Shape-bucketed compile cache.

Lets variable-shape callers (e.g., embedding-model wrappers that vary batch + seq per request) amortize the per-(shape) compile cost. Cache keys are caller-provided u64s — the caller decides what counts as a shape bucket. Typical recipe: (batch as u64) << 32 | seq as u64.

The cache stores one CompiledGraph per key. Params loaded onto a cached entry persist for that entry — re-fetching from cache does not require re-running set_param. Eviction is FIFO, capped at capacity entries (good enough for the current “a handful of common shapes” usage pattern; switch to LRU if a real workload shows churn).

§Example

let mut cache = CompileCache::new(Device::Metal, 8);
let key = ((batch as u64) << 32) | seq as u64;
let mut compiled = cache.get_or_compile(key, || build_my_graph(batch, seq));
// First call for `key`: compiles. Subsequent calls: cache hit.
compiled.run(&[("x", &input_data)]);

Structs§

BucketedCompileCache
CacheRunInput: Named runtime input for BucketedCompileCache::run_padded_mixed.
CompileCache
DynamicDimCompileCache: Compile-once / specialize-at-runtime cache for symbolic HIR modules.

Functions§

pad_rows: Pad data (interpreted as [actual, inner] row-major) up to upper rows by appending zeros. Returns a Vec<f32> of length upper * inner. Companion of slice_rows for the “compile at max, run at less” workflow with BucketedCompileCache.
slice_rows: Slice data (interpreted as [upper, inner] row-major) down to actual rows. Companion of pad_rows.