Module model_cache

Expand description

In-process model cache for GGUF files.

Avoids reloading model weights for each request by keeping a bounded set of ModelEntry values in a ModelCache. The cache uses LRU-like eviction (evict the entry with the longest idle time) when the slot limit is reached.

A companion ModelWarmup helper runs a small number of dummy inference passes on a freshly-loaded engine so that internal caches and JIT paths are primed before the first real request.

Structs§

ModelCache: Thread-safe in-process model cache.
ModelCacheConfig: Configuration for ModelCache.
ModelCacheStats: Snapshot of cache utilisation metrics, suitable for serialisation to JSON.
ModelEntry: A single cached model entry, storing metadata about a loaded model.
ModelWarmup: Runs a small number of dummy inference passes on a freshly-initialised InferenceEngine to prime internal allocation caches and JIT paths before the first real request arrives.

Module model_cache

Module model_cache Copy item path

Structs§

Module model_cache