Module semantic_cache

Expand description

Semantic caching layer for LLM inference.

Returns cached responses for semantically similar queries (above a cosine similarity threshold), avoiding redundant model inference. The cache uses TF-IDF embeddings and cosine similarity for semantic matching, with LRU-style eviction and TTL-based expiry.

§Example

use oxibonsai_runtime::semantic_cache::{CachedInference, SemanticCacheConfig};

let config = SemanticCacheConfig::default();
let ci = CachedInference::new(config);

let (response, was_hit) = ci.run_or_cache(
    "What is Rust programming language?",
    || "Rust is a systems programming language focused on safety.".to_string(),
);
assert!(!was_hit);

let (response2, was_hit2) = ci.run_or_cache(
    "Tell me about the Rust language",
    || "Rust is a memory-safe systems language.".to_string(),
);
// May or may not be a hit depending on similarity
let _ = (response2, was_hit2);

Structs§

CachedInference: Middleware wrapper that checks the semantic cache before running inference.
CachedResponse: A cached LLM response returned on a semantic cache hit.
SemanticCache: Semantic cache using TF-IDF embeddings and cosine similarity.
SemanticCacheConfig: Configuration for semantic caching.
SemanticCacheStats: Statistics about the cache, suitable for monitoring and dashboards.

Module semantic_cache

Module semantic_cache Copy item path

§Example

Structs§

Module semantic_cache