Skip to main content

Module semantic_cache

Module semantic_cache 

Source
Expand description

Semantic caching layer for LLM inference.

Returns cached responses for semantically similar queries (above a cosine similarity threshold), avoiding redundant model inference. The cache uses TF-IDF embeddings and cosine similarity for semantic matching, with LRU-style eviction and TTL-based expiry.

§Example

use oxibonsai_runtime::semantic_cache::{CachedInference, SemanticCacheConfig};

let config = SemanticCacheConfig::default();
let ci = CachedInference::new(config);

let (response, was_hit) = ci.run_or_cache(
    "What is Rust programming language?",
    || "Rust is a systems programming language focused on safety.".to_string(),
);
assert!(!was_hit);

let (response2, was_hit2) = ci.run_or_cache(
    "Tell me about the Rust language",
    || "Rust is a memory-safe systems language.".to_string(),
);
// May or may not be a hit depending on similarity
let _ = (response2, was_hit2);

Structs§

CachedInference
Middleware wrapper that checks the semantic cache before running inference.
CachedResponse
A cached LLM response returned on a semantic cache hit.
SemanticCache
Semantic cache using TF-IDF embeddings and cosine similarity.
SemanticCacheConfig
Configuration for semantic caching.
SemanticCacheStats
Statistics about the cache, suitable for monitoring and dashboards.