Crate memvid_ask_model

Modules§

cache: Cache for LLM answers to avoid redundant API calls Uses Blake3 hash of (query + context) as the key

calculate_cost: Calculate cost for a given model based on token usage. Prices are per 1M tokens in USD (December 2025 pricing).
extract_entities: Extract entities from text using an LLM
generate_search_query: Generate optimized search keywords from a question using LLM Returns the original question plus extracted search terms for better retrieval
postprocess_answer: Post-process the LLM answer for quality
run_model_inference
verify_grounding: Verify how well the answer is grounded in the provided context. Returns a GroundingResult with a score (0.0 to 1.0) indicating how well the answer is supported by the context.