Expand description
§Cost Model and Per-Query Budgets (Task 1)
This module provides an explicit cost model with enforceable per-query budgets to stabilize p99 latency under load while preserving recall targets.
§Architecture
The cost model is “bytes-moved first” and enforces runtime limits on:
- RAM bytes scanned for candidate generation
- SSD random reads allowed in hot path (ideally 0)
- SSD sequential bytes allowed for rerank batching
- CPU cycles spent in routing/scan
§Math/Algorithm
Constrained optimization: minimize E[bytes scanned] subject to:
- P(recall@k ≥ ρ) ≥ 1−δ
- p99 ≤ T
Convert latency SLA into budgets:
- Bytes ≤ BW_eff · T
- RandomIO ≤ ⌊T / L_io⌋
§Usage
ⓘ
use sochdb_vector::cost_model::{QueryBudget, CostTracker, AdmissionController};
// Define budget for query class
let budget = QueryBudget::new("high_recall")
.ram_bytes(16 * 1024 * 1024) // 16 MB RAM scan
.ssd_random_reads(0) // No random reads in hot path
.ssd_sequential_bytes(4 * 1024 * 1024) // 4 MB sequential for rerank
.cpu_cycles(1_000_000_000); // ~1B cycles
// Track costs during query execution
let mut tracker = CostTracker::new(budget);
tracker.add_ram_bytes(1024);
if tracker.is_exhausted() {
// Return best-known results under budget
}Structs§
- Admission
Controller - Admission controller for backpressure under concurrency
- Admission
Metrics - Metrics from admission controller
- Admission
Ticket - Handle returned when a query is admitted
- Cost
Summary - Summary of cost consumption
- Cost
Tracker - Tracks resource consumption during query execution
- Cost
Utilization - Resource utilization ratios
- Hardware
Profile - Hardware characteristics for SLA-to-budget conversion
- Query
Budget - Per-query budget limits derived from SLA targets
- Query
Class Registry - Registry of query classes with their budgets
Enums§
- Budget
Exhaustion Reason - Reason why budget was exhausted