Skip to main content

Module search_plan

Module search_plan 

Source
Expand description

Quantization-aware Search Plan

This module provides a formal runtime plan for vector search that separates policy (what to optimize for) from mechanism (how to execute).

§Architecture

SearchRequest + SLA → Planner → SearchPlan → Executor → Results
                         ↑
                   Cost Model + Statistics

§Policy vs Mechanism

Policy (what to optimize):

  • Target recall@k (e.g., 0.95)
  • Latency budget (e.g., 5ms p99)
  • Token/compute budget

Mechanism (how to execute):

  • BPS coarse scan parameters
  • PQ scoring parameters
  • Rerank depth and method
  • ef_search value
  • Filter evaluation order

§Cost Model

The planner uses measured per-stage costs:

  • cost_bps(N, D) = N × D × c_bps
  • cost_pq(N, D, M) = N × M × c_pq
  • cost_rerank(N, D) = N × D × c_f32

§Optimization

Minimize expected latency subject to:

  • recall@k ≥ target_recall
  • total_cost ≤ budget

Uses bandit-like adaptation based on recent query statistics.

Structs§

CostModel
Cost model parameters (calibrated per hardware).
DatasetStats
Statistics about the dataset for planning.
PipelineStage
A single stage in the search pipeline.
PlanExecutor
Plan executor that runs a search plan.
SearchPlan
The search plan: a complete specification for executing a search.
SearchPlanner
Search planner that generates optimal plans.
SearchSLA
Service Level Agreement for search.
StageCosts
Per-stage cost measurements.

Enums§

OptimizationMode
Optimization mode for the planner.
PlanError
Plan validation errors.
StageQuantLevel
Quantization level for a pipeline stage.