Expand description
Filter strategy selection for EdgeVec.
Determines how filtering integrates with HNSW search based on estimated selectivity and configured parameters.
§Strategy Selection
| Selectivity | Strategy | Rationale |
|---|---|---|
| >80% | PreFilter | Most vectors pass; scan all, then search subset |
| <5% | PostFilter | Few pass; oversample heavily, filter results |
| 5%-80% | Hybrid | Adaptive oversample based on estimated selectivity |
§Theoretical Basis
Strategy selection is based on the cost model derived from vector database literature (Milvus, Weaviate, Qdrant engineering blogs):
§PreFilter Threshold (80%)
When selectivity exceeds 80%, the cost equation favors pre-filtering:
Cost_PreFilter = O(N) + O(log(N_filtered))
Cost_PostFilter = O(log(N)) + O(k × oversample)
When selectivity > 0.8:
N_filtered ≈ 0.8N, so PreFilter cost ≈ O(N) + O(log(0.8N))
PostFilter would need oversample ≈ 1/0.8 = 1.25x (low benefit)
→ Full metadata scan amortizes well when most vectors match§PostFilter Threshold (5%)
When selectivity drops below 5%, oversample approaches practical limits:
Required oversample ≈ 1 / selectivity
At 5% selectivity: oversample = 20x (exceeds MAX_OVERSAMPLE)
At 10% selectivity: oversample = 10x (at MAX_OVERSAMPLE cap)
At 3% selectivity: oversample = 33x (capped, recall degradation)
→ Below 5%, aggressive post-filtering with max oversample is used
→ Recall may degrade; user should consider PreFilter for precision§Oversample Calculation
The oversample factor is derived from the probability model:
P(finding k matches in k×o candidates) = 1 - (1-s)^(k×o)
For 95% confidence with selectivity s:
oversample ≈ min(1/s, MAX_OVERSAMPLE)
This ensures high probability of finding k matching vectors
while bounding latency via MAX_OVERSAMPLE cap.§References
- Milvus: “Filtered Vector Search” technical blog
- Qdrant: “Payload Indexing” documentation
- Weaviate: “Filters in Vector Search” engineering notes
§Example
use edgevec::filter::strategy::{FilterStrategy, select_strategy, calculate_oversample};
// High selectivity (90% pass) -> PreFilter
assert_eq!(select_strategy(0.9), FilterStrategy::PreFilter);
// Low selectivity (3% pass) -> PostFilter with high oversample
let strategy = select_strategy(0.03);
assert!(matches!(strategy, FilterStrategy::PostFilter { .. }));
// Medium selectivity (30% pass) -> Hybrid
let strategy = select_strategy(0.3);
assert!(matches!(strategy, FilterStrategy::Hybrid { .. }));Structs§
- Selectivity
Estimate - Selectivity estimation result.
Enums§
- Filter
Strategy - Strategy for combining filtering with HNSW search.
Constants§
- DEFAULT_
OVERSAMPLE - Default oversample when selectivity is unknown.
- EF_CAP
- Absolute cap on ef_search to bound latency.
- MAX_
OVERSAMPLE - Maximum oversample factor to prevent ef explosion.
- POSTFILTER_
THRESHOLD - Selectivity threshold below which post-filter is sufficient.
- PREFILTER_
THRESHOLD - Selectivity threshold above which pre-filter is preferred.
- SELECTIVITY_
SAMPLE_ SIZE - Minimum sample size for selectivity estimation.
Traits§
- Metadata
Store - Trait for accessing vector metadata during selectivity estimation.
Functions§
- calculate_
oversample - Calculate oversample factor from selectivity.
- estimate_
filter_ selectivity - Estimates filter selectivity based on AST structure (W26.3.1).
- estimate_
selectivity - Estimate filter selectivity by sampling random vectors.
- is_
contradiction - Detect contradictory filters (always false).
- is_
tautology - Detect tautological filters (always true).
- overfetch_
from_ selectivity - Calculate overfetch factor from selectivity (W26.3.1).
- select_
strategy - Select strategy based on estimated selectivity.