Module strategy

Module strategy 

Source
Expand description

Filter strategy selection for EdgeVec.

Determines how filtering integrates with HNSW search based on estimated selectivity and configured parameters.

§Strategy Selection

SelectivityStrategyRationale
>80%PreFilterMost vectors pass; scan all, then search subset
<5%PostFilterFew pass; oversample heavily, filter results
5%-80%HybridAdaptive oversample based on estimated selectivity

§Theoretical Basis

Strategy selection is based on the cost model derived from vector database literature (Milvus, Weaviate, Qdrant engineering blogs):

§PreFilter Threshold (80%)

When selectivity exceeds 80%, the cost equation favors pre-filtering:

Cost_PreFilter = O(N) + O(log(N_filtered))
Cost_PostFilter = O(log(N)) + O(k × oversample)

When selectivity > 0.8:
  N_filtered ≈ 0.8N, so PreFilter cost ≈ O(N) + O(log(0.8N))
  PostFilter would need oversample ≈ 1/0.8 = 1.25x (low benefit)
  → Full metadata scan amortizes well when most vectors match

§PostFilter Threshold (5%)

When selectivity drops below 5%, oversample approaches practical limits:

Required oversample ≈ 1 / selectivity
At 5% selectivity: oversample = 20x (exceeds MAX_OVERSAMPLE)
At 10% selectivity: oversample = 10x (at MAX_OVERSAMPLE cap)
At 3% selectivity: oversample = 33x (capped, recall degradation)

→ Below 5%, aggressive post-filtering with max oversample is used
→ Recall may degrade; user should consider PreFilter for precision

§Oversample Calculation

The oversample factor is derived from the probability model:

P(finding k matches in k×o candidates) = 1 - (1-s)^(k×o)

For 95% confidence with selectivity s:
  oversample ≈ min(1/s, MAX_OVERSAMPLE)

This ensures high probability of finding k matching vectors
while bounding latency via MAX_OVERSAMPLE cap.

§References

  • Milvus: “Filtered Vector Search” technical blog
  • Qdrant: “Payload Indexing” documentation
  • Weaviate: “Filters in Vector Search” engineering notes

§Example

use edgevec::filter::strategy::{FilterStrategy, select_strategy, calculate_oversample};

// High selectivity (90% pass) -> PreFilter
assert_eq!(select_strategy(0.9), FilterStrategy::PreFilter);

// Low selectivity (3% pass) -> PostFilter with high oversample
let strategy = select_strategy(0.03);
assert!(matches!(strategy, FilterStrategy::PostFilter { .. }));

// Medium selectivity (30% pass) -> Hybrid
let strategy = select_strategy(0.3);
assert!(matches!(strategy, FilterStrategy::Hybrid { .. }));

Structs§

SelectivityEstimate
Selectivity estimation result.

Enums§

FilterStrategy
Strategy for combining filtering with HNSW search.

Constants§

DEFAULT_OVERSAMPLE
Default oversample when selectivity is unknown.
EF_CAP
Absolute cap on ef_search to bound latency.
MAX_OVERSAMPLE
Maximum oversample factor to prevent ef explosion.
POSTFILTER_THRESHOLD
Selectivity threshold below which post-filter is sufficient.
PREFILTER_THRESHOLD
Selectivity threshold above which pre-filter is preferred.
SELECTIVITY_SAMPLE_SIZE
Minimum sample size for selectivity estimation.

Traits§

MetadataStore
Trait for accessing vector metadata during selectivity estimation.

Functions§

calculate_oversample
Calculate oversample factor from selectivity.
estimate_filter_selectivity
Estimates filter selectivity based on AST structure (W26.3.1).
estimate_selectivity
Estimate filter selectivity by sampling random vectors.
is_contradiction
Detect contradictory filters (always false).
is_tautology
Detect tautological filters (always true).
overfetch_from_selectivity
Calculate overfetch factor from selectivity (W26.3.1).
select_strategy
Select strategy based on estimated selectivity.