iqdb-filter 0.3.0

Canonical metadata-filter evaluator for vector search: validate-on-construction and infallible per-row evaluation - part of the iQDB family.
Documentation
# iqdb-filter -- Roadmap

> Path from scaffold to a stable 1.0. Hard parts are front-loaded; each phase has hard exit criteria.
>
> **Anti-deferral rule:** no listed hard task moves to a later phase unless this file records the move and the reason.

---

## v0.1.0 -- Scaffold (DONE)

Compiles, CI green, structure correct, no domain logic.

- [x] Manifest, README, CHANGELOG, REPS, license, CI, lints in place.
- [x] API surface sketched in `docs/API.md`.

---

## v0.2.0 -- Canonical filter evaluator (DONE)

The hard part of the crate's core: one evaluator that decides what a `Filter`
means, shared by every consumer so the semantics cannot drift.

- [x] `FilterEvaluator` -- validation-on-construction (depth + `In` caps,
      iterative so `new` cannot overflow) and an infallible, allocation-free
      per-row `evaluate`.
- [x] Closed-world null / absent-field semantics, including the
      `Neq(absent)` / `Not(Eq(absent))` distinction.
- [x] `FilterStrategy` vocabulary enum (`#[non_exhaustive]`, variants only).
- [x] `MAX_FILTER_DEPTH` / `MAX_IN_VALUES` public caps; `VERSION`.
- [x] Every public item has rustdoc + a runnable example.
- [x] Core invariants property-tested (`tests/properties.rs`) and pinned as
      conformance integration tests (`tests/conformance.rs`).
- [x] `criterion` bench on the validation and evaluation paths.

**Deferral recorded (anti-deferral rule).** The original plan paired the
evaluator with a *selectivity estimate* in this phase. It moved to v0.3.0
below, because bundling it with the evaluator added surface no consumer
exercised yet. A placeholder estimate ("0.5 always") would have been worse than
none -- it would teach a selector to make confident wrong choices.

---

## v0.3.0 -- Strategy selection (DONE)

The `FilterStrategy` vocabulary becomes actionable, without an inverted index.

- [x] `FilterEvaluator::prefilter` / `postfilter` -- lazy, allocation-free scan
      adapters over `(key, metadata)` pairs, realising the `PreFilter` /
      `PostFilter` shapes. Shaped to the real `iqdb-flat` consumption pattern.
- [x] `estimate_selectivity(&FilterEvaluator) -> f64` -- a **structural**
      estimate (filter shape only, no data) in `[0.0, 1.0]`. Takes a validated
      evaluator, so the walk is depth-bounded.
- [x] `choose_strategy` (Tier 1) + `StrategySelector` (Tier 2, tunable
      `prefilter_threshold`, immutable builder) resolving `PreFilter` /
      `PostFilter`; `DEFAULT_PREFILTER_THRESHOLD = 0.5`.
- [x] New surface property-tested (selectivity is always a probability;
      `prefilter` matches the manual filter; the selector never yields
      `Auto` / `InFilter`) and benchmarked.

**Scope note (anti-deferral rule).** The selectivity estimate here is
deliberately *structural*. An **index-backed** estimate (histograms / sketches)
and the inverted `MetadataIndex` it reads stay deferred below -- the threshold
the selector compares against is tunable precisely because the structural
estimate is coarse. This is the "pre/post + heuristic auto" scope, chosen over
building speculative index machinery with no consumer.

---

## Deferred until the first approximate-index consumer

Everything below was considered for the 0.x core and dropped because it has no
real consumer yet (`iqdb-flat`, the only metadata-aware index today, pre-filters
with a row scan). Each lands when `iqdb-hnsw` / `iqdb-ivf` start honouring
filters and the cost model actually changes -- not before.

### Inverted `MetadataIndex` (opt-in per field)

A per-field index from `Value` to a set of vector ids, so a selective predicate
can hand the index a candidate set instead of scanning every row. Per-field
opt-in (indexing every field inflates resident memory and write amplification),
and gated on a string-cap policy in `iqdb-types` for the unbounded metadata
surface.

### Index-backed selectivity estimate

A data-driven upgrade to the structural `estimate_selectivity` shipped in
v0.3.0: `selectivity_estimate(&Filter, &MetadataIndex) -> f64` reading real
sources -- per-field histograms for ordered domains, sketches (HyperLogLog /
count-min) for high-cardinality and membership domains. Only worth building once
the `MetadataIndex` above exists; until then the structural estimate plus a
tunable threshold is the honest surface.

### Filter pushdown into graph traversal

`FilterStrategy::InFilter`: prune HNSW / IVF branches that provably cannot
produce a surviving candidate, without collapsing recall. Requires
`MetadataIndex` co-design and a clear story for how pruning interacts with the
approximate index's recall guarantees.

---

## Toward 1.0

- **Index phase** -- the deferred items above (inverted `MetadataIndex`,
  index-backed selectivity, `InFilter` pushdown), in order, each with tests and
  benchmarks where it is a hot path, once a real approximate-index consumer
  drives them. Feature freeze declared at the end (no `todo!` /
  `unimplemented!`).
- **API freeze** -- public API frozen and recorded here; `cargo audit` +
  `cargo deny` clean.
- **Alpha / Beta / RC (0.6.x -> 0.9.x)** -- integrate against real consumers
  (MINOR-compatible additions only), broaden testing, final benchmarks, doc
  polish.

## v1.0.0 -- Stable

- [ ] Definition of Done (DIRECTIVES section 7) satisfied.
- [ ] Public API frozen until 2.0.
- [ ] Release note written; published to crates.io; tag pushed.

---

## Out of scope for 1.0

- The `Filter` type itself -- defined in `iqdb-types`.
- A full query language -- expression evaluation only.
- A different evaluator algorithm. The recursive walker is simple, fast on the
  bounded-depth filters this crate accepts, and correct. Replacing it with a
  bytecode compiler or JIT is speculative until a profile shows it costs
  anything.
- `Filter` rewriting / canonicalisation (constant-fold, push `Not` down,
  reorder `And` children by selectivity) -- all depend on the deferred
  selectivity estimates; premature without them.