Expand description
Retrieval evaluation harness for lean-ctx hybrid search.
Runs a standardized query→expected_file benchmark to measure Recall@k, MRR (Mean Reciprocal Rank), and latency. Outputs NDJSON scorecards.
Usage: lean-ctx benchmark --eval [path]
Structs§
Functions§
- generate_
self_ eval - Generate self-eval queries from an indexed codebase. Picks random symbols/files and constructs retrieval queries.
- run_
eval - Run evaluation using the full hybrid search pipeline (BM25 + embeddings + SPLADE). Falls back to BM25-only if embeddings are not available.