Skip to main content

Module eval_harness

Module eval_harness

Expand description

Retrieval evaluation harness for lean-ctx hybrid search.

Runs a standardized query→expected_file benchmark to measure Recall@k, MRR (Mean Reciprocal Rank), and latency. Outputs NDJSON scorecards.

Usage: lean-ctx benchmark --eval [path]

Structs§

CategoryScore
EvalQuery
EvalResult
EvalScorecard

Functions§

generate_self_eval: Generate self-eval queries from an indexed codebase. Picks random symbols/files and constructs retrieval queries.
run_eval: Run evaluation using the full hybrid search pipeline (BM25 + embeddings + SPLADE). Falls back to BM25-only if embeddings are not available.