Expand description
Downstream task evaluation framework for search quality.
Measures how well the search pipeline supports actual coding tasks:
- Retrieval precision/recall against known-relevant chunks
- Mean Reciprocal Rank (MRR) for expected top results
- Normalized Discounted Cumulative Gain (nDCG)
Designed to compare BM25-only vs hybrid search and track quality over time.
Structs§
- Eval
Query - A single evaluation query with expected relevant results.
- Eval
Report - Result of evaluating a search system against a query set.
- Query
Score - Retrieved
Item - Retrieved result for evaluation (file path + score).
Functions§
- evaluate
- Evaluate search results against a set of queries with known relevance.