Modules§
- criteria
- Success criteria from the epic.
Structs§
- Bakeoff
Comparison - Bake-off comparison result.
- Document
- A document in the evaluation corpus.
- Evaluation
Config - Configuration for the evaluation harness.
- Evaluation
Corpus - Evaluation corpus containing documents and queries with ground truth.
- Evaluation
Harness - Evaluation harness for running bake-off evaluations.
- Latency
Stats - Latency statistics from a benchmark run.
- Latency
Timer - Timer for measuring operation latency.
- Model
Metadata - Model metadata for eligibility checking.
- Query
Eval Result - Result of evaluating a single query.
- Query
With Judgments - A query with ground truth relevance judgments.
- Relevance
Judgment - Ground truth relevance judgment for a query-document pair.
- Validation
Report - Minimal validation report for bake-off runs.
Constants§
- ELIGIBILITY_
CUTOFF - Hard eligibility cutoff: models must be released on/after this date. Format: YYYY-MM-DD
Functions§
- cosine_
similarity - Compute cosine similarity between two vectors.
- format_
comparison_ table - Format a comparison as a markdown table for reporting.
- ndcg_
at_ k - Compute NDCG@k for a list of relevances in rank order. Non-finite or <= 0 relevances are treated as non-relevant.