Skip to main content

Module bakeoff

Module bakeoff 

Source

Modules§

criteria
Success criteria from the epic.

Structs§

BakeoffComparison
Bake-off comparison result.
Document
A document in the evaluation corpus.
EvaluationConfig
Configuration for the evaluation harness.
EvaluationCorpus
Evaluation corpus containing documents and queries with ground truth.
EvaluationHarness
Evaluation harness for running bake-off evaluations.
LatencyStats
Latency statistics from a benchmark run.
LatencyTimer
Timer for measuring operation latency.
ModelMetadata
Model metadata for eligibility checking.
QueryEvalResult
Result of evaluating a single query.
QueryWithJudgments
A query with ground truth relevance judgments.
RelevanceJudgment
Ground truth relevance judgment for a query-document pair.
ValidationReport
Minimal validation report for bake-off runs.

Constants§

ELIGIBILITY_CUTOFF
Hard eligibility cutoff: models must be released on/after this date. Format: YYYY-MM-DD

Functions§

cosine_similarity
Compute cosine similarity between two vectors.
format_comparison_table
Format a comparison as a markdown table for reporting.
ndcg_at_k
Compute NDCG@k for a list of relevances in rank order. Non-finite or <= 0 relevances are treated as non-relevant.