Expand description
Elinor (Evaluation library in information retrieval) is a library for evaluating information retrieval (IR) systems. It provides a comprehensive set of tools and metrics tailored for IR engineers, offering an intuitive and easy-to-use interface.
§Key features
- IR-specific design: Elinor is tailored specifically for evaluating IR systems, with an intuitive interface designed for IR engineers. It offers a streamlined workflow that simplifies common IR evaluation tasks.
- Comprehensive evaluation metrics:
Elinor supports a wide range of key evaluation metrics, such as Precision, MAP, MRR, and nDCG.
The supported metrics are available in
Metric. The evaluation results are validated against trec_eval to ensure accuracy and reliability. - In-depth statistical testing:
Elinor includes several statistical tests, such as Student’s t-test or Randomized Tukey HSD test, to verify the generalizability of results.
Not only p-values but also other statistics, such as effect sizes and confidence intervals, are provided for thorough reporting.
See the
statistical_testsmodule for more details.
§Basic usage in evaluating several metrics
You first need to prepare gold relevance judgments and predicted relevance scores through
GoldRelStore and PredRelStore, respectively.
You can build these instances using GoldRelStoreBuilder and PredRelStoreBuilder.
Then, you can evaluate the predicted relevance scores using the evaluate function and
the specified metric. The available metrics are defined in the Metric enum.
An example is shown below:
use approx::assert_abs_diff_eq;
use elinor::{GoldRelStoreBuilder, PredRelStoreBuilder, Metric};
// Prepare gold relevance scores.
// In binary-relevance metrics, 0 means non-relevant and the others mean relevant.
let mut b = GoldRelStoreBuilder::new();
b.add_score("q_1", "d_1", 1)?;
b.add_score("q_1", "d_2", 0)?;
b.add_score("q_1", "d_3", 2)?;
b.add_score("q_2", "d_2", 2)?;
b.add_score("q_2", "d_4", 1)?;
let gold_rels = b.build();
// Prepare predicted relevance scores.
let mut b = PredRelStoreBuilder::new();
b.add_score("q_1", "d_1", 0.5.into())?;
b.add_score("q_1", "d_2", 0.4.into())?;
b.add_score("q_1", "d_3", 0.3.into())?;
b.add_score("q_2", "d_4", 0.1.into())?;
b.add_score("q_2", "d_1", 0.2.into())?;
b.add_score("q_2", "d_3", 0.3.into())?;
let pred_rels = b.build();
// Evaluate Precision@3.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::Precision { k: 3 })?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);
// Evaluate MAP, where all documents are considered via k=0.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::AP { k: 0 })?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);
// Evaluate MRR, where the metric is specified via a string representation.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "rr".parse()?)?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.6667, epsilon = 1e-4);
// Evaluate nDCG@3, where the metric is specified via a string representation.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "ndcg@3".parse()?)?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.4751, epsilon = 1e-4);§Relevance stores from HashMap
GoldRelStore and PredRelStore can also be instantiated from HashMaps.
The following mapping structure is expected:
query_id => { doc_id => score }It allows you to prepare data in JSON or other formats via Serde.
If you use Serde, enable the serde feature in the Cargo.toml:
[dependencies]
elinor = { version = "*", features = ["serde"] }An example to instantiate relevance stores from JSON is shown below:
use std::collections::HashMap;
use elinor::{GoldRelStore, GoldScore, PredRelStore, PredScore};
let gold_rels_data = r#"
{
"q_1": {
"d_1": 1,
"d_2": 0,
"d_3": 2
},
"q_2": {
"d_2": 2,
"d_4": 1
}
}"#;
let pred_rels_data = r#"
{
"q_1": {
"d_1": 0.5,
"d_2": 0.4,
"d_3": 0.3
},
"q_2": {
"d_3": 0.3,
"d_1": 0.2,
"d_4": 0.1
}
}"#;
let gold_rels_map: HashMap<String, HashMap<String, GoldScore>> =
serde_json::from_str(gold_rels_data)?;
let pred_rels_map: HashMap<String, HashMap<String, PredScore>> =
serde_json::from_str(pred_rels_data)?;
let gold_rels = GoldRelStore::from_map(gold_rels_map);
let pred_rels = PredRelStore::from_map(pred_rels_map);
assert_eq!(gold_rels.n_queries(), 2);
assert_eq!(gold_rels.n_docs(), 5);
assert_eq!(pred_rels.n_queries(), 2);
assert_eq!(pred_rels.n_docs(), 6);§Crate features
serde- Enables Serde forPredScore.
Re-exports§
pub use errors::ElinorError;pub use metrics::Metric;pub use relevance::Relevance;
Modules§
- Error handling for Elinor.
- Metrics for evaluating information retrieval systems.
- Data structures for storing relevance scores.
- Statistical tests.
- TREC format parser.
Structs§
- Struct to store evaluated results.
Functions§
- Evaluates the given predicted relevance scores against the gold relevance scores.
- Extracts paired scores from two
Evaluatedresults. - Extracts tupled scores from multiple
Evaluatedresults.
Type Aliases§
- Data structure to store gold relevance scores.
- Builder for
GoldRelStore. - Data type to store a gold relevance score. In binary relevance, 0 means non-relevant and the others mean relevant.
- Data structure to store predicted relevance scores.
- Builder for
PredRelStore. - Data type to store a predicted relevance score. A higher score means more relevant.