Crate elinor

Expand description

Elinor (Evaluation library in information retrieval) is a library for evaluating information retrieval (IR) systems. It provides a comprehensive set of tools and metrics tailored for IR engineers, offering an intuitive and easy-to-use interface.

§Key features

IR-specific design: Elinor is tailored specifically for evaluating IR systems, with an intuitive interface designed for IR engineers. It offers a streamlined workflow that simplifies common IR evaluation tasks.
Comprehensive evaluation metrics: Elinor supports a wide range of key evaluation metrics, such as Precision, MAP, MRR, and nDCG. The supported metrics are available in Metric. The evaluation results are validated against trec_eval to ensure accuracy and reliability.
In-depth statistical testing: Elinor includes several statistical tests, such as Student’s t-test or Randomized Tukey HSD test, to verify the generalizability of results. Not only p-values but also other statistics, such as effect sizes and confidence intervals, are provided for thorough reporting. See the statistical_tests module for more details.

§Basic usage in evaluating several metrics

You first need to prepare gold relevance judgments and predicted relevance scores through GoldRelStore and PredRelStore, respectively. You can build these instances using GoldRelStoreBuilder and PredRelStoreBuilder.

Then, you can evaluate the predicted relevance scores using the evaluate function and the specified metric. The available metrics are defined in the Metric enum.

An example is shown below:

use approx::assert_abs_diff_eq;
use elinor::{GoldRelStoreBuilder, PredRelStoreBuilder, Metric};

// Prepare gold relevance scores.
// In binary-relevance metrics, 0 means non-relevant and the others mean relevant.
let mut b = GoldRelStoreBuilder::new();
b.add_score("q_1", "d_1", 1)?;
b.add_score("q_1", "d_2", 0)?;
b.add_score("q_1", "d_3", 2)?;
b.add_score("q_2", "d_2", 2)?;
b.add_score("q_2", "d_4", 1)?;
let gold_rels = b.build();

// Prepare predicted relevance scores.
let mut b = PredRelStoreBuilder::new();
b.add_score("q_1", "d_1", 0.5.into())?;
b.add_score("q_1", "d_2", 0.4.into())?;
b.add_score("q_1", "d_3", 0.3.into())?;
b.add_score("q_2", "d_4", 0.1.into())?;
b.add_score("q_2", "d_1", 0.2.into())?;
b.add_score("q_2", "d_3", 0.3.into())?;
let pred_rels = b.build();

// Evaluate Precision@3.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::Precision { k: 3 })?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);

// Evaluate MAP, where all documents are considered via k=0.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::AP { k: 0 })?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);

// Evaluate MRR, where the metric is specified via a string representation.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "rr".parse()?)?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.6667, epsilon = 1e-4);

// Evaluate nDCG@3, where the metric is specified via a string representation.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "ndcg@3".parse()?)?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.4751, epsilon = 1e-4);

§Relevance stores from `HashMap`

GoldRelStore and PredRelStore can also be instantiated from HashMaps. The following mapping structure is expected:

query_id => { doc_id => score }

It allows you to prepare data in JSON or other formats via Serde. If you use Serde, enable the serde feature in the Cargo.toml:

[dependencies]
elinor = { version = "*", features = ["serde"] }

An example to instantiate relevance stores from JSON is shown below:

use std::collections::HashMap;
use elinor::{GoldRelStore, GoldScore, PredRelStore, PredScore};

let gold_rels_data = r#"
{
    "q_1": {
        "d_1": 1,
        "d_2": 0,
        "d_3": 2
    },
    "q_2": {
        "d_2": 2,
        "d_4": 1
    }
}"#;

let pred_rels_data = r#"
{
    "q_1": {
        "d_1": 0.5,
        "d_2": 0.4,
        "d_3": 0.3
    },
    "q_2": {
        "d_3": 0.3,
        "d_1": 0.2,
        "d_4": 0.1
    }
}"#;

let gold_rels_map: HashMap<String, HashMap<String, GoldScore>> =
    serde_json::from_str(gold_rels_data)?;
let pred_rels_map: HashMap<String, HashMap<String, PredScore>> =
    serde_json::from_str(pred_rels_data)?;

let gold_rels = GoldRelStore::from_map(gold_rels_map);
let pred_rels = PredRelStore::from_map(pred_rels_map);

assert_eq!(gold_rels.n_queries(), 2);
assert_eq!(gold_rels.n_docs(), 5);
assert_eq!(pred_rels.n_queries(), 2);
assert_eq!(pred_rels.n_docs(), 6);

§Crate features

serde - Enables Serde for PredScore.

Re-exports§

pub use errors::ElinorError;
pub use metrics::Metric;
pub use relevance::Relevance;

Modules§

errors
Error handling for Elinor.
metrics
Metrics for evaluating information retrieval systems.
relevance
Data structures for storing relevance scores.
statistical_tests
Statistical tests.
trec
TREC format parser.

Structs§

Evaluated
Struct to store evaluated results.

Functions§

evaluate
Evaluates the given predicted relevance scores against the gold relevance scores.
paired_scores_from_evaluated
Extracts paired scores from two Evaluated results.
tupled_scores_from_evaluated
Extracts tupled scores from multiple Evaluated results.

Type Aliases§

GoldRelStore
Data structure to store gold relevance scores.
GoldRelStoreBuilder
Builder for GoldRelStore.
GoldScore
Data type to store a gold relevance score. In binary relevance, 0 means non-relevant and the others mean relevant.
PredRelStore
Data structure to store predicted relevance scores.
PredRelStoreBuilder
Builder for PredRelStore.
PredScore
Data type to store a predicted relevance score. A higher score means more relevant.

Crate elinorCopy item path