Expand description
Text generation evaluation metrics
Provides BLEU, ROUGE (1, 2, L), and Perplexity for evaluating text generation, translation, and summarization models.
Functionsยง
- bleu_
score - Compute BLEU score with modified n-gram precision and brevity penalty.
- perplexity
- Compute perplexity from log-probabilities.
- rouge_l
- Compute ROUGE-L F1 score using longest common subsequence.
- rouge_n
- Compute ROUGE-N F1 score (n-gram overlap between reference and hypothesis).