Expand description
Span-based (entity-level) precision / recall / F1 — the seqeval metric.
Token-level accuracy is a poor measure for sequence labelling: a model that
gets every O right but mangles entity boundaries can still score highly.
The standard NER / chunking metric (CoNLL-2000/2003, implemented by the
seqeval library) instead compares entity spans: a predicted span is a
true positive only when an identical span — same entity type and same
start/end boundaries — appears in the gold sequence.
Given gold tags y and predicted tags ŷ, spans are extracted with
crate::tagging::bioes::extract_spans and matched exactly:
precision = |gold ∩ pred| / |pred|
recall = |gold ∩ pred| / |gold|
f1 = 2 P R / (P + R)Per-type breakdowns and a micro-averaged total are provided.
Structs§
- PrfScore
- Precision / recall / F1 with the raw TP/FP/FN counts behind them.
- Span
F1Report - Full span-F1 report: a micro-averaged overall score plus per-type scores.
Functions§
- span_f1
- Compute span-F1 directly from gold and predicted tag sequences.
- span_
f1_ from_ spans - Compute the micro-averaged span-F1 from pre-extracted gold/predicted spans.