pub fn build_classifier_dataset(
samples: &[TrainingSample],
num_candidates: usize,
stopwords: &HashSet<String>,
) -> (Vec<NodeFeatures>, Vec<f32>)Expand description
Build a pointwise training set for the classifier from labelled samples.
For every sample that has ground-truth text, each candidate node becomes one
(features, label) example, where label = 1.0 for the best-F1 candidate
(via label_from_f1) and 0.0 otherwise. Samples without ground truth, or
where no candidate matches, are skipped.