Skip to main content

build_classifier_dataset

Function build_classifier_dataset 

Source
pub fn build_classifier_dataset(
    samples: &[TrainingSample],
    num_candidates: usize,
    stopwords: &HashSet<String>,
) -> (Vec<NodeFeatures>, Vec<f32>)
Expand description

Build a pointwise training set for the classifier from labelled samples.

For every sample that has ground-truth text, each candidate node becomes one (features, label) example, where label = 1.0 for the best-F1 candidate (via label_from_f1) and 0.0 otherwise. Samples without ground truth, or where no candidate matches, are skipped.