pub fn predict_quality(features: &[f64]) -> f64Expand description
Predict extraction quality (estimated F1 score) from post-extraction features.
Returns a value in [0.0, 1.0] estimating how well the extraction captured
the page’s main content. Low scores (< 0.80) indicate the extraction may be
poor and should be routed to an LLM fallback.
§Arguments
features- Raw (unscaled) quality features. Must have lengthN_QUALITY_FEATURES. Features include content statistics, page type indicators, and HTML-level signals.
§Feature order (27 features)
0: heuristic_conf, 1: content_len, 2: word_count, 3: vocab_ratio, 4: avg_word_len, 5: sentence_count, 6: avg_sentence_len, 7: sentence_uniqueness, 8: paragraph_count, 9: avg_paragraph_len, 10: link_count_in_content, 11: link_density, 12: boilerplate_keywords, 13-19: is_article..is_service (one-hot page type), 20: length_ratio, 21: html_size, 22: extraction_ratio, 23: og_overlap, 24: script_count, 25: has_jsonld, 26: top_bigram_freq
§Panics
Panics if features.len() != N_QUALITY_FEATURES.