Skip to main content

Module corpus_features

Module corpus_features 

Source
Expand description

Low-dimensional profile of a corpus — features that meta-learning can map to an optimal PipelineConfig.

Meta-learning across corpora needs a stable, compact characterization of “what kind of data is this?” so that past (corpus, best_config) pairs can be indexed and retrieved for prediction on new corpora. CorpusFeatures is that characterization. Extraction is O(N² · d) for pairwise-similarity features and O(N · d) for shape features; ~100ms at N = 775, d = 128.

Every field is a scalar [0, 1] or an unbounded non-negative number. CorpusFeatures::to_vec flattens to a fixed-length Vec<f64> in a stable order matching CorpusFeatures::feature_names.

Structs§

CorpusFeatures
Low-dimensional profile of a corpus. Computed once per corpus; fed into any MetaModel to predict the pipeline config that’s likely to work best on it.

Constants§

CORPUS_FEATURE_COUNT
Length of the vector returned by CorpusFeatures::to_vec.