Expand description
Low-dimensional profile of a corpus — features that meta-learning can
map to an optimal PipelineConfig.
Meta-learning across corpora needs a stable, compact characterization
of “what kind of data is this?” so that past (corpus, best_config)
pairs can be indexed and retrieved for prediction on new corpora.
CorpusFeatures is that characterization. Extraction is O(N² · d)
for pairwise-similarity features and O(N · d) for shape features;
~100ms at N = 775, d = 128.
Every field is a scalar [0, 1] or an unbounded non-negative number.
CorpusFeatures::to_vec flattens to a fixed-length Vec<f64> in a
stable order matching CorpusFeatures::feature_names.
Structs§
- Corpus
Features - Low-dimensional profile of a corpus. Computed once per corpus; fed
into any
MetaModelto predict the pipeline config that’s likely to work best on it.
Constants§
- CORPUS_
FEATURE_ COUNT - Length of the vector returned by
CorpusFeatures::to_vec.