Expand description
Meta-learning across corpora: predict a PipelineConfig for a new
corpus by consulting past tuner runs on similar corpora.
This is Level 2 of SphereQL’s self-optimization hierarchy (per the metalearning-direction memory):
- L1 (
tuner::auto_tune): per-corpus search. Produces a best config. - L2 (this module): cross-corpus generalization. Takes the (corpus
features, best config) pairs produced by L1 and learns a function
CorpusFeatures → PipelineConfigso new corpora can skip search or warm-start it. - L3: online adaptation from query feedback. Deferred.
Today’s meta-model is a simple z-score-normalized nearest neighbor
over CorpusFeatures::to_vec. It works with any N ≥ 1 training
records, is deterministic, and has no free hyperparameters. When
you’ve accumulated ≥ 10 diverse corpora you can swap in something
fancier (gradient-boosted trees, small MLP) against the same
MetaModel trait — the storage format
(MetaTrainingRecord) stays stable.
§Storage
Records are serialized as a flat JSON array:
[
{ "corpus_id": "built_in_775", "features": {...}, "best_config": {...}, ... },
...
]MetaTrainingRecord::save_list and MetaTrainingRecord::load_list
are convenience wrappers; the format is plain enough to edit by hand
or process with jq.
Structs§
- Distance
Weighted Meta Model - Picks the training record that maximizes
best_score × w(distance), wherew(d) = 1 / (d + epsilon)over z-score-normalized Euclidean distance. - Meta
Training Record - One observation for the meta-learner: “on this corpus profile, this config was found to be best under this metric.”
- Nearest
Neighbor Meta Model - The simplest useful meta-model: given a new corpus, return the best_config of the training record whose corpus-feature vector is closest in z-score-normalized Euclidean distance.
Traits§
- Meta
Model - Predicts a
PipelineConfigfrom aCorpusFeaturesprofile.