Expand description
Corpus Management for Online Learning
Provides efficient corpus storage with deduplication, importance sampling, and configurable eviction policies.
§References
- [Vitter 1985] “Random Sampling with a Reservoir”
- [Settles 2009] “Active Learning Literature Survey”
§Toyota Way Principles
- Muda Elimination: Deduplication avoids redundant training data
- Heijunka: Eviction policies level data quality over time
Structs§
- Corpus
Buffer - Efficient corpus storage with deduplication
- Corpus
Buffer Config - Configuration for corpus buffer
- Corpus
Merger - Merge multiple data sources with configurable weighting
- Corpus
Provenance - Provenance tracking for merged corpus
- Corpus
Source - Source for corpus merger
- Sample
- A single sample in the corpus
Enums§
- Eviction
Policy - Eviction policy for corpus buffer
- Sample
Source - Sample source for provenance tracking