Skip to main content

Module corpus

Module corpus 

Source
Expand description

Corpus Management for Online Learning

Provides efficient corpus storage with deduplication, importance sampling, and configurable eviction policies.

§References

  • [Vitter 1985] “Random Sampling with a Reservoir”
  • [Settles 2009] “Active Learning Literature Survey”

§Toyota Way Principles

  • Muda Elimination: Deduplication avoids redundant training data
  • Heijunka: Eviction policies level data quality over time

Structs§

CorpusBuffer
Efficient corpus storage with deduplication
CorpusBufferConfig
Configuration for corpus buffer
CorpusMerger
Merge multiple data sources with configurable weighting
CorpusProvenance
Provenance tracking for merged corpus
CorpusSource
Source for corpus merger
Sample
A single sample in the corpus

Enums§

EvictionPolicy
Eviction policy for corpus buffer
SampleSource
Sample source for provenance tracking