Re-exports§
pub use config::ChunkingStrategy;pub use config::NegativeStrategy;pub use config::SamplerConfig;pub use config::Selector;pub use config::TextRecipe;pub use config::TripletRecipe;pub use data::DataRecord;pub use data::PairLabel;pub use data::QualityScore;pub use data::RecordChunk;pub use data::SampleBatch;pub use data::SamplePair;pub use data::SampleTriplet;pub use data::SectionRole;pub use data::TextBatch;pub use data::TextSample;pub use data::TripletBatch;pub use ingestion::IngestionManager;pub use ingestion::RecordCache;pub use kvp::KvpField;pub use kvp::KvpPrefixSampler;pub use sampler::BatchPrefetcher;pub use sampler::PairSampler;pub use sampler::Sampler;pub use source::DataSource;pub use source::SourceCursor;pub use splits::DeterministicSplitStore;pub use splits::FileSplitStore;pub use splits::SplitLabel;pub use splits::SplitRatios;pub use splits::SplitStore;pub use types::CategoryId;pub use types::HashPart;pub use types::KvpValue;pub use types::LogMessage;pub use types::MetaValue;pub use types::PathString;pub use types::RecipeKey;pub use types::RecordId;pub use types::Sentence;pub use types::SourceId;pub use types::TaxonomyValue;
Modules§
- config
- Sampling configuration types.
- constants
- Centralized constants used across sampler, splits, and sources.
- data
- Data record and sample batch types.
- example_
apps - Reusable example runners shared by downstream crates.
- heuristics
- Capacity and sampling estimation helpers.
- ingestion
- Background ingestion and caching infrastructure.
- kvp
- Key/value prefix sampling helpers.
- metadata
- Metadata keys and helpers.
- metrics
- Aggregate metrics helpers.
- sampler
- Sampler implementations and public sampling API.
- source
- Data source traits and built-in sources. Data source interfaces and paging helpers.
- splits
- Split stores and persistence helpers.
- transport
- Input transports used by sources (filesystem today; DBs later).
- types
- Shared type aliases.
- utils
- Text normalization helpers. Text normalization helpers shared by source implementations.
Enums§
- Sampler
Error - Error type for sampler configuration, IO, and persistence failures.