Expand description
Synthetic Data Generation for AutoML.
This module provides automatic synthetic data generation capabilities
to improve model performance in low-resource domains. Generated data
is validated, quality-scored, and integrated into the AutoML optimization loop.
§Quick Start
use aprender::synthetic::{SyntheticConfig, GenerationStrategy};
// Configure synthetic data generation
let config = SyntheticConfig::default()
.with_augmentation_ratio(0.5)
.with_quality_threshold(0.7);
assert_eq!(config.augmentation_ratio, 0.5);
assert_eq!(config.quality_threshold, 0.7);§Design Principles
- Quality-First: All generated samples validated before inclusion
- Diversity-Aware: Monitors for mode collapse and distribution shift
AutoMLIntegration: Augmentation parameters jointly optimized with model hyperparameters
§References
- Cubuk et al. (2019).
AutoAugment: Learning Augmentation Strategies. CVPR. - Wei & Zou (2019). EDA: Easy Data Augmentation. EMNLP.
- Ratner et al. (2017). Snorkel: Weak Supervision. VLDB.
Re-exports§
pub use andon::AndonConfig;pub use andon::AndonEvent;pub use andon::AndonHandler;pub use andon::AndonSeverity;pub use andon::DefaultAndon;pub use andon::TestAndon;
Modules§
- andon
- Andon mechanism for synthetic data generation (Toyota Jidoka).
- cache
- Caching for Synthetic Data Generation.
- code_
eda - Code-Specific EDA (Easy Data Augmentation) for source code.
- code_
features - Code Feature Extraction for Commit-Level Analysis.
- eda
- Easy Data Augmentation (EDA) for text data.
- mixup
MixUpData Augmentation.- shell
- Shell Autocomplete Synthetic Data Generator.
- template
- Template-based synthetic data generation.
- weak_
supervision - Weak Supervision for Synthetic Data Generation.
Structs§
- Diversity
Monitor - Monitors diversity of generated synthetic samples over time.
- Diversity
Score - Diversity metrics for a batch of generated samples.
- Quality
Degradation Detector - Detects when synthetic data is hurting rather than helping model performance.
- Synthetic
Config - Configuration for synthetic data generation.
- Synthetic
Stream - Streaming iterator for memory-constrained synthetic generation.
- Synthetic
Validator - Validates generated synthetic samples before inclusion.
Enums§
- Generation
Strategy - Available synthetic data generation strategies.
- Synthetic
Param - Synthetic data generation hyperparameters.
- Validation
Result - Result of validating a synthetic sample.
Traits§
- Synthetic
Callback - Callback trait for monitoring synthetic data generation.
- Synthetic
Generator - Trait for synthetic data generators.
Functions§
- check_
andon - Check Andon conditions and trigger events if thresholds exceeded.
- generate_
batched - Generate synthetic data in batches to manage memory.