Skip to main content

Module synthetic

Module synthetic 

Source
Expand description

Synthetic Data Generation for AutoML.

This module provides automatic synthetic data generation capabilities to improve model performance in low-resource domains. Generated data is validated, quality-scored, and integrated into the AutoML optimization loop.

§Quick Start

use aprender::synthetic::{SyntheticConfig, GenerationStrategy};

// Configure synthetic data generation
let config = SyntheticConfig::default()
    .with_augmentation_ratio(0.5)
    .with_quality_threshold(0.7);

assert_eq!(config.augmentation_ratio, 0.5);
assert_eq!(config.quality_threshold, 0.7);

§Design Principles

  • Quality-First: All generated samples validated before inclusion
  • Diversity-Aware: Monitors for mode collapse and distribution shift
  • AutoML Integration: Augmentation parameters jointly optimized with model hyperparameters

§References

  • Cubuk et al. (2019). AutoAugment: Learning Augmentation Strategies. CVPR.
  • Wei & Zou (2019). EDA: Easy Data Augmentation. EMNLP.
  • Ratner et al. (2017). Snorkel: Weak Supervision. VLDB.

Re-exports§

pub use andon::AndonConfig;
pub use andon::AndonEvent;
pub use andon::AndonHandler;
pub use andon::AndonSeverity;
pub use andon::DefaultAndon;
pub use andon::TestAndon;

Modules§

andon
Andon mechanism for synthetic data generation (Toyota Jidoka).
cache
Caching for Synthetic Data Generation.
code_eda
Code-Specific EDA (Easy Data Augmentation) for source code.
code_features
Code Feature Extraction for Commit-Level Analysis.
eda
Easy Data Augmentation (EDA) for text data.
mixup
MixUp Data Augmentation.
shell
Shell Autocomplete Synthetic Data Generator.
template
Template-based synthetic data generation.
weak_supervision
Weak Supervision for Synthetic Data Generation.

Structs§

DiversityMonitor
Monitors diversity of generated synthetic samples over time.
DiversityScore
Diversity metrics for a batch of generated samples.
QualityDegradationDetector
Detects when synthetic data is hurting rather than helping model performance.
SyntheticConfig
Configuration for synthetic data generation.
SyntheticStream
Streaming iterator for memory-constrained synthetic generation.
SyntheticValidator
Validates generated synthetic samples before inclusion.

Enums§

GenerationStrategy
Available synthetic data generation strategies.
SyntheticParam
Synthetic data generation hyperparameters.
ValidationResult
Result of validating a synthetic sample.

Traits§

SyntheticCallback
Callback trait for monitoring synthetic data generation.
SyntheticGenerator
Trait for synthetic data generators.

Functions§

check_andon
Check Andon conditions and trigger events if thresholds exceeded.
generate_batched
Generate synthetic data in batches to manage memory.