Expand description
Code-Specific EDA (Easy Data Augmentation) for source code.
Implements code augmentation techniques inspired by Wei & Zou (2019) EDA paper, adapted for programming languages. Operations preserve syntactic validity while introducing meaningful variation for training code analysis models.
§Operations
- Variable Renaming (VR): Rename variables to synonyms (e.g.,
x->value) - Comment Insertion (CI): Insert comments or assertions
- Statement Reorder (SR): Reorder independent statements
- Dead Code Removal (DCR): Remove comments/whitespace
§Example
use aprender::synthetic::code_eda::{CodeEda, CodeEdaConfig};
use aprender::synthetic::{SyntheticGenerator, SyntheticConfig};
let config = CodeEdaConfig::default();
let generator = CodeEda::new(config);
let code = "let x = 42;\nprintln!(\"{}\", x);";
let augmented = generator.augment(code, 42);
assert!(!augmented.is_empty());Structs§
- CodeEda
- Code-specific EDA generator.
- Code
EdaConfig - Configuration for code-specific EDA augmentation.
- Variable
Synonyms - Variable synonym dictionary for code identifiers.
Enums§
- Code
Language - Supported programming languages for code augmentation.