Expand description
Data pipeline for storing verified test cases
This module handles the storage and retrieval of verified (source, target, correctness) tuples in Parquet format.
§Features
- Large-scale parallel generation with progress tracking
- Automatic Parquet sharding for large datasets
- Support for all sampling strategies
Re-exports§
pub use corpus::CorpusFormat;pub use corpus::CorpusManager;pub use corpus::CorpusMetadata;pub use corpus::TrainingCorpus;pub use pipeline::DataPipeline;pub use pipeline::PipelineConfig;pub use pipeline::PipelineStats;pub use pipeline::PipelineStrategy;
Modules§
Structs§
- Code
Features - Features extracted from source code for ML
- Generation
Metadata - Metadata about how the test case was generated
- Test
Case - Test case with full metadata
- Test
Case Builder - Builder for test cases with mutations
- Verified
Tuple - Verified transpilation tuple for ML training
Enums§
- Test
Result - Test result enum