Skip to main content

generate_data

Function generate_data 

Source
pub fn generate_data(config: GeneratorConfig) -> DataBuffer
Expand description

Generate data with full configuration (ZERO-COPY - returns DataBuffer)

§Algorithm

  1. Fill blocks with Xoshiro256++ keystream (high entropy baseline)
  2. Add local back-references for compression
  3. Use round-robin deduplication across unique blocks
  4. Parallel generation via rayon (NUMA-aware if enabled)

§Performance

  • 5-15 GB/s per core with incompressible data
  • 1-4 GB/s with compression enabled (depends on compress factor)
  • Near-linear scaling with CPU cores

§Returns

DataBuffer that holds the generated data without copying:

  • UMA: Vec wrapper
  • NUMA: hwlocality Bytes wrapper (when numa_node is specified)

Python accesses this memory directly via buffer protocol - ZERO COPY!