pub struct DataGenerator { /* private fields */ }Expand description
Streaming data generator (like ObjectGenAlt from s3dlio)
Implementations§
Source§impl DataGenerator
impl DataGenerator
Sourcepub fn new(config: GeneratorConfig) -> Self
pub fn new(config: GeneratorConfig) -> Self
Create new streaming generator
Sourcepub fn fill_chunk(&mut self, buf: &mut [u8]) -> usize
pub fn fill_chunk(&mut self, buf: &mut [u8]) -> usize
Fill the next chunk of data
Returns the number of bytes written. When this returns 0, generation is complete.
Performance: When buffer contains multiple blocks (>=8 MB), generation is parallelized using rayon. Small buffers (<8 MB) use sequential generation to avoid threading overhead.
Sourcepub fn total_size(&self) -> usize
pub fn total_size(&self) -> usize
Get total size
Sourcepub fn is_complete(&self) -> bool
pub fn is_complete(&self) -> bool
Check if generation is complete
Sourcepub fn set_seed(&mut self, seed: Option<u64>)
pub fn set_seed(&mut self, seed: Option<u64>)
Set or reset the random seed for subsequent data generation
This allows changing the data pattern mid-stream while maintaining generation position.
The new seed takes effect on the next fill_chunk() call.
§Arguments
seed- New seed value, or None to use time+urandom entropy (non-deterministic)
§Examples
use dgen_data::{DataGenerator, GeneratorConfig, NumaMode};
let config = GeneratorConfig {
size: 100 * 1024 * 1024,
dedup_factor: 1,
compress_factor: 1,
numa_mode: NumaMode::Auto,
max_threads: None,
numa_node: None,
block_size: None,
seed: Some(12345),
};
let mut gen = DataGenerator::new(config);
let mut buffer = vec![0u8; 1024 * 1024];
// Generate some data with initial seed
gen.fill_chunk(&mut buffer);
// Change seed for different pattern
gen.set_seed(Some(67890));
gen.fill_chunk(&mut buffer); // Uses new seed
// Switch to non-deterministic mode
gen.set_seed(None);
gen.fill_chunk(&mut buffer); // Uses time+urandomSourcepub fn recommended_chunk_size() -> usize
pub fn recommended_chunk_size() -> usize
Get recommended chunk size for optimal performance
Returns 32 MB, which provides the best balance between:
- Parallelism: 8 blocks × 4 MB = good distribution across cores
- Cache locality: Fits well in L3 cache
- Memory overhead: Reasonable buffer size
Based on empirical testing showing 32 MB is ~16% faster than 64 MB and significantly better than smaller or larger sizes.
Auto Trait Implementations§
impl Freeze for DataGenerator
impl !RefUnwindSafe for DataGenerator
impl Send for DataGenerator
impl Sync for DataGenerator
impl Unpin for DataGenerator
impl !UnwindSafe for DataGenerator
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more