pub struct RandomPartitionedDataBuilder {
pub seed: u64,
pub num_partitions: usize,
pub batches_per_partition: usize,
pub rows_per_batch: usize,
/* private fields */
}Expand description
Builder for generating test data partitions with random geometries.
This builder allows you to create deterministic test datasets with configurable geometry types, data distribution, and partitioning for testing spatial operations.
The generated data includes:
id: Unique integer identifier for each rowdist: Random floating-point distance value (0.0 to 100.0)geometry: Random geometry data in the specified format (WKB or WKB View)
The strategy for generating geometries and their options are not stable and may change as the needs of testing and benchmarking evolve or better strategies are discovered. The strategy for generating random geometries is as follows:
- Points are uniformly distributed over the Self::bounds indicated
- Linestrings are generated by calculating the points in a circle of a randomly chosen size (according to Self::size_range) with vertex count sampled using Self::vertices_per_linestring_range. The start and end point of generated linestrings are never connected.
- Polygons are generated using a closed version of the linestring generated. They may or may not have a hole according to Self::polygon_hole_rate.
- MultiPoint, MultiLinestring, and MultiPolygon geometries are constructed with the number of parts sampled according to Self::num_parts_range. The size of the entire feature is constrained to Self::size_range, and this space is subdivided to obtain the exact number of spaces needed. Child features are generated using the global options except with sizes sampled to approach the space given to them.
§Example
use sedona_testing::datagen::RandomPartitionedDataBuilder;
use sedona_geometry::types::GeometryTypeId;
use geo_types::{Coord, Rect};
let (schema, partitions) = RandomPartitionedDataBuilder::new()
.seed(42)
.num_partitions(4)
.rows_per_batch(1000)
.geometry_type(GeometryTypeId::Polygon)
.bounds(Rect::new(Coord { x: 0.0, y: 0.0 }, Coord { x: 100.0, y: 100.0 }))
.build()
.unwrap();Fields§
§seed: u64§num_partitions: usize§batches_per_partition: usize§rows_per_batch: usizeImplementations§
Source§impl RandomPartitionedDataBuilder
impl RandomPartitionedDataBuilder
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new RandomPartitionedDataBuilder with default values.
Default configuration:
- seed: 42 (for deterministic results)
- num_partitions: 1
- batches_per_partition: 1
- rows_per_batch: 10
- geometry_type: Point
- bounds: (0,0) to (100,100)
- size_range: 1.0 to 10.0
- null_rate: 0.0 (no nulls)
- empty_rate: 0.0 (no empties)
- vertices_per_linestring_range
- num_parts_range: 1 to 3
- polygon_hole_rate: 0.0 (no polygons with holes)
Sourcepub fn seed(self, seed: u64) -> Self
pub fn seed(self, seed: u64) -> Self
Sets the random seed for deterministic data generation.
Using the same seed will produce identical datasets, which is useful for reproducible tests.
§Arguments
seed- The random seed value
Sourcepub fn num_partitions(self, num_partitions: usize) -> Self
pub fn num_partitions(self, num_partitions: usize) -> Self
Sets the number of data partitions to generate.
Each partition contains multiple batches of data. This is useful for testing distributed processing scenarios.
§Arguments
num_partitions- Number of partitions to create
Sourcepub fn batches_per_partition(self, batches_per_partition: usize) -> Self
pub fn batches_per_partition(self, batches_per_partition: usize) -> Self
Sets the number of batches per partition.
Each batch is a RecordBatch containing the specified number of rows.
§Arguments
batches_per_partition- Number of batches in each partition
Sourcepub fn rows_per_batch(self, rows_per_batch: usize) -> Self
pub fn rows_per_batch(self, rows_per_batch: usize) -> Self
Sets the number of rows per batch.
This determines the size of each RecordBatch that will be generated.
§Arguments
rows_per_batch- Number of rows in each batch
Sourcepub fn geometry_type(self, geom_type: GeometryTypeId) -> Self
pub fn geometry_type(self, geom_type: GeometryTypeId) -> Self
Sets the type of geometry to generate.
Currently supports:
GeometryTypeId::Point: Random points within the specified boundsGeometryTypeId::Polygon: Random diamond-shaped polygons- Other types default to point generation
§Arguments
geom_type- The geometry type to generate
Sourcepub fn sedona_type(self, sedona_type: SedonaType) -> Self
pub fn sedona_type(self, sedona_type: SedonaType) -> Self
Sets the Sedona data type for the geometry column.
This determines how the geometry data is stored (e.g., WKB or WKB View).
§Arguments
sedona_type- The Sedona type for geometry storage
Sourcepub fn bounds(self, bounds: Rect) -> Self
pub fn bounds(self, bounds: Rect) -> Self
Sets the spatial bounds for geometry generation.
All generated geometries will be positioned within these bounds. For polygons, the bounds are used to ensure the entire polygon fits within the area.
§Arguments
bounds- Rectangle defining the spatial bounds (min_x, min_y, max_x, max_y)
Sourcepub fn size_range(self, size_range: (f64, f64)) -> Self
pub fn size_range(self, size_range: (f64, f64)) -> Self
Sets the size range for generated geometries.
For polygons, this controls the radius of the generated shapes. For points, this parameter is not used.
§Arguments
size_range- Tuple of (min_size, max_size) for geometry dimensions
Sourcepub fn null_rate(self, null_rate: f64) -> Self
pub fn null_rate(self, null_rate: f64) -> Self
Sets the rate of null values in the geometry column.
§Arguments
null_rate- Fraction of rows that should have null geometry (0.0 to 1.0)
Sourcepub fn empty_rate(self, empty_rate: f64) -> Self
pub fn empty_rate(self, empty_rate: f64) -> Self
Sets the rate of EMPTY geometries in the geometry column.
§Arguments
empty_rate- Fraction of rows that should have empty geometry (0.0 to 1.0)
Sourcepub fn vertices_per_linestring_range(
self,
vertices_per_linestring_range: (usize, usize),
) -> Self
pub fn vertices_per_linestring_range( self, vertices_per_linestring_range: (usize, usize), ) -> Self
Sets the vertex count range
§Arguments
vertices_per_linestring_range- The minimum and maximum (inclusive) number of vertices in linestring output. This also affects polygon output, although the actual number of vertices in the polygon ring will be one more than the range indicated here to close the polygon.
Sourcepub fn num_parts_range(self, num_parts_range: (usize, usize)) -> Self
pub fn num_parts_range(self, num_parts_range: (usize, usize)) -> Self
Sets the number of parts range
§Arguments
num_parts_range- The minimum and maximum (inclusive) number of parts in multi geometry and/or collection output.
Sourcepub fn polygon_hole_rate(self, polygon_hole_rate: f64) -> Self
pub fn polygon_hole_rate(self, polygon_hole_rate: f64) -> Self
Sets the polygon hole rate
§Arguments
polygon_hole_rate- Fraction of polygons that should have an interior ring. Currently only a single interior ring is possible.
Sourcepub fn schema(&self) -> SchemaRef
pub fn schema(&self) -> SchemaRef
The SchemaRef generated by this builder
The resulting schema contains three columns:
id: Int32 - Unique sequential identifier for each rowdist: Float64 - Random distance value between 0.0 and 100.0geometry: SedonaType - Random geometry data (WKB or WKB View format)
Sourcepub fn build(&self) -> Result<(SchemaRef, Vec<Vec<RecordBatch>>)>
pub fn build(&self) -> Result<(SchemaRef, Vec<Vec<RecordBatch>>)>
Builds the random partitioned dataset with the configured parameters.
Generates a deterministic dataset based on the seed and configuration. The resulting schema contains three columns:
id: Int32 - Unique sequential identifier for each rowdist: Float64 - Random distance value between 0.0 and 100.0geometry: SedonaType - Random geometry data (WKB or WKB View format)
§Returns
A tuple containing:
SchemaRef: Arrow schema for the generated dataVec<Vec<RecordBatch>>: Vector of partitions, each containing a vector of record batches
§Errors
Returns a datafusion_common::Result error if:
- RecordBatch creation fails
- Array conversion fails
- Schema creation fails
Sourcepub fn validate(&self) -> Result<()>
pub fn validate(&self) -> Result<()>
Validate options
This is called internally before generating batches to prevent panics from occurring while creating random output; however, it may also be called at a higher level to generate an error at a more relevant time.
Sourcepub fn default_rng(seed: u64) -> impl Rng
pub fn default_rng(seed: u64) -> impl Rng
Sourcepub fn partition_reader<R: Rng + Send + 'static>(
&self,
rng: R,
partition_idx: usize,
) -> Box<dyn RecordBatchReader + Send>
pub fn partition_reader<R: Rng + Send + 'static>( &self, rng: R, partition_idx: usize, ) -> Box<dyn RecordBatchReader + Send>
Create a RecordBatchReader that reads a single partition
Trait Implementations§
Source§impl Clone for RandomPartitionedDataBuilder
impl Clone for RandomPartitionedDataBuilder
Source§fn clone(&self) -> RandomPartitionedDataBuilder
fn clone(&self) -> RandomPartitionedDataBuilder
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for RandomPartitionedDataBuilder
impl Debug for RandomPartitionedDataBuilder
Auto Trait Implementations§
impl Freeze for RandomPartitionedDataBuilder
impl !RefUnwindSafe for RandomPartitionedDataBuilder
impl Send for RandomPartitionedDataBuilder
impl Sync for RandomPartitionedDataBuilder
impl Unpin for RandomPartitionedDataBuilder
impl UnsafeUnpin for RandomPartitionedDataBuilder
impl !UnwindSafe for RandomPartitionedDataBuilder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more