test-data-generation 0.0.5

A simple to use, light-weight library that analyzes sample data to build algorithms and generates realistic test data.

Test Data Generation

License Coverage Status Docs.rs

Linux: Build Status Windows: Build status


For software development teams who need realistic test data for testing their software, this Test Data Generation library is a light-weight module that implements Markov decision process machine learning to quickly and easily profile sample data, create an algorithm, and produce representative test data without the need for persistent data sources, data cleaning, or remote services. Unlike other solutions, this open source solution can be integrated into your test source code, or wrapped into a web service or stand-alone utility.

PROBLEM In order to make test data represent production, (a.k.a. realistic) you need to perform one of the following:

  • load data from a production environment into the non-production environment, which requires ETL (e.g.: masking, obfuscation, etc.)
  • stand up a pre-loaded "profile" database that is randomly sampled, which requires preparing sample data from either another test data source or production environment (option #1 above)

SOLUTION Incorporate this library in your software's testing source code by loading an algorithm from a previously analyzed data sample and generating test data during your tests runtime.

Table of Contents

What's New

Here's whats new in 0.0.5:

  • Added the following new module and functions to the test_data_generation::shared module
  • string_to_static_str(s: String) -> &'static str
  • The following macros have been modified with 'returns', instead of 'sets'
  • random_percentage
  • random_between
  • Added the following macros data_test_generation::profile
  • symbolize_char -> char
  • factualize_entity -> (String, Vec)
  • The following test_data_generation::data_sample_parser::DataSampleParser functions takes &String instead of &'static str as the path parameter.
  • analyze_csv_file
  • from_file
  • generate_csv
  • with_new
  • save
  • The following test_data_generation::configs::Configs functions takes &String instead of &'static str as the path parameter.
  • new
  • Added the test_data_generation::data_sample_parser::DataSampleParserfunction analyze_csv_data function so that the csv data doesn't need to 'land' in order to be analyzed. This is helpful when wrapping the test data generation library in a REST service for instance.
  • Added the test_data_generation::profile::profile::Profile factualize function so that the processing of building Facts can be multi-threaded in the future
  • Added the test_data_generation::profile::pattern::Pattern factualize function so that the processing of building Facts can be multi-threaded in the future.
  • Refactored the following items
  • test_data_generation::profile::Profile function apply_facts renamed to generate_from_pattern
  • Improved documentation


test data generation uses Markov decision process machine learning to create algorithms that enable test data generation on the fly without the overhead of test data databases, security data provisioning (e.g.: masking, obfuscation), or standing up remote services.

The algorithm is built on the bases of:

  1. character patterns
  2. frequency of patterns
  3. character locations
  4. beginning and ending characters
  5. length of entity (string, date, number)


The are multiple ways to use the Test Data Generation library. It all depends on your intent.


The easiest way is to use a Profile. The profile module provides functionality to create a profile on a data sample (Strings). Once a profile has been made, data can be generated by calling the pre_generate() and generate() functions, in that order.

extern crate test_data_generation;

use test_data_generation::profile::profile::Profile;

fn main() {
    // analyze the dataset
	let mut data_profile =  Profile::new();

    // analyze the dataset
	data_profile.analyze("Smith, John");
	data_profile.analyze("Doe, John");
	data_profile.analyze("Dale, Danny");
	data_profile.analyze("Rickets, Ronney");

    // confirm 4 data samples were analyzed   		
   	assert_eq!(data_profile.patterns.len(), 4);

    // prepare the generator

    // generate some data
   	println!("The generated name is {:?}", data_profile.generate());

   	// save the profile (algorithm) for later
   	assert_eq!(data_profile.save(&String::from("./tests/samples/sample-00-profile")).unwrap(), true);

   	// later... create a new profile from the saved archive file
   	let mut new_profile = Profile::from_file(&String::from("./tests/samples/sample-00-profile"));

    // generate some data
   	println!("The generated name is {:?}", new_profile.generate());

Data Sample Parser

If you are using CSV files of data samples, then you may wish to use a Data Sample Parser. The data_sample_parser module provides functionality to read sample data, parse and analyze it, so that test data can be generated based on profiles.

extern crate test_data_generation;
use test_data_generation::data_sample_parser::DataSampleParser;

fn main() {
    let mut dsp = DataSampleParser::new();

    println!("My new name is {} {}", dsp.generate_record()[0], dsp.generate_record()[1]);
    // My new name is Abbon Aady

You can also save the Data Sample Parser (the algorithm) as an archive file (json) ...

extern crate test_data_generation;
use test_data_generation::data_sample_parser::DataSampleParser;

fn main() {
    let mut dsp =  DataSampleParser::new();  

    assert_eq!(dsp.save(&String::from("./tests/samples/sample-01-dsp")).unwrap(), true);

and use it at a later time.

extern crate test_data_generation;
use test_data_generation::data_sample_parser::DataSampleParser;

fn main() {
    let mut dsp = DataSampleParser::from_file(&String::from("./tests/samples/sample-01-dsp"));

	println!("Sample data is {:?}", dsp.generate_record()[0]);

You can also generate a new csv file based on the data sample provided.

extern crate test_data_generation;
use test_data_generation::data_sample_parser::DataSampleParser;

fn main() {
    let mut dsp =  DataSampleParser::new();  

    dsp.generate_csv(100, &String::from("./tests/samples/generated-01.csv")).unwrap();

How to Contribute

Details on how to contribute can be found in the CONTRIBUTING file.


test-data-generation is primarily distributed under the terms of the Apache License (Version 2.0).

See LICENSE-APACHE "Apache License for details.