Struct Profile

Source
pub struct Profile {
    pub id: Option<String>,
    pub patterns: BTreeMap<String, u32>,
    pub pattern_total: u32,
    pub pattern_keys: Vec<String>,
    pub pattern_vals: Vec<u32>,
    pub pattern_percentages: Vec<(String, f64)>,
    pub pattern_ranks: Vec<(String, f64)>,
    pub sizes: BTreeMap<u32, u32>,
    pub size_total: u32,
    pub size_ranks: Vec<(u32, f64)>,
    pub processors: u8,
    pub facts: Vec<Vec<Fact>>,
}
Expand description

Represents a Profile for sample data that has been analyzed and can be used to generate realistic data

Fields§

§id: Option<String>

An identifier (not necessarily unique) that is used to differentiate profiles from one another

§patterns: BTreeMap<String, u32>

A list of symbolic patterns with a distinct count of occurrences

§pattern_total: u32

The total number of patterns in the profile

§pattern_keys: Vec<String>

A list of symbolic patterns in the profile (used for temporary storage due to lifetime issues)

§pattern_vals: Vec<u32>

A list of distinct counts for patterns in the profile (used for temporary storage due to lifetime issues)

§pattern_percentages: Vec<(String, f64)>

A list of symbolic patterns with their percent chance of occurrence

§pattern_ranks: Vec<(String, f64)>

A list of symbolic patterns with a running total of percent chance of occurrence, in increasing order

§sizes: BTreeMap<u32, u32>

A list of pattern lengths with a distinct count of occurrence

§size_total: u32

the total number of pattern sizes (lengths) in the profile

§size_ranks: Vec<(u32, f64)>

A list of pattern sizes (lengths) with a running total of their percent chance of occurrence, in increasing order

§processors: u8

The number of processors used to distribute the work load (multi-thread) while finding Facts to generate data

§facts: Vec<Vec<Fact>>

A list of processors (which are lists of Facts) that store all the Facts in the profile

Implementations§

Source§

impl Profile

Source

pub fn new() -> Profile

Constructs a new Profile

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let placeholder = Profile::new();
}
Source

pub fn new_with_id(id: String) -> Profile

Constructs a new Profile using an identifier

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let placeholder = Profile::new_with_id("12345".to_string());
}
Source

pub fn new_with_processors(p: u8) -> Profile

Constructs a new Profile with a specified number of processors to analyze the data. Each processor shares the load of generating the data based on the Facts it has been assigned to manage.

§Arguments
  • p: u8 - A number that sets the number of processors to start up to manage the Facts.
    Increasing the number of processors will speed up the generator be distributing the workload. The recommended number of processors is 1 per 10K data points (e.g.: profiling 20K names should be handled by 2 processors)
    NOTE: The default number of processors is 4.

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
    let processors: u8 = 10;
	let placeholder = Profile::new_with_processors(processors);
}
Source

pub fn from_file(path: &'static str) -> Profile

Constructs a new Profile from an exported JSON file. This is used when restoring from “archive”

§Arguments
  • path: &str - The full path of the export file , excluding the file extension, (e.g.: “./test/data/custom-names”).

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile = Profile::from_file("./tests/samples/sample-00-profile");

    profile.pre_generate();

    println!("The generated name is {:?}", profile.generate());
}
Source

pub fn from_serialized(serialized: &str) -> Profile

Constructs a new Profile from a serialized (JSON) string of the Profile object. This is used when restoring from “archive”

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let serialized = "{\"patterns\":{\"VC\":1},\"pattern_total\":1,\"pattern_keys\":[\"VC\"],\"pattern_vals\":[1],\"pattern_percentages\":[],\"pattern_ranks\":[],\"sizes\":{\"2\":1},\"size_total\":1,\"size_ranks\":[],\"processors\":4,\"facts\":[[{\"key\":\"O\",\"prior_key\":null,\"next_key\":\"K\",\"pattern_placeholder\":\"V\",\"starts_with\":1,\"ends_with\":0,\"index_offset\":0}],[{\"key\":\"K\",\"prior_key\":\"O\",\"next_key\":null,\"pattern_placeholder\":\"C\",\"starts_with\":0,\"ends_with\":1,\"index_offset\":1}],[],[]]}";
	let mut profile = Profile::from_serialized(&serialized);

    profile.pre_generate();

    println!("The generated name is {:?}", profile.generate());
}
Source

pub fn analyze(&mut self, entity: &str)

This function converts an data point (&str) to a pattern and adds it to the profile

§Arguments
  • entity: String - The textual str of the value to analyze.
§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();
	profile.analyze("One");
	profile.analyze("Two");
	profile.analyze("Three");
	profile.analyze("Four");

	assert_eq!(profile.patterns.len(), 4);
}
Source

pub fn apply_facts( &mut self, pattern: String, facts: Vec<Fact>, ) -> Result<i32, String>

This function applies the pattern and list of Facts to the profile

§Arguments
  • pattern: String - The string the represents the pattern of the entity that was analyzed.
  • facts: Vec<Fact> - A Vector containing the Facts based on the analysis (one for each char in the entity).
§Example
extern crate test_data_generation;

use test_data_generation::engine::{Fact, PatternDefinition};
use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();
	let results = PatternDefinition::new().analyze("Word");

	assert_eq!(profile.apply_facts(results.0, results.1).unwrap(), 1);
}
Source

pub fn cum_patternmap(&mut self)

This function calculates the patterns to use by the chance they will occur (as cumulative percentage) in decreasing order

§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();

   	profile.analyze("Smith, John");
   	profile.analyze("O'Brian, Henny");
   	profile.analyze("Dale, Danny");
   	profile.analyze("Rickets, Ronnae");
   	profile.analyze("Richard, Richie");
   	profile.analyze("Roberts, Blake");
   	profile.analyze("Conways, Sephen");

   	profile.pre_generate();
   	let test = [("CvccvccpSCvccvv".to_string(), 28.57142857142857 as f64), ("CcvccpSCvcc".to_string(), 42.857142857142854 as f64), ("CvccvccpSCvccvc".to_string(), 57.14285714285714 as f64), ("CvcvcccpSCcvcv".to_string(), 71.42857142857142 as f64), ("CvcvpSCvccc".to_string(), 85.7142857142857 as f64), ("V@CcvvcpSCvccc".to_string(), 99.99999999999997 as f64)];

   	assert_eq!(profile.pattern_ranks, test);
}
Source

pub fn cum_sizemap(&mut self)

This function calculates the sizes to use by the chance they will occur (as cumulative percentage) in decreasing order

§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();
	profile.analyze("One");
	profile.analyze("Two");
	profile.analyze("Three");
	profile.analyze("Four");
	profile.analyze("Five");
	profile.analyze("Six");

    profile.cum_sizemap();

	print!("The size ranks are {:?}", profile.size_ranks);
    // The size ranks are [(3, 50), (4, 83.33333333333333), (5, 100)]
}
Source

pub fn generate(&mut self) -> String

This function generates realistic test data based on the sampel data that was analyzed.

§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();

	profile.analyze("One");
	profile.analyze("Two");
	profile.analyze("Three");
	profile.analyze("Four");
	profile.analyze("Five");

    profile.pre_generate();

	print!("The test data {:?} was generated.", profile.generate());
}
Source

pub fn generate_from_pattern(&self, pattern: String) -> String

This function generates realistic test data based on the sample data that was analyzed.

§Arguments
  • pattern: String - The pattern to reference when generating the test data.
§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();

	profile.analyze("01/13/2017");
	profile.analyze("11/24/2017");
	profile.analyze("08/05/2017");

    profile.pre_generate();

 	let generated = profile.generate_from_pattern("##p##p####".to_string());

    assert_eq!(generated.len(), 10);
}
Source

pub fn learn_from_entity( &mut self, control_list: Vec<String>, ) -> Result<bool, String>

This function learns by measuring how realistic the test data it generates to the sample data that was provided.

§Arguments
  • control_list: Vec<String> - The list of strings to compare against. This would be the real data from the data sample.
§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profil =  Profile::new();
	let sample_data = vec!("Smith, John".to_string(),"Doe, John".to_string(),"Dale, Danny".to_string(),"Rickets, Ronney".to_string());

	for sample in sample_data.iter().clone() {
		profil.analyze(&sample);
	}

	// in order to learn the profile must be prepared with pre_genrate()
	// so it can generate data to learn from
	profil.pre_generate();

	let learning = profil.learn_from_entity(sample_data).unwrap();

	assert_eq!(learning, true);
}
Source

pub fn levenshtein_distance( &mut self, control: &String, experiment: &String, ) -> usize

This function calculates the levenshtein distance between 2 strings. See: https://crates.io/crates/levenshtein

§Arguments
  • control: &String - The string to compare against. This would be the real data from the data sample.
  • experiment: &String - The string to compare. This would be the generated data for which you want to find the distance.

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();

    assert_eq!(profile.levenshtein_distance(&"kitten".to_string(), &"sitting".to_string()), 3 as usize);
}
Source

pub fn realistic_test(&mut self, control: &str, experiment: &str) -> f64

This function calculates the percent difference between 2 strings.

§Arguments
  • control: &str - The string to compare against. This would be the real data from the data sample.
  • experiment: &str - The string to compare. This would be the generated data for which you want to find the percent difference.

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();

    assert_eq!(profile.realistic_test(&"kitten".to_string(), &"sitting".to_string()), 76.92307692307692 as f64);
}
Source

pub fn pre_generate(&mut self)

This function prepares the size a pattern accumulated percentages order by percentage increasing

§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();
	profile.analyze("One");
	profile.analyze("Two");
	profile.analyze("Three");
	profile.analyze("Four");
	profile.analyze("Five");
	profile.analyze("Six");

    profile.pre_generate();

	print!("The size ranks are {:?}", profile.size_ranks);
    // The size ranks are [(3, 50), (4, 83.33333333333333), (5, 100)]
}
Source

pub fn reset_analyze(&mut self)

This function resets the patterns that the Profile has analyzed. Call this method whenever you wish to “clear” the Profile

§Example
extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	let mut profile =  Profile::new();

	profile.analyze("One");
	profile.analyze("Two");
	profile.analyze("Three");

    let x = profile.patterns.len();

    profile.reset_analyze();

	profile.analyze("Four");
	profile.analyze("Five");
	profile.analyze("Six");
	profile.analyze("Seven");
	profile.analyze("Eight");
	profile.analyze("Nine");
	profile.analyze("Ten");

    let y = profile.patterns.len();

    assert_eq!(x, 3);
    assert_eq!(y, 5);
}
Source

pub fn save(&mut self, path: &'static str) -> Result<bool, Error>

This function saves (exports) the Profile to a JSON file. This is useful when you wish to reuse the algorithm to generate more test data later.

§Arguments
  • field: String - The full path of the export file , excluding the file extension, (e.g.: “./test/data/custom-names”).

#Errors If this function encounters any form of I/O or other error, an error variant will be returned. Otherwise, the function returns Ok(true).

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	// analyze the dataset
	let mut profile =  Profile::new();
    profile.analyze("Smith, John");
	profile.analyze("O'Brian, Henny");
	profile.analyze("Dale, Danny");
	profile.analyze("Rickets, Ronney");

	profile.pre_generate();

    assert_eq!(profile.save("./tests/samples/sample-00-profile").unwrap(), true);
}
Source

pub fn serialize(&mut self) -> String

This function converts the Profile to a serialize JSON string.

#Example

extern crate test_data_generation;

use test_data_generation::Profile;

fn main() {
	// analyze the dataset
	let mut data_profile =  Profile::new();

    // analyze the dataset
	data_profile.analyze("OK");

    println!("{}", data_profile.serialize());
    // {"patterns":{"VC":1},"pattern_total":1,"pattern_keys":["VC"],"pattern_vals":[1],"pattern_percentages":[],"pattern_ranks":[],"sizes":{"2":1},"size_total":1,"size_ranks":[],"processors":4,"facts":[[{"key":"O","prior_key":null,"next_key":"K","pattern_placeholder":"V","starts_with":1,"ends_with":0,"index_offset":0}],[{"key":"K","prior_key":"O","next_key":null,"pattern_placeholder":"C","starts_with":0,"ends_with":1,"index_offset":1}],[],[]]}
}

Trait Implementations§

Source§

impl Clone for Profile

Source§

fn clone(&self) -> Profile

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Profile

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for Profile

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for Profile

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,