Module pbd::dpi[][src]

Background

The practice of implementing a Data Privacy Inspector addresses the following Privacy Design Strategies:

  • Control
  • Enforce

Data Driven solutions, (e.g.: Data as Service) requires that the dynamic nature of the system includes the ability to identify and manage data based on its characteristics. A vital characteristic is if the data is determined private - PII. DPI is a semi-supervised machine learning PbD feature that provides the ability to score that probability of data being private and categorizing it accordingly, (e.g.: NPPI, PCI, Public, Confidential, Restricted).

Special thanks to rs-natural for their work on Phonetics, NGrams, Tokenization, and Tf-ldf.

Usage

Default logic

You can start inspecting content for private data using the default logic.

use pbd::dpi::DPI;

let mut dpi = DPI::default();
let doc = r#"
Dear Aunt Bertha,

I can't believe it has already been 10 years since we moved to back to the Colorado.
I love Boulder and haven't thought of leaving since. So please don't worry when I tell you that we are moving in less than a week.
We will be upgrading to a larger home on the other side of the city on Peak Crest Lane.
It have a great view of the mountains and we will have a two car garage.

We will have the same phone number, so you can still reach us. But our new address with be 1345 Peak Crest Lane Boulder, Colorado 125468.

Let us know if you ever want to vist us.

Sincerely,
Robert
"#.to_string();

println!("Score: {}", dpi.inspect(doc));

Custom logic

You can also build you own custom DPI and then train it based upon sample content before using it to inspect documents.

use pbd::dpi::DPI;

let words = vec!["home".to_string(),"address".to_string()];
let mut dpi = DPI::with_key_words(words);

// train it
let mut samples: Vec<String> = vec!["Our home has a garage".to_string(), "My address is 14 Main Stree Newtown CA 56743".to_string(), "My home phone number is (689) 225-9696".to_string()];
let suggestions = dpi.auto_train(samples);

println!("Training limit is {}", DPI::TRAIN_LIMIT);
println!("Suggested words that were automatically applied during training: {:?}", suggestions);

// use it
let doc = r#"
Dear Aunt Bertha,

I can't believe it has already been 10 years since we moved to back to the Colorado.
I love Boulder and haven't thought of leaving since. So please don't worry when I tell you that we are moving in less than a week.
We will be upgrading to a larger home on the other side of the city on Peak Crest Lane.
It have a great view of the mountains and we will have a two car garage.

We will have the same phone number, so you can still reach us. But our new address with be 1345 Peak Crest Lane Boulder, Colorado 125468.

Let us know if you ever want to vist us.

Sincerely,
Robert
"#.to_string();

println!("Score: {}", dpi.inspect(doc));

Modules

error

Data Privacy Inspector specific Errors

reference

Structs

DPI

Represents a Data Privacy Inspector (DPI)

Pattern

Represents a symbolic pattern of an entity (String)

PatternDefinition

Represents the object managing all the symbols used in pattern definitions

Score

Represents a Score

Suggestion

Represents a Suggestion

Enums

ScoreKey

Traits

Phonetic

The collection of methods that enable a structure to find words that sound alike

Tfidf
Tokenizer

The collection of methods that enable a structure to tokenize and convert text to ngrams