1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
//! Mechanisms to classify, manipulate, and redact sensitive data.
//!
//! Commercial software often needs to handle sensitive data, such as personally identifiable information (PII).
//! A user's name, IP address, email address, and other similar information require special treatment. For
//! example, it's usually not legally acceptable to emit a user's email address in a system's logs.
//! Following these rules can be challenging and error-prone, especially when the data is
//! transferred between different components of a large complex system. This crate provides
//! mechanisms to reduce the risk of unintentionally exposing sensitive data.
//!
//! This crate's approach uses wrapping to isolate sensitive data and avoid accidental exposure.
//! Mechanisms are provided to automatically process sensitive data to make it safe to use in telemetry.
//!
//! # Concepts
//!
//! Before continuing, it's important to understand a few concepts:
//!
//! * **Data Classification**: The process of assigning sensitive data individual data classes.
//! Different data classes may have different rules for handling them. For example, some sensitive
//! data can be put into logs, but only for a limited time, while other data can never be logged.
//!
//! * **Data Taxonomy**: A group of related data classes that together represent a consistent set
//! of rules for handling sensitive data. Different companies or governments usually have their
//! own taxonomies representing the different types of data they manipulate, each with specific
//! policies.
//!
//! * **Redaction**: The process of removing or obscuring sensitive information from data.
//! Redaction is often done by using consistent hashing, replacing the sensitive data with a hash
//! value that is not reversible. This allows the data to be used for analysis or processing
//! without exposing the sensitive information.
//!
//! It's important to note that redaction is different from deletion. Redaction typically replaces sensitive data
//! with something else, while deletion removes the data entirely. Redaction allows for correlation since a given piece
//! of sensitive data will always produce the same redacted value. This makes it possible to look at many different
//! log records and correlate them to a specific user or entity without exposing the sensitive data itself. It's possible
//! to tell over time that an operation is attributed to the same piece of state without knowing what the state is.
//!
//! # Traits
//!
//! This crate is built around two primary traits:
//!
//! * The [`Classified`] trait is used to mark types that hold sensitive data.
//!
//! * The [`Redactor`] trait defines the logic needed by an individual redactor. This crate provides a
//! few implementations of this trait, such as [`SimpleRedactor`](simple_redactor::SimpleRedactor), but others can
//! be implemented and used by applications as well.
//!
//! This crate also exposes additional traits which are usually, but not necessarily, implemented by types that implement the
//! [`Classified`] trait:
//!
//! - The [`RedactedDebug`] trait defines how to produce redacted debug output for classified data.
//!
//! - The [`RedactedDisplay`] trait defines how to produce redacted display output for classified data.
//!
//! - The [`RedactedToString`] trait defines how to produce a redacted string representation of classified data.
//!
//! # Taxonomies and Data Classes
//!
//! A taxonomy is defined using the [`taxonomy`] attribute macro. The macro is applied to an enum
//! declaration. Each variant of the enum represents a data class within the taxonomy.
//!
//! [`DataClass`] is a struct that represents a single data class within a taxonomy. The struct
//! contains the name of the taxonomy and the name of the data class. You can get a `DataClass` instance for a given data class
//! by calling the associated `data_class` method on the taxonomy enum.
//!
//! ```rust
//! use data_privacy::taxonomy;
//!
//! // A simple taxonomy definition for the Contoso organization.
//! #[taxonomy(contoso)]
//! enum ContosoTaxonomy {
//! CustomerContent,
//! CustomerIdentifier,
//! OrganizationIdentifier,
//! }
//!
//! let dc = ContosoTaxonomy::CustomerIdentifier.data_class();
//! assert_eq!(dc.taxonomy(), "contoso");
//! assert_eq!(dc.name(), "customer_identifier");
//! ```
//!
//! # Classified Containers
//!
//! Types that implement the [`Classified`] trait are said to be _classified containers_. They encapsulate
//! an instance of another type. Although containers can be created by hand, they are most commonly created
//! using the [`classified`] attribute. See the documentation for the attribute to learn how you define your own
//! classified type.
//!
//! Applications use the classified container types around application
//! data types to indicate instances of those types hold sensitive data.
//!
//! # Theory of Operation
//!
//! How this all works:
//!
//! * An application defines its own taxonomy using the [`taxonomy`] macro.
//!
//! * An application defines classified container types using the [`classified`] attribute for each piece of sensitive data it needs to manipulate.
//!
//! * The application uses the classified container types to wrap sensitive data throughout the application. This ensures the
//! sensitive data is not accidentally exposed through telemetry or other means.
//!
//! * On startup, the application initializes a [`RedactionEngine`] via [`RedactionEngine::builder`]. The engine is configured
//! with redactors for each data class in the taxonomy. The redactors define how to handle sensitive data for that class.
//! For example, for a given data class, a redactor may substitute the original data for a hash value, or it may replace it with asterisks.
//!
//! * When it's time to log or otherwise process the sensitive data, the application uses the redaction engine to redact the data.
//!
//! # Examples
//!
//! This example shows how to define a simple taxonomy and a few classified container types, and how to manipulate these
//! container types.
//!
//! ```rust
//! use data_privacy::{classified, RedactionEngine, taxonomy};
//! use data_privacy::simple_redactor::{SimpleRedactor, SimpleRedactorMode};
//!
//! // A simple taxonomy definition for the Contoso organization.
//! #[taxonomy(contoso)]
//! enum ContosoTaxonomy {
//! CustomerContent,
//! CustomerIdentifier,
//! OrganizationIdentifier,
//! }
//!
//! // A classified container for customer names.
//! #[classified(ContosoTaxonomy::CustomerIdentifier)]
//! struct Name(String);
//!
//! // A classified container for customer addresses.
//! #[classified(ContosoTaxonomy::CustomerIdentifier)]
//! struct Address(String);
//!
//! // A classified container for customer content.
//! #[classified(ContosoTaxonomy::CustomerContent)]
//! struct Memo(String);
//!
//! // A customer record which contains a bunch of classified data.
//! #[derive(Debug)]
//! struct Customer {
//! name: Name,
//! address: Address,
//! memo : Memo,
//! }
//!
//! let c = Customer {
//! name: Name("John Doe".to_string()),
//! address: Address("123 Main St, Anytown, USA".to_string()),
//! memo: Memo("Leave packages on the front porch.".to_string()),
//! };
//!
//! // Displaying the customer record will not leak sensitive data because the classified containers protect the data
//! println!("Customer record: {:?}", c);
//!
//! // You can get redacted representations of classified data using a [`RedactionEngine`](redaction_engine::RedactionEngine).
//!
//! // Initialize some redactors
//! let asterisk_redactor = SimpleRedactor::new();
//! let erasing_redactor = SimpleRedactor::with_mode(SimpleRedactorMode::Erase);
//!
//! // Create the redaction engine. This is typically done once when the application starts.
//! let engine = RedactionEngine::builder()
//! .add_class_redactor(ContosoTaxonomy::CustomerIdentifier.data_class(), asterisk_redactor)
//! .set_fallback_redactor(erasing_redactor)
//! .build();
//!
//! let mut output_buffer = String::new();
//! _ = engine.redacted_display(&c.name, &mut output_buffer);
//!
//! // check that the data in the output buffer has indeed been redacted as expected.
//! assert_eq!(output_buffer, "********");
//! ```
// Needed for the `taxonomy` macro to be able to use `data_privacy` instead of `crate` in examples
// Workaround for https://github.com/bkchr/proc-macro-crate/issues/14
extern crate self as data_privacy;
pub use Classified;
pub use ;
pub use ;
pub use ;
pub use RedactionEngine;
pub use RedactionEngineBuilder;
pub use rapidhash_redactor;
pub use xxh3_redactor;
pub use ;
pub use Sensitive;