1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
//! A fast, extensible probabilistic cross-categorization engine.
//!
//!
//! Lace is a probabilistic cross-categorization engine written in rust with an
//! optional interface to python. Unlike traditional machine learning methods, which
//! learn some function mapping inputs to outputs, Lace learns a joint probability
//! distribution over your dataset, which enables users to...
//!
//! - predict or compute likelihoods of any number of features conditioned on any
//! number of other features
//! - identify, quantify, and attribute uncertainty from variance in the data,
//! epistemic uncertainty in the model, and missing features
//! - determine which variables are predictive of which others
//! - determine which records/rows are similar to which others on the whole or
//! given a specific context
//! - simulate and manipulate synthetic data
//! - work natively with missing data and make inferences about missingness
//! (missing not-at-random)
//! - work with continuous and categorical data natively, without transformation
//! - identify anomalies, errors, and inconsistencies within the data
//! - edit, backfill, and append data without retraining
//!
//! and more, all in one place, without any explicit model building.
//!
//!
//! # Design
//! Lace learns a probabilistic model of tabular data using cross-categorization.
//! The general steps to operation are
//!
//! * Create a [`prelude::Codebook`] which describes your data. One can be
//! autogenerated but it is best to check it before use.
//! * Create an [`prelude::Engine`] with your data and codebook.
//! * Train the [`prelude::Engine`] and monitor the model likelihood for
//! convergence.
//! * Ask questions via the [`prelude::OracleT`] implementation of [`prelude::Engine`] to explore your data.
//!
//!
//! # Example
//!
//! (For a complete tutorial, see the [Lace Book](https://TODO))
//!
//! The following example uses the pre-trained `animals` example dataset.
//! Each row represents an animal and each column represents a feature of that
//! animal.
//! The feature is present if the cell value is 1 and is absent if the value is 0.
//!
//! First, we create an oracle and import some `enum`s that allow us to call
//! out some of the row and column indices in plain English.
//!
//! ```rust
//! use lace::prelude::*;
//! use lace::examples::Example;
//!
//! let oracle = Example::Animals.oracle().unwrap();
//! ```
//! Let's ask about the statistical dependence between whether something swims
//! and is fast or has flippers. We expect that something swimming is more
//! indicative of whether it swims than whether something is fast, therefore we
//! expect the dependence between swims and flippers to be higher.
//!
//! ```rust
//! # use lace::prelude::*;
//! # use lace::examples::Example;
//! # let oracle = Example::Animals.oracle().unwrap();
//! let depprob_fast = oracle.depprob(
//! "swims",
//! "fast",
//! ).unwrap();
//!
//! let depprob_flippers = oracle.depprob(
//! "swims",
//! "flippers",
//! ).unwrap();
//!
//! assert!(depprob_flippers > depprob_fast);
//! ```
//!
//! We have the same expectation of mutual information. Mutual information
//! requires more input from the user. We need to know what type of mutual
//! information, and how many samples to take if we need to estimate the mutual
//! information.
//!
//! ```rust
//! # use lace::prelude::*;
//! # use lace::examples::Example;
//! # let oracle = Example::Animals.oracle().unwrap();
//! let mut rng = rand::rng();
//!
//! let mi_fast = oracle.mi(
//! "swims",
//! "fast",
//! 1000,
//! MiType::Iqr,
//! ).unwrap();
//!
//! let mi_flippers = oracle.mi(
//! "swims",
//! "flippers",
//! 1000,
//! MiType::Iqr,
//! ).unwrap();
//!
//! assert!(mi_flippers > mi_fast);
//! ```
//!
//! We can likewise ask about the similarity between rows -- in this case,
//! animals.
//!
//! ```
//! # use lace::prelude::*;
//! # use lace::examples::Example;
//! # let oracle = Example::Animals.oracle().unwrap();
//! let wrt: Option<&[usize]> = None;
//! let rowsim_wolf = oracle.rowsim(
//! "wolf",
//! "chihuahua",
//! wrt,
//! RowSimilarityVariant::ViewWeighted,
//! ).unwrap();
//!
//! let rowsim_rat = oracle.rowsim(
//! "rat",
//! "chihuahua",
//! wrt,
//! RowSimilarityVariant::ViewWeighted,
//! ).unwrap();
//!
//! assert!(rowsim_rat > rowsim_wolf);
//! ```
//!
//! And we can add context to similarity.
//!
//! ```
//! # use lace::prelude::*;
//! # use lace::examples::Example;
//! # let oracle = Example::Animals.oracle().unwrap();
//! let context = vec!["swims"];
//! let rowsim_otter = oracle.rowsim(
//! "beaver",
//! "otter",
//! Some(&context),
//! RowSimilarityVariant::ViewWeighted,
//! ).unwrap();
//!
//! let rowsim_dolphin = oracle.rowsim(
//! "beaver",
//! "dolphin",
//! Some(&context),
//! RowSimilarityVariant::ViewWeighted,
//! ).unwrap();
//! ```
//!
//! # Feature flags
//! - `formats`: create `Engine`s and `Codebook`s from IPC, CSV, JSON, and
//! Parquet data files
//! - `bencher`: Build benchmarking utilities
//! - `ctrc_handler`: enables and update handler than captures Ctrl+C
//!
pub use EngineUpdateConfig;
pub use *;
pub use update_handler;
pub use AppendStrategy;
pub use BuildEngineError;
pub use ConditionalEntropyType;
pub use DatalessOracle;
pub use Engine;
pub use EngineBuilder;
pub use Given;
pub use HasData;
pub use HasStates;
pub use InsertDataActions;
pub use InsertMode;
pub use Metadata;
pub use MiComponents;
pub use MiType;
pub use Oracle;
pub use OracleT;
pub use OverwriteMode;
pub use Row;
pub use RowSimilarityVariant;
pub use SupportExtension;
pub use Value;
pub use WriteMode;
use Debug;
use Serialize;
;
pub use FType;
pub use StateDiagnostics;
pub use StateTransition;
pub use Category;
pub use Datum;
pub use SummaryStatistics;
pub use rv;