1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
//! Implementations of various optimization algorithms and penalty regularizations.
//!
//! Some of the most commonly used methods are already supported, and the interface is linear
//! enough, so that more sophisticated ones can be also easily integrated in the future. The
//! complete list can be found [here](#algorithms).
//!
//! An optimizer holds a state, in the form of a *representation*, for each of the parameters to
//! optimize and it updates them accordingly to both their gradient and the state itself.
//!
//! # Using an optimizer
//!
//! The first step to be performed in order to use any optimizer is to construct it.
//!
//! ## Constructing it
//!
//! To construct an optimizer you have to pass it a vector of [`Param`](struct@Param) referring to
//! the parameters you whish to optimize. Depending on the kind of optimizer you may also need to
//! pass several optimizer-specific setting such as the learning rate, the momentum, etc.
//!
//! The optimization algorithms provided by neuronika are designed to work both with variables and
//! neural networks.
//!
//! ```
//! # use neuronika::Param;
//! # use neuronika::nn::{ModelStatus, Linear, Learnable};
//! # struct NeuralNetwork {
//! # lin1: Linear,
//! # lin2: Linear,
//! # lin3: Linear,
//! # status: ModelStatus,
//! # }
//! # impl NeuralNetwork {
//! # // Basic constructor.
//! # fn new() -> Self {
//! # let mut status = ModelStatus::default();
//! #
//! # Self {
//! # lin1: status.register(Linear::new(25, 30)),
//! # lin2: status.register(Linear::new(30, 35)),
//! # lin3: status.register(Linear::new(35, 5)),
//! # status,
//! # }
//! # }
//! #
//! # fn parameters(&self) -> Vec<Param> {
//! # self.status.parameters()
//! # }
//! # }
//! use neuronika;
//! use neuronika::optim::{SGD, Adam, L1, L2};
//!
//! let p = neuronika::rand(5).requires_grad();
//! let q = neuronika::rand(5).requires_grad();
//! let x = neuronika::rand(5);
//!
//! let y = p * x + q;
//! let optim = SGD::new(y.parameters(), 0.01, L1::new(0.05));
//!
//! let model = NeuralNetwork::new();
//! let model_optim = Adam::new(model.parameters(), 0.01, (0.9, 0.999), L2::new(0.01), 1e-8);
//! ```
//!
//! ## Taking an optimization step
//!
//! All neuronika's optimizer implement a [`.step()`](Optimizer::step()) method that updates the
//! parameters.
//!
//! # Implementing an optimizer
//!
//! Implementing an optimizer in neuronika is quick and simple. The procedure consists in *3* steps:
//!
//! 1. Define its parameter's representation struct and specify how to build it from
//! [`Param`](crate::Param).
//!
//! 2. Define its struct.
//!
//! 3. Implement the [`Optimizer`] trait.
//!
//! Let's go through them by implementing the classic version of the stochastic gradient descent.
//!
//! Firstly, we define the SGD parameter's struct and the conversion from `Param`.
//!
//! ```
//! use neuronika::Param;
//! use ndarray::{ArrayD, ArrayViewMutD};
//!
//! struct SGDParam<'a> {
//! data: ArrayViewMutD<'a, f32>,
//! grad: ArrayViewMutD<'a, f32>,
//! }
//!
//! impl<'a> From<Param<'a>> for SGDParam<'a> {
//! fn from(param: Param<'a>) -> Self {
//! let Param { data, grad } = param;
//! Self { data, grad }
//! }
//! }
//! ```
//!
//! Being a basic optimizer, the `SGDParam` struct will only contain the gradient and the data views
//! for each of the parameters to optimize.
//!
//! Nevertheless, do note that an optimizer's parameter representation acts as a container for the
//! additional information, such as adaptive learning rates and moments of any kind, that may be
//! needed for the learning steps of more complex algorithms.
//!
//! Then, we define the SGD's struct.
//!
//! ```
//! use neuronika::Param;
//! use neuronika::optim::Penalty;
//! use std::cell::{Cell, RefCell};
//!
//! # use ndarray::{ArrayD, ArrayViewMutD};
//! # struct SGDParam<'a> {
//! # data: ArrayViewMutD<'a, f32>,
//! # grad: ArrayViewMutD<'a, f32>,
//! # }
//! struct SGD<'a, T> {
//! params: RefCell<Vec<SGDParam<'a>>>,
//! lr: Cell<f32>,
//! penalty: T,
//! }
//! ```
//!
//! Lastly, we implement [`Optimizer`] for `SGD`.
//!
//! ```
//! use ndarray::Zip;
//! use neuronika::optim::Optimizer;
//! use rayon::iter::{IntoParallelRefMutIterator, ParallelIterator};
//! # use neuronika::Param;
//! # use neuronika::optim::Penalty;
//! # use ndarray::{ArrayD, ArrayViewMutD};
//! # use std::cell::{Cell, RefCell};
//! # struct SGD<'a, T> {
//! # params: RefCell<Vec<SGDParam<'a>>>,
//! # lr: Cell<f32>,
//! # penalty: T,
//! # }
//! # struct SGDParam<'a> {
//! # data: ArrayViewMutD<'a, f32>,
//! # grad: ArrayViewMutD<'a, f32>,
//! # }
//! # impl<'a> From<Param<'a>> for SGDParam<'a> {
//! # fn from(param: Param<'a>) -> Self {
//! # let Param { data, grad } = param;
//! # Self { data, grad }
//! # }
//! # }
//!
//! impl<'a, T: Penalty> Optimizer<'a> for SGD<'a, T> {
//! type ParamRepr = SGDParam<'a>;
//!
//! fn step(&self) {
//! let (lr, penalty) = (self.lr.get(), &self.penalty);
//!
//! self.params.borrow_mut().par_iter_mut().for_each(|param| {
//! let (data, grad) = (&mut param.data, ¶m.grad);
//!
//! Zip::from(data).and(grad).for_each(|data_el, grad_el| {
//! *data_el += -(grad_el + penalty.penalize(data_el)) * lr
//! });
//! });
//! }
//!
//! fn zero_grad(&self) {
//! self.params.borrow_mut().par_iter_mut().for_each(|param| {
//! let grad = &mut param.grad;
//! Zip::from(grad).for_each(|grad_el| *grad_el = 0.);
//! });
//! }
//!
//! fn get_lr(&self) -> f32 {
//! self.lr.get()
//! }
//!
//! fn set_lr(&self, lr: f32) {
//! self.lr.set(lr)
//! }
//! }
//!
//! /// Simple constructor.
//! impl<'a, T: Penalty> SGD<'a, T> {
//! pub fn new(parameters: Vec<Param<'a>>, lr: f32, penalty: T) -> Self {
//! Self {
//! params: RefCell::new(Self::build_params(parameters)),
//! lr: Cell::new(lr),
//! penalty,
//! }
//! }
//! }
//! ```
//!
//! # Adjusting the learning rate
//!
//! The [`lr_scheduler`] module provides several methods to adjust the learning rate based on the
//! number of epochs.
//!
//! # Algorithms
//!
//! List of all implemented optimizers.
//!
//! * [`Adagrad`] - Implements the Adagrad algorithm.
//!
//! * [`Adam`] - Implements the Adam algorithm.
//!
//! * [`AMSGrad`] - Implements the AMSGrad algorithm.
//!
//! * [`RMSProp`] - Implements the RMSProp algorithm.
//!
//! * [`SGD`] - Implements the stochastic gradient descent algorithm.
use crateParam;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Optimizer Trait ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/// Optimizer trait, defines the optimizer's logic.
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Penalty Trait ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/// Penalty trait, defines the penalty regularization's logic.
/// L2 penalty, also known as *weight decay* or *Tichonov regularization*.
/// L1 penalty.
/// ElasticNet regularization, linearly combines the *L1* and *L2* penalties.
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Penalty Trait Implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Optimizers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Learning Rate Schedulers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~