pub struct StochasticControlledGradientDescent { /* private fields */ }
Expand description
Provides Stochastic Controlled Gradient Descent optimization based on 2 papers of Lei-Jordan.
"On the adaptativity of stochastic gradient based optimisation"
arxiv 2019,2020 SCSG-1"Less than a single pass : stochastically controlled stochastic gradient"
arxiv 2019 SCSG-2
According to the first paper we have the following notations:
One iteration j consists in :
- a large batch of size Bⱼ
- a number noted mⱼ of small batches of size bⱼ
- update position with a step ηⱼ. The number of mini batch is described by a random variable with a geometric law.
The paper establishes rates of convergence depending on the ratio mⱼ/Bⱼ , bⱼ/mⱼ and ηⱼ/bⱼ and their products.
The second paper :
“Less than a single pass : stochastically controlled stochastic gradient”
describes a simplified version where the mini batches consist in just one term
and the number of mini batch is set to the mean of the geometric variable corresponding to
number of mini batches.
We adopt a mix of the two papers:
-
Letting the size of mini batch grow a little seems more stable than keeping it to 1. (in particular when initialization of the algorithm varies.) but replacing the geometric law by its mean is really more stable due to the large variance of its law.
-
We choose a fraction of the number of terms in the sum (large_batch_fraction_init) and alfa so that large_batch_fraction_init * alfa^(2*nbiter) = 1.
Then if nbterms is the number of terms in function to minimize and j the iteration number:
- Bⱼ evolves as : large_batch_fraction_init * nbterms * alfa^(2j)
- mⱼ evolves as : m_zero * nbterms * alfa^(3j/2)
- bⱼ evolves as : b_0 * alfa^j
- ηⱼ evolves as : eta_0 / alfa^(j/2)
The evolution of Bⱼ is bounded above by nbterms/10 (can be modified with Self::set_large_batch_max_fraction()) and bⱼ by nbterms/100.
The size of small batch must stay small so b₀ must be small (typically 1 seems OK)
Implementations§
source§impl StochasticControlledGradientDescent
impl StochasticControlledGradientDescent
sourcepub fn new(
eta_zero: f64,
m_zero: f64,
mini_batch_size_init: usize,
large_batch_fraction_init: f64
) -> StochasticControlledGradientDescent
pub fn new( eta_zero: f64, m_zero: f64, mini_batch_size_init: usize, large_batch_fraction_init: f64 ) -> StochasticControlledGradientDescent
args are :
- eta_zero : initial value of step along gradient value of 0.1 is a good default choice.
- m_zero : a good value is 0.2 *large_batch_fraction_init so that mⱼ << Bⱼ
- mini_batch_size_init : base value for size of mini_batchs : a value of 1 is a good default choice
- large_batch_fraction_init : fraction of nbterms to initialize large batch size : a good default value is between 0.01 and 0.02 so that large batch size begins at 0.01 * nbterms or 0.02 * nbterms.
(see examples)
Examples found in repository?
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
fn main() {
init_log();
// check for path image and labels
let image_path = PathBuf::from(String::from(IMAGE_FNAME_STR).clone());
let image_file_res = OpenOptions::new().read(true).open(&image_path);
if image_file_res.is_err() {
println!("could not open image file : {:?}", IMAGE_FNAME_STR);
return;
}
let label_path = PathBuf::from(LABEL_FNAME_STR);
let label_file_res = OpenOptions::new().read(true).open(&label_path);
if label_file_res.is_err() {
println!("could not open label file : {:?}", LABEL_FNAME_STR);
return;
}
//
// load mnist data
//
let mnist_data =
MnistData::new(String::from(IMAGE_FNAME_STR), String::from(LABEL_FNAME_STR)).unwrap();
let images = mnist_data.get_images();
let labels = mnist_data.get_labels();
// nb_images is length of third compoenent of array dimension
let (nb_row, nb_column, nb_images) = images.dim(); // get t-uple from dim method
assert_eq!(nb_images, labels.shape()[0]); // get slice from shape method...
// transform into logisitc regression
let mut observations = Vec::<(Array1<f64>, usize)>::with_capacity(nb_images);
//
for k in 0..nb_images {
let mut image = Array1::<f64>::zeros(1 + nb_row * nb_column);
let mut index = 0;
image[index] = 1.;
index += 1;
for i in 0..nb_row {
for j in 0..nb_column {
image[index] = images[[i, j, k]] as f64 / 256.;
index += 1;
}
} // end of for i
observations.push((image, labels[k] as usize));
} // end of for k
//
let regr_l = LogisticRegression::new(10, observations);
//
// minimize
//
// step, m_0, b_0 , B_0
let scgd_pb = StochasticControlledGradientDescent::new(
0.1, // gradient step at beginning
0.004, // base factor for number of mini batch
1, // base for size of mini batch
0.02,
); // base for large batch size
// allocate and set to 0 an array with 9 rows(each row corresponds to a class, columns are pixels values)
let mut initial_position = Array2::<f64>::zeros((9, 1 + nb_row * nb_column));
// do a bad initializion , fill with 0 is much better!!
initial_position.fill(0.5);
//
let nb_iter = 150;
let solution = scgd_pb.minimize(®r_l, &initial_position, Some(nb_iter));
println!(" solution with minimized value = {:2.4E}", solution.value);
//
// get image of coefficients to see corresponding images.
//
let image_fname = String::from("classe_scsg.img");
for k in 0..9 {
let mut k_image_fname: String = image_fname.clone();
k_image_fname.push_str(&k.to_string());
let image_path = PathBuf::from(k_image_fname.clone());
let image_file_res = OpenOptions::new()
.write(true)
.create(true)
.open(&image_path);
if image_file_res.is_err() {
println!("could not open image file : {:?}", k_image_fname);
return;
}
//
let mut out = io::BufWriter::new(image_file_res.unwrap());
//
// get a f64 slice to write
let f64_array_to_write: &[f64] = solution.position.slice(s![k, ..]).to_slice().unwrap();
let u8_slice = unsafe {
std::slice::from_raw_parts(
f64_array_to_write.as_ptr() as *const u8,
std::mem::size_of::<f64>() * f64_array_to_write.len(),
)
};
out.write_all(u8_slice).unwrap();
out.flush().unwrap();
// out.write(&solution.position.slice(s![k, ..])).unwrap();
}
}
sourcepub fn seed(&mut self, seed: [u8; 32])
pub fn seed(&mut self, seed: [u8; 32])
Seeds the random number generator using the supplied seed
.
This is useful to create re-producable results.
sourcepub fn set_large_batch_max_fraction(&mut self, fraction: f64)
pub fn set_large_batch_max_fraction(&mut self, fraction: f64)
if larger batch size is needed maximum large batch size will be set to: nb_terms * fraction (default for fraction is 0.1)