pub struct StochasticControlledGradientDescent { /* private fields */ }
Expand description

Provides Stochastic Controlled Gradient Descent optimization based on 2 papers of Lei-Jordan.

  • "On the adaptativity of stochastic gradient based optimisation" arxiv 2019,2020 SCSG-1
  • "Less than a single pass : stochastically controlled stochastic gradient" arxiv 2019 SCSG-2

According to the first paper we have the following notations:

One iteration j consists in :

  • a large batch of size Bⱼ
  • a number noted mⱼ of small batches of size bⱼ
  • update position with a step ηⱼ. The number of mini batch is described by a random variable with a geometric law.

The paper establishes rates of convergence depending on the ratio mⱼ/Bⱼ , bⱼ/mⱼ and ηⱼ/bⱼ and their products.

The second paper :
“Less than a single pass : stochastically controlled stochastic gradient”
describes a simplified version where the mini batches consist in just one term and the number of mini batch is set to the mean of the geometric variable corresponding to number of mini batches.

We adopt a mix of the two papers:

  • Letting the size of mini batch grow a little seems more stable than keeping it to 1. (in particular when initialization of the algorithm varies.) but replacing the geometric law by its mean is really more stable due to the large variance of its law.

  • We choose a fraction of the number of terms in the sum (large_batch_fraction_init) and alfa so that large_batch_fraction_init * alfa^(2*nbiter) = 1.

Then if nbterms is the number of terms in function to minimize and j the iteration number:

  • Bⱼ evolves as : large_batch_fraction_init * nbterms * alfa^(2j)
  • mⱼ evolves as : m_zero * nbterms * alfa^(3j/2)
  • bⱼ evolves as : b_0 * alfa^j
  • ηⱼ evolves as : eta_0 / alfa^(j/2)

The evolution of Bⱼ is bounded above by nbterms/10 (can be modified with Self::set_large_batch_max_fraction()) and bⱼ by nbterms/100.
The size of small batch must stay small so b₀ must be small (typically 1 seems OK)

Implementations§

source§

impl StochasticControlledGradientDescent

source

pub fn new( eta_zero: f64, m_zero: f64, mini_batch_size_init: usize, large_batch_fraction_init: f64 ) -> StochasticControlledGradientDescent

args are :

  • eta_zero : initial value of step along gradient value of 0.1 is a good default choice.
  • m_zero : a good value is 0.2 *large_batch_fraction_init so that mⱼ << Bⱼ
  • mini_batch_size_init : base value for size of mini_batchs : a value of 1 is a good default choice
  • large_batch_fraction_init : fraction of nbterms to initialize large batch size : a good default value is between 0.01 and 0.02 so that large batch size begins at 0.01 * nbterms or 0.02 * nbterms.

(see examples)

Examples found in repository?
examples/mnist_logistic_scsg.rs (lines 80-85)
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
fn main() {
    init_log();
    // check for path image and labels
    let image_path = PathBuf::from(String::from(IMAGE_FNAME_STR).clone());
    let image_file_res = OpenOptions::new().read(true).open(&image_path);
    if image_file_res.is_err() {
        println!("could not open image file : {:?}", IMAGE_FNAME_STR);
        return;
    }
    let label_path = PathBuf::from(LABEL_FNAME_STR);
    let label_file_res = OpenOptions::new().read(true).open(&label_path);
    if label_file_res.is_err() {
        println!("could not open label file : {:?}", LABEL_FNAME_STR);
        return;
    }
    //
    // load mnist data
    //
    let mnist_data =
        MnistData::new(String::from(IMAGE_FNAME_STR), String::from(LABEL_FNAME_STR)).unwrap();
    let images = mnist_data.get_images();
    let labels = mnist_data.get_labels();
    // nb_images is length of third compoenent of array dimension
    let (nb_row, nb_column, nb_images) = images.dim(); // get t-uple from dim method
    assert_eq!(nb_images, labels.shape()[0]); // get slice from shape method...
                                              // transform into logisitc regression
    let mut observations = Vec::<(Array1<f64>, usize)>::with_capacity(nb_images);
    //
    for k in 0..nb_images {
        let mut image = Array1::<f64>::zeros(1 + nb_row * nb_column);
        let mut index = 0;
        image[index] = 1.;
        index += 1;
        for i in 0..nb_row {
            for j in 0..nb_column {
                image[index] = images[[i, j, k]] as f64 / 256.;
                index += 1;
            }
        } // end of for i
        observations.push((image, labels[k] as usize));
    } // end of for k
      //
    let regr_l = LogisticRegression::new(10, observations);
    //
    // minimize
    //
    // step, m_0, b_0 , B_0
    let scgd_pb = StochasticControlledGradientDescent::new(
        0.1,   // gradient step at beginning
        0.004, // base factor for number of mini batch
        1,     // base for size of mini batch
        0.02,
    ); // base for large batch size
       // allocate and set to 0 an array with 9 rows(each row corresponds to a class, columns are pixels values)
    let mut initial_position = Array2::<f64>::zeros((9, 1 + nb_row * nb_column));
    // do a bad initializion , fill with 0 is much better!!
    initial_position.fill(0.5);
    //
    let nb_iter = 150;
    let solution = scgd_pb.minimize(&regr_l, &initial_position, Some(nb_iter));
    println!(" solution with minimized value = {:2.4E}", solution.value);
    //
    // get image of coefficients to see corresponding images.
    //
    let image_fname = String::from("classe_scsg.img");
    for k in 0..9 {
        let mut k_image_fname: String = image_fname.clone();
        k_image_fname.push_str(&k.to_string());
        let image_path = PathBuf::from(k_image_fname.clone());
        let image_file_res = OpenOptions::new()
            .write(true)
            .create(true)
            .open(&image_path);
        if image_file_res.is_err() {
            println!("could not open image file : {:?}", k_image_fname);
            return;
        }
        //
        let mut out = io::BufWriter::new(image_file_res.unwrap());
        //
        // get a f64 slice to write
        let f64_array_to_write: &[f64] = solution.position.slice(s![k, ..]).to_slice().unwrap();
        let u8_slice = unsafe {
            std::slice::from_raw_parts(
                f64_array_to_write.as_ptr() as *const u8,
                std::mem::size_of::<f64>() * f64_array_to_write.len(),
            )
        };
        out.write_all(u8_slice).unwrap();
        out.flush().unwrap();
        //   out.write(&solution.position.slice(s![k, ..])).unwrap();
    }
}
source

pub fn seed(&mut self, seed: [u8; 32])

Seeds the random number generator using the supplied seed. This is useful to create re-producable results.

source

pub fn set_large_batch_max_fraction(&mut self, fraction: f64)

if larger batch size is needed maximum large batch size will be set to: nb_terms * fraction (default for fraction is 0.1)

Trait Implementations§

source§

impl Default for StochasticControlledGradientDescent

source§

fn default() -> Self

Returns the “default value” for a type. Read more
source§

impl<D: Dimension, F: SummationC1<D>> Minimizer<D, F, usize> for StochasticControlledGradientDescent

§

type Solution = Solution<D>

Type of the solution the Minimizer returns.
source§

fn minimize( &self, function: &F, initial_position: &Array<f64, D>, max_iterations: Option<usize> ) -> Solution<D>

Performs the actual minimization and returns a solution. MinimizerArg should provide a number of iterations, a min error , or anything needed for implemented algorithm

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V