Learn the Rust programming language through implementing classic machine learning algorithms. This project is self-completed without relying on any third-party libraries, serving as a bootstrap machine learning library.
❗❗❗:Actively seeking code reviews and welcome suggestions on fixing bugs or code refactoring. Please feel free to share your ideas. Happy to accept advice!
Basics
- NdArray Module, just as the name. It has implemented
broadcast
,matrix operations
,permute
and etc. in arbitrary dimension. SIMD is used in matrix multiplication thanks to auto vectorizing by Rust. - Dataset Module, supporting customized loading data, re-format,
normalize
,shuffle
andDataloader
. Several popular dataset pre-processing recipes are available.
Algorithms
- Decision Tree, supporting both classification and regression tasks. Info gains like
gini
orentropy
are provided. - Logistic Regression, supporting regularization (
Lasso
,Ridge
andL-inf
) - Linear Regression, same as logistic regression, but for regression tasks.
- Naive Bayes, free to handle discrete or continuous feature values.
- SVM, with linear kernel using SGD and Hinge Loss to optimize.
- nn Module, containing
linear(MLP)
and someactivation
functions which could be freely stacked and optimized by gradient back propagations. - KNN, supporting both
KdTree
and vanillaBruteForceSearch
. - K-Means, clustering data with an unsupervised learning approach
Start
Let's use KNN algorithm to solve a classification task. More examples can be found in examples
directory.
-
create some synthetic data for tests
use HashMap; let features = vec!; let labels = vec!; // so it is a binary classifiction task, 0 is for the large label, 1 is for the small label let mut label_map = new; label_map.insert; label_map.insert;
-
convert the data to the
dataset
use Dataset; let dataset = new;
-
split the dataset into
train
andvalid
sets and normalize them by Standard normalizationlet mut temp = dataset.split_dataset; // [2.0, 1.0] is the split fraction, 0 is the seed let = ; use ; normalize_dataset; normalize_dataset;
-
build and train our KNN model using
KdTree
use ; // KdTree is one implementation of KNN; 1 defines the k of neighbours; Weighting decides the way of ensemble prediction; train_dataset is for training KNN; Some(2) is the param of minkowski distance let model = new;
-
evaluate the model
use evaluate; let = evaluate; println!;
Todo
- model weights serialization for saving and loading
- Boosting/bagging
- matrix multiplication with multi threads
- refactor codes, sincerely request for comments from senior developers
Reference
- scikit-learn
- The book, 机器学习西瓜书 by Prof. Zhihua Zhou
Thanks
The rust community. I received many help from rust-lang Discord.
License
Under GPL-v3 license. And commercial use is strictly prohibited.