Rafor is a performance-oriented Random Forest and Decision Trees library.
Classification
Decision tree classifier (rafor::dt::Classifier) and random forest classifier
(rafor::rf::Classifier) expect the labels to be i64. By default classifiers use Gini index for
evaluating the split impurity.
Classifiers provide method predict for predicting a batch of samples, it returns Vec<i64> with
predicted class labels. Method predict_one returns i64 -- a predicted class for a single sample.
To get probabilities distribution, there is a method proba which returns a Vec<f32> of length
num_samples * num_classes where each chunk of num_classes elements contains the probabilities
of classes for a sample. Internally the i64 class labels are mapped into numbers 0, 1, ... of
type u32. To decode classes, Classifier provides method get_decode_table, which returns
&[i64] - a map where index is an internal representation, and a value - i64 class. Also there
is decode method which receives u32 internal label and returns i64 value.
use num_cpus; // Requires num_cpus dependency in Cargo.toml
use *; // Required for .with_option builders and .num_classes().
use Classifier; // Requires num_cpus dependency in Cargo.toml
Regression
Decision tree regressor (rafor::dt::Regressor) and random forest regressor (rafor::rf::Regressor)
expect the targets to be f32. By default regressors use MSE score for evaluating the split
impurity.
Regressor interface is mostly similar to Classifier, please see examples folder.
Dataset
The dataset is a single f32 slice which is processed in chunks of num_features elements,
each chunk is a single sample. During training, num_features is defined as
dataset.len() / targets.len().
Train will panic if dataset.len() is not divisible by targets.len().
Model serialization and deserialization
All models support serde, so any lib that supports serde
can be used for serialization and deserialization.
Below is an example of using bincode.
use File;
use Classifier;
Issues
Please keep in mind that this is a new lib and it may contain bugs. I'll do my best to test it, but still caution is advised.
License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in rafor by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.