[−][src]Crate linfa_svm
Support Vector Machines
Support Vector Machines are one major branch of machine learning models and offer classification or regression analysis of labeled datasets. They seek a discriminant, which seperates the data in an optimal way, e.g. have the fewest numbers of miss-classifications and maximizes the margin between positive and negative classes. A support vector contributes to the discriminant and is therefore important for the classification/regression task. The balance between the number of support vectors and model performance can be controlled with hyperparameters.
More details can be found here
Available parameters in Classification and Regression
For supervised classification tasks the C or Nu values are used to control this balance. In fit_c the C value controls the penalty given to missclassification and should be in the interval (0, inf). In fit_nu the Nu value controls the number of support vectors and should be in the interval (0, 1].
For supervised classification with just one class of data a special classifier is available in fit_one_class. It also accepts a Nu value.
For support vector regression two flavors are available. With fit_epsilon a regression task is learned while minimizing deviation larger than epsilon. In fit_nu the parameter epsilon is replaced with Nu again and should be in the interval (0, 1]
Kernel Methods
Normally the resulting discriminant is linear, but with Kernel Methods non-linear relations between the input features can be learned in order improve the performance of the model.
For example to transform a dataset into a sparse RBF kernel with 10 non-zero distances you can
use linfa_kernel
:
use linfa_kernel::Kernel; let dataset = ...; let kernel = Kernel::gaussian_sparse(&dataset, 10);
The solver
This implementation uses Sequential Minimal Optimization, a widely used optimization tool for convex problems. It selects in each optimization step two variables and updates the variables. In each step it performs:
- Find a variable, which violates the KKT conditions for the optimization problem
- Pick a second variables and crate a pair (a1, a2)
- Optimize the pair (a1, a2)
After a couple of iterations the solution may be optimal.
Example
The wine quality data consists of 11 features, like "acid", "sugar", "sulfur dioxide", and groups the quality into worst 3 to best 8. These are unified to good 8-7 and bad 3-6 to get a binary classification task.
With an RBF kernel and C-Support Vector Classification an accuracy of 0.988% is reached within 2911 iterations and 1248 support vectors. You can find the example here.
Fit SVM classifier with #1439 training points Exited after 2911 iterations with obj = -248.51510322468084 and 1248 support vectors classes | bad | good bad | 1228 | 17 good | 0 | 194 accuracy 0.98818624, MCC 0.9523008
Re-exports
pub use solver_smo::SolverParams; |
Modules
SVClassify | Support Vector Classification |
SVRegress | Support Vector Regression |
solver_smo |
Structs
Svm | The result of the SMO solver |
SvmParams |
Enums
ExitReason | SMO can either exit because a threshold is reached or the iterations are maxed out |