# [−][src]Crate linfa_svm

Support Vector Machines

Support Vector Machines are one major branch of machine learning models and offer classification or regression analysis of labeled datasets. They seek a discriminant, which seperates the data in an optimal way, e.g. have the fewest numbers of miss-classifications and maximizes the margin between positive and negative classes. A support vector contributes to the discriminant and is therefore important for the classification/regression task. The balance between the number of support vectors and model performance can be controlled with hyperparameters.

More details can be found here

## Available parameters in Classification and Regression

For supervised classification tasks the C or Nu values are used to control this balance. In fit_c the C value controls the penalty given to missclassification and should be in the interval (0, inf). In fit_nu the Nu value controls the number of support vectors and should be in the interval (0, 1].

For supervised classification with just one class of data a special classifier is available in fit_one_class. It also accepts a Nu value.

For support vector regression two flavors are available. With fit_epsilon a regression task is learned while minimizing deviation larger than epsilon. In fit_nu the parameter epsilon is replaced with Nu again and should be in the interval (0, 1]

## Kernel Methods

Normally the resulting discriminant is linear, but with Kernel Methods non-linear relations between the input features can be learned in order improve the performance of the model.

For example to transform a dataset into a sparse RBF kernel with 10 non-zero distances you can
use `linfa_kernel`

:

use linfa_kernel::Kernel; let dataset = ...; let kernel = Kernel::gaussian_sparse(&dataset, 10);

# The solver

This implementation uses Sequential Minimal Optimization, a widely used optimization tool for convex problems. It selects in each optimization step two variables and updates the variables. In each step it performs:

- Find a variable, which violates the KKT conditions for the optimization problem
- Pick a second variables and crate a pair (a1, a2)
- Optimize the pair (a1, a2)

After a couple of iterations the solution may be optimal.

# Example

The wine quality data consists of 11 features, like "acid", "sugar", "sulfur dioxide", and groups the quality into worst 3 to best 8. These are unified to good 8-7 and bad 3-6 to get a binary classification task.

With an RBF kernel and C-Support Vector Classification an accuracy of 0.988% is reached within 2911 iterations and 1248 support vectors. You can find the example here.

Fit SVM classifier with #1439 training points Exited after 2911 iterations with obj = -248.51510322468084 and 1248 support vectors classes | bad | good bad | 1228 | 17 good | 0 | 194 accuracy 0.98818624, MCC 0.9523008

## Re-exports

`pub use solver_smo::SolverParams;` |

## Modules

SVClassify | Support Vector Classification |

SVRegress | Support Vector Regression |

solver_smo |

## Structs

Svm | The result of the SMO solver |

SvmParams |

## Enums

ExitReason | SMO can either exit because a threshold is reached or the iterations are maxed out |