Module opencv::ml

source ·
Expand description

§Machine Learning

The Machine Learning Library (MLL) is a set of classes and functions for statistical classification, regression, and clustering of data.

Most of the classification and regression algorithms are implemented as C++ classes. As the algorithms have different sets of features (like an ability to handle missing measurements or categorical input variables), there is a little common ground between the classes. This common ground is defined by the class cv::ml::StatModel that all the other ML classes are derived from.

See detailed overview here: [ml_intro].

Modules§

Structs§

  • Artificial Neural Networks - Multi-Layer Perceptrons.
  • Boosted tree classifier derived from DTrees
  • The class represents a single decision tree or a collection of decision trees.
  • The class represents a decision tree node.
  • The class represents split in a decision tree.
  • The class implements the Expectation Maximization algorithm.
  • The class implements K-Nearest Neighbors model
  • Implements Logistic Regression classifier.
  • Bayes classifier for normally distributed data.
  • The structure represents the logarithmic grid range of statmodel parameters.
  • The class implements the random forest predictor.
  • Support Vector Machines.
  • ! Stochastic Gradient Descent SVM classifier
  • Base class for statistical models in OpenCV ML.
  • Class encapsulating training data.

Enums§

Constants§

  • The simulated annealing algorithm. See Kirkpatrick83 for details.
  • The back-propagation algorithm.
  • Gaussian function: inline formula
  • Identity function: inline formula
  • Leaky ReLU function: for x>0 inline formula and x<=0 inline formula
  • Do not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation equal to 1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
  • Do not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
  • ReLU function: inline formula
  • The RPROP algorithm. See RPROP93 for details.
  • Symmetrical sigmoid: inline formula
  • Update the network weights, rather than compute them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
  • Discrete AdaBoost.
  • Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data.
  • LogitBoost. It can produce good regression fits.
  • Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
  • each training sample occupies a column of samples
  • A symmetric positively defined matrix. The number of free parameters in each matrix is about inline formula. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
  • A diagonal matrix with positive diagonal elements. The number of free parameters is d for each matrix. This is most commonly used option yielding good estimation results.
  • A symmetric positively defined matrix. The number of free parameters in each matrix is about inline formula. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
  • A scaled identity matrix inline formula. There is the only parameter inline formula to be estimated for each matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with covMatType=EM::COV_MAT_DIAGONAL.
  • Set MiniBatchSize to a positive integer when using this method.
  • Regularization disabled
  • %L1 norm
  • %L2 norm
  • each training sample is a row of samples
  • Average Stochastic Gradient Descent
  • More accurate for the case of linearly separable sets.
  • Stochastic Gradient Descent
  • General case, suits to the case of non-linearly separable sets, allows outliers.
  • Exponential Chi2 kernel, similar to the RBF kernel: inline formula.
  • Returned by SVM::getKernelType in case when custom kernel has been set
  • C-Support Vector Classification. n-class classification (n inline formula 2), allows imperfect separation of classes with penalty multiplier C for outliers.
  • inline formula-Support Vector Regression. The distance between feature vectors from the training set and the fitting hyper-plane must be less than p. For outliers the penalty multiplier C is used.
  • Histogram intersection kernel. A fast kernel. inline formula.
  • Linear kernel. No mapping is done, linear discrimination (or regression) is done in the original feature space. It is the fastest option. inline formula.
  • inline formula-Support Vector Classification. n-class classification with possible imperfect separation. Parameter inline formula (in the range 0..1, the larger the value, the smoother the decision boundary) is used instead of C.
  • inline formula-Support Vector Regression. inline formula is used instead of p. See LibSVM for details.
  • Distribution Estimation (One-class %SVM). All the training data are from the same class, %SVM builds a boundary that separates the class from the rest of the feature space.
  • Polynomial kernel: inline formula.
  • Radial basis function (RBF), a good choice in most cases. inline formula.
  • Sigmoid kernel: inline formula.
  • makes the method return the raw results (the sum), not the class label
  • categorical variables
  • same as VAR_ORDERED
  • ordered variables

Traits§

Functions§

Type Aliases§