Module smartcore::neighbors

Expand description

Supervised neighbors-based learning methods

Nearest Neighbors

The k-nearest neighbors (KNN) algorithm is a simple supervised machine learning algorithm that can be used to solve both classification and regression problems. KNN is a non-parametric method that assumes that similar things exist in close proximity.

During training the algorithms memorizes all training samples. To make a prediction it finds a predefined set of training samples closest in distance to the new point and uses labels of found samples to calculate value of new point. The number of samples (k) is defined by user and does not change after training.

The distance can be any metric measure that is defined as \( d(x, y) \geq 0\) and follows three conditions:

\( d(x, y) = 0 \) if and only \( x = y \), positive definiteness
\( d(x, y) = d(y, x) \), symmetry
\( d(x, y) \leq d(x, z) + d(z, y) \), subadditivity or triangle inequality

for all \(x, y, z \in Z \)

Neighbors-based methods are very simple and are known as non-generalizing machine learning methods since they simply remember all of its training data and is prone to overfitting. Despite its disadvantages, nearest neighbors algorithms has been very successful in a large number of applications because of its flexibility and speed.

Advantages

The algorithm is simple and fast.
The algorithm is non-parametric: there’s no need to build a model, the algorithm simply stores all training samples in memory.
The algorithm is versatile. It can be used for classification, regression.

Disadvantages

The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase.

References:

Modules

knn_classifier
K Nearest Neighbors Classifier
knn_regressor
K Nearest Neighbors Regressor

Enums

KNNWeightFunction
Weight function that is used to determine estimated value.

Type Definitions

KNNAlgorithmNameDeprecated
KNNAlgorithmName maintains a list of supported search algorithms, see KNN algorithms