Expand description
Supervised neighbors-based learning methods
Nearest Neighbors
The k-nearest neighbors (KNN) algorithm is a simple supervised machine learning algorithm that can be used to solve both classification and regression problems. KNN is a non-parametric method that assumes that similar things exist in close proximity.
During training the algorithms memorizes all training samples. To make a prediction it finds a predefined set of training samples closest in distance to the new point and uses labels of found samples to calculate value of new point. The number of samples (k) is defined by user and does not change after training.
The distance can be any metric measure that is defined as \( d(x, y) \geq 0\) and follows three conditions:
- \( d(x, y) = 0 \) if and only \( x = y \), positive definiteness
- \( d(x, y) = d(y, x) \), symmetry
- \( d(x, y) \leq d(x, z) + d(z, y) \), subadditivity or triangle inequality
for all \(x, y, z \in Z \)
Neighbors-based methods are very simple and are known as non-generalizing machine learning methods since they simply remember all of its training data and is prone to overfitting. Despite its disadvantages, nearest neighbors algorithms has been very successful in a large number of applications because of its flexibility and speed.
Advantages
- The algorithm is simple and fast.
- The algorithm is non-parametric: there’s no need to build a model, the algorithm simply stores all training samples in memory.
- The algorithm is versatile. It can be used for classification, regression.
Disadvantages
- The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase.
References:
Modules
- K Nearest Neighbors Classifier
- K Nearest Neighbors Regressor
Enums
- Weight function that is used to determine estimated value.
Type Definitions
- KNNAlgorithmNameDeprecated
KNNAlgorithmName
maintains a list of supported search algorithms, see KNN algorithms