Module smartcore::neighbors

source ·
Expand description

Supervised neighbors-based learning methods

Nearest Neighbors

The k-nearest neighbors (KNN) algorithm is a simple supervised machine learning algorithm that can be used to solve both classification and regression problems. KNN is a non-parametric method that assumes that similar things exist in close proximity.

During training the algorithms memorizes all training samples. To make a prediction it finds a predefined set of training samples closest in distance to the new point and uses labels of found samples to calculate value of new point. The number of samples (k) is defined by user and does not change after training.

The distance can be any metric measure that is defined as \( d(x, y) \geq 0\) and follows three conditions:

  1. \( d(x, y) = 0 \) if and only \( x = y \), positive definiteness
  2. \( d(x, y) = d(y, x) \), symmetry
  3. \( d(x, y) \leq d(x, z) + d(z, y) \), subadditivity or triangle inequality

for all \(x, y, z \in Z \)

Neighbors-based methods are very simple and are known as non-generalizing machine learning methods since they simply remember all of its training data and is prone to overfitting. Despite its disadvantages, nearest neighbors algorithms has been very successful in a large number of applications because of its flexibility and speed.

Advantages

  • The algorithm is simple and fast.
  • The algorithm is non-parametric: there’s no need to build a model, the algorithm simply stores all training samples in memory.
  • The algorithm is versatile. It can be used for classification, regression.

Disadvantages

  • The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase.

References:

Modules

Enums

Type Definitions