K-nearest neighbor algorithm (KNN) for artificial intelligence machine learning

There are three main categories of artificial intelligence machine learning: 1) classification; 2) regression; 3) clustering. Today we focus on the K Nearest Neighbor (KNN) algorithm.

The K-Nearest Neighbor algorithm, also known as the K-nearest neighbor algorithm, was proposed by Cover and Hart in 1968 and is one of the more mature algorithms in machine learning algorithms. The model used by the K-nearest neighbor algorithm actually corresponds to the partitioning of the feature space. The KNN algorithm can be used not only for classification but also for regression.

KNN concept:

The K-nearest neighbor algorithm KNN is given a training data set. For a new input instance, find K instances (K neighbors) closest to the instance in the training data set. Most of the K instances belong to a certain class. Classify the input instance into this class.

If a sample has most of the k most similar samples in the feature space (ie, the nearest neighbor in the feature space) belong to a certain category, then the sample also belongs to this category. The model used by the K-nearest neighbor algorithm actually corresponds to the partitioning of the feature space.

In layman's terms, it is "the things are gathered together, and the people are divided into groups."

The classification strategy is that “a few are subordinate to the majority”.

Algorithm Description:

KNN does not show the training process. During the test, the distance between the test sample and all training samples is calculated. According to the category of the nearest K training samples, the majority is voted for prediction. The specific algorithm is described as follows:

Input: training data set T={(x1, y1), (x2, y2),. . . , (xn, yn)}, where xi∈Rn, yi∈{c1,c2,. . . , cK} and test data x

Output: the category to which instance x belongs

1) According to the given distance metric, the k samples closest to the x distance are found in the training set T, and the neighborhood of x covering the k points is denoted as Nk(x).

2) Determine the category y of x in Nk(x) according to the classification rules (such as majority vote):

main idea:

When it is impossible to determine which of the known classifications the current to-be-categorized point belongs to, according to the statistical theory, the positional feature of the neighborhood is measured, and the weight of the neighbors around it is measured, and it is classified as having a larger weight. In that category.

The input to the kNN is the test data and the training sample data set, and the output is the category of the test sample.

In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. The KNN algorithm determines the category to which the sample to be classified belongs based on only the category of the nearest neighbor or samples.

Algorithm elements:

The KNN algorithm has three basic elements:

1) Selection of K value: The choice of K value will have a significant impact on the results of the algorithm. A small K value means that only training instances that are closer to the input instance will work for the prediction result, but it is prone to overfitting; if the K value is large, the advantage is that the estimation error of learning can be reduced, but the disadvantage is that learning The approximation error increases, and the training instance that is farther away from the input instance will also contribute to the prediction, causing the prediction to have an error. In practical applications, the K value generally chooses a smaller value, and the cross-validation method is usually used to select the optimal K value. As the number of training instances tends to infinity and K=1, the error rate does not exceed twice the Bayesian error rate. If K also tends to infinity, the error rate tends to Bayesian error rate.

2) Distance metric: The distance metric generally adopts the Lp distance. When p=2, it is the Euclidean distance. Before the metric, the value of each attribute should be normalized, which helps to prevent attributes with large initial range. The weight of an attribute with a smaller initial value field is too large.

For text categorization, using cosine to calculate similarity is more appropriate than Euclidean distance.

3) Classification decision rule: The classification decision rule in the algorithm is often a majority vote, that is, the class of the input instance is determined by the majority of the K nearest training instances of the input instance.

Algorithm flow:

1) Prepare the data and preprocess the data.

2) Select the appropriate data structure to store the training data and test tuples.

3) Set parameters such as K.

4) Maintain a priority queue (length K) from large to small to store the nearest neighbor training tuple. The K tuples are randomly selected from the training tuples as the initial nearest neighbor tuples, and the distances of the test tuples to the K tuples are respectively calculated, and the training tuple labels and distances are stored in the priority queue.

5) Traverse the training tuple set, calculate the distance between the current training tuple and the test tuple, and obtain the distance L from the maximum distance Lmax in the priority queue.

6) Compare. If L>=Lmax, the tuple is discarded and the next tuple is traversed. If L < Lmax, the tuple of the largest distance in the priority queue is deleted, and the current training tuple is stored in the priority queue.

7) After the traversal is completed, calculate the majority of the K tuples in the priority queue and use them as the category of the test tuple.

8) Calculate the error rate after the test tuple set is completed, continue to set different K values ​​and re-train, and finally take the K value with the smallest error rate.

Algorithm advantages:

1) KNN is also dependent on the limit theorem in principle, but in class decision making, it is only related to a very small number of adjacent samples.

2) Since the KNN method mainly relies on the surrounding limited samples, rather than relying on the discriminant domain method to determine the category, the KNN method is more common for the sample clusters with more or more overlaps. The method is more suitable.

3) The algorithm itself is simple and effective, high precision, insensitive to outliers, easy to implement, no need to estimate parameters, the classifier does not need to use the training set for training, the training time complexity is 0.

4) The computational complexity of the KNN classification is proportional to the number of documents in the training set, ie if the total number of documents in the training set is n, then the KNN's classification time complexity is O(n).

5) Suitable for classifying rare events.

6) Especially suitable for multi-classification problems (mulTI-modal), objects have multiple category labels, and kNN performs better than SVM.

Algorithm disadvantages:

1) When the sample is unbalanced, the number of samples does not affect the results of the operation.

2) The algorithm has a large amount of calculation;

3) Poor understanding, unable to give rules like decision trees.

Improvement strategy:

Because the KNN algorithm is proposed earlier, with the continuous updating and improvement of other technologies, the KNN algorithm gradually shows many deficiencies, so many improved algorithms of KNN algorithm have emerged. The goal of algorithm improvement is mainly toward the two directions of classification efficiency and classification effect.

Improvement 1: By finding the k nearest neighbors of a sample and assigning the average of the properties of these neighbors to the sample, the properties of the sample can be obtained.

Improvement 2: Give different weights to the influence of the neighbors of different distances on the sample, such as the weight is inversely proportional to the distance (1/d), that is, the neighbor weight is smaller than the sample distance, which is called The weighted adjusted K nearestneighbor (WAKNN) can be adjusted. However, WAKNN will increase the amount of computation, because each text to be classified must calculate its distance to all known samples in order to obtain its K nearest neighbors.

Improvement 3: The known sample points are edited in advance (ediTIng technology), and the condensing technique is used to remove the samples that are not useful for classification. The algorithm is more suitable for automatic classification of class domains with larger sample sizes, and those with smaller sample sizes are more likely to be misclassified by this algorithm.

Considerations:

When implementing the K-nearest neighbor algorithm, the main consideration is how to perform fast K-nearest neighbor search on the training data, which is necessary when the feature space dimension is large and the training data capacity is large.

Application scenario:

The K-nearest neighbor algorithm application scenarios include machine learning, character recognition, text classification, and image recognition.

Conclusion:

The K-nearest neighbor algorithm KNN, also known as the K nearest neighbor algorithm, is an active area of ​​machine learning research. The simplest brute force algorithm is more suitable for small data samples. The model used by the K-nearest neighbor algorithm actually corresponds to the partitioning of the feature space. The KNN algorithm can be used not only for classification but also for regression. KNN algorithm has been widely used in artificial intelligence machine learning, character recognition, text classification, image recognition and other fields.

Connectors System

We cover many types of Connectors for industrial, electrical and automotive, such as IP68 and waterproof connectors, OBD diagnostic connectors, also the standard or custom-designed power connectors for MINI FIT, MICRO FIT, MATE-N-LOCK.


We can support customers to copy some original componenst for local/equivalent connecotors by considering short L/T and competitive price. Also it's workable to offer solutions for unique and customized connectors by overmold tooling with low cost.

Connectors System,Board System Connector,Efi System Injector Connector,Efi System Car Connector

ETOP WIREHARNESS LIMITED , https://www.wireharnessetop.com

Posted on