[ad_1]

→ KNN- K Nearest Neighbors is one of the simplest Supervised Machine Learning algorithm, mostly used for classification. It classifies a data point based on how it’s neighbors are classified.

→ KNN stores all available cases and classifies new cases based on a similarity measure.

→ K in KNN is a parameter that refers to the number of nearest neighbors to include in the majority voting process.

**How do we choose ‘K’ ?**

→ KNN algorithm is based on feature similarity.

→ Choosing the right value of ‘K’ is a process called parameter tuning and it is important for better accuracy.

Example :- Have you ever feel yourself alone in a rush? But actually there’s a lot of people in same market who feels alone, then you feel “ Bheer me tanhai ka sath h”, now that’s a joke. Let’s understand by an image example.

Now there’s a question, what is in the circle? A Triangle or A Rectangle?

So, at K=3 , we can classify ‘?’ as ▯.

and if we increase diameter, so , at K=7, we can classify ‘?’ as △.

**Now, the question is, How do we choose the factor of ‘K’?**

→ The class of unknown data point was ▯ at k=3, but changed at k =7, so which k should we choose?

- To choose a value of ‘k’?

→ *Sqrt(n), where n is the total number of data points.*

→ *Odd value of ‘k’ is selected to avoid confusion between two classes of data.*

{The above two are most important points, please read carefully.}

**How does KNN algorithm work?**

→ Consider a dataset having some variables; let’s look at that first.

we will just focus on the no. of played matches and runs column so we will play here with ‘Mat’ and ‘Runs’, on that basis we will decide the cricketer is professional or noob?.

Now, On the basis of the given data we have to classify the below set as a Professional Player or Normal Player.

Now we will just focus on ‘Mat — 89’ & ‘Runs — 12,344’.

→ To find the nearest neighbors, we will calculate Euclidean Distance.

Euclidean Distance?, what it is?

- According to the Euclidean Distance formula, the distance between two points in the plane with coordinates (x,y) and (a,b) is given by :

Let’s calculate it to understand clearly :

dist(d1) = root{(12344–15921)² + (89–200)²} = 3578.721

similarly, calculate every distance with every point , now if we calculate with every point there will be a column in Euclidean Distance like below.

Now, how to choose ‘k’?

The question is still the same. Here is the answer;

so, majority neighbors are pointing toward ‘Professional’.

and if we look at the runs it matches with Professionals Score(runs).

Hence, as per KNN algorithm the class of (89, 12344) should be professional, so, Virat Kohli is a professional player.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

**Summarizing :-**

- A positive integer k is specified, along with a new sample.
- We select the ‘k’ entries in our database which are closest to the new sample.
- We find the most common classification of these entries.
- This is the classification we give to the new sample

Here is the link of implementation of KNN.

**Diabetes Prediction using KNN algorithm.**

Link → https://colab.research.google.com/drive/1RphoDIVYmL0x2ooSt64oNRH34fi_gBGU?usp=sharing

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

If there’s something wrong please tell me I’ll be Happy to Learn.

My LinkedIn account waiting for you.

Thank You!

[ad_2]

Source link