Here are another 10 terms (61–70) in 5 minutes to give you an overview of the data world from Data Science to Data Analytics to Data Engineering:
If you’ve decided that your problem is a supervised problem, you’ll then need to decide if you want to solve it using regression or classification.
You use regression if your target variable is numeric and you use classification otherwise.
For example; if you want to predict incomes based on a series of factors, you’d be solving a supervised regression problem because incomes are numeric.
62. Support Vector Machines
This is a support vector machine algorithm.
One of the most common algorithms in machine learning and works by trying to find the biggest gap between various groups inside a data set.
If your target variable is not numeric then you have a supervised classification problem.
When working with classification problems, you have a couple of ways you can classify an object.
64. Binary Classification
Binary classification is just classifying if an object is one thing or another based on some data.
Such as a hot dog or not a hot dog.
65. Multi-label Classification
Multi-label classification is another type of classification where we can classify each instance with multiple labels.
This could be something like predicting the make and model of a car bought by individuals based on a bunch of information we know about their other buying habits.
66. Logistic Regression
The confusingly named logistic regression is just one type of supervised multi-label classification algorithm.
67. Unsupervised Algorithms
Unsupervised algorithms won’t try and match your data to a target variable.
Instead, they’ll try and find their own patterns in the data.
This is useful if you want to cluster your data.
Clustering is an unsupervised technique where your algorithm will try to cluster observations into groups based on how related instances are to one another.
Let’s say we have a table of people’s transaction data for a certain store.
A clustering algorithm can create groups of customers with similar purchasing habits.
You can then label it with archetypes to better target certain particularly desirable customers.
K-NN refers to an algorithm called K-Nearest Neighbors.
Basically, you define what K should be and the algorithm will create centers in your data and group the nearest K values to each other.
Although easiest to illustrate with two-dimensional data this technique works with data of all dimensions.
70. Reinforcement Learning
Reinforcement Learning is a very special type of machine learning algorithm.
It tries to teach by setting some desired outcome and rewarding the algorithm when that outcome is reached and punishing it if the outcome isn’t reached.
For a particularly interesting example check out this video by Two Minute Papers where a reinforcement learning algorithm is used to train bots to play hide and seek.
As they let the algorithm run for longer the results start to get very interesting.