Foundational concepts in the fields of Machine Learning and Deep Neural Networks
Welcome to this Medium Article. Perhaps the hottest topic in the world right now is artificial intelligence. When people talk about this, they often talk about machine learning, and specifically, neural networks. What people have done in the last decades kind of abstracted the theory into a basis set of equations that emulate a network of artificial neurons. Then people have invented ways to train these systems based on data. So, rather than instructing a machine with rules like a piece of software, these neural networks are trained based on data. So, now we’re going to learn the very basics for, perception, error-functions, terminology that doesn’t make sense yet, but by the end of this medium article, you should be able to write and train your own neural network. That’s a ton of fun.
Now we will learn how to use one of the most exciting tools and self-driving car development, deep neural networks. A deep neural network is just a term that describes a big multi-layer neural network. And a neural network is a machine learning algorithm that you can train using input like camera images or sensor readings and generate output like what’s steering angle the car should set or how fast it should go on the road. The idea is that the neural network learns from observing the world. You don’t have to teach it anything specific.
Deep learning is just another term for using deep neural networks to solve problems and it’s become really important for self-driving cars. But deep learning is relatively new. Until the last few years, computers simply weren’t fast enough to train deep neural networks effectively. Now, however, automotive manufacturers can apply deep learning techniques to drive cars in real time.
Because deep learning is so new, automotive engineers and researchers are still experimenting with just how far it can take us, but deep learning has already revolutionized segments of autonomous driving like computer vision and it has the potential to entirely change the way we develop self-driving cars. Some of the most important recent breakthroughs in the performance of self-driving cars, have come from machine learning.
Foundational concepts in the fields of machine learning and deep neural networks
Machine learning is a field of artificial intelligence that relies on computers to learn about the environment from data, instead of relying on the rule set by computer programmers. Deep learning is an approach to machine learning, that uses deep neural networks. Deep learning uses this one tool to accomplish an amazing array of objectives, from speech recognition, to driving a car.
Perceptron, which is the fundamental unit of a neural network. We will learn how to combine these units into a simple neural network. Before we start learning about neural network though, let’s go over the basics of machine learning 😄
So you may be wondering why are these objects called neural networks?
Well, the reason why they’re called neural networks is because perceptions kind of look like neurons in the brain. And what the perception does, it calculates some equations on the input and decides to return a one or a zero. In a similar way neurons in the brain take inputs coming from the dendrites. These inputs are nervous impulses. So what the neuron does is it does something with the nervous impulses and then it decides if it outputs a nervous impulse or not through the axon. The way we create neural networks in our development is by concatenating these perceptions so we’ll be mimicking the way the brain connects neurons by taking the output from one and turning it into the input for another one.
What are Error Functions?
An error function is something that tells how far we are from the actual solution. For example, if I’m here and my goal is to get to a nearby plant, an error function will just tell me the distance from the plant. My approach would then be to look around myself, check in which direction I can take a step to get closer to the plant, take that step and then repeat. Here the error is simply the distance from the plant.
Here is obvious realization of the error function:
We’re standing on top a mountain, Mount ABC and I want to descend but it’s not that easy because it’s cloudy and the mountain is very big, so we can’t really see the big picture. What we’ll do to go down is we’ll look around us and we consider all the possible directions in which we can walk.
Then we pick a direction that makes us descend the most. So we take a step in that direction. Thus, we’ve decreased the height. Once we take the step and start the process again and again always decreasing the height until we go all the way down the mountain and thus, minimizing the height. In this case the key metric that we use to solve the problem is the height of the mountain. We’ll call the height the error.
The error is what’s telling us how badly we’re doing at the moment and how far we are from an ideal solution. And if we constantly take steps to decrease the error then we’ll eventually solve our problem, descending from Mt. ABC. Some of you may be thinking, wait, that doesn’t necessarily solve the problem. What if I get stuck in a valley, a local minimum, but that’s not the bottom of the mountain. This happens a lot in machine learning! It’s also worth noting that many times a local minimum will give us a pretty good solution to a problem. This method is called gradient descent.
Let’s say we have a machine learning model that will predict if we receive a gift or not. So, the model use predictions in the following way. It says, the probability that we get a gift is 0.8, which automatically implies that the probability that we don’t receive a gift is 0.2.
And what does the model do?
What the model does is take some inputs. For example, is it your birthday or have it been good all year? And based on those inputs, the ML model calculates a linear model which would be the score. Then, the probability that we get the gift or not is simply the sigmoid function applied to that score generated by the model. Now, what if you had more options than just getting a gift or not a gift? Let’s say we have a model that just tell us what animal we just saw, and the options are a duck, a beaver and a walrus. We want a machine learning model that tells an answer along the lines of, the probability of a duck is 0.67, the probability of a beaver is 0.24, and the probability of a walrus is 0.09. Notice that the probabilities need to add to one. Let’s say we have a linear model based on some inputs. The inputs could be, does it have a beak or not? Number of teeth. Number of feathers. Hair, no hair. Does it live in the water? Does it fly? We will a calculate linear function based on those inputs, and let’s say we get some scores. So, the duck gets a score of two, and the beaver gets a score of one, walrus gets a score of zero. And now the question is, how do we turn these scores into probabilities?
The first thing we need to satisfy with probabilities is as we said, they need to add to one. So the two, the one, and the zero do not add to one. The second thing we need to satisfy is, since the duck had a higher score than the beaver and the beaver had a higher score than the walrus, then we want the probability of the duck to be higher than the probability of the beaver, and the probability of the beaver to be higher than the probability of the walrus.
Here’s a simple way of doing it. Let’s take each score and divide it by the sum of all the scores. The two becomes two divided (2+1+0), the one becomes one divided by (2+1+0), and the zero becomes zero divided (2+1+0). This kind of works because the probabilities we obtain are two thirds for the duck, one third for the beaver, and zero for the walrus. That works but there’s a little problem. Let’s think about it. What could this problem be?
The problem is the following.
What happens if our scores are negative? This is completely possible since the scores are linear function which could give negative values. What if we had, say, scores of 1, 0 and (-1)? Then, one of the probabilities would turn into one divided by (1+0+(-1)) which is zero, and we know very well that we cannot divide by zero. This unfortunately won’t work, but the idea is good.
How can we turn this idea into one that works all the time even for negative numbers? Well, it’s almost like we need to turn these scores into positive scores. How do we do this? Is there a function that can help us?
The answer is Softmax.
The softmax function is a function that turns a vector of K real values into a vector of K real values that sum equal to 1. The input values can be positive, zero, negative, or greater than one, but the softmax transforms them into values between 0 and 1, so that they can be interpreted as probabilities 😀
With this, we have come to the end of this article. Thanks for reading this and following along. Hope you loved it! Bundle of thanks for reading it!
My Linkedin 🙂