An executive’s guide to how neural networks work, without the jargon and complicated maths
Neural networks, deep learning, reinforcement learning — all seem complicated, and the barrier to entry in understanding how these things work can seem too high. In this article, I’m going to explain the mechanics of a neural network in an intuitive way, using a worked example.
Lots of explanations try to relate how a neuron in the brain works to an artificial neural network (ANN). However, unless you did biology, medicine, or neuroscience as a degree, you probably don’t know how a neuron works in the brain, so it doesn’t help you. I did medical degree with a specialism in neuroscience and I found the explanations of neurons ‘identifying straight lines, and loops’ completely baffling, so don’t feel disheartened. For me, it only made sense when I could understand the basic calculation the network is completing.
I’ll use 2 examples, one to predict the price of a flight (a regression problem), and one to identify whether someone will default on their next credit card payment (a classification problem).
A regression example — predicting the price of a flight
Step 1: Gather the inputs
First, we want to think about what data we can use to help us predict the price of flights. For our example we will use 2 inputs; the distance of the flight in miles and the typical proportion of seats that are booked on this route (i.e., the utilisation).
We’ll start with a flight between London and Tokyo and the real price of this flight is £900. The distance in miles between London and Tokyo is 5,936 miles and typically 90% of the seats on this route are booked.
Step 2: Assign weights
We need to do something to those inputs to get to a price of £900. To do that we introduce weights. The network will start by randomly assigning weights to each input. For now, we will start with 0.2 for the distance, and 6 for the utilisation of the flight. We then multiply the inputs by the weights and add them up.
For a first guess, this isn’t terrible, but we can do better.
Step 3: Introduce a minimum threshold
When we think about it more, there is a minimum price for operating a flight because of fixed costs. Regardless of whether a flight goes to Edinburgh or Tokyo, Heathrow will charge an airline a fixed amount for the landing slot, landing taxes, and the costs of the baggage handlers / terminal (ignoring the nuance of short haul versus long haul UK air taxes for now!). That means that while the relationship between miles and price should roughly be a straight line, it won’t go through 0 on the y axis.
We add this to our neural network, and this is called bias. Now the calculation is:
Step 4: calculate the error
We know the flight from London to Tokyo cost £900 but our network is currently predicting that it cost £1393. That means it is £493 too high. The closer we are to 900 the better the model is performing.
Step 5: adjust the weights.
Now the model will adjust the weights. Let’s see what happens if we use 0.1 for the distance and keep 6 on the utilisation.
Better, but still not perfect. The model will continue to adjust the weights, until it can get as close as possible to £900.
I left a few things out in this example, in the interests of keeping this simple to understand. The distance of the flight has far more effect in this model than the utilisation, because the number is bigger. Normally you would scale this data, so all the inputs were the same range. I’ve also not talked about how the weights are adjusted — as the maths for this does get quite complex.
A classification example — predicting if someone will default on their credit card payment
The example above was essentially a linear regression problem. Neural networks are used for regression problems, but it is more common to use them for classification problems. This means we want to identify which category an image / customer / card transaction etc belongs to.
We want to predict if a customer will default on their next credit card payment.
Step 1: Gather the inputs
We start by identifying data that can help us predict whether a customer will default. For this example, we will use 3 inputs; the customer’s credit score, the history of missed payments and their income.
There are 2 customers; one that looks intuitively risky and one that doesn’t.
We’ll start with customer A; they intuitively look risky, their credit score is quite low, the income is below the average, and they have 5 missed payments before.
Step 2: Assign weights and bias
Next, we randomly assign weights to each input the same as before. We multiply the inputs by the weights and add them together.
This results in a score of 340.
Next let’s see what this looks like for customer B. This customer intuitively looks less risky; they have a credit score of 600, no missed payments and an income of 50,000.
This results in a score of -370.
Step 3; activation
In the regression solution, the number from the output cell was a meaningful number, the price of a flight. In this example, what does 340 represent? Are they likely to default on their next payment or not?
To answer this, we use a sigmoid activation function. This sounds complicated but all it does it takes the number given to it and makes sure the output is between 0 and 1. That means we can use it as a probability, so the bigger the number, the more likely it is that the customers will miss their next payment.
When we calculate this for customer A; we can see the output is 1. The model predicts that the customer is likely to default on their next payment.
Conversely, customer B has an output of 0 which means the model predicts that they are unlikely to default on their next payment.
Step 4; calculate the loss and adjust the weights
Next, we need to work out how accurate the predictions were. For customer B, we were correct, the customer did not default on their next payment. For customer A, despite the signals, the customer also did not default on their next payment. For now, let’s keep it simple and calculate the number of customers we correctly predicted.
We only correctly predicted half of the customers, which isn’t great. The model will automatically adjust the weights until the number of correctly identified customers is maximised. The way this is done is quite complicated, so I’ve left it out of this article for now.
Similarly, to the regression example, this classification example is oversimplified in a few ways. Both of our training examples didn’t default on their next payment. Neural networks require examples of all the outputs you want to predict, and significantly more than 2 training examples to learn from. All the inputs we gave our model are numerical. However, in the real-world data is categorical e.g. city, hair colour, gender. Categorical data has to be converted into a form that weights can be applied i.e. into numbers. There are several different ways to determine the performance of a model and the right metric depends on the type of data and the type of model. We also didn’t scale out inputs before passing them to the model which means some of them have more of an effect on the model.
Purkait, N. (2019) Hands-on neural networks with Keras design and create neural networks using deep learning and artificial intelligence principles. Available at: https://portal.igpublish.com/iglibrary/obj/PACKT0005217.html (Accessed: 19 August 2022).
I like to write about data science for business users and I’m passionate about using data to deliver tangible business benefits.
You can connect with me on LinkedIn and follow me on Medium to stay up to date with my latest articles.