Imagine looking at a sloppy, low-resolution image of the number 6. Despite its low quality, your brain has no difficulty in identifying it as 6. It’s a remarkable feat that your brain can do without much effort, but programming the computer to recognize digits has been quite another matter. And this is where networks of neurons come in.
This article intends to explain the concept of neural networks in simple language, without requiring any previous knowledge. I’ll be using a classic example of recognizing handwritten digits to help you understand how neural networks function. This example is fantastic for beginners, and it’s an excellent way to get started with the topic!
In a neural network, each input to an artificial neuron has a weight assigned to it, these inputs are multiplied by their corresponding weights, and a bias is added to the output. This process is called a weighted sum
The weighted sum is then passed through an activation function, which is often some non-linear function, if you’re curious about activation functions, I wrote an article on the topic that I highly recommend checking out!
The first layer of a neural network is the input layer, which receives data provided to the network. The final layer is the output layer. the number of neurons in both the input and the output layers depends on the type of dataset and the problem you are trying to solve, as well as the number of hidden layers and neurons in each hidden layer.
The outputs from the first hidden layer are linked to the first neuron in the second hidden layer, and conversely, the second neuron in the second hidden layer is connected to all the outputs from the first hidden layer, and so forth.
I hope I didn’t blow up your mind with this. But just in case you’re still struggling to picture it, here’s a handy visualization to help you out.
Let’s take the example of recognizing handwritten digits to illustrate the structure of a neural network.
The network consists of an input layer, a series of hidden layers, and an output layer. The input layer corresponds to the 28 by 28 pixel grid of the handwritten digit image, which has 784 neurons in total, with each neuron holding a number that represents the grayscale value of the corresponding pixel.
The output layer has ten neurons, each representing one of the digits. The hidden layers are located between the input and output layers and are the key to the neural network’s ability to learn and recognize patterns.
In this example, we have two hidden layers, each with 16 neurons. The activations in one layer determine the activations of the next layer, and this process continues until the output layer is reached.
In order to train a neural network, it is necessary to provide it with a labeled dataset of examples. Lucky for you, there’s an amazing dataset called MNIST that’s perfect for this task. You can find on Kaggle and download it from there.
The network then uses this data to adjust the weights between the neurons. our goal is to find the optimal weights that enable the network to accurately recognize the handwritten digits.
Loss Functions: Measuring the Neural Network’s Error
A loss function is a way of measuring how well the neural network is performing on the given task. It computes the difference between the predicted output and the true output for a given input. Our goal here is minimizing the loss function.
There are many types of loss functions that can be used, depending on the problem you are trying to solve. For example, Mean Squared Error (MSE) is commonly used for regression problems, while Cross-Entropy Loss is commonly used for classification problems.
Backpropagation: Adjusting Weights to Minimize Loss
Once you’ve measured the neural network’s performance using a Loss Function, you can start to optimizing it using “Backpropagation”.
It is the process of adjusting the weights between the neurons to minimize the loss function. It involves calculating the gradient of the loss function with respect to each weight in the network, and then adjusting the weights in the opposite direction of the gradient.
This process is repeated over many iterations, with the weights being adjusted after each iteration, until the loss function is minimized.
There are several algorithms that can be used to optimize the loss function during backpropagation. For example, Stochastic Gradient Descent (SGD) a popular algorithm that adjusts the weights in small steps, based on the gradient of the loss function. I will cover that topic in my upcoming articles, so follow me to get notified when it’s posted.
Given that we haven’t covered all the basics, such as the mathematical formulas for loss functions and how SGD (Stochastic Gradient Descent) works, we will be coding this project using TensorFlow. However, as a beginner, it is highly recommended to first try building a neural network from scratch to gain a better understanding of how things work.
Step 1: Install the Required Packages
Before we can start building our model, we need to install the required packages. Run the following command in your terminal or command prompt:
pip install tensorflow
Step 2: Load the Dataset Next, we need to load the MNIST dataset. You can either load the dataset using the keras.datasets
module or to use the data that you have downloaded from Kaggle. Then normalize the pixel values.
import tensorflow as tf# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize the pixel values
x_train = x_train / 255.0
x_test = x_test / 255.0
Step 3: Build the Model Build a simple neural network with two hidden layer with 16 neurons each.
# Build the model
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
]
Step 4: Compile the Model, specifying the loss function, optimizer, and metrics.
# Compile the model
model.compile(
optimizer=tf.keras.optimizers.SGD(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
Step 5: Train the Model Finally, we can train our model on the training data. To do this, we simply call the fit
method on the model object, passing in the training data, labels, and the number of epochs (iterations over the entire dataset).
# Train the model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
If you have any questions or comments about the topics covered in this post, please leave them in the comment section below. I’d love to hear from you and provide any additional information or clarification that you need.
Thank you for taking the time to read my post. I hope you found it informative and enjoyable. If you want to remain up-to-date on my latest articles and never miss a story, be sure to follow me on Medium.