The toughest part in learning anything new is to break that mental inertia and indulge your mind into a cycle of efforts and rewards. However, incorporating anything new into this cycle requires immense practice which has to be fueled by persistent motivation. But thanks to the current design of social media apps, our attention span has fallen drastically low due to which many of us struggle in the motivation and attention part. Further, when it comes to learning a humongous topic like machine learning (ML), there exist plenty of challenges that one needs to overcome. Some of which are:
- Its actually a cross-domain topic, requiring knowledge form several domains and sources to get a proper feel of it.
- Its very dynamic. One can come across something new to be learnt on a daily basis; be it in terms of frameworks, literature, best practices, etc.
- There are umpteen number of resources (like this one) that can guide your ML journey. However I personally have often found that you cannot find a ‘silver bullet’ that can train you holistically and hence you need to choose carefully where to start, without the risk of burning out. Thus, I created this article in the form of a springboard to quick start the ML journey.
- ML from the outside seems very enticing and promising. It is only when one starts to study it (in a top to bottom approach) is it realised that there is an arduous journey across a demanding passage of theory and literature before actually getting to behold your own ‘artificially intelligent being’.
I personally believe in following a bottom-up approach when trying to dive into a new domain (starting with the application first). It allows you to realise the potential and the limitations of that domain. Additionally, the sheer joy of being able to quickly realise the application of the domain pumps up enough motivation to fuel the remaining part of the journey. It is due to this very reason that I have curated this application specific content, especially targeted for absolute beginners who want to break into the domain of ML, ASAP!! In this article, we shall be using the brief discussion about ML that we had in an earlier article¹. Just have a quick look at that to get some interesting insights into the WHAT and WHY of ML before we prepare to take off!
Assuming that you have read the earlier mentioned article (along with the disclaimer!), let us begin our ML journey. We shall be creating the most cliché but fundamental ML model: an image classifier. An image classifier is a model that can categorise given images into different classes. The specifications of our image classifier are as follows:
- The classifier used will be an Artificial Neural Network (ANN)
- The images that we will be classifying shall be from the MNIST dataset
Although we could have used a Convolutional Neural Network (CNN) for this task but I do not want to complicate things unnecessarily for you. At the same time, I also want you to realise how much powerful a simple ANN with just a few layers can be (layers have been briefly discussed in the earlier article).
So before we start implementing the ANN, make sure you have a good knowledge of the Python programming language, Tensorflow, Keras and have a personal computer with atleast 16GB of RAM… Just Kidding! In fact, with a tool like TensorFlow, you do not necessarily need to be proficient with a programming language (although it is recommended for greater control over your implementations) and also do not need to know the deeper details of how an ANN works. Additionally, with Google Colab, you do not need a high end system to run your ML models. It provides a neat and powerful ecosystem wherein you can develop your model form scratch. Best part, Google Colab already comes installed with TensorFlow!
So we start with Google Colab:
1. Search for ‘google colab’ in google and click on the first link.
2. The Colab page will open. Click on ‘File’ (top left) and click on ‘New notebook’.
A notebook is a beautiful way to write code/text, generate graphs and embed images all in a single workspace.
3. You will be asked to log in to your google account. Just proceed with that.
4. You would be presented now with a blank notebook: your own personal space to write custom codes, create impressive graphic documents and run ML models.
A notebook essentially has cells, as can be seen above (the darker gray rectangle towards the top). You can perform the earlier mentioned functionalities in this cell. Further, you can have several such cells, thereby organising your content better. From a coding perspective, you can run different cells at different times. Thus the whole program can be fragmented into different sections. The results of each cell will be carried over to the later cells for use, if you need so!
5. Now lets write our first piece of code. A program to add two numbers. Write the following in the first cell:
You just wrote your first piece of code! Now its time to run it. Simply click on play button at the left of the cell. One more method to run a cell is to press ‘shift’ and ‘enter’ together while the cursor is inside the cell. As shown below, the notebook might take some time to initialise and allocate the resources to perform computation for you.
After the cell has finished executing, you will get the output as 5! Note that a green tick appears on the left of the cell that has successfully finished execution. Always wait for this green tick to be displayed to finish the execution of a particular cell.
Voila! you ran your first piece of code in Colab! (Colab is actively used by ML researchers and enthusiasts, so its indeed a milestone in your ML journey). You can now delete this cell to start with a clean slate for your ANN. Simply click on the bin icon that can be found on the top right of the cell.
6. Now that you know how to write and run code in Colab, we start our actual coding for ANN (in Python programming language). We will be having the following steps in our ANN implementation:
- Importing libraries: Library is a software already created by someone that can be used according to one’s requirements. To use a library, one has to specifically mention it before the code so that during execution it is known that which library is to be used. We use the word ‘import’ in Python to mention the libraries to be used.
- Loading dataset: As discussed in the earlier article¹, data is used to train an ANN. We shall be using the MNIST dataset to train our ANN. The MNIST dataset is a collection of 70,000 images. 60,000 of these are generally used for training and remaining for testing the ANN. All the images are of decimal digits ranging from 0 to 9.
- Defining the model: A model needs to be defined before it can be trained. It requires one to define how many and what types of layers should there be in the ANN, how many neurons should there be in each layer and what activation function will each layer be using.
- Compiling the model: Compilation requires one to define the loss and optimisation parameters of the model. These are nothing but methods to cut the stencil¹.
- Fitting the model: The model is made to ‘fit’ the dataset by feeding the model with the data and running multiple cycles of update and corrections over the model. This step is also known as training the model.
- Prediction: This is the last part where you can use your trained model to perform predictions for you. An image is given to the model for which you already know the class. The class predicted by the model can be compared with its ‘true’ class.
Note: We have not included certain steps (like cross validation and evaluation) to keep things simpler for now. They will be discussed later.
So lets finish coding our ANN:
- Importing libraries: Copy the below content in the cell and run it.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Flatten,Conv2D,MaxPooling2D
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
We will use the TensorFlow library here. From TensorFlow we import the ‘Sequential’ model, which is just a style to create ML models. Then we import different types of layers (Dense, Flatten, Conv2D, MaxPooling2D) that we shall use in our ANN (do not worry about the ‘types’). Thereafter we import the ‘mnist’ dataset which contains the actual images to train our ANN. Tensorflow has so much to provide! (both the ANN components and the dataset). Then we import the ‘matlplotlib’ library which allows one to display graphs and images by converting numbers/matrix of numbers to images. Lastly we import the ‘numpy’ library which makes manipulation of matrices very easy. Remember, images here are ultimately grids or matrices of numbers, over which operations can be done, yielding more matrices.
2. Loading dataset: Copy the below content in the next cell and run it.
(x_train,y_train),(x_test,y_test) = mnist.load_data()
x_train = x_train/255
x_test = x_test/255
‘mnist.load_data()’ simply loads the MNIST dataset for your browser session. The ‘loading’ is done into different variables (containers of content):
2.1 x_train: a variable that stores all the images for training the ANN, also called as the training samples.
2.2 y_train: a variable that stores all the labels (class numbers) of the training samples.
2.3 x_test: a variable that stores all the images for testing the ANN, also called as the testing samples. We will not use it though.
2.4 y_test: a variable that stores all the labels (class number) of the testing samples.
Having loaded the data into these variables, you can now use the images of this dataset. The division of variables by 255 in the next line is simply normalising every pixel’s value/intensity of the training/testing data (‘x_train’, ‘x_test’). This helps the model train efficiently and quickly.
3. Defining the model: Copy the below content in the next cell and run it.
model = Sequential([
Flatten(input_shape = (28,28)),
Dense(32, activation = ‘relu’),
Dense(10, activation = ‘softmax’)
As can be observed, we create a ‘ model’ with ‘Sequential’ style (there are other styles to do it). We also define the layers of the model. The first layer is a ‘Flatten’ layer which converts our 2 dimensional image (with size of 28×28 pixels) to a 1 dimensional array of numbers that can be fed to the latter part of the ANN. Then we have a ‘Dense’ layer, within which we have 32 neurons. Each of these neurons will take the pixel values of our image as input, sum it, apply activation function over it (‘relu’ activation function allows a high output only after a threshold of input) and then give out the output. Then we have the last ‘Dense’ layer which has 10 neurons. Each of these neurons will feed the value from previous layer and output a high value depending on the class of the input image. For instance, if the input image is of class 3, then the third neuron of the last layer will have a high output (‘softmax’ activation function is used to achieve this).
4. Compiling the model: Copy the below content in the next cell and run it.
model.compile(loss = "sparse_categorical_crossentropy", optimizer = "adam", metrics = ['accuracy'])
The model is compiled by defining the type of ‘loss’ , the ‘optimizer’ and optional ‘metrics’ to be analysed. ‘loss’ is used to compare the current prediction (output) of the model with the actual class of the input. ‘optimizer’ is used to tune the model’s parameters (cut the stencils¹ appropriately) according to the ‘loss’. There exist numerous ways for both the ‘loss’ and the ‘optimizer’. ‘metrics’ are performance measures of your model that you may want to check while it is being trained. We would like to keep an eye on the accuracy of our model as it goes through the training phase, thus we use ‘accuracy’ in our ‘metrics’.
5. Fitting the model: Copy the below content in the next cell and run it.
model.fit(x_train,y_train,epochs = 5)
Now that everything is in place, we can ‘fit’ our model to the data. But what is fitting? Go back to our stencil example form last discussion¹. Fitting there refers to overlapping our stencil on an actual leaf and cutting the stencil till the hence created holes match the actual leaf’s boundary, i.e., our stencil fits the leaf. Thus, for our ANN, we feed it both the training samples (x_train) and their labels (y_train) to fit it to the real mapping between images and labels (classes). ‘epochs’ can be understood as the number of times you want your model to be trained over the same data. This step takes time, depending on the ‘epochs’, as each time the model runs through all the 60,000 images of the MNIST dataset. So be patient till the green tick appears on the left side of the cell.
Also keep an eye on the ‘accuracy’ metric for each epoch as the status is printed. It keeps on increasing across epochs and goes well above 0.96 (96 percent)!
6. Prediction: Copy the below content in the next cell and run it.
image_index = 9
print("The predicted class for the above image is: ",np.argmax(model.predict(x_train[image_index][np.newaxis,...])))
We just realised our first ‘artificial intelligence’! The ANN has been trained and can now be used to predict the class (label) of images for us. How do we do it? Simply use the ‘model.predict()’ function for that. Inside the brackets, we need to give the image that we want to test. ‘x_train[image_index][np.newaxis,…]’ is the image present in the training dataset . ‘image_index’ is just the index (like serial number) of the image that we choose to predict from the collection of the training images. You can change the value of ‘image_index’ from 9 to any other value (within limits) to predict other image samples. ‘plt.imshow()’, which also takes an image inside the brackets, simply displays the image that we are predicting so that we are sure of the actual class.
As can be seen above, the predicted class (4) matches the image (number 4).
You just made a very famously used ML model, i.e., the ANN. In fact, a substantial amount of focus in the ML community is around these kinds of models.
You should also appreciate the fact that such a simple model (only 3 layers) is able to very accurately classify such a big collection of images (60,000 images). Imagine the daunting task of achieving this with a rule based approach, if at all you could do that. This is the power of ANNs. Cherry on the top is that you could achieve this close to state-of-art performance in less than 20 lines of code! All thanks to TensorFlow.
I am sure by now you would be excited to try new things with ANN; like adding more layers to it, testing it on unseen data, changing activation functions, changing number of neurons and looking out for images where the ANN does a wrong prediction (a Pandora’s box in itself), why it would do that and how to avoid that. You might also want to enquire about ‘keras’. I will now strongly suggest you to look out for each line of code that we used here in the Internet and study in detail about them in a bottom up approach. This will help you gain confidence over this implementation and prepare you well for the upcoming ML journey!
See you soon. Till then, keep making your ANN smarter!