In this blog, we’ll dig deeper into Convolutional Neural Networks. We’ll learn about the different steps involved in building a CNN and get to know the modern day applications of CNN.
It is a neural network in which at least one layer is a convolutional layer. A Convolutional Neural Network (CNN) uses features to categorize or classify images.
For example, once we have trained our CNN for identifying pictures of cats or dogs by feeding different labelled images of cats and dogs, we essentially train our neural network. Once trained, if we pass a new image of a cat to our network it’ll categorize the image as a cat.
Yann Lecun is considered the grandfather of Convolutional neural networks.
These are the layers of convolutional neural network where filters are applied to the original image.
A convolutional neural network consists of 4 main steps / layers which are:
- Convolution operation
1.2. ReLU Layer
4. Full connection
The below diagram shows the different layers in a CNN.
So let us see each of the layers in detail.
In this process, we reduce the size of the image by passing the input image through a Feature detector/Filter/Kernel so as to convert it into a Feature map/ Convolved feature/ Activation map
It helps remove the unnecessary details from the image.
We can create many feature maps (detects certain features from the image) to obtain our first convolution layer.
The convolution operation involves element-wise multiplication of convolutional filter with the slice of an input matrix and finally the summation of all values in the resulting matrix.
The number of pixels by which we are moving the filter over the input matrix is called a stride.
1.2. ReLU Activation Function:
ReLU is the most commonly used activation function in the world.
When applying convolution, there is a risk we might create something linear and there we need to break linearity.
Rectified Linear unit can be described by the function f(x) = max(x, 0).
We are applying the rectifier to increase the non-linearity in our image/CNN. Rectifier keeps only non-negative values of an image.
It helps to reduce the spatial size of the convolved feature which in-turn helps to to decrease the computational power required to process the data.
Here we are able to preserve the dominant features, thus helping in the process of effectively training the model.
Pooling converts the feature map into a Pooled feature map.
Pooling is divided into 2 types:
1. Max Pooling — Returns the max value from the portion of the image covered by the kernel.
2. Average Pooling — Returns the average of all values from the portion of the image covered by the kernel.
Involves converting a Pooled feature map into one-dimensional Column vector.
The flattened output from the Flattening step is fed to a feed-forward neural network with backpropagation applied to every iteration.
Over a series of epochs, the model is able to identify dominating features and low-level features in images and classify them using the Softmax Classification technique (It brings the output values between 0 and 1).
- Optical character reader applications like document analysis.
- Surveillance and security.
- Traffic monitoring like congestion detection.
- Advertising and programmatic buying.
- Facial recognition and detection by identifying pose, angle, external features, etc.
In this blog we have learned about Convolutional Neural Network and each of the different layers that make up a CNN. We looked at the applications of CNN in real-world and tried to understand what makes Convolutional Neural Network the go to algorithm for image classification and detection.