Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers
Convolutional networks have been around for a long time, they have gained popularity recently mainly because of the vast amounts of data available to us now. The first time the modern convolutional neural network was in 1980 when a computer science researcher Yann LeCun used CNNs to recognize handwritten numbers. These handwritten numbers are still available today as the MNIST dataset and people still use it to this day. The neural network he used was pretty simple with five layers of 5×5 convolutional layers and 2×2 max pooling layers. It had limited use in the 80s because CNN needed a lot of data to be able to give an accurate result but with advancement of CNN architecture and with way more data available to us today, CNN is widely used in many fields.
Convolution is a procedure where two sources of data are interweaved. They are generally used in image processing to blur and sharpen images and in this case we are using it in out CNN model.
Convolutional Neural Networks or CNN are made of neurons like a human brain that self-optimize by learning. CNN specializes in processing and mapping image data or time-series data making it useful for image classification and recognition.
The image above shows the architecture of a CNN with 5 layers — two convolutional layers, two pooling layers and a fully connected layer. We will be further going into each of these layers and what the do:
The convolutional layer if the core of the CNN neural network, this layer is responsible for extracting the features from the input layer. Basically this layer performs a dot product between the input layer matrix and a filter called kernel which produces the feature map.
The kernel is usually smaller than the input image but it’s more in depth because the image has three channels — RGB and the kernel even though smaller still has all three channels. Generally when an image passes through the convolutional layer outputs a 3D volume, a section of the volume taken through the depth represents features of the same part of the image and each feature in the same depth layer is generated by the same filter that convolves the image.
Convolutional Layer performs a linear operation and images are far from linear, so activation layers or non-linearity layers are put right after the convolutional layer so we can introduce non linearity in the feature map output.
Types of Activation Layers:
- Sigmoid — Takes input values and converts it to a range between 0 and 1.
- RelU — The Rectified Linear Unit is the most commonly used layer. It uses the function f(k) = max(0, k) to introduce non linearity.
- Tanh — Converts the input values to numbers in the range [-1,1].
Pooling layer replaces the feature map at certain locations by summarizing the features of the input image. Pooling layers are usually used to reduce or downsample the dimensions of the feature maps. So, further layers perform their operations on a summarized feature map. There are two important kinds of pooling layers — MaxPooling and AveragePooling
- Maxpooling — Calculates the max value for each section of the feature map
- AveragePooling — Calculates the average value for each section of the feature map
Padding and Strides
To handle the edge cases in the input layer we usually add a padding of 0 pixels around the image so that the convolutional layer can perform its operations on the edge pixels .
Strides is a parameter of the CNN model which changes the amount of movement over the input layer. For example, if the stride is 1 then we move the filters 1 pixel at a time.
Convolutional Layer Formula
To find out the size of the output matrix from a convolutional layer we use the above formula to calculate it where W stands for the width of the input layer, F stands for the width of the kernel, P stands for the padding and S stands for the strides. For example — An image the size 5x5x3 and a kernel of size 3x3x32 will give a feature map of size 3x3x32.
Fully Connected Layer
The convolutional layer is partially connected because the feature map does not connect to every pixel from the input image. We use the fully-connected layer to learn the non-linear combinations of the features from the feature map. The feature maps obtained from the convolutional layers is are 3-dimensional we need to flatten them to 1-dimensional vector that can be passed to the fully connected layer for the final classification step. The fully connected layer uses the softmax activation function that converts the output vector of numbers to a vector of probabilities.
Application of CNNs
CNNs are the backbone of computer vision where the model gains important information from visual data like images. Some important areas where CNN plays an important role is:
- Video Surveillance
- Image Classifaction
- Content Regulation
CNN is a very powerful tool and can be used to solve almost all real-world problems that has solutions.
- Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville published by MIT Press, 2016
- ProjectPro. (2022, June 20). Introduction to Convolutional Neural Networks Architecture. ProjectPro. Retrieved August 31, 2022, from https://www.projectpro.io/article/introduction-to-convolutional-neural-networks-algorithm-architecture/560#:~:text=In%20the%201980s%2C%20the%20world,LeNet%20after%20Yann%20LeCun%20himself.