Start Your Coding for Computer vision with Python
We, human beings, perceive the environment and surroundings with our vision system. The human eye, brain, and limbs work together to perceive the environment and act accordingly. An intelligent system can perform those tasks which require some level of intelligence if done by a human. So, for performing intelligent tasks, artificial vision system is one of the important things for a computer. Normally, the camera and image are used to gather information needed to do the job. Computer vision and Image processing techniques help us to perform similar tasks done by humans, like image recognition, object tracking, etc.
In computer vision, the camera works as a human eye to capture the image, and the processor works as a brain to process the captured image and generate significant results. But there is a basic difference between humans and computers. The human brain works automatically, and intelligence is a by-born acquisition. On the contrary, the computer has no intelligence without human instruction (program). Computer vision is the way to provide the appropriate instruction so that it can work compatible with the human vision system. But the capacity is limited.
In the upcoming sections, we will discuss the basic idea of how the image is formed and can be manipulated using python.
How Image is Formed and Displayed
The image is nothing but a combination of pixels with different color intensities. The jargon ‘pixels’ and ‘color intensity’ may be unknown to you. Don’t worry. It will be crystal clear, just read the article till the end.
Pixel is the smallest unit/element of the digital image. Details are in the image below.
The display is formed with pixels. In the above figure, there are 25 columns and 25 rows. Each small square is considered a pixel. The setup can house 625 pixels. It represents a display with 625 pixels. If we shine the pixels with different color intensity (brightness), it will form a digital image.
How does the computer store the image in the memory?
If we look at the image carefully, we can compare it with a 2D matrix. A matrix has rows and columns, and its elements can be addressed with its index. The matrix structure is similar to an array. And computer store the image in an array of computer memory.
Each array element holds the intensity value of a color. Generally, the intensity value ranges from
0 to 255. For demonstration purposes, I have included an array representation of an image.
Grayscale and Colored Image
The grayscale image is a black-and-white image. It is formed with only one color. A pixel value close to 0 represents darkness and becomes brighter with higher intensity values. The highest value is 255, which represents the white color. A 2D array is sufficient to hold the grayscale image, as the last figure shows.
Colored images can’t be formed with only one color; there might be hundreds of thousands of color combinations. Mainly, there are three primary color channels
RED (R), GREEN(G), and Blue(B). And each color channel is stored in a 2D array and holds its intensity values, and the final image is the combination of these three color channels.
This color model has (256 x 256 x 256) = 16,777,216 possible color combinations.
You may visualize the combination here.
But in computer memory, the image is stored differently.
The computer doesn’t know the RGB channels. It knows the intensity value. The red channel is stored with high intensity, and the green and blue channels are stored with medium and low-intensity values, respectively.
NumPy Basics to Work with Python
NumPy is a fundamental python package for scientific computation. It works mainly as an array object, but its operation isn’t limited to the array. However, the library can handle various numeric and logical operations on numbers .
You will get NumPy official documentation here.
Let’s start our journey. First thing first.
- Importing the NumPy library.
It’s time to work with NumPy. As we know, NumPy works with an array. So, let’s try to create our first 2D array of all zeros.
It’s as simple as that. We can also create a NumPy array with all ones just as follows.
Interestingly, NumPy also provides a method to fill the array with any values. The simple syntax
array.fill(value) can do the job.
‘b’ with all ones is now filled with
- The Function of Seed in case of Random Number Generation
Just have a look at the following coding examples.
In the first code cell, we have used
np.random.seed(seed_value), but we haven’t used any seeding for the other two code cells. There is a major difference between random number generation with and without seeding. In the case of random seeding, the generated random number remains the same for a specific seed value. On the other hand, without a seed value, random number changes for each execution.
- Basic operations (max, min, mean, reshape, etc.) with NumPy
NumPy has made our life easier by providing numerous functions to do mathematical operations.
array_name.min(), array_name.max(), array_name.mean() syntaxes help us find an array’s minimum, maximum, and mean values. Coding example —
Indeies of the minimum and maximum values can be extracted with the syntaxes
array_name.argmax(), array_name.argmin(). Example —
Array reshaping is one of the important operations of
NumPy. array_name.reshape(row_no, column_no) is the syntax for reshaping an array. While reshaping the array, we must be careful about the number of array elements before and after reshaping. In both cases, the total number of elements must be the same.
- Array Indexing and Slicing
Each array element can be addressed with its
column and row number. Let’s generate another array with 10 rows and columns.
Suppose we want to find the value of the first value of the array. It can be extracted by passing the row and column index (0 , 0).
Specific row and column values can be sliced with the syntax
Let’s try to slice the central elements of the array.
OpenCV is an open-source python library for Computer Vision developed by Intel . I will discuss a few usages of OpvenCv though its scope is vast.
You will find the official documentation here.
I have used the following image for demonstration purposes.
- Importing OpenCV and Matplotlib library
Matplotlib is a visualization library. It helps to visualize the image.
- Loading the image with OpenCV and visualize with matplotlib
We have read the image with OpenCV and visualized it with the matplotlib library. The color has been changed because OpenCV reads the image in BGR format instead of RGB, but matplotlib expects the image in RGB format. So, we need to convert the image from BGR to RGB.
- Converting the image from BGR to RGB format
Now, the image seems okay.
- Converting image to grayscale
We can easily convert the image from BGR to grayscale with
cv2.COLOR_BGR2GRAY is as follows.
The above image is not properly gray though it has been converted to grayscale. It has been visualized with matplotlib. By default, matplotlib uses color mapping other than grayscale. To properly visualize it, we need to specify the grayscale color mapping in matplotlib. Let’s do that.
Rotating is also an easy task with
cv2.rotate() function helps us to do that.
Clockwise and anticlockwise 90-degree and 180-degree rotation have shown below.
We can resize the image by passing the width and height pixel values to the
Sometimes we need to draw on an existing image. For example, we need to draw a bounding box on an image object to identify it. Let’s draw a rectangle on the flower.
cv2.rectangle() function helps to draw on it. It takes some parameters like the image on which we draw the rectangle, the coordinate point of the upper left corner
(pt1) and the lower right corner
(pt2), and the thickness of the boundary line. A coding example is given below.
There are other drawing functions
cv.line(), cv.circle() , cv.ellipse(), cv.putText(), etc. The full official documentation is available
Play with NumPy
We will change the intensity value of an image. I will try to keep it simple. So, consider the grayscale image shown previously. Find the shape of the image.
It shows it is a
2D array with a size of 1200 x 1920. In the basic NumPy operation, we learned how to slice an array.
Using the concept, we have taken the grayscale image array slice
[400:800, 750:1350] and replaced the intensity values with
255. Finally, we visualize it and find the above image.
Computer vision is one of the promising fields in modern computer science technology. I always emphasize the basic knowledge of any domain. I have discussed just the primary knowledge of computer vision and shown some hands-on coding. The concepts are very simple but may play a significant role for the beginner of computer vision.
This is the first article of the computer vision series. Get connected to read the upcoming articles.
[N.B. Instructor Jose Portilla’s course helps me to gather knowledge.]