In this project, RetinaNet is trained in Google Colab to detect price-tags and extract product name, product ID and price. Here was combined several libraries and approaches. This article could be used as a good draft for your pet-project or first attempts with object detection in computer vision. Part I devoted to the data preparation and augmentation.
In this article, RetinaNet is trained in Google Colab to detect price tags and extract useful information from it. First of all we are going to detect price-tag and crop it from the whole image. After this step we have to do some work to deal with photo distortion. In this case we need to rotate image and make it horizontal. Next, detect separate parts from the price-tags and extract text. For pet-project could be used flask for operationalization.
Creating test and train sets
Our data set was taken from the mall. You can do your own . For this purpose just take some photos of price-tags in the nearest supermarket. As usual we have good and already labeled data. But this time we need to do all dirty work by ourselves.
So, first things come first. We are going to create annotations for all available images. For this reason it will be good to use LabelImg. It is a graphical image annotation tool. It is written in Python and have a convenient graphical interface. The annotations are saved as XML files in PASCAL VOC format. Here are simple installation steps:
Unfortunately, for our reason it will be more convenient to use CSV annotation then PASCAL VOC in XML. So, here I will provide a link to nice and usefull sript for XML to CSV transformations. At this stage it will be useful to split the data into test and train sets. In our case we separated 176 images into 150 train and 26 test photos. And only after this step transform XML annotations into CSV format.
After this, we will get two CSV files with classes and annotations. But 150 images is too small data set for object detection model. That is why we are going to make augmentation and save our new annotations. For this purpose we will use albumenation library. Just install it using pip and import with alias as A.
This library allows us to make some transformation of the images. Let’s create an example transformer and check the result:
Our goal is to create several transformed photos from one. Particularly, in our case 1500 from 150. So, create a list of transforms:
And finally, apply predefined function to our train data in prepared folder with CSV annotations. We read each jpg file in the directory and filter the existing CSV file to find bounding boxes coordinates. After that we use each transform from our list and save new file and new bounding boxes in a new folder.
OK, everybody knows that data wrangling and preparation are the least interesting parts of data science work. But, from the quality of the model will be defined by the quality of our work at this stage. So, we have labeled all necessary photos, split them into train and test, and apply augmentation. Next, we will download RetinaNet and test it. Very useful option could be to do anchor optimization of the model, which will be described in the next part.