For some time in the past, running ML algorithms and neural networks on AMD GPUs has been a non-trivial task for an ordinary developer. As part of the GEMS AI/ML team at Luxoft DXC, we spend a lot of time working on computer vision and machine learning algorithms. Among other projects, we do internal research and development for AMD — one of our major customers.
Our close collaboration with AMD provides us with early access to versatile AMD video cards that we use in our research work. Compared to competing hardware, AMD GPUs have numerous significant advantages in floating-point calculations. And what is more, the latest release of the AMD ROCm™ open software platform introduced support for new features and offered a great number of improvements, which would make lives of ML engineers much, much easier!
That is why, at some point, we decided to try running our ML inference on AMD GPUs. In this article, we want to share our journey about how we applied new ML approaches in our project and used those to run ML models with AMD.
- In this part, we will talk about the stack of neural networks and computer vision approaches that we applied.
- In the second part, we will describe how we tested our models using the ROCm software stack.
The core idea for our latest large project was implementing a smart interior editor that would allow users changing their room layout and design in just a couple of clicks. At the fundamental level, we needed to automate the following processes: performing image segmentation, finding frame perspective, determining room geometry, removing objects within the frame and adding new 3D models. Additionally, we needed to develop an algorithm that would apply a new material texture with the light and shade effects.
After studying the subject area and running the first tests, we came up with the following pipeline for our smart editor:
- First, we should determine the parameters of the room: consider the segmentation mask, layout mask (room geometry) and vanishing points (room perspective).
- Next, we should find individual objects on the segmentation mask for subsequent removal.
- Then we can remove existing objects.
- After that, we need to apply new textures, materials to the floor and walls.
- Our next step is transferring shadows from the original image and applying them.
- And finally, we can calibrate the 3D camera by the vanishing point and add new objects.
To implement this pipeline, we used machine learning models. In the following portion of the article, we will describe our network in more detail.
1. Machine Learning Algorithms
Segmentation is a well–studied task, so there are a great number of canned solutions. To compare models for segmentation, we used the mmseg framework. It includes a great number of models for segmentation and weights for various datasets.
For training and testing models, we used the configuration file approach. This approach allowed us to unify work with different models and datasets. We chose the ADE20K dataset because it contains the largest number of indoor scenes available in the open source.
As a model for segmentation, we chose the Swin-Transformer introduced in 2021. The transformers architecture has shown good results on NLP tasks, and it is also good for processing images. Our tests on the floor class showed the accuracy of 77.8%.
1.2 Room layout
To replace the wall texture, it was necessary to split the wall segmentation mask into segments and determine the wall type (front, left or right), since the perspective depended on it. We did this using the layout model. This stage was actually a bottleneck because the model often made mistakes. To reduce the amount of these mistakes, we tried all models available in the open-source domain, and finally decided on the lsun-room.
The architecture of the model was similar to the segmentation model. For a better result, the authors used an adaptive edge penalty in the loss function, which penalized the model for distorted edges.
1.3 Vanishing points
The perspective algorithm was based on finding vanishing points on the image plane. These are the points at which receding parallel lines seem to be meeting when represented in the linear perspective.
To search for vanishing points, we tried several algorithms. For example, XiaohuLuVPDetection detects lines in the image, and then clusters these lines to get 3 points. However, there still exists a problem of false lines, which often results in poor quality. So we decided to use an algorithm based on the neurvps neural network.
The diagram below illustrates the overall structure of the NeurVPS network. Taken an image and a vanishing point as an input, the network predicts the probability of a candidate being near a ground-truth vanishing point.
The network has two parts: a backbone feature extraction network and a conic convolution sub-network. The backbone is a conventional CNN that extracts semantic features from images.
On the inference, we had the following algorithm:
First sample N points on the unit sphere and calculate their likelihoods of being the line direction of a vanishing point using the trained neural network classifier. Then pick the top K candidates and sample another N points around each of their neighbors. This step is repeated until we reach the desired resolution.
The following figure shows unit spheres with candidates. The algorithm gives a good result with 3 iterations.
1.4 Object removal
To delete objects, at first we simply replaced them with a new texture. However, there was a problem with shadows, since we could not apply shadows to the objects from the original image.
So we started looking for other algorithms and finally found LaMa. It allows highlighting objects that need to be removed with a mask and replaces them with a background.
2.1 Texture transfer
Now let’s take a closer look at the algorithm for replacing the surface material using the models described in the previous part.
At this point, we have found the vanishing points, and now we can let lines from these points to the floor area to find the bounding lines from each point. We need 4 points of the lines intersection. Now knowing these points, we feed them to the input of the function that will create a transformation matrix for the texture.
Now we can apply the transformation to the texture. In Perspective Transformation, we need to provide the points on the image from which want to gather information by changing the perspective. We also need to provide the points inside which we calculate on previous step. We get the 3×3 matrix of a perspective transform with cv2.getPerspectiveTransform.
At the next step, we apply the resulting transformation matrix to the texture using the cv2.warpPerspective function.
And at the end, we cut off the excess area by the segmentation mask.
The wall material is replaced for each wall segment independently:
- On the segmentation mask, we separate the wall segment obtained from the layout mask
- We choose the desired vanishing points, depending on the wall type (left, front, right)
- The texture replacement procedure is similar to the one used for the floor
2.2 Shadows transfer
To make the new floor texture realistic, we need to apply shadows and lights from the original image. To do this, we first get the floor using a segmentation mask and convert the image to grayscale. Then we will apply a blur to remove the texture of the floor and keep only the shadows and lights.
Using the resulting image, we build a histogram and find the most common shade of gray. We believe that this is the real color of the floor.
Now we can find the threshold of dark and light areas on the floor. We will assume that a decrease in brightness by 1.5 times will be a shadow, and an increase will be light. We get masks of light and shadow. The figure on the left shows an example of a shadow mask.
As you can see, we got too sharp an edge, so we need to apply the bluer again. To do this, we fill the dark areas inside the floor with the maximum brightness value and apply a bluer. For the light mask, the algorithm is the same, but we do not fill in the dark areas.
We can now apply masks to the target image.
First, we will apply a shadow mask. To do this, we convert the shadow mask from grey to RGBA. Next, we put zero in the alpha channel if the pixel is not a shadow. And blend with original image with blend mode: multiply. For a light mask, the algorithm is the same.
2.1 Adding 3D objects
To add new objects, we used the Three library.js. Its functionality is similar to conventional 3D editors: adding background, 3D camera, light, 3D objects.
To add 3D objects to an image, we used the following approach:
- Create a scene
- Add an image to the background
- Calibrate the 3D camera for images, to preserve the perspective
- Add a light source
- Add 3D objects from the user
Using the above algorithm, we faced difficulties in calibrating the camera, since it required additional parameters. At the end, we have successfully automated this stage.
We didn’t find a canned solution for camera calibration, so we had to do it ourselves. After researching the camera calibration methods, we found the fspy application. This application allows users to manually set parallel lines, and then export the camera parameters.
We managed to find out how the algorithm for calculating camera parameters works, using vanishing points. Since we could calculate the points automatically from the image, the user’s participation was no longer required.
Bringing all the pieces together, we achieved the following results for the interior redesign.
This is where we can stop talking about the ML/AI algorithms. To learn how we tested the described models using the ROCm software stack, check our second article Part II. Running ML inference with AMD GPU and ROCm.
As part of the AI community, we are always open for discussion. If you find our experience useful, please be free to share your thoughts. Any and all feedback and contribution are welcome.
Arthur Shaikhatarov is an AI/ML lead specializing in management and facilitation of the development team. Arthur’s major customer is AMD, while he’s also working with other IT giants.
Aleksandr Nefedov is an AI/ML Engeneer. Alexander does research and development in the fields of computer vision, machine learning and NLP.