Real-world implementation of ADAS-CAS with LIDAR-Camera Sensor Fusion, using RPi, RPLIDAR A1, Pi Cam V2, LED SHIM, NCS 2, Speaker, and battery.
India accounts for only 1% of total vehicles in the world. However, World Bank’s survey reports 11% of global road death happens in India, exposing the dire need to enhance road safety. Most of the developing countries like India pose a set of challenges, unique on their own. These include chaotic traffic, outdated vehicles, lack of pedestrian lanes and zebra crossings, animals crossing the road, and the like. Needless to say, most vehicles don’t have advanced driver-assist features nor can they afford to upgrade the car for better safety.
Against this backdrop, this solution aims to augment even the least expensive cars in India with an ultra-cheap ADAS Level 0, i.e. collision avoidance and smart surround-view. Modern cars with a forward-collision warning (FCW) system or autonomous emergency braking (AEB) are very expensive, but we can augment such functions on old cars, at a low cost.
The complete source code of this solution is available here
The idea is to use a battery-powered Pi connected to a LIDAR, Pi Cam, LED SHIM, and NCS 2, mounted on the car bonnet to perceive frontal objects with their depth & direction. This not only enables a forward-collision warning system but also smart driver assistance that gives alerts about traffic signs or pedestrians, walking along the roadside, or crossing the road.
LIDAR uses laser beams to compute distance based on reflection time.
Cameras generally have higher resolution than LiDAR but cameras have a limited FOV and can’t estimate distance. While rotating LIDAR has a 360° field of view, Pi Cam has only 62×48 degrees Horizontal x Vertical FoV. As we deal with multiple sensors here, we need to employ visual fusion techniques to integrate the sensor output, that is to get the distance and angle of an obstacle in front of the vehicle. Let’s first discuss the theoretical foundation of sensor fusion before hands-on implementation.
Each sensor has its own advantages and disadvantages. Take, for instance, RADARs are low in resolution, but are good at measurement without a line of sight. In an autonomous car, often a combination of LiDARs, RADARs, and Cameras are used to perceive the environment. This way we can compensate for the disadvantages, by combining the advantages of all sensors.
- Camera: excellent to understand a scene or to classify objects
- LIDAR: excellent to estimate distances using pulsed laser waves
- RADAR: can measure the speed of obstacles using Doppler Effect
The camera is a 2D Sensor from which features like bounding boxes, traffic lights, lane divisions can be identified. LIDAR is a 3D Sensor that outputs a set of point clouds. The fusion technique finds a correspondence between points detected by LIDAR and points detected by the camera. To use LiDARs and Cameras in unison to build ADAS, the 3D sensor output needs to be fused with 2D sensor output by doing the following steps.
1. Project the LiDAR point clouds (3D) onto the 2D image.
2. Do object detection using an algorithm like YOLOv4.
3. Match the ROI to find the interested LiDAR projected points.
By doing the 3 steps above, the surrounding objects would be measured and classified using LIDAR-Camera fusion.
When a raw image from a cam is merged with raw data from RADAR or LIDAR then it’s called Low-Level Fusion or Early Fusion. In Late Fusion, detection is done before the fusion. Note that there are many challenges to project the 3D LIDAR point cloud on a 2D image. The relative orientation and translation between the two sensors must be considered in performing fusion.
- Rotation: The coordinate system of LIDAR and Camera can be different. Distance on the LIDAR may be on the z-axis, while it is x-axis on the camera. We need to apply rotation on the LIDAR point cloud to make the coordinate system the same, i.e. multiply each LIDAR point with the Rotation matrix.
- Translation: In an autonomous car, the LIDAR can be at the center top and the camera on the sides. The position of LIDAR and camera in each installation can be different. Based on the relative sensor position, translate the LIDAR Points by multiplying with a Translation matrix.
- Stereo Rectification: For stereo camera setup, we need to do Stereo Rectification to make the left and right images co-planar. Thus, we need to multiply with matrix R0 to align everything along the horizontal Epipolar line.
- Intrinsic calibration: Calibration is the step where you tell your camera how to convert a point in the 3D world into a pixel. To account for this, we need to multiply with an intrinsic calibration matrix containing factory calibrated values.
To sum it up, we need to multiply LIDAR points with all the 4 matrices to project on the camera image.
To project a point X in 3D onto a point Y in 2D,
- P = Camera Intrinsic Calibration matrix
- R0 = Stereo Rectification matrix
- R|t = Rotation & Translation to go from LIDAR to Camera
- X = Point in 3D space
- Y = Point in 2D Image
Note that we have combined both the rigid body transformations, rotation, and translation, in one matrix, R|t. Putting it together, the 3 matrices, P, R0, and R|t account for extrinsic and intrinsic calibration to project LIDAR points onto the camera image. However, the matrix values highly depend on our custom sensor installation.
This is just one piece of the puzzle. Our aim is to augment any cheap car with an end-to-end collision avoidance system and smart surround view. This would include our choice of sensors, sensor positions, data capture, custom visual fusion, and object detection, coupled with a data analysis node, to do synchronization across sensors in order to trigger driver-assist warnings to avoid danger.
First, we need to assemble the Pi with RPLIDAR A1, Pi Cam, LED SHIM, and NCS 2. 2D LIDAR is used instead of 3D LIDAR as we aim to make the gadget, cheapest possible. The unit is powered by a 5V 3A 10K mAH battery. For ease of assembly, a LIDAR mount is 3D printed and attached to the RPi. Part of the mount design is taken from the STL files obtained from here
Connect RPLIDAR A1 with the USB adapter that is connected to the Pi USB using a micro-USB cable. LIDAR’s adapter provides power and converts LIDAR’s internal UART serial interface to a USB interface. Use an Aux-to-Aux cable to connect RPi to speakers. Due to physical constraints, an LED SHIM is used instead of Blinkt to signal warning messages. While the total cost of the ADAS gadget is just around US$ 150–200, one may have to shell out at least $10–20K more, to get a car model with such advanced features.
Let’s imagine, a 3D LIDAR is connected the same way as above. First, we will try to solve 3D LIDAR-Camera Sensor Fusion on the above gadget. Then we will see the variation for 2D LIDAR-Camera Fusion, so as to make it work on RPLIDAR A1.
It is clear from the above discussion that we need to do rotation, translation, stereo rectification, and intrinsic calibration to project LIDAR points on the image. We will try to apply the above formula based on the custom gadget that we built.
From the above image, you can estimate the Pi Cam is 10 mm below the LIDAR scan plane. i.e. a translation of [0, -10, 0] along the 3D-axis. Consider Velodyne HDL-64E as our 3D LIDAR, which requires 180° rotation to align the coordinate system with Pi Cam. We can compute the R|t matrix now.
As we use a monocular camera here, the stereo rectification matrix will be an identity matrix. We can make the intrinsic calibration matrix based on the hardware spec of Pi Cam V2.
For the Raspberry Pi V2 camera,
- Focal Length (FL) = 3.04 mm
- FL Pixels = focal length * sx, where sx = real world to pixels ratio
- Focal Length * sx = 3.04mm * (1/ 0.00112 mm per px) = 2714.3 px
Due to a mismatch in shape, the matrices cannot be multiplied. To make it work, we need to transition from Euclidean to Homogeneous coordinates by adding 0’s and 1’s as the last row or column. After doing the multiplication we need to convert back to Homogeneous coordinates.
You can see the 3D LIDAR-CAM sensor fusion projection output after applying the projection formula on the 3D point cloud. The input sensor data from 360° Velodyne HDL-64E and camera is downloaded  and fed in.
However, the 3D LiDAR cost is a barrier to build a cheap solution. We can instead use cheap 2D LiDAR with necessary tweaks, as it only scans a single horizontal line.
Our gadget is equipped with 2D RP LIDAR A1 to minimize the cost. This LIDAR scans the environment in a 2D plane, orthogonal to the camera plane. The rotating scan will estimate the distance to the obstacle, for each angle from 0° to 360°. Due to the placement of LIDAR w.r.t. Pi Cam in the gadget, the camera is at +90° on the LIDAR geometry. However, note that the Field of View of Pi cam V2 is 62°x48° in horizontal x vertical direction respectively.
The integrated front view of the device is as shown below.
As both the LIDAR and Camera sensor data is available in the frontal 62° arc, we need to fuse the data. In the LIDAR scan plane, the camera data starts from +59° to +59° + 62° = 121°. We can run object detection on the image to get bounding boxes for the objects of interest. Eg: human, car, bike, traffic light, etc. Since 2D LIDAR has only width information, consider only the x_min and x_max of each bounding box.
We need to compute the LIDAR angle corresponding to an image pixel, in order to estimate the distance to the pixel. To find the distance to the object inside the bounding box, compute θ_min and θ_max corresponding to x_min & x_max using the below formula, based on the above diagram,
Now you can find the distance to each angle between θ_min and θ_max based on the latest LIDAR scan data. Then compute the median distance of all LIDAR points that subtends the object bounding box to estimate the object depth. If the distance is below a threshold, then trigger a warning based on the angle. Repeat warning, if the box center shift by a significant distance in subsequent frames.
To build a non-blocking system flow, a modular architecture is employed wherein, each independent node is dependent on different hardware components. i.e., the “object detection” node uses Movidius for inference, whereas the “distance estimation” node takes LIDAR data as input, while the “Alert” module signals to Pimoroni Blinkt and the speaker. The modules are communicated via MQTT messages on respective topics.
The time synchronization module takes care of the “data relevance factor” for Sensor Fusion. For ADAS, the location of detected objects by ‘Node 1’ may change, as the objects move. Thus, the distance estimate to the bounding box could go wrong after 2–3 seconds (while the message can remain in the MQTT queue). In order to synchronize, current time = 60*minutes + seconds is appended to the message (to ignore lagged messages).
The model output is sent from Node 1 to Node 2, where the LiDAR-Cam sensor fusion happens, further pushing a message to Node 3. For the system to function, the 3 MQTT nodes should work in tandem, orchestrated by MQTT messages, published, and subscribed on respective topics.
On top of this post, you can watch the gadget plying the Indian Roads, giving driver-assist alerts after perceiving the surrounding objects, along with their depth and direction. The speaker is be wired internally or a Bluetooth speaker is to be integrated to make the announcement more audible to the driver.
This ADAS device can be connected to the CAN bus to let it accelerate, steer, or apply the brake. RPi doesn’t have a built-in CAN Bus, but its GPIO includes SPI Bus, which is supported by a number of CAN controllers like MCP2515. So, autonomous emergency braking (AEB) and collision avoidance system (CAS) can be done by connecting this device to the CAN bus.
The Indian traffic conundrum is so unique that it demands custom solutions. To start with, we need to train object detection models with Indian vehicles such as trucks, tempos, vans, autos, cycle rickshaws, etc.
Further, to enhance smart surround view, we need to train the model with Indian traffic signs and signboards to give more meaningful driver-assist warnings on Indian roads. It’s a common sight in India, the animals like cows, pigs, buffaloes, goats, dogs, etc., cross the roads and highways. Hence, it’s beneficial to detect them as well.
For the PoC, see the output of SSD-Mobilenet model trained to classify Indian traffic signs against Indian signboards. You can further classify the traffic sign to decipher the exact meaning of the sign.
The annotated Indian Traffic Sign dataset is provided by Datacluster Labs, India. They are yet to finish the annotation of “Indian Vehicles” database. It’s just a matter of training time to make this gadget, tailor-made for India.
To find out the ROI from images, we have used SSD MobileNet trained on COCO filtered by potential objects. To detect only people and vehicles, you can use this model also to get better speed and accuracy.
More importantly, it is possible to train an object detection model with your custom annotated data and do hardware optimization using OpenVINO and deploy on Pi, as long as the layers are supported by OpenVINO. By doing so, we can deploy an object detection model on RPi, to locate custom objects.
1. First, choose an efficient object detection model such as Efficientdet, SSD-Mobilenet, Tiny-YOLO, YOLOX, etc which are targeted for low power hardware. I have experimented with all the mentioned models on RPi 4B and SSD-Mobilenet fetched maximum FPS.
2. Do transfer learning of object detection models with your custom data.
3. Convert the trained *.pb file to Intermediate representation — *.xml and *.bin using Model Optimizer.
/deployment_tools/model_optimizer/mo_tf.py— input_model — reverse_input_channels — output_dir — tensorflow_object_detection_api_pipeline_config — tensorflow_use_custom_operations_config /deployment_tools/model_optimizer/extensions/front/tf/ssd_support_api_v1.14.json
python3 object_detection_demo.py -d CPU -i
— labels labels.txt -m -at ssd
4. Finally, deploy the hardware optimized models on Pi.
The complete source code of this solution is available here
If you have any query or suggestion, you can reach me here
1. LIDAR-Camera Sensor Fusion High Level: https://www.thinkautonomous.ai/blog/?p=lidar-and-camera-sensor-fusion-in-self-driving-cars
2. LIDAR data scan code stub by Adafruit: https://learn.adafruit.com/remote-iot-environmental-sensor/code
3. Camera Calibration and Intrinsic Matrix Estimation: https://www.cc.gatech.edu/classes/AY2016/cs4476_fall/results/proj3/html/agartia3/index.html
4. Visual Fusion For Autonomous Cars Course at PyImageSearch University: https://www.pyimagesearch.com/pyimagesearch-university/
5. LIDAR Distance Estimation: https://en.wikipedia.org/wiki/Lidar
6. RPLIDAR A1 M8 Hardware Specification: https://www.generationrobots.com/media/rplidar-a1m8-360-degree-laser-scanner-development-kit-datasheet-1.pdf
7. Model Training, Data Cleansing & Augmentation: www.roboflow.com
8. The Indian traffic sign model has been trained using the Traffic dataset offered by Datacluster Labs, India.