The Essential Role of AI-Powered Cameras in Shaping a Smarter World
As the camera market is booming, so does the need to empower cameras with artificial intelligence (AI). In this blog post we discuss the need for on-camera AI and how it enhances video quality and empowers advanced video analytics. We will look at some typical cameras and applications and try to estimate the AI budget required to execute the different scenarios.
Cameras. Cameras Everywhere
In today’s tech-driven world, cameras have become an integral part of our daily lives, and we are used to constantly recording videos and being recorded. The rapid deployment of IP cameras in residential homes, commercial and public space and the industrial sector, is fueling an unprecedented growth in the market, which is estimated by ABI Research to reach 200 million cameras by 2027, with a revenue of $35B. The most significant growth driver for this market is the ability to improve safety and security through video surveillance. From enhancing home security to monitoring public safety and optimizing traffic management, smart cameras provide countless benefits and endless opportunities for a smarter, safer world.
AI isn’t only cloud anymore
With the proliferation of camera deployments, comes the need to automate and enhance the ability to monitor the video streams and generate insights from them, as well as to make streaming and storage of video more efficient and cost-effective. This is where artificial intelligence (AI) comes to play. Since traditional cloud-based AI models often suffer from latency issues, not enabling real-time insights and alerts, in addition to posing some privacy concerns, and network dependency, the need to enable AI at the edge is surging along with the smart camera market. Edge AI guarantees video analytics, insights and alerts in real-time, enabling a higher level of security. In addition, AI at the edge enables streaming of only metadata and insights from the video, thus reducing the cost of transferring, computing and storing on the cloud while enhancing people’s privacy and eliminating network dependencies.
More AI, please
However, most of these IP cameras have a limited amount of compute power. What sets apart the next generation of smart cameras, is their incorporation of high compute power and AI processing capacity, enabling not only processing of complex and advanced video analytics tasks, but also applying AI for video enhancement to provide top quality video image.
As each of these functions – enhancing video quality and enabling advanced video analytics – is demanding its own budget of AI capacity, today’s smart cameras need to be equipped with the right amount of AI power to address the needs.
AI-powered video enhancement
AI can be used to improve image quality and provide a clear and sharp image even from a poor-quality video. AI can handle a variety of image enhancement tasks such as noise reduction in low light conditions, high dynamic range (HDR) and even some aspects of the classic 3A (Auto exposure, Auto white-balance, Auto focus). AI can correct image distortion, stabilize image and compensate for motion, and enable digital zoom. Extreme low light conditions, for example, may cause a reduction in viewing distance, poor image quality, and poor color capture. Noise also reduces the ability to differentiate details in the image, resulting in increased data size during compression, leading to low efficiency in transmitting and storing video data in the cloud. As can be seen in Figure 1, AI can be leveraged to remove noise while preserving important image details and textures, resulting in higher image quality as measured by signal-to-noise ratio (SNR) and structural similarity index measure (SSIM). As an example, noise removal from a 4K image taken at low light conditions of about 5 Lux, would require approximately 100 GOPS per frame, which is 3 TOPS for real-time video streaming at 30 FPS.
Noise reduction in low light conditions
AI-powered video analytics
As higher resolution video streams become more common, the need arises to process a greater amount of data, detect and identify more complex and granular objects, and execute more tasks and more complex pipelines.
When a camera has enough AI capacity, it can support advanced video analytics on top of the AI-powered video enhancement. These can include running multiple AI tasks and models with complex pipelines on the same video stream, identifying smaller and more distant objects with higher accuracy and less false alarms, or performing faster detection at high resolution. For example, a road surveillance camera can run a complex pipeline such as Automatic License Plate Recognition (ALPR, Figure 2) which requires object detection to identify every car on the road, followed by license plate detection to identify the license plate within every car, and finally license plate recognition to determine the characters in each license plate. With the right amount of compute power, additional tasks and pipelines can be handled by the same camera (Figure 3), for example running SAM – Segment Anything Model, to clearly identify objects in the video, in higher resolution and less false / miss detection. It can then perform classification to identify abnormal, illegal or dangerous behavior such as line crossing, over speeding, dangerous overtaking, driving in the wrong direction, not maintaining enough distance between cars, driving in a non-drivable space, etc. When such an abnormal behavior is recognized the license plate of the violating vehicle can be retrieved in order to generate an alert to law enforcement agents.
How much more then?
To obtain high accuracy analytics at high image quality, smart cameras need enough AI power to run both video enhancement and analytic tasks in parallel. The ability to intertwine video enhancement and analytics tasks benefits both the visual outcome as well as the analytics insights. Looking at the example in figure 3, in order to get an accurate license plate number and person identification, the camera’s vision processor needs to apply semantic awareness in order to run denoising techniques selectively based on the semantic significance of elements in the frame and perform video processing differently based Region of Interest (RoI). Noise reduction needs to be applied while preserving important image details and textures, for higher SNR & SSIM, and the processing of both noise reduction and object detection need to occur simultaneously, for faster input and higher FPS, especially in high-resolution video.
For basic vision tasks such as denoising, a 2MP camera would require around 0.5TOPS, and for basic video analytics pipelines such as object detection or people counting, an additional 1 TOPS is needed. To add advanced video enhancement features such as HDR or x4 upscaling digital zoom, an additional 1 TOPS is needed and for advanced analytics multistage pipeline with 3-5 stages, such as LPR or facial recognition 2 more TOPS will be needed, to a total of 4.5 TOPS. A camera with a 8MP sensor or 2x4MP would need 4 times the amount of TOPS respectively.
A portfolio of processors for a wide variety of cameras
In the spring of 2023 Hailo launched a family of powerful AI vision processors. The 2nd generation of Hailo’s processors, named Hailo-15™ include three variants that were designed to fit every camera type and need. For the high-end cameras Hailo is offering the Hailo-15H with 20 TOPS which enable running all the basic and advanced video enhancement and analytics pipelines with some TOPS to spare for future extension of camera capabilities. The Hailo-15M and Hailo-15L provide 11 TOPS and 7 TOPS respectively, and are offered at competitive pricing to match the range of cameras they support. The Hailo-15 family of system on a chip (SoCs) can process multiple ML models simultaneously, at a very low power consumption, in accordance with camera design requirements. As the NN core of the Hailo-15™ vision processors enables fusing the both the video enhancement and video analytics tasks, it achieves superior processing efficiency. No other vision processor currently offered in the market can support these AI TOPS numbers.
The bottom line
The integration of AI into smart cameras is opening a plethora of opportunities for visual intelligence. From transforming video quality to powering advanced analytics, AI at the edge is revolutionizing industries such as security, industrial automation, retail and more. As we witness a paradigm shift in the way we capture, process, and interpret visual information, embracing this AI revolution in smart cameras will undoubtedly reshape additional sectors, improving safety, efficiency, and the overall user experience. Contact Hailo today to learn how our technology allows you to realize the full potential of AI in your smart camera applications.
Bochkovskiy, A., Wang, C.-Y., Hong-Yuan, & Liao, M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. Tech Report.
Girshick, R. (2015). Fast R-CNN. ICCV.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR.
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. CVPR.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal Loss for Dense Object Detection. ICCV.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. ECCV.
Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. Tech Report.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. CVPR.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NeurIPS.
Tan, M., Pang, R., & Le, Q. V. (2020). EfficientDet: Scalable and Efficient Object Detection. CVPR.
Tian, Z., Shen, C., Chen, H., & He, T. (2019). FCOS: Fully Convolutional One-Stage Object Detection. ICCV.
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as Points. Tech Report.