A new hardware platform for deep learning on the extreme edge
When we talk about deep learning and neural networks we often think about Google Colab or the latest Nvidia GPUs that can double up as a space heater. But unlike training, deep learning inference can happen on more humble platforms. Some humble enough to even be considered ‘edge compute’.
No, we aren’t here to talk about the Google Coral Edge TPU Board or the NVIDIA Jetson Nano Dev board. Consuming power in the range of 5–10 watts, they are still power guzzlers compared to our topic of discussion. But if you still want to read about them you can check out this detailed comparison by Manu.
We’re talking about a true extreme edge AI compute platform that can do keyword detection and FaceID all while sipping on power in MilliWatts. The closest competitor would be a low power ARM Cortex-M4 or M7 that would be orders of magnitude slower and hence orders of magnitude more energy hungry for a given task. We’re talking about the MAX7800x family of AI Microcontrollers, who’s name is almost as cool as it’s core temperature when running a deep neural network.

Let’s talk about the secret sauce in the MAX78000. Apart from the ARM Cortex M4 primary MCU and RISC-V based Smart DMA, it has a dedicated CNN accelerator that comprises of 64 parallel processors. Yes, 64. It also has a 432kB weight memory for up to 3.5 million weights (assuming 1 bit weights) that is SRAM based, so the weights can be changed even after deployment. This accelerator can be used to implement a neural network of up to 64 layers — with pooling every alternate layer — or 32 layers without pooling with a maximum of 1024 inputs or outputs per layer.
This means, instead of forward propagation running serially in triple nested matrix multiplication loops — as it is on classic hardware — it can run with a much higher degree of parallelism using the CNN accelerator. Additionally, while is primarily a CNN accelerator, it can also be used to implement traditional neural networks as well as RNNs.
A deep learning engineer may describe a 64-layer, 0.4-3.5 million param neural network as tiny compared to other modern networks, but a lot can still be done within these constraints. It will be interesting to see how the deep learning community innovates with constrained hyper-parameters.
The recently launched MAX78002 takes this a step further by almost doubling most of the specs for the CNN accelerator. We can only expect continued improvements and more powerful accelerators in the near future.
Getting started with the MAX7800x family has been simplified with the help of this Github repository from Analog Devices which contains the SDK for the MCU as well as all the necessary tools for training and synthesis of your own custom models.
The MAX78000 is truly a unique microcontroller and is set to revolutionise deep learning as we know it. We can expect even more ultra low power deep learning technology as ARM announced that ARM v9 will priortise DSP and ML hardware accelerators. The new ARM Cortex-M55 paired with the Ethos-U55 Neural Processing Unit is expected to give a 480X improvement in ML performance over existing Cortex-M based systems.
Deep learning was already an exciting field, with these latest hardware innovations we can expect it to scale even greater heights with more widespread application. Stay tuned for more musings on EdgeAI and Deep Learning!