In this article:
Implicit in the framework of classic AI hardware design, there is a need to constantly maintain equilibrium between the compute, memory, and bandwidth in order to prevent bottlenecks. This gets unnecessarily complicated since there is no standardized ‘AI workload’. Neural networks differ a great deal in how they utilize these resources and therefore force system designers to either compromise or design a specialized product.
The additional aspect of power acts as another constraint when we approach Endpoint AI. Out of all factors, memory bandwidth impacts the power utilization the most, followed by compute power utilization
The argument put forth here concludes that most AI workloads with sufficient compute require an NPU. Moreover, these NPUs necessitate large external memories and bandwidth.
Implication of Bigger Neural Networks
In general, the size of a neural network corresponds directly to the size of its inputs and outputs, the complexity of the task at hand, and the desired precision. Simple tasks like identifying handwritten digits require minimal input and output and can therefore be handled by small networks. On the contrary, complicated tasks like the usage of ChatGPT need sizable inputs, extensive neural networks, and extensive amounts of compute.
Enduring Endpoint AI Workloads
Enduring Endpoint AI workloads are distinguished by their constraints – they function on data collected locally while needing to accommodate itself within confined compute and memory limits. Besides, they are also extremely sensitive to power consumption.
This piece mentions the endpoint AI's need for memory (capacity and bandwidth) and compute capacity, which in turn should be regulated to avoid bottlenecks. However, NPUs speed up the compute process without adding memory. While this can benefit some Endpoint AI workloads, most of them do not reap any advantage. The article further details the specific domains where NPUs have proven to be useful.
Among them, real-time complex audio processing and real-time video analytics stand out. There are complex AI tasks like the sound-specific noise recognition that require NPUs due to strict latency constraints. These relatively small models must run every few milliseconds.
The Power Dilemma
In battery-powered environments, the traditional method of saving power has been to stay in sleep mode for as long as possible. With the recent major advances in microcontroller power efficiency, the race to sleep is becoming less relevant, compelling, or even unnecessary.
The Significance of Large Language Models (LLMs)
The best Large Language Models have captured the imagination of the world. In the Endpoint world, cloud-based artificial intelligence could be used to extract deep insights from the more basic and useful insights that Endpoint AI is capable of.
Concluding Remarks
Always-on endpoint AI revolves around analyzing data where it originates to churn out valuable insights. Although there are some domains where Endpoint AI compute acceleration can be advantageous, the most significant constraints are power, memory, and the nature of the data. Hence, most features empowered by Endpoint AI do not derive benefit from the additional compute, especially on extremely power-efficient devices.
For the products best suited for business ‘AI chatbot', we have provided a hand-picked selection of quality items here.
We've worked with the minds at TechRadarPro's Expert Insights channel to facilitate this article. Although the views expressed are the author's, they don't reflect those of TechRadarPro or Future plc. If you wish to contribute as well, find more details here:
Disclaimer: This article contains affiliate marketing links, and we will receive a small commission for purchases made through these links.