Boost AI Performance On Snapdragon Android Devices With Google's New LiteRT Accelerator

### Google’s Qualcomm AI Engine Direct (QNN): Revolutionizing On-Device AI

Google recently unveiled its powerful accelerator for LiteRT, termed Qualcomm AI Engine Direct (QNN), specifically designed to amplify on-device AI performance on Qualcomm-powered Android devices featuring Snapdragon 8 series System on Chips (SoCs). This innovative technology promises breathtaking improvements in execution speed, leading to up to **100 times faster performance compared to CPU execution** and an impressive **10 times faster than GPU** performance.

### Addressing Performance Bottlenecks in Mobile AI

Despite the widespread availability of GPU horsepower in modern Android devices, relying solely on these processors for AI-related tasks often creates performance bottlenecks. Google software engineers Lu Wang, Wiyi Wanf, and Andrew Wang pointed out a common scenario: executing a compute-heavy, text-to-image generation model while simultaneously processing real-time camera feeds can easily overwhelm even the most advanced mobile GPUs. This overload may lead to a jittery experience and dropped frames, significantly impairing user interaction.

In contrast, many modern devices come equipped with **Neural Processing Units (NPUs)**—custom AI accelerators designed to enhance the speed and efficiency of AI workloads while drawing less power than traditional GPUs. This adaptation to leverage NPUs allows for a more seamless, high-performance experience in AI applications.

### Introducing the QNN Accelerator

Developed in collaboration with Qualcomm, QNN serves as a refined replacement for the previous TensorFlow Lite QNN delegate, streamlining the development process for creators. By integrating multiple SoC compilers and runtimes into a cohesive workflow, QNN presents a simplified API for developers. This tool supports **90 LiteRT operations**, aiming to facilitate **full model delegation**, a critical factor in achieving peak performance across various applications.

Included within the QNN framework are specialized kernels and optimizations tailored to enhance the performance of **Large Language Models (LLMs)** such as Gemma and FastLVM, significantly increasing the capabilities of AI applications.

### Benchmarked Performance Gains

Google conducted rigorous benchmarking of QNN across **72 machine learning models**, with an impressive **64 of those models achieving full NPU delegation**. These assessments revealed overwhelming performance benefits, showcasing gains of up to **100 times over CPU execution** and **10 times over GPU execution**.

A notable example of these enhancements can be seen on Qualcomm’s latest flagship SoC, the **Snapdragon 8 Elite Gen 5**. Here, the performance advantage is eye-opening: over **56 models are executed in under 5 milliseconds** with the NPU, while only **13 models reach that execution time on the CPU**. This advancement unlocks a realm of real-time AI experiences previously deemed unattainable.

### Cutting-Edge Use Cases and Applications

Google engineers have even developed a pioneering concept application that harnesses optimized versions of Apple’s FastVLM-0.5B vision-encoding model. This application can interpret scenes captured by the camera nearly instantaneously. On the Snapdragon 8 Elite Gen 5 NPU, it achieves a remarkable **time-to-first-token (TTFT) of just 0.12 seconds** on 1024×1024 images, along with processing speeds over **11,000 tokens per second** for prefill and more than **100 tokens per second** for decoding.

The optimization process for Apple’s model involved **int8 weight quantization** and **int16 activation quantization**, a vital aspect that unlocks the full potential of the NPU’s high-speed **int16 kernels**.

### Compatibility and Getting Started

Currently, QNN supports a selective range of Android hardware, focusing primarily on devices powered by the **Snapdragon 8** and **Snapdragon 8+** SoCs. Developers eager to explore these advancements can visit the NPU acceleration guide and download **LiteRT** from GitHub to get started with these innovative capabilities.

### Summary of Insights

With QNN, Google is setting a compelling precedent in the world of mobile AI, not only maximizing processing efficiency but also paving the way for interactive, real-time applications that were previously constrained by hardware limitations. By leveraging NPUs and revolutionizing the way on-device AI is approached, QNN stands to redefine user experiences across Qualcomm-powered Android devices.

Inspired by: Source

Boost AI Performance on Snapdragon Android Devices with Google’s New LiteRT Accelerator

Stay Connected

Explore Top AI Tools Instantly

Latest News

NAACP Lawsuit Claims Elon Musk’s xAI Pollutes Black Neighborhoods Near Memphis

Enhancing Gradient Concentration to Distinguish Between SFT and RL Data

Optimizing Use-Case Based Deployments with SageMaker JumpStart

Unlocking Vector Databases and Embeddings Using ChromaDB: A Comprehensive Guide on Real Python

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.