Holo3.1: Redefining Computer-Use Agents Across Environments
Last March, we proudly introduced Holo3, a cutting-edge computer-use model that revolutionized workflows from browser automation to desktop applications. This immediate adoption by developers, enterprises, and partners underscored a growing need: users demanded more than just performance. They sought the ability to utilize the same capabilities seamlessly across both desktop and mobile environments.
Bridging Environments: A New Era with Holo3.1
Recognizing the necessity for robust integration across various frameworks, we are excited to announce the Holo3.1 family. This suite is specially designed to enhance performance across three critical dimensions: environments (including web, desktop, and mobile), agent frameworks, and deployment targets.
For the first time, we are releasing quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4. This advancement marks a significant step toward our vision of universal computer-use agents—systems capable of operating seamlessly across diverse platforms and workflows.
Mobile Automation: Unlocking New Potential
Holo3.1 not only expands the capabilities of Holo3; it also introduces substantial improvements in mobile environments. Our analysis on AndroidWorld shows a notable uplift: the 35B-A3B model has improved its performance from 67% to 79.3%. Even with smaller variants, such as the 4B and 9B models, user satisfaction has surged from 58% to an impressive 72%.
This enhancement proves that Holo3.1 isn’t just about scaling performance; it’s about optimizing functionality for mobile users, ensuring they experience the same capabilities and efficiency found in desktop applications.
Optimized Cross-Harness Performance
Holo3.1 understands the complexities of deploying software within various third-party agent stacks. That’s why we’ve introduced native support for function-calling protocols, alongside the structured JSON outputs that Holo3 already offers.
In our benchmarking across environments like OSWorld and various business workflows, Holo3.1 has demonstrated near-parity performance in function-calling and native execution, showcasing over a 25% improvement compared to its predecessor when assessed within our Holotab product harness.
Cost-Performance Tradeoffs with Smaller Models
To cater to a broader audience, we’re also launching several new models sized at 0.8B, 4B, and 9B. These smaller variants are perfect for local and on-device inference, allowing for cost-effective and private deployments. Of course, we still offer the high-performance 35B-A3B model for those looking for state-of-the-art capabilities—all without compromising functionality.
The graph illustrates the performance versus cost for the Holo3.1 and Qwen 3.5 families, showing an average across critical benchmarks.
Pioneering Local Agents on Consumer Hardware
Our release of quantized weights, beginning with the 35B-A3B checkpoints, signifies a monumental change in local deployment. The methods we employed, particularly for NVFP4, utilized NVIDIA’s Model Optimizer for a W4A16 configuration, facilitating fast local inference with minimal degradation in performance.
The speed enhancements are significant: on DGX Spark, the NVFP4 W4A16 configuration allows for 1.41× the total token throughput over FP8 and 1.74× over BF16. This translates to a more efficient and conducive environment for developers and businesses.
This graph measures agent request rates across platforms, demonstrating the advantages of NVFP4.
The Holo3.1 Family: A Diverse Offering
Holo3.1 comes in four distinct sizes, tailored to various deployment needs:
| Model | Deployment Target |
|---|---|
| Holo3.1-0.8B | Ultra-lightweight local agents |
| Holo3.1-4B | Cost-efficient deployment |
| Holo3.1-9B | Balanced performance and latency |
| Holo3.1-35B-A3B | State-of-the-art performance |
This comprehensive array not only enhances user choices but also ensures that everyone—from developers to enterprises—can find a solution tailored to their specific needs.
We eagerly anticipate the innovative ways developers will harness the power of Holo3.1 to build exceptional experiences and solutions across all environments.
Inspired by: Source


