Holo3.1: Redefining Computer-Use Agents Across Environments

Last March, we proudly introduced Holo3, a cutting-edge computer-use model that revolutionized workflows from browser automation to desktop applications. This immediate adoption by developers, enterprises, and partners underscored a growing need: users demanded more than just performance. They sought the ability to utilize the same capabilities seamlessly across both desktop and mobile environments.

Contents

Bridging Environments: A New Era with Holo3.1
Mobile Automation: Unlocking New Potential
Optimized Cross-Harness Performance
Cost-Performance Tradeoffs with Smaller Models
Pioneering Local Agents on Consumer Hardware
The Holo3.1 Family: A Diverse Offering

Bridging Environments: A New Era with Holo3.1

Recognizing the necessity for robust integration across various frameworks, we are excited to announce the Holo3.1 family. This suite is specially designed to enhance performance across three critical dimensions: environments (including web, desktop, and mobile), agent frameworks, and deployment targets.

For the first time, we are releasing quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4. This advancement marks a significant step toward our vision of universal computer-use agents—systems capable of operating seamlessly across diverse platforms and workflows.

Mobile Automation: Unlocking New Potential

Holo3.1 not only expands the capabilities of Holo3; it also introduces substantial improvements in mobile environments. Our analysis on AndroidWorld shows a notable uplift: the 35B-A3B model has improved its performance from 67% to 79.3%. Even with smaller variants, such as the 4B and 9B models, user satisfaction has surged from 58% to an impressive 72%.

This enhancement proves that Holo3.1 isn’t just about scaling performance; it’s about optimizing functionality for mobile users, ensuring they experience the same capabilities and efficiency found in desktop applications.

Optimized Cross-Harness Performance

Holo3.1 understands the complexities of deploying software within various third-party agent stacks. That’s why we’ve introduced native support for function-calling protocols, alongside the structured JSON outputs that Holo3 already offers.

In our benchmarking across environments like OSWorld and various business workflows, Holo3.1 has demonstrated near-parity performance in function-calling and native execution, showcasing over a 25% improvement compared to its predecessor when assessed within our Holotab product harness.

Cost-Performance Tradeoffs with Smaller Models

To cater to a broader audience, we’re also launching several new models sized at 0.8B, 4B, and 9B. These smaller variants are perfect for local and on-device inference, allowing for cost-effective and private deployments. Of course, we still offer the high-performance 35B-A3B model for those looking for state-of-the-art capabilities—all without compromising functionality.

The graph illustrates the performance versus cost for the Holo3.1 and Qwen 3.5 families, showing an average across critical benchmarks.

Pioneering Local Agents on Consumer Hardware

Our release of quantized weights, beginning with the 35B-A3B checkpoints, signifies a monumental change in local deployment. The methods we employed, particularly for NVFP4, utilized NVIDIA’s Model Optimizer for a W4A16 configuration, facilitating fast local inference with minimal degradation in performance.

The speed enhancements are significant: on DGX Spark, the NVFP4 W4A16 configuration allows for 1.41× the total token throughput over FP8 and 1.74× over BF16. This translates to a more efficient and conducive environment for developers and businesses.

This graph measures agent request rates across platforms, demonstrating the advantages of NVFP4.

The Holo3.1 Family: A Diverse Offering

Holo3.1 comes in four distinct sizes, tailored to various deployment needs:

Model	Deployment Target
Holo3.1-0.8B	Ultra-lightweight local agents
Holo3.1-4B	Cost-efficient deployment
Holo3.1-9B	Balanced performance and latency
Holo3.1-35B-A3B	State-of-the-art performance

This comprehensive array not only enhances user choices but also ensures that everyone—from developers to enterprises—can find a solution tailored to their specific needs.

We eagerly anticipate the innovative ways developers will harness the power of Holo3.1 to build exceptional experiences and solutions across all environments.

Inspired by: Source

Holo3.1: Accelerate Local Computer Usage with Smart Agents

Holo3.1: Redefining Computer-Use Agents Across Environments

Bridging Environments: A New Era with Holo3.1

Mobile Automation: Unlocking New Potential

Optimized Cross-Harness Performance

Cost-Performance Tradeoffs with Smaller Models

Pioneering Local Agents on Consumer Hardware

The Holo3.1 Family: A Diverse Offering

Stay Connected

Explore Top AI Tools Instantly

Latest News

Leveraging Moral Rationales for Self-Explaining Hate Speech Detection: A Comprehensive Study

Orbis 2: An Advanced Hierarchical Driving Model for Enhanced Navigation

Join Our August InfoQ Certification Cohorts: Meet the Expert Facilitators

Top 5 High-Performance MCP Servers for Optimal Agentic Development

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Holo3.1: Redefining Computer-Use Agents Across Environments

Bridging Environments: A New Era with Holo3.1

Mobile Automation: Unlocking New Potential

More Read

Optimized Cross-Harness Performance

Cost-Performance Tradeoffs with Smaller Models

Pioneering Local Agents on Consumer Hardware

The Holo3.1 Family: A Diverse Offering

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Leveraging Moral Rationales for Self-Explaining Hate Speech Detection: A Comprehensive Study

Orbis 2: An Advanced Hierarchical Driving Model for Enhanced Navigation

Join Our August InfoQ Certification Cohorts: Meet the Expert Facilitators

Top 5 High-Performance MCP Servers for Optimal Agentic Development