By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
AIModelKitAIModelKitAIModelKit
  • Home
  • News
    NewsShow More
    Thinking Machines Aims to Create Conversational AI That Listens Effectively While Communicating
    Thinking Machines Aims to Create Conversational AI That Listens Effectively While Communicating
    4 Min Read
    OpenAI Unveils Its Response to Claude Mythos: A Comprehensive Overview
    OpenAI Unveils Its Response to Claude Mythos: A Comprehensive Overview
    4 Min Read
    Discover the Latest Developments at Mira Murati’s AI Company: What’s Happening Now?
    Discover the Latest Developments at Mira Murati’s AI Company: What’s Happening Now?
    5 Min Read
    Discover the Latest Innovations in Device Charging Technology
    Discover the Latest Innovations in Device Charging Technology
    4 Min Read
    AI’s True Threat: Worker Surveillance and Control, Not the Job Apocalypse | Understanding Artificial Intelligence
    AI’s True Threat: Worker Surveillance and Control, Not the Job Apocalypse | Understanding Artificial Intelligence
    6 Min Read
  • Open-Source Models
    Open-Source ModelsShow More
    Enhancing Scientific Impact with Global Partnerships and Open Resources
    Enhancing Scientific Impact with Global Partnerships and Open Resources
    5 Min Read
    Top 4 Ways Google Research Scientists Utilize Empirical Research Assistance
    Top 4 Ways Google Research Scientists Utilize Empirical Research Assistance
    5 Min Read
    Unlocking DeepInfra on Hugging Face: Explore Powerful Inference Providers 🔥
    Unlocking DeepInfra on Hugging Face: Explore Powerful Inference Providers 🔥
    5 Min Read
    How AI-Generated Synthetic Neurons are Revolutionizing Brain Mapping
    How AI-Generated Synthetic Neurons are Revolutionizing Brain Mapping
    5 Min Read
    Discover HoloTab by HCompany: Your Ultimate AI Browser Companion
    4 Min Read
  • Guides
    GuidesShow More
    Mastering List Flattening in Python: A Quiz from Real Python
    Mastering List Flattening in Python: A Quiz from Real Python
    4 Min Read
    Test Your Knowledge: Python Memory Management Quiz – Real Python
    Test Your Knowledge: Python Memory Management Quiz – Real Python
    2 Min Read
    Mastering OpenCode: AI-Assisted Python Coding Quiz Guide | Real Python
    Mastering OpenCode: AI-Assisted Python Coding Quiz Guide | Real Python
    2 Min Read
    Master Python & APIs: Your Ultimate Quiz Guide to Accessing Public Data – Real Python
    Master Python & APIs: Your Ultimate Quiz Guide to Accessing Public Data – Real Python
    4 Min Read
    7 Essential OpenCode Plugins to Supercharge Your AI Coding Experience
    7 Essential OpenCode Plugins to Supercharge Your AI Coding Experience
    5 Min Read
  • Tools
    ToolsShow More
    Optimizing Use-Case Based Deployments with SageMaker JumpStart
    Optimizing Use-Case Based Deployments with SageMaker JumpStart
    5 Min Read
    Safetensors Partners with PyTorch Foundation: Strengthening AI Development
    Safetensors Partners with PyTorch Foundation: Strengthening AI Development
    5 Min Read
    High Throughput Computer Use Agent: Understanding 12B for Optimal Performance
    High Throughput Computer Use Agent: Understanding 12B for Optimal Performance
    5 Min Read
    Introducing the First Comprehensive Healthcare Robotics Dataset and Essential Physical AI Models for Advancing Healthcare Robotics
    Introducing the First Comprehensive Healthcare Robotics Dataset and Essential Physical AI Models for Advancing Healthcare Robotics
    6 Min Read
    Creating Native Multimodal Agents with Qwen 3.5 VLM on NVIDIA GPU-Accelerated Endpoints
    Creating Native Multimodal Agents with Qwen 3.5 VLM on NVIDIA GPU-Accelerated Endpoints
    5 Min Read
  • Events
    EventsShow More
    Introducing NVIDIA Spectrum-X: The Open, AI-Native Ethernet Fabric for Gigascale AI with Enhanced MRC Capabilities
    Introducing NVIDIA Spectrum-X: The Open, AI-Native Ethernet Fabric for Gigascale AI with Enhanced MRC Capabilities
    5 Min Read
    NVIDIA and ServiceNow Collaborate on Next-Gen Autonomous AI Agents for Enterprise Solutions
    NVIDIA and ServiceNow Collaborate on Next-Gen Autonomous AI Agents for Enterprise Solutions
    6 Min Read
    Exploring Hack The Box’s Role in Locked Shields 2026: Contributions and Insights
    Exploring Hack The Box’s Role in Locked Shields 2026: Contributions and Insights
    5 Min Read
    Expert Educator Warns: The AI Bubble Is Deflating – Here’s Why
    Expert Educator Warns: The AI Bubble Is Deflating – Here’s Why
    5 Min Read
    Unlocking the Potential of OpenAI’s GPT-5.5: Enhancing Codex Performance on NVIDIA Infrastructure
    Unlocking the Potential of OpenAI’s GPT-5.5: Enhancing Codex Performance on NVIDIA Infrastructure
    5 Min Read
  • Ethics
    EthicsShow More
    Ilya Sutskever Defends His Role in Sam Altman’s OpenAI Ouster: ‘I Aimed to Protect the Company’
    Ilya Sutskever Defends His Role in Sam Altman’s OpenAI Ouster: ‘I Aimed to Protect the Company’
    6 Min Read
    Understanding AI Behavior: Distinguishing Artificial Intelligence from Consciousness
    Understanding AI Behavior: Distinguishing Artificial Intelligence from Consciousness
    5 Min Read
    Understanding Speech Transcription: How It Influences Power Dynamics and Bias
    Understanding Speech Transcription: How It Influences Power Dynamics and Bias
    6 Min Read
    Trump-Xi Summit in Beijing: Prioritizing Shared AI Risks for Global Cooperation
    Trump-Xi Summit in Beijing: Prioritizing Shared AI Risks for Global Cooperation
    6 Min Read
    Exploring AI in the Emergency Department: Promising Potential, Powerful Tools, but Unproven Results
    Exploring AI in the Emergency Department: Promising Potential, Powerful Tools, but Unproven Results
    5 Min Read
  • Comparisons
    ComparisonsShow More
    EgoMemReason: Benchmarking Memory-Driven Reasoning for Long-Horizon Egocentric Video Analysis
    EgoMemReason: Benchmarking Memory-Driven Reasoning for Long-Horizon Egocentric Video Analysis
    5 Min Read
    Unlocking the Potential of Order: Misleading LLMs with Adversarial Table Permutations in Research 2605.00445
    Unlocking the Potential of Order: Misleading LLMs with Adversarial Table Permutations in Research 2605.00445
    5 Min Read
    Enhanced Transformer Language Models: Achieving Sparser, Faster, and Lighter Architectures
    Enhanced Transformer Language Models: Achieving Sparser, Faster, and Lighter Architectures
    5 Min Read
    Enhancing Long-Term Talking Head Generation: AsymTalker for Identity Consistency through Asymmetric Distillation
    Enhancing Long-Term Talking Head Generation: AsymTalker for Identity Consistency through Asymmetric Distillation
    4 Min Read
    Netflix Unveils ‘Model Lifecycle Graph’ to Enhance Enterprise Machine Learning Scalability
    Netflix Unveils ‘Model Lifecycle Graph’ to Enhance Enterprise Machine Learning Scalability
    5 Min Read
Search
  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
Reading: Discover TorchAO Quantized Models and Recipes on Hugging Face Hub for PyTorch
Share
Notification Show More
Font ResizerAa
AIModelKitAIModelKit
Font ResizerAa
  • 🏠
  • 🚀
  • 📰
  • 💡
  • 📚
  • ⭐
Search
  • Home
  • News
  • Models
  • Guides
  • Tools
  • Ethics
  • Events
  • Comparisons
Follow US
  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events
© 2025 AI Model Kit. All Rights Reserved.
AIModelKit > Tools > Discover TorchAO Quantized Models and Recipes on Hugging Face Hub for PyTorch
Tools

Discover TorchAO Quantized Models and Recipes on Hugging Face Hub for PyTorch

aimodelkit
Last updated: September 19, 2025 7:57 pm
aimodelkit
Share
Discover TorchAO Quantized Models and Recipes on Hugging Face Hub for PyTorch
SHARE

Exciting developments in the world of machine learning! PyTorch has announced the native quantized variants of popular models like Phi4-mini-instruct, Qwen3, SmolLM3-3B, and gemma-3-270m-it through an inspired collaboration between the TorchAO team and Unsloth. These innovative models use int4 and float8 quantization techniques to provide efficient inference on high-performance GPUs like A100 and H100, as well as on mobile devices. What’s remarkable is their ability to achieve these advancements while maintaining a minimal to no degradation in model quality compared to their bfloat16 counterparts.

Key Highlights of the New Quantized Models

  • We’ve launched pre-quantized models that are optimized for both server and mobile platforms, ideal for users looking to deploy faster models in their production environments.
  • Complete and reproducible quantization recipes and guides are now available, covering model quality evaluation and performance benchmarking. This resource is invaluable for users applying PyTorch’s native quantization to their own models and datasets.
  • Users can also finetune with unsloth and then quantize the finetuned model with TorchAO.

Post Training Quantization: Models and Results

We proudly present several quantized variants of Phi4-mini-instruct, Qwen3, SmolLM3-3B, and gemma-3-270m-it. Below is a detailed breakdown of our quantization methods, results, and corresponding models:

Quantization methods Results Models
Int4 weight-only quantization with hqq algorithm and AWQ (for server H100 and A100 GPU)
  • 1.1-1.2x speedup on A100 over bfloat16 model and 1.75x on H100 at batch size 1.
  • Small accuracy degradation from bfloat16 model, e.g. Phi4-mini-instruct-INT4 scored 53.28 vs. 55.35 for the baseline bfloat16.
  • For accuracy-critical tasks, Phi4-mini-instruct-INT4 scored 36.98 for mmlu_pro, while careful calibration improved accuracy to 43.13.
  • 60% peak memory reduction.
Phi-4-mini-instruct-INT4, Phi-4-mini-instruct-AWQ-INT4, Qwen3-8B-INT4, Qwen3-8B-AWQ-INT4
Float8 dynamic activation and float8 weight quantization (for server H100 GPU)
  • 1.7-2x speedup on H100 over bfloat16 at batch sizes 1 and 256.
  • Little to no accuracy degradation with scores like Phi-4-mini-instruct-FP8 averaging 55.11 vs. bfloat16’s 55.35.
  • 30-40% peak memory reduction.
gemma-3-270m-it-torchao-FP8, Phi-4-mini-instruct-FP8, Qwen3-32B-FP8
Int8 dynamic activation and int4 weight quantization (for mobile CPU)
  • Small accuracy degradation compared to bfloat16.
  • Facilitates model execution on iOS and Android devices like iPhone 15 Pro and Samsung Galaxy S22.
Phi-4-mini-instruct-INT8-INT4, Qwen3-4B-INT8-INT4, SmolLM3-3B-INT8-INT4

Each of the mentioned models comes complete with reproducible quantization recipes utilizing the TorchAO library. This functionality empowers users to quantize their own models as well.

Seamless Integrations Within the PyTorch Ecosystem

The new PyTorch native quantized models are designed to work harmoniously within the broader PyTorch ecosystem, ensuring that users benefit from robust, high-performance quantization solutions that cater to a variety of deployment requirements.

We leverage an array of tools across the PyTorch stack for model quantization, finetuning, quality evaluation, latency testing, and deployment, guaranteeing that the newly released quantized models and their associated recipes function smoothly throughout the entire lifecycle of model preparation and deployment.

Looking Ahead: Future Innovations

  • New Features
    • Innovations like MoE quantization for both inference and training.
    • Support for new dtype: NVFP4.
    • Enhanced techniques for preserving accuracy during post-training quantization, such as SmoothQuant, GPTQ, and SpinQuant.
  • Collaborations
    • We’re thrilled to continue our partnership with Unsloth, ensuring that TorchAO is accessible for finetuning, QAT, and releasing TorchAO quantized models.
    • We’re also working alongside vLLM to enhance end-to-end server inference performance, utilizing optimized kernels from FBGEMM.

We Want to Hear from You!

We invite you to try our new models and quantization recipes. Your feedback is incredibly valuable to us, so please share your thoughts by opening issues in TorchAO or discussing your experiences on the released models page. You can also connect with us on our Discord channel. Additionally, we are eager to learn how you are currently quantizing models and explore opportunities to collaborate on releasing quantized models on HuggingFace in the future.

Inspired by: Source

Contents
  • Key Highlights of the New Quantized Models
  • Post Training Quantization: Models and Results
  • Seamless Integrations Within the PyTorch Ecosystem
  • Looking Ahead: Future Innovations
  • We Want to Hear from You!
Ethics and Society Monthly Newsletter: Issue #1
Optimizing olmOCR: Enhancing Accuracy for a Reliable OCR Engine
Optimizing Use-Case Based Deployments with SageMaker JumpStart
Enhance Your LLMs Using Gradio MCP Servers for Effective Upskilling
Boosting 2K Scale Pre-Training by 1.28x with TorchAO, MXFP8, and TorchTitan on the Crusoe B200 Cluster Using PyTorch

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Copy Link Print
Previous Article TechEx Europe 2025: Essential Insights for AI Leaders to Drive Success TechEx Europe 2025: Essential Insights for AI Leaders to Drive Success
Next Article Enhancing LLM Comprehension: Effective Step-by-Step Reading Strategies Enhancing LLM Comprehension: Effective Step-by-Step Reading Strategies

Stay Connected

XFollow
PinterestPin
TelegramFollow
LinkedInFollow

							banner							
							banner
Explore Top AI Tools Instantly
Discover, compare, and choose the best AI tools in one place. Easy search, real-time updates, and expert-picked solutions.
Browse AI Tools

Latest News

EgoMemReason: Benchmarking Memory-Driven Reasoning for Long-Horizon Egocentric Video Analysis
EgoMemReason: Benchmarking Memory-Driven Reasoning for Long-Horizon Egocentric Video Analysis
Comparisons
Ilya Sutskever Defends His Role in Sam Altman’s OpenAI Ouster: ‘I Aimed to Protect the Company’
Ilya Sutskever Defends His Role in Sam Altman’s OpenAI Ouster: ‘I Aimed to Protect the Company’
Ethics
Thinking Machines Aims to Create Conversational AI That Listens Effectively While Communicating
Thinking Machines Aims to Create Conversational AI That Listens Effectively While Communicating
News
Unlocking the Potential of Order: Misleading LLMs with Adversarial Table Permutations in Research 2605.00445
Unlocking the Potential of Order: Misleading LLMs with Adversarial Table Permutations in Research 2605.00445
Comparisons
//

Leading global tech insights for 20M+ innovators

Quick Link

  • Latest News
  • Model Comparisons
  • Tutorials & Guides
  • Open-Source Tools
  • Community Events

Support

  • Privacy Policy
  • Terms of Service
  • Contact Us
  • FAQ / Help Center
  • Advertise With Us

Sign Up for Our Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

AIModelKitAIModelKit
Follow US
© 2025 AI Model Kit. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?