Introducing Mellum2: An Advanced Mixture-of-Experts Model for Text and Code

Today, we’re excited to share the launch of Mellum2, a groundbreaking model in the field of artificial intelligence. Mellum2 is a 12 billion-parameter Mixture-of-Experts (MoE) model uniquely designed for a variety of natural language and code tasks. Unlike traditional models, which activate all parameters during each inference, Mellum2 operates with enhanced efficiency, activating only 2.5 billion parameters per token. This approach significantly boosts performance while maintaining a low latency, making it ideal for high-throughput applications.

Contents

Key Features of Mellum2
Performance and Benchmarks
Architectural Overview
Primary Use Cases

Routing and Orchestration
RAG Pipelines
Sub-Agents
Private Deployment

The Importance of Specialized Models
Getting Started with Mellum2

Key Features of Mellum2

High Efficiency: Mellum2 is optimized for latency-sensitive operations, ensuring fast inference that exceeds performance benchmarks set by similar-sized models.
Open Source: Released under the Apache 2.0 license, Mellum2 is available to everyone, encouraging innovation and collaboration.
Broad Usability: The model is versatile, functioning effectively across various tasks, including routing, retrieval-augmented generation (RAG), summarization, and coding features.

For developers interested in exploring Mellum2, you can access the model on Hugging Face.

Performance and Benchmarks

Mellum2 has been rigorously tested against multiple benchmarks in coding, reasoning, science, and mathematics. The results are impressive—Mellum2 not only competes favorably with similarly sized models but also boasts over 2x faster inference speeds. This performance enhancement makes it a valuable asset for production workloads requiring rapid response times.

Architectural Overview

Mellum2’s architecture utilizes a Mixture-of-Experts model, which allows for a high total parameter count while limiting the parameters activated per token. This strategic design ensures that the model remains compact and efficient, particularly focused on text and code, rather than attempting to accommodate a wider range of multimodal tasks.

Model	Total Parameters	Active Parameters per Token	Modality	License
Mellum2	12B	2.5B	Text and Code	Apache 2.0

Primary Use Cases

Routing and Orchestration

Mellum2 serves as an efficient routing and orchestration model within complex multi-model systems. It excels at tasks such as prompt classification and tool selection, playing a vital role in orchestrating various elements of an AI workflow.

RAG Pipelines

This model is particularly well-suited for latency-sensitive retrieval pipelines. It can perform context compression, generate summaries, and carry out post-processing of retrieval tasks, ensuring that the information is both relevant and concise.

Sub-Agents

Mellum2 provides support for subtasks such as planning, validation, and context preparation, reducing dependency on larger models for intermediate operations. This functionality streamlines workflows and enhances overall system efficiency.

Private Deployment

Given its efficient architecture, Mellum2 is well-equipped for deployment in self-hosted environments where proprietary code or sensitive internal data is involved. This flexibility enables organizations to leverage advanced AI capabilities without compromising security.

The Importance of Specialized Models

As AI technologies evolve, the architecture of effective systems is becoming increasingly modular. While large, general models have their place, production systems often benefit from deploying a combination of specialized tools. Mellum2 acts as a “focal” model, purpose-built for high-frequency tasks within larger AI ecosystems. The core aim isn’t to supplant every model in the stack, but to enhance the system’s speed and efficiency without sacrificing control.

Getting Started with Mellum2

Developers and organizations focused on software engineering can readily experiment with Mellum2. Whether you are integrating it into an IDE, incorporating it into a RAG system, or utilizing it on private infrastructure, Mellum2 is designed to meet the demands of modern AI applications.

For those keen to delve deeper into its architecture, training setup, and performance metrics, the full technical report is available here.

With these compelling features and practical applications, Mellum2 stands poised to redefine how we approach AI tasks in both the natural language and programming domains.

Inspired by: Source

Introducing Mellum2: JetBrains’ 12B Parameter Mixture-of-Experts Model for Enhanced AI Performance

Introducing Mellum2: An Advanced Mixture-of-Experts Model for Text and Code

Key Features of Mellum2

Performance and Benchmarks

Architectural Overview

Primary Use Cases

Routing and Orchestration

RAG Pipelines

Sub-Agents

Private Deployment

The Importance of Specialized Models

Getting Started with Mellum2

Stay Connected

Explore Top AI Tools Instantly

Latest News

Enhancing Language Models with Graded Entity-Familiarity Readouts: Polish Adaptation, Cross-Language Robustness, and Refusal Steering Techniques

Maximizing Utility and Minimizing Risk: Evaluating Safeguard-Conditioned Uplift in Dual-Use Biology Assistants

Meta’s Brain2Qwerty: Achieving 61% Accuracy with Noninvasive Brain–Computer Interface Technology

July 2026 Security Incident Disclosure: Key Insights and Updates

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Introducing Mellum2: An Advanced Mixture-of-Experts Model for Text and Code

Key Features of Mellum2

Performance and Benchmarks

Architectural Overview

Primary Use Cases

Routing and Orchestration

More Read

RAG Pipelines

Sub-Agents

Private Deployment

The Importance of Specialized Models

Getting Started with Mellum2

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Enhancing Language Models with Graded Entity-Familiarity Readouts: Polish Adaptation, Cross-Language Robustness, and Refusal Steering Techniques

Maximizing Utility and Minimizing Risk: Evaluating Safeguard-Conditioned Uplift in Dual-Use Biology Assistants

Meta’s Brain2Qwerty: Achieving 61% Accuracy with Noninvasive Brain–Computer Interface Technology

July 2026 Security Incident Disclosure: Key Insights and Updates