Introducing Alibaba’s Qwen3.5 Series: A Leap in Multimodal AI

Alibaba has made waves in the AI landscape with the introduction of the Qwen3.5 series, a groundbreaking open-source initiative crafted for native multimodal agents. The series kicks off with a remarkable vision-language model (VLM) housing approximately 400 billion parameters. Built on a hybrid architecture of Mixture of Experts (MoE) and Gated Delta Networks, Qwen3.5 is setting a new standard in AI capabilities. This model steps up significantly from its predecessors by being able to understand and navigate user interfaces effortlessly.

Contents

Key Features of Qwen3.5

Impressive Specifications
Versatile Use Cases

Building with NVIDIA Endpoints

Example Code Snippet

Customizing with NVIDIA NeMo
Getting Started with Qwen3.5

Key Features of Qwen3.5

Impressive Specifications

The Qwen3.5 model is finely tuned and packed with robust features that cater to a variety of applications. Here are some of its standout specifications:

Specification	Detail
Modalities	Vision, Language
Total Parameters	397B
Active Parameters	17B
Activation Rate	4.28%
Input Context Length	256K (extensible to 1M tokens)
Languages Supported	200+
Experts	512
Shared Experts	1
Experts Per Token	11 (10 routed + 1 shared)
Layers	60
Vocabulary	248,320

Table 1. Specifications and configuration details for the Qwen3.5 model

Versatile Use Cases

The capabilities of Qwen3.5 are designed to accommodate an expansive range of use cases, such as:

Coding, including web development
Visual Reasoning, applicable in mobile and web interfaces
Chat Applications, enhancing user interaction
Complex Search, making information retrieval more intuitive

Building with NVIDIA Endpoints

Developers can dive into building applications with Qwen3.5 today through NVIDIA’s GPU-accelerated endpoints available at build.nvidia.com. Powered by the latest NVIDIA Blackwell GPUs, these endpoints offer a platform where you can explore the model’s functionality in real-time. You can experiment with prompts and test the model using your own datasets to obtain valuable insights on its performance.

For a visual understanding, check out the demo video showcasing how to test Qwen3.5 on these GPU-accelerated endpoints.

Developers can also utilize NVIDIA’s hosted model via an accessible API. All you need is a free registration in the NVIDIA Developer Program to start integrating Qwen3.5 into your projects.

Example Code Snippet

Utilizing the Qwen3.5 API for chat interactions can be straightforward, as shown below:

python
import requests

invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"

headers = {
"Authorization": "Bearer $NVIDIA_API_KEY",
"Accept": "application/json",
}

payload = {
"messages": [
{
"role": "user",
"content": ""
}
],
"model": "qwen/qwen3.5-397b-a17b",
"chat_template_kwargs": {
"thinking": True
},
"frequency_penalty": 0,
"max_tokens": 16384,
"presence_penalty": 0,
"stream": True,
"temperature": 1,
"top_p": 1
}

session = requests.Session()
response = session.post(invoke_url, headers=headers, json=payload)

response.raise_for_status()
response_body = response.json()
print(response_body)

By employing this code snippet, developers can initiate interactions with Qwen3.5. Simply define tool arrays in the tools parameter to enhance chat interactions.

Customizing with NVIDIA NeMo

While Qwen3.5 shines with its inherent capabilities, the NVIDIA NeMo framework takes it a step further by assisting developers in customizing it for specialized needs. The NeMo Automodel library empowers developers to fine-tune the Qwen3.5 architecture efficiently, catering to niche markets or unique applications.

With its PyTorch-native training library, NeMo offers seamless Day 0 support for Hugging Face, allowing direct training on existing models—freeing developers from tedious conversions. This promotes rapid experimentation, supportive of both full supervised fine-tuning (SFT) and more memory-efficient methods like LoRA.

A detailed technical tutorial on Medical Visual QA provides the groundwork for fine-tuning Qwen3.5 on radiological datasets. NeMo even supports multinode deployments using Slurm and Kubernetes, ensuring optimal performance for large-scale models in various environments.

Getting Started with Qwen3.5

Whether you’re deploying on NVIDIA Blackwell GPUs or using NVIDIA NIM for containerized solutions, there are numerous avenues for integrating Qwen3.5 into your software ecosystem. You can kick off your journey by visiting the Qwen3.5 model page on Hugging Face or by experimenting with the model at build.nvidia.com.

With its unparalleled capabilities, Qwen3.5 stands poised to revolutionize how developers and businesses harness the power of AI. With a commitment to innovation, Alibaba continues to push the boundaries of what’s possible in the realm of multimodal artificial intelligence.

Inspired by: Source

Creating Native Multimodal Agents with Qwen 3.5 VLM on NVIDIA GPU-Accelerated Endpoints

Introducing Alibaba’s Qwen3.5 Series: A Leap in Multimodal AI

Key Features of Qwen3.5

Impressive Specifications

Versatile Use Cases

Building with NVIDIA Endpoints

Example Code Snippet

Customizing with NVIDIA NeMo

Getting Started with Qwen3.5

Stay Connected

Explore Top AI Tools Instantly

Latest News

Stripe Benchmark Report: AI Agents Excel in Building Integrations but Face Challenges in Validation

Trump Condemns New York’s Statewide Data Center Moratorium: Insights and Implications

Unlocking the Secrets of Diffusion Models: Understanding Their Creative Potential

Enhancing KV Cache Efficiency: Near-Lossless Compression Techniques Using Joint Tucker and JL-Residual Allocation for Large Language Models (LLMs)

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Introducing Alibaba’s Qwen3.5 Series: A Leap in Multimodal AI

Key Features of Qwen3.5

Impressive Specifications

Versatile Use Cases

Building with NVIDIA Endpoints

More Read

Example Code Snippet

Customizing with NVIDIA NeMo

Getting Started with Qwen3.5

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Stripe Benchmark Report: AI Agents Excel in Building Integrations but Face Challenges in Validation

Trump Condemns New York’s Statewide Data Center Moratorium: Insights and Implications

Unlocking the Secrets of Diffusion Models: Understanding Their Creative Potential

Enhancing KV Cache Efficiency: Near-Lossless Compression Techniques Using Joint Tucker and JL-Residual Allocation for Large Language Models (LLMs)