Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model

Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model

💬 AI Chatbot

QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and mode...

Mar 24, 2026Admin

We have a new blog! View this page at qwen.ai . This page will automatically redirect in 5 seconds. If you are not redirected automatically, please click the button below. Go Now Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model January 28, 2025 · 3 min · 561 words · Qwen Team | Translations: 简体中文 QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry community has limited experience in effectively scaling extremely large models, whether they are dense or Mixture-of-Expert (MoE) models. Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3. Concurrently, we are developing Qwen2.5-Max, a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. Today, we are excited to share the performance results of Qwen2.5-Max and announce the availability of its API through Alibaba Cloud. We also invite you to explore Qwen2.5-Max on Qwen Chat ! Performance # We evaluate Qwen2.5-Max alongside leading models, whether proprietary or open-weight, across a range of benchmarks that are of significant interest to the community. These include MMLU-Pro, which tests knowledge through college-level problems, LiveCodeBench, which assesses coding capabilities, LiveBench, which comprehensively tests the general capabilities, and Arena-Hard, which approximates human preferences. Our findings include the performance scores for both base models and instruct models. We begin by directly comparing the performance of the instruct models, which can serve for downstream applications such as chat and coding. We present the performance results of Qwen2.5-Max alongside leading state-of-the-art models, including DeepSeek V3, GPT-4o, and...

Related Tools

Comments

Please login to leave a comment