アリババクラウド、フラッグシップLLM「Qwen3-Max」を発表：GPT-5やClaude Opus 4を凌駕する性能

“`html アリババクラウド、フラッグシップLLM「Qwen3-Max」を発表：GPT-5やClaude Opus 4を凌駕する性能

アリババクラウド、フラッグシップLLM「Qwen3-Max」を発表：GPT-5やClaude Opus 4を凌駕する性能

人工知能の進化は目覚ましく、大規模言語モデル（LLM）の分野では、新たなブレークスルーが日々生まれています。この度、アリババクラウドは、そのフラッグシップLLMである「Qwen3-Max」を発表し、その性能がOpenAIのGPT-5やAnthropicのClaude Opus 4といった最先端モデルを凌駕すると主張しており、AI業界に大きな衝撃を与えています。

Qwen3-Max：1兆を超えるパラメーターを誇る新時代モデル

2025年9月24日に開催されたアリババクラウドの年次Apsara Conferenceで発表されたQwen3-Maxは、1兆を超えるパラメーターを持つ巨大なモデルです。これは、Qwenシリーズの最新かつ最も強力なモデルとして位置づけられており、およそ36兆ものトークンで訓練されました。この圧倒的なスケールは、モデルが複雑なタスクをより深く、より広範に理解し、処理する能力を裏付けています。アリババクラウドは、Qwen3-Maxがドメイン知識、推論、コーディング、エージェントタスク、多言語理解といった幅広いベンチマークにおいて、最高のパフォーマンスを達成したと強調しています。

GPT-5とClaude Opus 4との性能比較

アリババクラウドの主張によると、Qwen3-Maxは、特に特定のベンチマークにおいて、OpenAIのGPT-5やAnthropicのClaude Opus 4といった業界の主要モデルを上回る、または同等の性能を示しています。例えば、エージェント能力を測るTau2-Benchでは、Qwen3-Maxが74.8%を記録し、Claude Opus 4やDeepSeek V3.1を上回るパフォーマンスを見せました。また、LMArenaのText Arenaリーダーボードでは、Qwen3-Maxの「Instruct」バージョンがOpenAIのGPT-5-Chatを僅差で上回り、トップ3にランクインしました。

一方で、コーディング能力に特化したSWE-Bench Verifiedでは、Qwen3-Maxが69.6%のスコアを達成し、依然として非常に強力なモデルであることを示しています。ちなみに、OpenAIのGPT-5は74.9%、AnthropicのClaude Opus 4は72.5% をSWE-Benchで記録しており、モデルごとに得意とする領域が異なることが分かります。Qwen3-Maxは、特にエージェントタスクや多言語理解において優れた性能を発揮するとされています。さらに、Qwen3-Maxには「Thinking」バージョンも開発中であり、数学ベンチマーク（AIME 2025やHMMT 25）で100%を達成するなど、非常に有望な先行結果が報告されています。

AI競争の加速と将来展望

OpenAIのGPT-5は2025年8月7日にリリースされ、PhDレベルの推論能力、 hallucinationの削減、272,000トークンものコンテキストウィンドウを特徴としています。また、AnthropicのClaude Opus 4も2025年5月22日に登場し、SWE-benchで72.5%という高いスコアを記録するなど、コーディングや長期間にわたるエージェントワークフローで優れた能力を発揮しています。これらの強力なモデルが相次いでリリースされる中で、アリババクラウドがQwen3-Maxで世界市場に挑む姿勢は、AI開発競争が新たな段階に入ったことを示唆しています。

アリババクラウドは、今後3年間でAIおよびクラウドインフラに3,800億元（約534億ドル）を投資する計画を発表しており、AIを中核事業として強化していく方針です。 Qwen3-Maxの発表は、グローバルなAIエコシステムにおいてアリババクラウドが果たす役割をさらに拡大し、特にアジア市場や多言語対応の分野で大きな影響を与える可能性があります。開発者にとって、OpenAIのAPIと互換性のあるQwen3-MaxのAPIは、導入のハードルを下げる要因となるでしょう。

結論

アリババクラウドのQwen3-Maxの登場は、LLMの能力と応用の可能性をさらに広げるものです。特に、特定のベンチマークにおけるGPT-5やClaude Opus 4への優位性の主張は、AI業界の技術革新のペースが衰えることなく加速していることを示しています。今後、これらのフラッグシップモデルがどのような進化を遂げ、私たちの生活やビジネスにどのような影響をもたらすのか、その動向から目が離せません。

The race for artificial intelligence supremacy continues to accelerate, and Alibaba Cloud has just made a monumental stride with the unveiling of its flagship large language model, Qwen3-Max. This latest iteration in the Tongyi Qianwen (Qwen) series is not merely an incremental update; it represents a significant leap forward, boasting over a trillion parameters and claiming performance that rivals, and in some key areas, surpasses industry titans like OpenAI’s GPT-5 and Anthropic’s Claude Opus 4.

Released in September 2025, Qwen3-Max positions Alibaba Cloud firmly at the forefront of global AI innovation, demonstrating remarkable advancements in areas critical for real-world applications.

Under the Hood: Scale, Architecture, and Core Strengths

At its core, Qwen3-Max is built on a Mixture-of-Experts (MoE) architecture, a design paradigm known for enhancing efficiency and scalability in large models. This colossal model was pre-trained on an astounding 36 trillion tokens, a testament to the sheer volume of data processed to imbue it with comprehensive knowledge and understanding. The MoE approach allows for only a subset of its trillion-plus parameters to be active during inference, optimizing performance while managing computational demands.

Beyond its impressive scale, Qwen3-Max exhibits robust capabilities across a spectrum of tasks. It supports over 100 languages, demonstrating strong multilingual understanding and generation. Furthermore, its context window of 262,000 tokens in the preview version, with some Qwen variants pushing up to one million tokens, allows it to handle incredibly long inputs, enabling deeper reasoning and more coherent responses over extended conversations or complex documents.

Setting New Benchmarks: Outperforming the Competition

Alibaba Cloud has presented compelling benchmark results for Qwen3-Max, highlighting its competitive edge:

Global Leaderboard Presence: The Qwen3-Max-Instruct variant secured an impressive third place globally on the LMArena text leaderboard, notably outperforming OpenAI’s GPT-5-Chat.
Coding and Agentic Capabilities: In programming tasks, Qwen3-Max-Instruct achieved a SWE-Bench Verified score of 69.6, placing it among the top-performing models in fixing real-world software bugs. More strikingly, on the Tau2-Bench, which evaluates an AI’s ability to call external tools and manage complex workflows, Qwen3-Max-Instruct scored 74.8, surpassing both Claude Opus 4 and Deepseek V3.1.
Advanced Reasoning with Qwen3-Max-Thinking: A specialized variant, Qwen3-Max-Thinking, currently undergoing intensive training, has already achieved perfect 100% scores on challenging mathematical reasoning benchmarks like AIME 25 and HMMT, matching results from models such as GPT-5 Pro and xAI’s Grok 4. This version integrates a code interpreter and leverages parallel test-time computation techniques for enhanced reasoning.

Innovations in Training and Efficiency

The development of Qwen3-Max also marks significant breakthroughs in large model training methodologies. Alibaba Cloud reported an unusually stable training process, characterized by a smooth loss curve without the spikes or rollbacks often seen with ultra-large models. This stability, partly attributed to the MoE architecture and a global-batch load balancing loss, translates into more reliable and efficient development.

Furthermore, training efficiency for Qwen3-Max-Base improved by 30% compared to its predecessor, Qwen2.5-Max-Base, thanks to optimized parallelization strategies like PAI-FlashMoE. For handling long-context scenarios, the ChunkFlow strategy delivered a remarkable threefold improvement in throughput.

Accessibility for Developers and Enterprises

Alibaba Cloud has made Qwen3-Max accessible through various channels. Developers can interact with the model via its API on Alibaba Cloud Model Studio, which is compatible with OpenAI’s APIs, simplifying integration for those familiar with other leading models. It is also available for direct interaction on Qwen Chat. While Qwen3-Max is not open-source, its availability through robust API services underscores Alibaba’s focus on enterprise adoption and real-world application.

Strategic Positioning and the Future of AI

The launch of Qwen3-Max underscores Alibaba’s aggressive strategy to become a dominant force in the global AI landscape, challenging established players and driving forward the capabilities of artificial general intelligence. By emphasizing enhanced coding, agent capabilities, and superior reasoning, Alibaba Cloud is targeting critical areas for enterprise innovation, from software development and automation to complex problem-solving. As the AI ecosystem continues to evolve, Qwen3-Max stands as a powerful new contender, pushing the boundaries of what large language models can achieve.