MiniMax M2.7 — AI Glossary

MiniMax M2.7 is the latest flagship open-source model from MiniMax, announced on March 18, 2026. It is MiniMax’s first model to actively participate in its own development cycle — generating evaluation data, identifying its own capability gaps, and producing synthetic training examples to close them. This makes M2.7 one of the earliest production demonstrations of AI-assisted AI training at meaningful scale.

Architecture

M2.7 is a Sparse Mixture-of-Experts (MoE) model with 230 billion total parameters and only 10 billion activated per token. It routes each token to the most relevant subset of its 256 expert sub-networks, keeping per-inference compute equivalent to a ~10B dense model while retaining the knowledge capacity of a much larger system.

Key architectural details:

Multi-head causal self-attention with Rotary Position Embeddings (RoPE) for positional encoding
QK RMSNorm (Query-Key Root Mean Square Normalisation) for stable attention at scale
Top-k expert routing — only the most relevant experts activate per token
200,000-token context window — sufficient for large codebases, extended agent sessions, and long documents
Full-precision (BF16) deployment requires approximately 460 GB of GPU VRAM

Self-Evolution

The defining feature of M2.7 is its recursive self-optimisation framework: a training pipeline where the model generates its own evaluation data, identifies capability gaps, and produces synthetic training examples to address them.

In practice, an M2.7 instance autonomously runs a complete iterative improvement loop:

Analyse failure trajectories from previous evaluations
Plan modifications to the training scaffold or data strategy
Modify scaffold code
Run evaluations
Compare results against baseline
Decide whether to keep or revert changes

MiniMax ran this loop for 100+ rounds without human intervention, achieving a 30% improvement on internal benchmarks. In broader production workflows, M2.7 handles 30–50% of its own training pipeline — including aspects of data curation, evaluation, and iteration.

Performance

SWE-Pro: 56.22% — strong real-world software engineering benchmark
Terminal Bench 2: 57.0%
Artificial Analysis Intelligence Index: Ranked #1 out of 136 models (score: 50)
GDPval-AA (agentic coding): ELO 1,495 — highest among open-weight models
MLE Bench Lite (22 Kaggle ML competitions): 66.6% medal rate — second only to Claude Opus 4.6 and GPT-5.4
Matches GPT-5 on several multi-language engineering benchmarks

API pricing: $0.30 per million input tokens — roughly 1/50th the cost of Claude Opus.

Agentic Capabilities

M2.7 is optimised for complex multi-step agent harnesses. It maintains a 97% skill adherence rate across 40+ complex skills (each exceeding 2,000 tokens), making it reliable in automated pipelines that require following long, structured instruction sets across multiple rounds.

Practical strengths include:

End-to-end software engineering (project delivery, log-based debugging, code security)
ML research workflows
Complex multi-round document editing (Excel, PowerPoint, Word)
Multi-agent coordination with dynamic tool search

Significance

M2.7 sits at the intersection of two frontier trends: MoE architectures for compute-efficient scaling, and models that reduce their own training cost by automating evaluation and data generation. Its open-source release under a permissive licence, combined with sub-cent-per-token pricing, makes it one of the most cost-effective options for production agentic workloads as of mid-2026.

Ready to build?

Leverage AI technologies to build your product stack

Superteams can help you build, deploy and launch AI application stacks using open source technologies — from architecture through to production.

Talk to Superteams