Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Beta version: *Information might not be fully accurate. Please report any discrepancies.
Latest Data
2025-09-05
Context Window
128k
tokens
Input Cost
$0.50
per 1M tokens
Output Cost
$1.50
per 1M tokens
Cache Cost
$0.40 / Free
read / write per 1M
Parameters
1T MoE (32B activated)
model footprint
1 Variants Available
Performance Analysis // Verified Benchmarks
Massive Multitask Language Understanding covers 57 subjects across STEM, the humanities, social sciences, and more.
Resolving real-world GitHub issues. Verified subset ensures solvable issues.
Contamination-free, continuously updated reasoning benchmark.
A more robust and harder version of MMLU, focusing on complex reasoning and STEM subjects.
500-problem math benchmark for broad quantitative reasoning.
Contamination-free coding benchmark using recent problems.
Graduate-Level Google-Proof Q&A Benchmark.
Instruction Following Evaluation for Large Language Models. Measures ability to follow strict formatting and constraint requirements.
American Invitational Mathematics Examination 2025 problems.