Beta version: *Information might not be fully accurate. Please report any discrepancies.
Advanced instruction-following benchmark with verified grading.
Score Distribution
AgentBench
agentbench
IFEval
ifeval
Inverse IFEval
ifeval-inverse
IFBench
ifbench
MultiChallenge
multichallenge
Terminal-Bench 2.0
terminal-bench