Beta version: *Information might not be fully accurate. Please report any discrepancies.
Cybersecurity-flavored coding benchmark in simulated environments.
Score Distribution
HumanEval
human-eval
SWE-bench Verified
swe-bench-verified
BigCodeBench
bigcodebench
OJBench (Python)
ojbench-python
LMArena WebDev ELO
lmarena-webdev-elo
Codeforces
codeforces