HomeresearchdevelopingMay 8, 2026

Is ProgramBench Impossible?

ProgramBench is a new coding benchmark that all frontier models spectacularly fail. We’ve been on a quest for “hard benchmarks” for a while so it’s refreshing to see a benchmark where top models do badly. Unfortunately, ProgramBench has one big problem: it’s impossible! What is ProgramBench? ProgramBench tests if a model can recreate a program from a “clean room” environment. The model is given only a bit of documentation and black-box access to the program (all the programs are CLIs), then tasked with re-implementing it. How does ProgramBench know if the implementation is correct? It also ge…

Read source Source profile: LessWrong

Community read

How readers judge the impact of this story. Pick the option that matches your own read — Beneficial, Harmful, or Uncertain are peer choices, not a default.

Beneficial

Harmful

Uncertain

Average sentiment

No votes yet

Based on beneficial vs harmful votes across the current response set. Uncertain votes are shown separately and do not shift the average.

Your read

Archive actions

Save this article to your personal archive for later review without turning the product into a visible popularity contest.

Discussion node

Article discussion

Story discussion

0 commentsOpen full node

No comments yet. Start the discussion below.