CueBench for Developers is live: score how well you drive coding agents

(app.cuebench.dev)

9 points | by DillonMehta 13 hours ago ago

3 comments

drdexebtjl 12 hours ago ago
Yikes. This is literally only useful to justify layoffs.
jadyen 13 hours ago ago
Looks cool at a first glance, can't wait to play around with it!
DillonMehta 13 hours ago ago
Hey everyone we're CueBench (S26). As teams go agent-first, everyone benchmarks the agents; nobody measures how well people drive them. We score a coding-agent session (Claude Code, Codex, Cursor, PI) on the human side: delegation, task description, catching the agent's mistakes, and verifying before shipping. 0–100 plus a breakdown.
Scoring is deterministic, built on measurable signals from the session, not an LLM vibing on your transcript. Same session, same score.
We just opened a public demo and need real sessions thrown at it. Nothing to install, nothing runs on your machine, just upload a session file from your agent's logs (or paste one terminal command) and you get scored in seconds.
Where it's going: a product for engineering orgs — session-level feedback that upskills engineers at agent-driven development, and gives managers a skills signal (coaching, not surveillance).
The ask: run one real session through it this week and tell us where the score feels wrong. Brutal > polite. Demo video: https://youtu.be/r9vAdAMv6js