3 comments

  • drdexebtjl 12 hours ago ago

    Yikes. This is literally only useful to justify layoffs.

  • jadyen 13 hours ago ago

    Looks cool at a first glance, can't wait to play around with it!

  • DillonMehta 13 hours ago ago

    Hey everyone we're CueBench (S26). As teams go agent-first, everyone benchmarks the agents; nobody measures how well people drive them. We score a coding-agent session (Claude Code, Codex, Cursor, PI) on the human side: delegation, task description, catching the agent's mistakes, and verifying before shipping. 0–100 plus a breakdown.

    Scoring is deterministic, built on measurable signals from the session, not an LLM vibing on your transcript. Same session, same score.

    We just opened a public demo and need real sessions thrown at it. Nothing to install, nothing runs on your machine, just upload a session file from your agent's logs (or paste one terminal command) and you get scored in seconds.

    Where it's going: a product for engineering orgs — session-level feedback that upskills engineers at agent-driven development, and gives managers a skills signal (coaching, not surveillance).

    The ask: run one real session through it this week and tell us where the score feels wrong. Brutal > polite. Demo video: https://youtu.be/r9vAdAMv6js