AI Agent Reliability Tracker

(hal.cs.princeton.edu)

1 points | by smartmic 4 hours ago ago

1 comments

  • chrisjj 3 hours ago ago

    > recent capability gains have yielded only small improvements in reliability.

    Have I missed something? Why would one expect capability gain to make any such improvement?