Show HN: RULER – Easily apply RL to any agent

(openpipe.ai)

62 points | by kcorbitt 15 hours ago ago

13 comments

  • sadiq 11 hours ago ago

    Excellent, look forward to giving this a go.

    I was looking at: https://arxiv.org/abs/2506.18254 but your approach is even more general.

    • kcorbitt 9 hours ago ago

      I really like RLPR for when you have a known-good answer to compare to as well!

  • spmurrayzzz 11 hours ago ago

    Might end up being some confusion with the RULER benchmark from NVIDIA given the (somewhat shared) domain: https://github.com/NVIDIA/RULER

    EDIT: by shared I only mean the adjacency to LLMs/AI/ML, RL is a pretty big differentiator though and project looks great

    • kcorbitt 9 hours ago ago

      Dang, hadn't seen that. Namespace collision strikes again.

      • swyx 4 hours ago ago

        yeah unforutnately for you this is one of the well known long context benchmarks. too late tho, soldier on.

  • 11 hours ago ago
    [deleted]
  • maxrmk 9 hours ago ago

    Very cool. Do you do anything to mitigate ordering bias in the evaluation function, or do you just expect it to average out over time?

    • kcorbitt 9 hours ago ago

      No, we don't do anything. Theoretically we could judge several times with different ordering.

      We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments!

  • swyx 4 hours ago ago

    how does o3 on the customer support agent task so dreadfully underperform qwen?

  • 10 hours ago ago
    [deleted]
  • someoneontenet 12 hours ago ago

    Love these write ups!

    • kcorbitt 11 hours ago ago

      Thank! If there are any topics that you'd find particularly interesting, let me know and I can try to find time. :)

  • ndgold 11 hours ago ago

    Dope