15 comments

  • blackcat201 11 hours ago ago

    Shameless plug, for anyone who's interested in "self-improvement" agent check out StreamBench[1] where we benchmark and try out what's essential for improvements in online settings. Basically we find feedback signal is vital and the stronger the signal the more improvement you can get if you were able to feed it back to the agent in terms of weights (LoRA) or in-context examples.

    [1] https://arxiv.org/abs/2406.08747

  • gdiamos 15 hours ago ago

    Can it modify its training data?

    • keskival 10 hours ago ago

      Nope, just the code which sets up the agentic system and related prompts.

  • grahamj 14 hours ago ago

    heh I was just working on something that tries to improve itself today. I wrote a simple agent executor that makes calling one a simple function call, and then wrote an agent which invents other agents. By calling that in a loop for a while I ended up with, effectively, a large library of functions I not only didn't write but didn't even think up.

    By passing those functions as tools in LLM requests any of the agents can make use of any of the other agents so it's basically expanding its own capabilities.

    Not quite sure what task to sick it on yet but it's fun to play with.

  • digitcatphd 13 hours ago ago

    I’m skeptical this would work in production better than RLHF, if the agent makes a mistake, how is it supposed to know to correct itself and understand what it did wrong to prevent it? It seems better to try again recursively until it finds the solution like a human

  • pajeets 15 hours ago ago

    meh im not convinced that any sort of framework or side tool that works on top of large language models is the solution

    we really need something intelligent (no, o1 doesn't count) and its unclear what that will look like. Perhaps it will be some RNN with neurosymbolism

    • randomNumber7 11 hours ago ago

      I would say reinforcement learning needs to be part of the solution.

      Don't know how to prove it but I'm pretty sure you can't reach agi only with (un-/self-)supervised learning.

    • yathaid 12 hours ago ago

      I am not sure it is useful to bring in something as nebulous as "intelligence" and hand wave everything else away, unless you are going to tightly define what intelligence means.

      There are only two objective measurements needed:

      -is it making progress towards its goal?

      -is it able to acquire capabilities it didn't have previously?

      I am not sure if even the first one is objective enough.

      Dismissing the argument without stating why you aren't convinced just comes across as a form of AI ludditism.

      • randomNumber7 11 hours ago ago

        You don't need these criteria when you can see in advance that something is impossible.

        I think something that only learns to reproduce text, can not become an intelligent actor.

        It's necessary to act in an environment with Feedback.

        And while it of course depends on the definition of intelligence, the article is about the Gödel machine, which is a fancy word for AGI

        • ben_w 8 hours ago ago

          You need the criteria in advance to even know if the thing is impossible.

          We don't know the extent of our ignorance about intelligence.

          > I think something that only learns to reproduce text, can not become an intelligent actor.

          > It's necessary to act in an environment with Feedback.

          Ok, but text adventures are a thing, so that doesn't rule out learning from text.

          And all RHLF has humans as part of the environment and giving feedback (that's the H and the F in RLHF).

      • whatshisface 11 hours ago ago

        The word capabilities is as hard to define as intelligence.

        • ben_w 8 hours ago ago

          Really? IMO capabilities can be enumerated as a set of challenges in the category of things you want done. We don't need to discuss if an IC is "intelligent" to agree that the original $5 Pi Zero is "more capable" at that than all of humanity combined.

          Sure, you can also say that GPT-4's passing the Bar tells you it can pass the kind of questions in the Bar exam without that extending to the kind of questions actual lawyers need to do, Goodhart's law remains if that was your point?

  • optimalsolver 10 hours ago ago

    >The rapid advancement of large language models (LLMs) has significantly enhanced the capabilities of AI-driven agents across various tasks

    No it hasn't.

    • derektank 9 hours ago ago

      Siri is a lot more capable at managing calendars with the iOS 18.1 update, at least in the 20 minutes I spent playing around with a friend's iPhone that was in the beta. My understanding is that most of the capability improvement is due to it running ChatGPT-4o on the backend

  • m3kw9 15 hours ago ago

    If their demo work, they must be close to AGI right?