Agents Aren't Coworkers, Embed Them in Your Software

(feldera.com)

48 points | by gz09 a day ago ago

23 comments

  • solid_fuel a day ago ago

    "Agents" can't think and LLMs aren't sentient. They aren't suited to be your coworker, but they also aren't suited for generation computational tasks. The chat interface is all that there is and their behavior in chat is not deterministic or bounded enough to be useful in most applications. They mimic tokens in reply to the tokens you give them, and that is all.

    You know what's a bad idea from an engineering (that thinky thing we used to do as part of building software) perspective?

    Building a dependency on an expensive remote API into your system.

    This isn't just me bloviating, I've been down this road before. In my case I had a project using LLMs to automatically edit videos provided by Hollywood content owners. It seemed like a decent application, but LLMs are structurally unsuited for dealing with user data like this. The way that the prompt is evaluated means there is no separation between system and user input, so once you start dealing with a wide variety of topics you pretty quickly run into walls.

    One example - ChatGPT refusing to summarize and pick a top segment from a news program because it contained references to a murder-suicide, and both murder and suicide are included in the many prohibited topics that are filtered in ChatGPT replies. This was through their API, not the regular user interface, so it is in theory as unrestricted as access gets. But because the LLM cannot be trusted to behave properly around the topic, they have to filter anything which touches it.

    Structurally, I don't see a way this can be overcome - LLMs by design mix the entire prompt together, it's not like a parameterized SQL query where you can isolate the user and system data. That means that a long or bold enough user input is often enough to outweigh the system prompt, and that causes the LLM to veer into unpredictable territory.

    • SR2Z 4 hours ago ago

      > The chat interface is all that there is and their behavior in chat is not deterministic or bounded enough to be useful in most applications.

      Their behavior in chat is not deterministic, it's stochastic. That is the point - the usefulness of LLMs comes from their ability to deal with the vagaries of language.

      > But because the LLM cannot be trusted to behave properly around the topic, they have to filter anything which touches it.

      IMO this is because giving a random person a frontier LLM is like giving them a Ferrari. Most people would manage to not crash it. A few would experiment with it and learn how to drive it very well. A few more would immediately assume that a fast car means they can drive it fast and end up wrapped around a telephone pole.

      We get lots of mileage out of other stochastic systems. I've worked on a lot of projects that did, and the defining trait that made them successful doesn't seem to have a name but the closest I can come up with is "boosting." In ML (esp. in classical ML), boosting is when you train one classifier to predict the residual error of another. The first classifier minimizes some entropy loss, and then the second contributes additional bits.

      In a system with a human-in-the-loop, it often takes a lot of engineering to allow the human to boost the output of a system. I once worked for a company where we had to very precisely label maps based on real-world data. We had a model that could produce a sometimes-accurate polygon, but obviously just asking a person to adjust the polygon after the model generated it was terrible because that was a vague ask that took a lot of time and effort to do. Instead, we gave users a brush tool and trained a new model to fix the polygon based on that. A simpler example was a system for reviewing user reports: we tuned our system to approve them with high precision and used a human review queue for the rest. Reducing the number of bits of entropy a human being had to contribute to a decision in the average case allowed us to smoothly iterate on the model while staying flexible.

      The AI companies that actually going to deliver useful products will be the ones that engineer interfaces that quickly allow human beings to refine LLM outputs. It's going to be a long time before any of these models can reliably one-shot a complex task with ambiguous parameters. Chat is only one possible way to do this, and frankly it's not a very good one. I think that this is the point the article was trying to make, minus the corpspeak and hype.

  • iot_devs a day ago ago

    > Give an agent the right interfaces and it becomes less conversational and more ambient. It no longer needs to constantly ask, explain, summarize, and negotiate. It can stay in the background, react to changes, and make steady progress with less supervision and less noise. That is closer to Weiser’s vision: calm technology, but for machines.

    I tend to agree quite a bit.

    I created a ambient background agent for my projects that does just that.

    It is there, in the background, constantly analysing my code and opening PRs to make it better.

    The hard part is finding a definition of "better" and for now it is whatever makes the longer and type checker happy.

    But overall it is a pleasure to use.

    • stingraycharles a day ago ago

      Just take a look at the pull requests / issues opened of a repository that’s popular with LLM agents, to understand how well that works.

      If there’s one take away it’s that these agents need more, not less, oversight. I don’t agree at all with the “just remove a few tools and you can remove the human from the loop” approach. It just reduces the blast radius in case the agent gets it wrong, not the fact that it gets it wrong.

      • iot_devs a day ago ago

        Yeah, but my projects are personal and not popular.

        I crafted the AI loop to do exactly what I would be doing by manually.

        Out of 10 PRs, 6 to 7 gets merged. The other simply get closed.

        • stingraycharles a day ago ago

          Yeah my experience is that this works for a short time and then after a few weeks your codebase is a complete disaster.

          • codebje a day ago ago

            I could believe a ~66% success rate on asking an agent to run a linter and make PRs addressing issues found, that sounds about right: very tightly bounded problem, a sensible solution is often offered by the tool, and verification of success is binary.

            Structural changes, in which attention to the small details of a task is directly at odds with the need to consider less overt factors like cohesion and coherence, are where an agent will turn your code base into a dog's breakfast.

            The vibe coded software I have for my own use only is like that. Giant hundred-line functions, poor separation of concerns, easy for a change to have unintended behaviour somewhere else. It's probably a step up from the spreadsheet I was using before it, but not by enough to justify current RAM prices.

  • ori_b a day ago ago

    I'd pay more for deterministic, explainable, and fast software without agents. The value of computers is that they do tasks repeatably, reliably, and at blinding speed.

    This stuff is negative value.

    • efskap a day ago ago

      Right and modulo agents, they're just describing event-driven architecture like Lambda.

  • apsurd a day ago ago

    Ambient agents premise lands and is thought provoking.

    But the more you read the article the more the point is lost. The prescriptions given aren't ambient?

        CLI: a good command-line interface makes it easy for an agent loop to interact with your system and saves tokens.
        Specs: Declarative configs, schemas, manifests. Artifacts that state the desired outcome, not the steps.
        Reconciliation loops: you declare the target state, let the system continuously converge toward it. Detect if something drifts.
    
    (seems you're talking to the AI above (and you'll need to refine just like a conversation), it's just not synchronously in chat)

    The gripe seems to be specifically with being able to chat with the AI. Yes, ideally the AI just knows to do stuff. But the chat interface is also the reason every Bob and Sarah has chatGPT in their pocket. It's also just growing pains.

    • aykutseker 17 hours ago ago

      yeah and the reconciliation pattern only really works when you can actually read current state. half of what agents do is side effects you can't observe. slack messages, payments, emails. agent times out, retries,customer gets billed twice. article kinda glosses over that.

  • leobuskin a day ago ago

    > Agentic management software is all the hype today: What started with Moltbot and OpenClaw now has a lot of competition: ZeroClaw, Hermes, AutoGPT etc.

    Moltbot is OpenClaw, AutoGPT was born significantly before. I just couldn’t read after the first paragraph, I’ve lost the trust entirely, whatever/whoever wrote it.

    • simonw a day ago ago

      Hermes agent dates back to at least September last year too, pre-dating Moltbot/OpenClow by a couple of months https://github.com/NousResearch/hermes-agent/commit/17608c11...

    • stingraycharles a day ago ago

      It’s marketing. They’re selling some change management solution, so obviously they advocate for showing AI agents only changes, rather than the full context.

      Doesn’t mean it’s a good idea, though.

  • skybrian a day ago ago

    I like using them for coding, but I'm wary of making software that depends on an unreliable, expensive remote API. I'd rather have the agent write code and have no runtime dependency.

    It might be nice to have something simple and cheap for basic text classification, but I'm not sure what to use. (My websites are written in Deno.)

  • politelemon a day ago ago

    > Humans are not a good target for calm technology.

    Exactly the opposite is true. I couldn't even understand the point or relation being made here as the article continues to emit further disconnected revelations and factual errors. I would suggest a human calmly read through the post and sense check it.

  • orliesaurus a day ago ago

    not yet coworkers*

    • claysmithr a day ago ago

      you wouldn't download a coworker

      • doubled112 a day ago ago

        I would, along with a car for the coworker to drive me around in.

        • electroglyph a day ago ago

          but should you drive or walk to the car wash?

    • pando85 20 hours ago ago

      We should use them at that level. Using them just as simple tools is simply not enough.

  • tommy29tmar 21 hours ago ago

    [dead]

  • WhoffAgents a day ago ago

    [dead]