Which cool, great, I sure love "pip install"ing every time instead of just baking a single container image with it already installed.
This isn't any sort of fancy or interesting sandboxing, this is shelling out to "docker run", and not even using docker as well as it could.
Quoting from the linked page:
> The tradeoff is ~5-10 seconds of container startup overhead
Sure, maybe it's 5-10 seconds if you use containers wrong. Unpacking a root filesystem and spinning up a clean mount namespace on linux is a few ms, and taking more than a second means something is going wrong, like "pip install"ing at runtime instead of buildtime for some reason.
I can spin up a full linux vm and run some code in quicker than 5 seconds.
> Which cool, great, I sure love "pip install"ing every time instead of just baking a single container image with it already installed.
Obviously the correct thing for such a use case would be building their own image with whatever tools are needed and then using that.
Unfortunately, then they’d probably get roasted for not maintaining the image well enough and not having proper enough automation set up to keep it recent in Docker Hub or wherever, which they’d then also have to do. On an individual level, it’s easier to just hold it wrong and do what works. Could also build the image locally once, but again, more work.
I think the ideal DX on Docker’s side would be:
docker run some-container --pre-requisites “pip install…”
Basically support for some list of commands needing to be done, which would build an intermediate image locally and reuse it whenever the same base image and prerequisite command is used in a run command. Then you could avoid doing unnecessary init in the container itself and keep using silly little scripts without having to push reusable images and keep them up to date yourself.
You just set up CI/CD on GitHub and have it dump an image into ghcr it’s trivial. Claude excells at seeing up workflows. I don't know why anyone bothers with Dockerhub at all really.
> You just set up CI/CD on GitHub and have it dump an image into ghcr it’s trivial. Claude excells at seeing up workflows. I don't know why anyone bothers with Dockerhub at all really.
Probably because Docker Hub has great discoverability. For my own needs I use Gitea Packages and Woodpecker CI cause GitHub actions feels worse (also in comparison to GitLab CI) or maybe sometimes Nexus as a registry when I need to decouple from my Git platform or need a pull through proxy no matter the original source (I also build my own base and language runtime images but just host upstream PostgreSQL for example). I also use Claude quite heavily but the reality is that not everyone does and anything that we say should “just” be done often won’t because of a variety of reasons (limited free time also being one of those).
All of that is a bit like saying "I don't know why anyone would willingly inflict Docker upon themselves when Podman exists!" or maybe that people should just prefer FreeBSD jails or NixOS. I will say, however, that making the good stuff easier to do is always a good move.
FWIW, people making the exact mistake you describe, at scale, is the only hypothesis I ever came up with for the sheer number of downloads from PyPI that pip used to get (and many other things that you wouldn't expect production code to need at runtime, like `setuptools`, still do). You'd think that ordinary users would only ever need to get it from PyPI when they upgrade, which admittedly could happen once per pip version per virtual environment if you didn't know or care how to do it any better. But we're talking about over half a billion downloads per month. It used to be firmly on the top 20 list.
Really, the fact that any package gets that many downloads is crazy to me. (I think the main reason that boto3 ecosystem stuff tops the charts is that they apparently publish new wheels daily.) How many devices run Python? How many of those need, say, Numpy? How many of those really care about being on the latest version all the time, and can't use a cached version? (Granted, another problem here is that you can't readily tell pip "prefer a cached version if anything already cached is usable". Pip doesn't even know what's in its own cache, unless it was built locally; the cache is really only there to power a caching HTTPS proxy, so it stores artifacts keyed by a hash of the original download URL.)
That's really bad... I don't care if people (probably LLM here) do these kind of mistakes in their own personal tooling. But when you're going to distribute it as some sort of library, it becomes unacceptable.
Write public libraries for solving issues of domains you are an expert in. If your library is LLM generated, it is most likely useless and full of errors that will waste other people's time and resources.
The problem isn't getting an AI agent running in a sandbox. That's trivial. The problem is getting an existing enterprise project runnable inside the sandbox too, with no access to production keys or data or even test-db-that-is-actually-just-a-copy-of-prod, but with access to mock versions of all the various microservices and api's that the project depends on.
I was curious, so I dug a bit.
Under the hood it's effectively running:
Which cool, great, I sure love "pip install"ing every time instead of just baking a single container image with it already installed.This isn't any sort of fancy or interesting sandboxing, this is shelling out to "docker run", and not even using docker as well as it could.
Quoting from the linked page:
> The tradeoff is ~5-10 seconds of container startup overhead
Sure, maybe it's 5-10 seconds if you use containers wrong. Unpacking a root filesystem and spinning up a clean mount namespace on linux is a few ms, and taking more than a second means something is going wrong, like "pip install"ing at runtime instead of buildtime for some reason.
I can spin up a full linux vm and run some code in quicker than 5 seconds.
> Which cool, great, I sure love "pip install"ing every time instead of just baking a single container image with it already installed.
Obviously the correct thing for such a use case would be building their own image with whatever tools are needed and then using that.
Unfortunately, then they’d probably get roasted for not maintaining the image well enough and not having proper enough automation set up to keep it recent in Docker Hub or wherever, which they’d then also have to do. On an individual level, it’s easier to just hold it wrong and do what works. Could also build the image locally once, but again, more work.
I think the ideal DX on Docker’s side would be:
Basically support for some list of commands needing to be done, which would build an intermediate image locally and reuse it whenever the same base image and prerequisite command is used in a run command. Then you could avoid doing unnecessary init in the container itself and keep using silly little scripts without having to push reusable images and keep them up to date yourself.You just set up CI/CD on GitHub and have it dump an image into ghcr it’s trivial. Claude excells at seeing up workflows. I don't know why anyone bothers with Dockerhub at all really.
> You just set up CI/CD on GitHub and have it dump an image into ghcr it’s trivial. Claude excells at seeing up workflows. I don't know why anyone bothers with Dockerhub at all really.
Probably because Docker Hub has great discoverability. For my own needs I use Gitea Packages and Woodpecker CI cause GitHub actions feels worse (also in comparison to GitLab CI) or maybe sometimes Nexus as a registry when I need to decouple from my Git platform or need a pull through proxy no matter the original source (I also build my own base and language runtime images but just host upstream PostgreSQL for example). I also use Claude quite heavily but the reality is that not everyone does and anything that we say should “just” be done often won’t because of a variety of reasons (limited free time also being one of those).
All of that is a bit like saying "I don't know why anyone would willingly inflict Docker upon themselves when Podman exists!" or maybe that people should just prefer FreeBSD jails or NixOS. I will say, however, that making the good stuff easier to do is always a good move.
FWIW, people making the exact mistake you describe, at scale, is the only hypothesis I ever came up with for the sheer number of downloads from PyPI that pip used to get (and many other things that you wouldn't expect production code to need at runtime, like `setuptools`, still do). You'd think that ordinary users would only ever need to get it from PyPI when they upgrade, which admittedly could happen once per pip version per virtual environment if you didn't know or care how to do it any better. But we're talking about over half a billion downloads per month. It used to be firmly on the top 20 list.
Really, the fact that any package gets that many downloads is crazy to me. (I think the main reason that boto3 ecosystem stuff tops the charts is that they apparently publish new wheels daily.) How many devices run Python? How many of those need, say, Numpy? How many of those really care about being on the latest version all the time, and can't use a cached version? (Granted, another problem here is that you can't readily tell pip "prefer a cached version if anything already cached is usable". Pip doesn't even know what's in its own cache, unless it was built locally; the cache is really only there to power a caching HTTPS proxy, so it stores artifacts keyed by a hash of the original download URL.)
That's really bad... I don't care if people (probably LLM here) do these kind of mistakes in their own personal tooling. But when you're going to distribute it as some sort of library, it becomes unacceptable.
Write public libraries for solving issues of domains you are an expert in. If your library is LLM generated, it is most likely useless and full of errors that will waste other people's time and resources.
> This isn't any sort of fancy or interesting sandboxing, this is shelling out to "docker run", and not even using docker as well as it could.
That doesn’t sound right - the LLM told them it was a fantastic idea!
The problem isn't getting an AI agent running in a sandbox. That's trivial. The problem is getting an existing enterprise project runnable inside the sandbox too, with no access to production keys or data or even test-db-that-is-actually-just-a-copy-of-prod, but with access to mock versions of all the various microservices and api's that the project depends on.
This feels less like "agents" and more like a controlled generate → execute → fix loop.
Works great when you have a clear verification signal (tests passing), but what drives convergence when that signal isn’t well-defined?
Couldn't you just do AgentExecutor(...).run(task="...") and launch an autonomous AI in only one line?
The “Do X in Y lines of code” thing where the Y lines of code include import statements is so, so silly.
self-plug here.
Launch an AI agent to operate on production servers/sql safely using tmux
https://news.ycombinator.com/item?id=47411242
If you want sandboxed access to git, Slack, Gmail, etc, I built https://agentblocks.ai
[dead]
[dead]
[dead]
[flagged]
[flagged]
[dead]
[flagged]
[flagged]
Wrong story. :)
https://news.ycombinator.com/item?id=47387268
Grandparent is an LLM/agent, so what can you expect... sigh.