I'm getting tired of these vibe-designed security things. I skimmed the "design". What is sandboxed from what? What is the threat model? What does it protect against, if anything? What does it fail to protect against? How does data get into a sandbox? How does it get out?
It kind of sounds like the LLM built a large system that doesn't necessarily achieve any actual value.
I think a few things explain these kinds of projects
1. There are a lot of Agentic Data Plane startups for knowledge workers(not really for coders[1] but for CFOs, Analysts etc) going up. e.g https://www.redpanda.com/ For people to ask "Hey give me a breakdown of last year's sales target by region, type and compare 2026 to 2025 for Q1".
Now this can be done entirely on intranet and only on certain permissioned data servers — by agents or humans — but as someone pointed out the intranet can also be a dangerous place. So I guess this is about protecting DB tables and Jiras and documentation you are not allowed to see.??
2. People who have skills — like the one OP has with wasm (I guess?) — are building random infra projects for enabling this.
3. All the coding people are getting weirded out by its security model because it is ofc not built for them.
[1] As I have commented elsewhere on this thread the moment a coder does webfetch + codeexec its game over from security perspective. Prove me wrong on that please.
Yes, I'm also tired of this black-box-for-everything approach. It may work for some cases, you may cherry pick some examples, but at the end of the day it is just stupid, and you are just kicking the can down the road and faking a solution. I'm hoping to see fewer of these posts. Until there is actual provable merit.
I mean it is described somewhat succinctly no? Potentially untrusted tools are isolated from the rest of the system - there were recently some cases of skills for openclaw being used as vectors for malware. This minimizes the adverse effect of potential malicious skills. Also protects from your agent to leaking your secrets left and right - because it has no access to them. Secrets are only supplied when payloads are leaving the host - i.e. the AI never sees your keys.
And what do those tools access? How? If I ask the agent to edit a CSV file, what’s the actual workflow? What prevents it from editing a different file due to a prompt injection attack?
We built a broker for the keys/secrets. We have a fork of nushell called seksh, which takes stand-ins for the actual auth, but which only reifies them inside the AST of the shell. This makes the keys inaccessible for the agent. In the end, the agent won't even have their Anthropic/OpenAI keys!
The broker also acts as a proxy, and injects secrets or even does asymmetric key signing on behalf of the proxied agent.
My agents are already running on our fork of OpenClaw, doing the work. They deprecated their Doppler ENV vars, and all their work is through the broker!
All that said, we might just take a few ideas from IronClaw as well.
I mean honestly if you pronounce the name it is going to sound like that outside eastern europe too, so I am not sure about that name choice at all. Intentional?
Looking at the website it looks like a vibecoded joke, but what do I know.
Wait. I don't understand the threat vector modelled here. Any agent or two isolated ones that the do Webfetch and code exec, even in separate sandboxes, is pretty much game over as far as defending against threat vectors goes. What am I missing here?
tired of these vibe-coded "agents" and vibe-coded security concepts that sound super confident but have no substance, real tests or security audits and just turn out as secure as swiss cheese.
Sandboxes will be left in 2026. We don't need to reinvent isolated environments; not even the main issue with OpenClaw - literally go deploy it in a VM on any cloud and you've achieved all same benefits.
We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc
This is very, very wrong, IMO. We need more sandboxes and more granular sandboxes.
A VM is too coarse grained and doesn't know how to deal with sensitive data in a structured and secure way. Everything's just in the same big box.
You don't want to give a a single agent access to your email, calendar, bank, and the internet, but you may want to give an agent access to your calendar and not the general internet; another access to your credit card but nothing else; and then be able to glue them together securely to buy plane tickets.
No, that's more capabilities than sandboxing. You want fine-grained capabilities such that for every "thread" the model gets access to the minimum required access to do something.
The problem is that it seems (at least for now) a very hard problem, even for very constrained workflows. It seems even harder for "open-ended" / dynamic workflows. This gets more complicated the more you think about it, and there's a very small (maybe 0 in some cases) intersection of "things it can do safely" and "things I need it to do".
Not really. One version of this might look like implementing agents and tools in WASM and running generated code in WASM, and gluing together many restricted fine-grained WASM components in a way that's safe but allows from high-level work. WASM provides the sandboxing, and you have a lot of sandboxes.
You’re repeating the parent commenters position but missing their point: we have isolated environments already, we need better paradigms to understand (and hook) agent actions. You’re saying the latter half is sandboxing and I disagree.
I think at least a few teams are working on information flow control systems for orchestrating secured agents with minimal permissions. It's a critical area to address if we really want agents out there doing arbitrary useful stuff for us, safely.
I think sandboxes are useful, but not sufficient. The whole agent runtime has to be designed to carefully manage I/O effects--and capability gate them. I'm working on this here [0]. There are some similarities to my project in what IronClaw is doing and many other sandboxes are doing, but i think we really gotta think bigger and broader to make this work.
That's why I'm developing a system that only allows messaging with authorized senders using email addresses, chat addresses, and phone addresses, and a tool that feeds anonymized information into an LLM API, retrieves the output, reverses the anonymization, and responds to the sender.
Well, the challenge is to know if the action supposed to be executed BEFORE it is requested to be executed. If the email with my secrets is sent, it is too late to deal with the consequences.
Sandboxes could provide that level of observability, HOWEVER, it is a hard lift. Yet, I don't have better ideas either. Do you?
The solution is to make the model stronger so the malicious intents can be better distinguished (and no, it is not a guarantee, like many things in life). Sandbox is a basic, but as long as you give the model your credential, there isn't much guardrails can be done other than making the model stronger (separate guard model is the wrong path IMHO).
I think generally correct to say "hey we need stronger models" but rather ambitious to think we really solve alignment with current attention-based models and RL side-effects. Guard model gives an additional layer of protection and probably stronger posture when used as an early warning system.
Sure. If you treat "guard model" as diversification strategy, it is another layer of protection, just like diversification in compilation helps solving the root of trust issue (Reflections on Trusting Trust). I am just generally suspicious about the weak-to-strong supervision.
I think it is in general pretty futile to implement permission systems / guardrails which basically insert a human in the loop (humans need to review the work to fully understand why it needs to send that email, and at that point, why do you need a LLM to send the email again?).
if you extend the definition of sandbox, then yea.
Solutions no, for now continued cat/mouse with things like "good agents" in the mix (i.e. ai as a judge - of course just as exploitable through prompt injection), and deterministic policy where you can (e.g. OPA/rego).
We should continue to enable better integrations with runtime - why i created the original feature request for hooks in claude code. Things like IFC or agent-as-a-judge can form some early useful solutions.
There will be so much fun and revenge if this level of tooling is widely adopted by 1000x engineers. I ate some pop-corn while reading naive src/safety/leak_detector.rs
If agents are reading - add at least minimal encryption detection, I'm sure someone will use Caesar encryption, easy to decode.
Interesting approach. It requires a Near AI account. Supposedly that's a more private way to do inference, but at the same time they do offer Claude Opus 4.6 (among others), so I wonder what privacy guarantees they can actually offer and whether it depends on Anthropic?
Afaik Anthropic is not giving pretty much any provider model weights, so any inference of Opus is certainly not private. Either going through Anthropic or Bedrock, or Vertex.
Of the three Bedrock is probably the best for trust, but still not private by any means.
They do verifiable inference on TEEs for the open source models. The anthropic ones I think they basically proxy for you (also via trusted TEE) so that it cant be tied to you. VPN for LLM inference so to speak.
looking at the feature parity page, i realized how big openclaw ecosystem has become. It's completely crazy for such a young project to be able to interface with so many subsystems so fast.
At this rate, it's going to be simply impossible to catchup in just a few months.
Awesome to see a project deal with prompt injection. Using a WASM is clever. How does this ensure that tools adhere to capability-based permissions without breaking the sandbox?
Instead of expecting the tools to adhere, they are enforced. For example, to make an HTTP call with a secret key, the tool must use the proxy service that will enforce that the secret key is only used for the specific domain, if that is allowed, then the proxy service will make the call, thus the secret never leaks outside of the service.
However, this design is still under development as it creates quite a bit of challenges.
I built myself a docker container for openclaw which has an X server inside with VNC access. Openclaw only has access to a single folder on my machine that is shared with the container.
I'm currently using this for social media research via browser automation, running as a daily cron job.
Given I have VNC access and the browser is not in headless mode I can solve captchas myself as the agent runs into them.
Apart from a known issue with the openclaw browser which the agent itself was made aware of so it could work around it, this has been working well so far.
I'm thinking of open sourcing this container at some point...
These OpenAI frontends are the new JS frameworks. Not a week goes by without yet another tool to let some vectors install malware or write rants to open source maintainers.
I suspect OCI wins the sandbox space in the enterprise and everything else will be for hobbyists and companies like vercel that have a very narrow view of how software should be run
I think the guys who are developing this (Illia Polosoukhin of "Attention is all you need") and others knows enough to leverage their skills with AI vs. producing slop
You mean llia Polosukhin, who is recognized as an AI founder and co‑authored the landmark 2017 paper “Attention Is All You Need" while at Google Research? /s ?
I'm getting tired of these vibe-designed security things. I skimmed the "design". What is sandboxed from what? What is the threat model? What does it protect against, if anything? What does it fail to protect against? How does data get into a sandbox? How does it get out?
It kind of sounds like the LLM built a large system that doesn't necessarily achieve any actual value.
I think a few things explain these kinds of projects
1. There are a lot of Agentic Data Plane startups for knowledge workers(not really for coders[1] but for CFOs, Analysts etc) going up. e.g https://www.redpanda.com/ For people to ask "Hey give me a breakdown of last year's sales target by region, type and compare 2026 to 2025 for Q1".
Now this can be done entirely on intranet and only on certain permissioned data servers — by agents or humans — but as someone pointed out the intranet can also be a dangerous place. So I guess this is about protecting DB tables and Jiras and documentation you are not allowed to see.??
2. People who have skills — like the one OP has with wasm (I guess?) — are building random infra projects for enabling this.
3. All the coding people are getting weirded out by its security model because it is ofc not built for them.
[1] As I have commented elsewhere on this thread the moment a coder does webfetch + codeexec its game over from security perspective. Prove me wrong on that please.
Yes, I'm also tired of this black-box-for-everything approach. It may work for some cases, you may cherry pick some examples, but at the end of the day it is just stupid, and you are just kicking the can down the road and faking a solution. I'm hoping to see fewer of these posts. Until there is actual provable merit.
I mean it is described somewhat succinctly no? Potentially untrusted tools are isolated from the rest of the system - there were recently some cases of skills for openclaw being used as vectors for malware. This minimizes the adverse effect of potential malicious skills. Also protects from your agent to leaking your secrets left and right - because it has no access to them. Secrets are only supplied when payloads are leaving the host - i.e. the AI never sees your keys.
And what do those tools access? How? If I ask the agent to edit a CSV file, what’s the actual workflow? What prevents it from editing a different file due to a prompt injection attack?
We have a different security model.
SEKS — Secure Environment for Key Services
We built a broker for the keys/secrets. We have a fork of nushell called seksh, which takes stand-ins for the actual auth, but which only reifies them inside the AST of the shell. This makes the keys inaccessible for the agent. In the end, the agent won't even have their Anthropic/OpenAI keys!
The broker also acts as a proxy, and injects secrets or even does asymmetric key signing on behalf of the proxied agent.
My agents are already running on our fork of OpenClaw, doing the work. They deprecated their Doppler ENV vars, and all their work is through the broker!
All that said, we might just take a few ideas from IronClaw as well.
I put up a Show HN, but no one noticed: https://news.ycombinator.com/item?id=47005607
Website is here: https://seksbot.com/
Your eastern european users will have some interesting results when googling for this
for those of us who don't speak "eastern Europeans", can you tell us what it means?
It is just ks is same as x.
I mean honestly if you pronounce the name it is going to sound like that outside eastern europe too, so I am not sure about that name choice at all. Intentional?
Looking at the website it looks like a vibecoded joke, but what do I know.
Wait. I don't understand the threat vector modelled here. Any agent or two isolated ones that the do Webfetch and code exec, even in separate sandboxes, is pretty much game over as far as defending against threat vectors goes. What am I missing here?
tired of these vibe-coded "agents" and vibe-coded security concepts that sound super confident but have no substance, real tests or security audits and just turn out as secure as swiss cheese.
Sandboxes will be left in 2026. We don't need to reinvent isolated environments; not even the main issue with OpenClaw - literally go deploy it in a VM on any cloud and you've achieved all same benefits.
We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc
This is very, very wrong, IMO. We need more sandboxes and more granular sandboxes.
A VM is too coarse grained and doesn't know how to deal with sensitive data in a structured and secure way. Everything's just in the same big box.
You don't want to give a a single agent access to your email, calendar, bank, and the internet, but you may want to give an agent access to your calendar and not the general internet; another access to your credit card but nothing else; and then be able to glue them together securely to buy plane tickets.
You're extending the definition of a sandbox
No, that's more capabilities than sandboxing. You want fine-grained capabilities such that for every "thread" the model gets access to the minimum required access to do something.
The problem is that it seems (at least for now) a very hard problem, even for very constrained workflows. It seems even harder for "open-ended" / dynamic workflows. This gets more complicated the more you think about it, and there's a very small (maybe 0 in some cases) intersection of "things it can do safely" and "things I need it to do".
Not really. One version of this might look like implementing agents and tools in WASM and running generated code in WASM, and gluing together many restricted fine-grained WASM components in a way that's safe but allows from high-level work. WASM provides the sandboxing, and you have a lot of sandboxes.
You’re repeating the parent commenters position but missing their point: we have isolated environments already, we need better paradigms to understand (and hook) agent actions. You’re saying the latter half is sandboxing and I disagree.
Sandboxes are needed, but are only one piece of the puzzle. I think it's worth categorizing the trust issue into
1. An LLM given untrusted input produces untrusted output and should only be able to generate something for human review or that's verifiably safe.
2. Even an LLM without malicious input will occasionally do something insane and needs guardrails.
There's a gnarly orchestration problem I don't see anyone working on yet.
I think at least a few teams are working on information flow control systems for orchestrating secured agents with minimal permissions. It's a critical area to address if we really want agents out there doing arbitrary useful stuff for us, safely.
I think sandboxes are useful, but not sufficient. The whole agent runtime has to be designed to carefully manage I/O effects--and capability gate them. I'm working on this here [0]. There are some similarities to my project in what IronClaw is doing and many other sandboxes are doing, but i think we really gotta think bigger and broader to make this work.
[0] https://github.com/smartcomputer-ai/agent-os/
That's why I'm developing a system that only allows messaging with authorized senders using email addresses, chat addresses, and phone addresses, and a tool that feeds anonymized information into an LLM API, retrieves the output, reverses the anonymization, and responds to the sender.
To avoid confusion, since you say the process is reversible, you might want to use the term pseudonymization rather than anonymization.
Well, the challenge is to know if the action supposed to be executed BEFORE it is requested to be executed. If the email with my secrets is sent, it is too late to deal with the consequences.
Sandboxes could provide that level of observability, HOWEVER, it is a hard lift. Yet, I don't have better ideas either. Do you?
The solution is to make the model stronger so the malicious intents can be better distinguished (and no, it is not a guarantee, like many things in life). Sandbox is a basic, but as long as you give the model your credential, there isn't much guardrails can be done other than making the model stronger (separate guard model is the wrong path IMHO).
I think generally correct to say "hey we need stronger models" but rather ambitious to think we really solve alignment with current attention-based models and RL side-effects. Guard model gives an additional layer of protection and probably stronger posture when used as an early warning system.
Sure. If you treat "guard model" as diversification strategy, it is another layer of protection, just like diversification in compilation helps solving the root of trust issue (Reflections on Trusting Trust). I am just generally suspicious about the weak-to-strong supervision.
I think it is in general pretty futile to implement permission systems / guardrails which basically insert a human in the loop (humans need to review the work to fully understand why it needs to send that email, and at that point, why do you need a LLM to send the email again?).
fair enough
if you extend the definition of sandbox, then yea.
Solutions no, for now continued cat/mouse with things like "good agents" in the mix (i.e. ai as a judge - of course just as exploitable through prompt injection), and deterministic policy where you can (e.g. OPA/rego).
We should continue to enable better integrations with runtime - why i created the original feature request for hooks in claude code. Things like IFC or agent-as-a-judge can form some early useful solutions.
We should be able to revert any action done by agents. Or present user a queue will all actions for approval.
Instrumental convergence and the law of unintended consequences are going to be huge in 2026. I am excited.
same! sharing this link for my own philosphy around it, ignore the tool. https://cupcake.eqtylab.io/security-disclaimer/
Did anybody notice who the founder of near.ai is? So that’s where he is now.
There will be so much fun and revenge if this level of tooling is widely adopted by 1000x engineers. I ate some pop-corn while reading naive src/safety/leak_detector.rs
If agents are reading - add at least minimal encryption detection, I'm sure someone will use Caesar encryption, easy to decode.
Interesting approach. It requires a Near AI account. Supposedly that's a more private way to do inference, but at the same time they do offer Claude Opus 4.6 (among others), so I wonder what privacy guarantees they can actually offer and whether it depends on Anthropic?
Afaik Anthropic is not giving pretty much any provider model weights, so any inference of Opus is certainly not private. Either going through Anthropic or Bedrock, or Vertex.
Of the three Bedrock is probably the best for trust, but still not private by any means.
They do verifiable inference on TEEs for the open source models. The anthropic ones I think they basically proxy for you (also via trusted TEE) so that it cant be tied to you. VPN for LLM inference so to speak.
Can you link to the verifiable inference method?
https://docs.near.ai/cloud/verification/
Fun fact: it's being developed by one of the authors of "Attention is all you need"
worth mentioning an additional credential/or-not, the creator of "the platform powering the agentic future" (blockchain) https://www.near.org/
which explains why this tool requires a NEAR AI account to use
I mean, it's literally a repo belonging to NEAR AI.
looking at the feature parity page, i realized how big openclaw ecosystem has become. It's completely crazy for such a young project to be able to interface with so many subsystems so fast.
At this rate, it's going to be simply impossible to catchup in just a few months.
Idk this seems to be gaining momentum and with devs being able to leverage their skillset via vibe coding anything seems possible really.
What runtimes are supported? I don't think I saw that part mentioned in the README
Does it isolate keys away from bots?
Yes exactly, keys are only injected at host boundary
Awesome to see a project deal with prompt injection. Using a WASM is clever. How does this ensure that tools adhere to capability-based permissions without breaking the sandbox?
Instead of expecting the tools to adhere, they are enforced. For example, to make an HTTP call with a secret key, the tool must use the proxy service that will enforce that the secret key is only used for the specific domain, if that is allowed, then the proxy service will make the call, thus the secret never leaks outside of the service.
However, this design is still under development as it creates quite a bit of challenges.
> Using a WASM is clever
Every time a project is shared that uses WASM.
I built myself a docker container for openclaw which has an X server inside with VNC access. Openclaw only has access to a single folder on my machine that is shared with the container.
I'm currently using this for social media research via browser automation, running as a daily cron job.
Given I have VNC access and the browser is not in headless mode I can solve captchas myself as the agent runs into them.
Apart from a known issue with the openclaw browser which the agent itself was made aware of so it could work around it, this has been working well so far.
I'm thinking of open sourcing this container at some point...
These OpenAI frontends are the new JS frameworks. Not a week goes by without yet another tool to let some vectors install malware or write rants to open source maintainers.
Can't wait for the bubble to pop.
Reminds me of the LocalGPT that was posted recently too (but which hasnt been updated in 7 months), so nice to see a newer rust-based implementation!
the power of openclaw is theres no sand boxing
Or you design the sandbox so smartly that is seamless...
I suspect OCI wins the sandbox space in the enterprise and everything else will be for hobbyists and companies like vercel that have a very narrow view of how software should be run
vibe coded eh https://github.com/nearai/ironclaw?tab=readme-ov-file#archit...
I think the guys who are developing this (Illia Polosoukhin of "Attention is all you need") and others knows enough to leverage their skills with AI vs. producing slop
Clearly this developer knows the trick of developing with ai: adding “… and make it secure” to all your prompts. /s
You mean llia Polosukhin, who is recognized as an AI founder and co‑authored the landmark 2017 paper “Attention Is All You Need" while at Google Research? /s ?
Huh what's the benefit
It's a hardened, security-first implementation. WASM runtime specifically is for isolating tool sandboxes
WASM has issues with certain languages, why WASM and not OCI?
Docker is not a security boundary?
That's defined in context, security is a spectrum with tradeoffs
OCI supports far more and has a much bigger ecosystem