I've been going down this exact rabbit hole for the last few months. The 'opt-in guardrails' problem you mentioned is the dealbreaker. If the agent can just ignore the read_file tool wrapper and call os.system('cat ...'), the policy is useless.
I ended up building a 'capability token' primitive (think Macaroons or Google Zanzibar, but for ephemeral agent tasks) to solve this.
My approach (Tenuo) works like this:
1. Runtime Enforcement: The agent gets a cryptographically signed 'Warrant' that mechanically limits what the runtime allows. It’s not a 'rule' the LLM follows; it’s a constraint the runtime enforces (e.g., fs:read is only valid for /tmp/*).
2. Attenuation: As the agent creates sub-tasks, it can only delegate less authority than it holds.
3. Offline Verify: I wrote the core in Rust so I can verify these tokens in ~27µs on every single tool call without a network round-trip.
If you are building a POC, feel free to rip out the core logic or use the crate directly. I’d love to see more tools move away from 'prompt engineering security' toward actual runtime guarantees.
I've been going down this exact rabbit hole for the last few months. The 'opt-in guardrails' problem you mentioned is the dealbreaker. If the agent can just ignore the read_file tool wrapper and call os.system('cat ...'), the policy is useless.
I ended up building a 'capability token' primitive (think Macaroons or Google Zanzibar, but for ephemeral agent tasks) to solve this.
My approach (Tenuo) works like this:
1. Runtime Enforcement: The agent gets a cryptographically signed 'Warrant' that mechanically limits what the runtime allows. It’s not a 'rule' the LLM follows; it’s a constraint the runtime enforces (e.g., fs:read is only valid for /tmp/*).
2. Attenuation: As the agent creates sub-tasks, it can only delegate less authority than it holds.
3. Offline Verify: I wrote the core in Rust so I can verify these tokens in ~27µs on every single tool call without a network round-trip.
If you are building a POC, feel free to rip out the core logic or use the crate directly. I’d love to see more tools move away from 'prompt engineering security' toward actual runtime guarantees.
Repo: https://github.com/tenuo-ai/tenuo