Interesting architecture. Im curious about the workflow when an agent hits a denied action, does it get a structured rejection it can reason about and try an alternative, or does it just fail? Wondering how the feedback loop works between safety kernel and the LLM's planning
Great question. This is actually a core design principle of the Cordum Agent Protocol (CAP).
It’s definitely a *structured rejection*, not a silent fail.
Since the LLM needs to "know" it was blocked to adjust its plan, the kernel returns a standard error payload (e.g., `PolicyViolationError`) with context.
The flow looks like this:
1. *Agent:* Sends intent "Delete production DB".
2. *Kernel:* Checks policy -> DENY.
3. *Kernel:* Returns a structured result: `{ "status": "blocked", "reason": "destructive_action_limit", "message": "Deletion requires human approval" }`.
4. *Agent (LLM):* Receives this as an observation.
5. *Agent (Re-planning):* "Oh, I can't delete it. I will generate a slack message to the admin asking for approval instead."
This feedback loop turns safety from a "blocker" into a constraint that the agent can reason around, which is critical for autonomous recovery.
I built formal testing for AI agents, runs on the cli, free version launching soon - includes MCP security tests and chaos engineering features: https://exordex.com/waitlist
It is overkill for a demo. But for my production environment, I need an external safety layer. I can't rely on 'prompt engineering' when real data is at stake.
Interesting architecture. Im curious about the workflow when an agent hits a denied action, does it get a structured rejection it can reason about and try an alternative, or does it just fail? Wondering how the feedback loop works between safety kernel and the LLM's planning
Great question. This is actually a core design principle of the Cordum Agent Protocol (CAP).
It’s definitely a *structured rejection*, not a silent fail. Since the LLM needs to "know" it was blocked to adjust its plan, the kernel returns a standard error payload (e.g., `PolicyViolationError`) with context.
The flow looks like this: 1. *Agent:* Sends intent "Delete production DB". 2. *Kernel:* Checks policy -> DENY. 3. *Kernel:* Returns a structured result: `{ "status": "blocked", "reason": "destructive_action_limit", "message": "Deletion requires human approval" }`. 4. *Agent (LLM):* Receives this as an observation. 5. *Agent (Re-planning):* "Oh, I can't delete it. I will generate a slack message to the admin asking for approval instead."
This feedback loop turns safety from a "blocker" into a constraint that the agent can reason around, which is critical for autonomous recovery.
I built formal testing for AI agents, runs on the cli, free version launching soon - includes MCP security tests and chaos engineering features: https://exordex.com/waitlist
Nice job, but is'nt it a bit overkill?
It is overkill for a demo. But for my production environment, I need an external safety layer. I can't rely on 'prompt engineering' when real data is at stake.