Hey HN. I've been running multi-agent AI coding workflows in production for 6 months now, and VNX is the governance system I built to make it actually work.
The problem isn't getting AI agents to write code — it's knowing when they went wrong, why, and preventing the same failure next time.
Every multi-agent framework I tried solved the demo but collapsed in production: no audit trail, no way to scope tasks, no quality enforcement, and when something broke three agents deep, no way to trace it.
VNX is a different approach. Four components, all filesystem-based:
1. Dispatch queue — T0 (orchestrator) breaks work into scoped tasks (150-300 lines max) and routes them to worker terminals. Each terminal runs its own AI CLI (Claude Code, Codex CLI, or Gemini CLI) with its own context window. No shared state between agents.
2. Receipt ledger — Every agent completion produces an append-only NDJSON receipt: what was dispatched, what was produced, which git commit, which files changed, duration, cost. After 1100+ entries, patterns emerge that you can't see any other way — which task types fail most, which agents struggle with which skills, where context pollution actually happens.
3. Quality gates — Deterministic, not LLM-based. The agent proposes, the gate validates: file size limits, test coverage thresholds, open blocker counts. Verdicts are APPROVE, HOLD, or ESCALATE. The LLM never decides whether its own work is good enough.
4. Context rotation — When an agent's context window fills up mid-task, a 3-hook pipeline detects it at 65%, has the agent write a structured handover, clears the session via tmux, and resumes with a fresh context window. Zero lost work, zero human intervention.
The whole thing runs in a 2x2 tmux grid. T0 orchestrates, T1-T3 execute. The terminal layout IS the architecture — each pane is a fully observable, independent agent session. I can read every thought, every tool call, every mistake. That's what "glass box" means: the opposite of agents calling agents inside a shared process where you're debugging the framework's abstractions.
cd vnx-orchestration/demo/dry-run
bash replay.sh --fast
This replays a real 6-PR development session with dispatches, receipts, quality verdicts, and open item resolution.
There's also a context rotation demo in demo/dry-run-context-rotation/.
What VNX is NOT: not a SaaS, not a framework you import into code, not an agent builder. It's bash + python, local-first, no database, no cloud dependency. MIT licensed.
What I'd love to discuss: governance approaches for AI agents in general. Quality gates, audit trails, scoping strategies — I think this is the actual hard problem in multi-agent systems, not the orchestration itself. Curious what patterns others have found.
Hey HN. I've been running multi-agent AI coding workflows in production for 6 months now, and VNX is the governance system I built to make it actually work. The problem isn't getting AI agents to write code — it's knowing when they went wrong, why, and preventing the same failure next time.
Every multi-agent framework I tried solved the demo but collapsed in production: no audit trail, no way to scope tasks, no quality enforcement, and when something broke three agents deep, no way to trace it.
VNX is a different approach. Four components, all filesystem-based:
1. Dispatch queue — T0 (orchestrator) breaks work into scoped tasks (150-300 lines max) and routes them to worker terminals. Each terminal runs its own AI CLI (Claude Code, Codex CLI, or Gemini CLI) with its own context window. No shared state between agents.
2. Receipt ledger — Every agent completion produces an append-only NDJSON receipt: what was dispatched, what was produced, which git commit, which files changed, duration, cost. After 1100+ entries, patterns emerge that you can't see any other way — which task types fail most, which agents struggle with which skills, where context pollution actually happens.
3. Quality gates — Deterministic, not LLM-based. The agent proposes, the gate validates: file size limits, test coverage thresholds, open blocker counts. Verdicts are APPROVE, HOLD, or ESCALATE. The LLM never decides whether its own work is good enough.
4. Context rotation — When an agent's context window fills up mid-task, a 3-hook pipeline detects it at 65%, has the agent write a structured handover, clears the session via tmux, and resumes with a fresh context window. Zero lost work, zero human intervention.
The whole thing runs in a 2x2 tmux grid. T0 orchestrates, T1-T3 execute. The terminal layout IS the architecture — each pane is a fully observable, independent agent session. I can read every thought, every tool call, every mistake. That's what "glass box" means: the opposite of agents calling agents inside a shared process where you're debugging the framework's abstractions.
Try it without any LLM:
git clone https://github.com/Vinix24/vnx-orchestration.git
cd vnx-orchestration/demo/dry-run bash replay.sh --fast
This replays a real 6-PR development session with dispatches, receipts, quality verdicts, and open item resolution.
There's also a context rotation demo in demo/dry-run-context-rotation/.
What VNX is NOT: not a SaaS, not a framework you import into code, not an agent builder. It's bash + python, local-first, no database, no cloud dependency. MIT licensed.
What I'd love to discuss: governance approaches for AI agents in general. Quality gates, audit trails, scoping strategies — I think this is the actual hard problem in multi-agent systems, not the orchestration itself. Curious what patterns others have found.