The PR was invented because humans can't be trusted to push directly to main. If automated verification is more thorough than human review (and benchmarks suggest it is), the PR becomes a receipt, not a gate.
Has anyone else been thinking along these lines? Curious if this resonates or if there are fundamental blockers I'm not seeing.
The experiments with model collapse would seem to indicate that doing this will go very poorly. See also examples of claw spam for what this looks like today
Model collapse happens when AI output feeds back into AI training data. That's a real problem for models.
But self-sustaining codebases aren't using AI to verify AI. The verification layer is deterministic: dependency graphs, targeted test suites, blast radius computation. These are structural checks, not generative ones. The graph doesn't hallucinate. Tests either pass or they don't
The claw spam problem is what happens when you have no verification at all.
The PR was invented because humans can't be trusted to push directly to main. If automated verification is more thorough than human review (and benchmarks suggest it is), the PR becomes a receipt, not a gate.
Has anyone else been thinking along these lines? Curious if this resonates or if there are fundamental blockers I'm not seeing.
The experiments with model collapse would seem to indicate that doing this will go very poorly. See also examples of claw spam for what this looks like today
Model collapse happens when AI output feeds back into AI training data. That's a real problem for models.
But self-sustaining codebases aren't using AI to verify AI. The verification layer is deterministic: dependency graphs, targeted test suites, blast radius computation. These are structural checks, not generative ones. The graph doesn't hallucinate. Tests either pass or they don't
The claw spam problem is what happens when you have no verification at all.
Model collapse happens outside of the training loop as well, at the agentic level, when left to their own devices for too long.