My understanding of UB is that in programming, there are corner cases which should be ignored by the language system. For example, C writing to a zero pointer is not “wrong”, you told it to write there when you dereferenced a zero pointer and assigned a value. The effects and the mitigation thereof just need to be outside the language, because otherwise you are saying that each and every memory access needs to be checked. So UB is just the language system saying “hey, I don’t know what happens, and we are not going to guarantee anything, and you can expect the world to explode if you do this, but not really, but yes really.”
That would be a much nicer definition of UB than is actually the case. What the standards say is that a program that writes to a null pointer (distinct from a zero pointer) is semantically meaningless. The compiler can assume the write doesn't happen at all, call some other functions first, initialize the pointer with a value it takes later, or whatever else it wants. One effect of this is so-called "time-traveling", where the entire execution trace before the point of UB in the execution trace is also semantically undefined. C23 updated to clarify that time travel isn't a valid interpretation, but it remains valid in C++.
Is it even worth pointing out that the author misunderstands how UB works in Rust, given that at this point such a misunderstanding has to be willful than otherwise? There is _no_ UB in the safe subset of Rust, and in mixed safe-unsafe rust, the UB can only originate in the unsafe blocks.
In modern C++ (i.e. with smart pointers) something similar is true in that the UB can only occur in code dealing with certain types and functions (e.g. raw pointers), and not other types and functions. It's really the same as rust, just without the compiler support and the explicit unsafe blocks.
I think you are entirely missing the author's point. The author is generalizing from the specific technicalities of C/Rust/etc UB, to the problem with UB which is that should it be triggered, then you can't know what the program will do. This does not have to be the result of language specification. If writing safe Rust yourself, yes no UB will occur usually, and you can know what will happen based off of what code you wrote. The author extends UB to vibecoding where there is no specification to understand the translation of prompts to code. Without thorough review, you are unable to be sure that the output code matches the intent of your prompting, which is analagous to writing code with UB. The issue the author has with vibecoded Rust is not that the code can trigger undefined behavior at the language layer, but that the perfectly "safe" code generated may not at all match the intended semantics.
The problem with the author's argument is the inductions don't follow from the premise. With defined C, you can in principle look at a piece of code and know what it will do in the abstract machine (or at least build a model dependent on assumptions about things like unspecified behavior). Actually doing this may be practically impossible, but that's not the point. It's not possible in the presence of UB. You can't know what a piece of code containing UB will do, even in principle.
You can in principle read the LLM's output and know that it won't put your credentials on the net, so it's not the same as UB. Maybe there are practical similarities to UB in how LLM bugs present, but I'm not sure it's a useful comparison and it's not the argument the author made.
> The problem with the author's argument is the inductions don't follow from the premise.
That's possible. No one ever accused me of sound arguments :-)
I would still like to address your comment anyway.
Lets call this assertion #1:
> With defined C, you can in principle look at a piece of code and know what it will do in the abstract machine ... It's not possible in the presence of UB.
And lets call this assertion #2:
> You can in principle read the LLM's output and know that it won't put your credentials on the net, so it's not the same as UB.
With Assertion #1 you state you are not examining the output of the compiler, you are only examining the input (i.e. the source code).
With Assertion #2, you state you are examining the output of the LLM, and you are not examining the input.
IOW, these two actions are not comparable because in one you examine only the input while in the other you examine only the output.
In short: you are comparing analysing the input in one case with analysing the output in another case.
For the case of accidentally doing $FOO when trying to do $BAR:
1. No amount of input-analysis on LLM prompts will ever reveal to you if it generated code that will do $FOO - you have to analyse the output. There is a zero percent chance that examining the prompt "Do $BAR" will reveal to the examiner that their credentials will be leaked by the generated code.
2. There is a large number of automated input-analysis for C that will catch a large number of UB that prevents $FOO, when the code implements "Do $BAR". Additionally, while a lot of UB gets through, a great deal are actually caught during review.
Think of the case: "I wrote code to add two numbers, but UB caused files to get deleted off my computer"
In C, this was always possible (and C programmers acted accordingly). In Java, C#, Rust, etc this was never possible. Unless your code was generated by an LLM.
That's a good point, I didn't realize I was implicitly mixing up inputs and outputs.
I think you're imagining a very particular way of using LLMs though. The source code is the source of truth in traditional development. It's the artifact we preserve long term and the one that's used to regenerate ephemeral artifacts like binaries. When you regenerate binaries from source code containing UB, the result may not behave the same as before. Each binary's semantics can be individually understood, but not the semantics of future translations.
If you treat the entire LLM->binary system as a black box, then yeah. I agree there's no reasonable way to go from input to output semantics, much as there isn't if you ask a human. But people generally aren't using the prompt as the source of truth. They're using the code that's produced, which (in the absence of traditional UB) will have the same semantics every time it's used even if the initial LLM doesn't.
If that's the author's point then the article needs a rewrite. I suspect that was _not_ the author's point and it's offered as a good faith but misplaced post-hoc justification.
>> Without thorough review, you are unable to be sure that the output code matches the intent of your prompting, which is analagous to writing code with UB.
> If that's the author's point then the article needs a rewrite. I suspect that was _not_ the author's point and it's offered as a good faith but misplaced post-hoc justification.
I am the author (thanks for giving some of your valuable attention to my post; much appreciated :-), and I can confirm that the `>> ...` quoted bit above is my point, and this bit of my blog-post is where I made that specific point
> As of today 2, there is a large and persistent drive to not just incorporate LLM assistance into coding, but to (in the words of the pro-LLM-coding group) “Move to a higher level of abstraction”.
> What this means is that the AI writes the code for you, you “review” (or not, as stated by Microsoft, Anthropic, etc), and then push to prod.
> Brilliant! Now EVERY language can exhibit UB.
Okay, fair enough, I'm not the worlds best writer, but I thought that bit was pretty clear when I wrote it. I still think it's clear. Especially the "Now EVERY language can exhibit UB" bit.
I'm now half inclined to paste the entire blog into a ChatAI somewhere and see what it thinks my conclusion is...
My understanding of UB is that in programming, there are corner cases which should be ignored by the language system. For example, C writing to a zero pointer is not “wrong”, you told it to write there when you dereferenced a zero pointer and assigned a value. The effects and the mitigation thereof just need to be outside the language, because otherwise you are saying that each and every memory access needs to be checked. So UB is just the language system saying “hey, I don’t know what happens, and we are not going to guarantee anything, and you can expect the world to explode if you do this, but not really, but yes really.”
That would be a much nicer definition of UB than is actually the case. What the standards say is that a program that writes to a null pointer (distinct from a zero pointer) is semantically meaningless. The compiler can assume the write doesn't happen at all, call some other functions first, initialize the pointer with a value it takes later, or whatever else it wants. One effect of this is so-called "time-traveling", where the entire execution trace before the point of UB in the execution trace is also semantically undefined. C23 updated to clarify that time travel isn't a valid interpretation, but it remains valid in C++.
Is it even worth pointing out that the author misunderstands how UB works in Rust, given that at this point such a misunderstanding has to be willful than otherwise? There is _no_ UB in the safe subset of Rust, and in mixed safe-unsafe rust, the UB can only originate in the unsafe blocks.
In modern C++ (i.e. with smart pointers) something similar is true in that the UB can only occur in code dealing with certain types and functions (e.g. raw pointers), and not other types and functions. It's really the same as rust, just without the compiler support and the explicit unsafe blocks.
I think you are entirely missing the author's point. The author is generalizing from the specific technicalities of C/Rust/etc UB, to the problem with UB which is that should it be triggered, then you can't know what the program will do. This does not have to be the result of language specification. If writing safe Rust yourself, yes no UB will occur usually, and you can know what will happen based off of what code you wrote. The author extends UB to vibecoding where there is no specification to understand the translation of prompts to code. Without thorough review, you are unable to be sure that the output code matches the intent of your prompting, which is analagous to writing code with UB. The issue the author has with vibecoded Rust is not that the code can trigger undefined behavior at the language layer, but that the perfectly "safe" code generated may not at all match the intended semantics.
The problem with the author's argument is the inductions don't follow from the premise. With defined C, you can in principle look at a piece of code and know what it will do in the abstract machine (or at least build a model dependent on assumptions about things like unspecified behavior). Actually doing this may be practically impossible, but that's not the point. It's not possible in the presence of UB. You can't know what a piece of code containing UB will do, even in principle.
You can in principle read the LLM's output and know that it won't put your credentials on the net, so it's not the same as UB. Maybe there are practical similarities to UB in how LLM bugs present, but I'm not sure it's a useful comparison and it's not the argument the author made.
> The problem with the author's argument is the inductions don't follow from the premise.
That's possible. No one ever accused me of sound arguments :-)
I would still like to address your comment anyway.
Lets call this assertion #1:
> With defined C, you can in principle look at a piece of code and know what it will do in the abstract machine ... It's not possible in the presence of UB.
And lets call this assertion #2:
> You can in principle read the LLM's output and know that it won't put your credentials on the net, so it's not the same as UB.
With Assertion #1 you state you are not examining the output of the compiler, you are only examining the input (i.e. the source code).
With Assertion #2, you state you are examining the output of the LLM, and you are not examining the input.
IOW, these two actions are not comparable because in one you examine only the input while in the other you examine only the output.
In short: you are comparing analysing the input in one case with analysing the output in another case.
For the case of accidentally doing $FOO when trying to do $BAR:
1. No amount of input-analysis on LLM prompts will ever reveal to you if it generated code that will do $FOO - you have to analyse the output. There is a zero percent chance that examining the prompt "Do $BAR" will reveal to the examiner that their credentials will be leaked by the generated code.
2. There is a large number of automated input-analysis for C that will catch a large number of UB that prevents $FOO, when the code implements "Do $BAR". Additionally, while a lot of UB gets through, a great deal are actually caught during review.
Think of the case: "I wrote code to add two numbers, but UB caused files to get deleted off my computer"
In C, this was always possible (and C programmers acted accordingly). In Java, C#, Rust, etc this was never possible. Unless your code was generated by an LLM.
That's a good point, I didn't realize I was implicitly mixing up inputs and outputs.
I think you're imagining a very particular way of using LLMs though. The source code is the source of truth in traditional development. It's the artifact we preserve long term and the one that's used to regenerate ephemeral artifacts like binaries. When you regenerate binaries from source code containing UB, the result may not behave the same as before. Each binary's semantics can be individually understood, but not the semantics of future translations.
If you treat the entire LLM->binary system as a black box, then yeah. I agree there's no reasonable way to go from input to output semantics, much as there isn't if you ask a human. But people generally aren't using the prompt as the source of truth. They're using the code that's produced, which (in the absence of traditional UB) will have the same semantics every time it's used even if the initial LLM doesn't.
If that's the author's point then the article needs a rewrite. I suspect that was _not_ the author's point and it's offered as a good faith but misplaced post-hoc justification.
>> Without thorough review, you are unable to be sure that the output code matches the intent of your prompting, which is analagous to writing code with UB.
> If that's the author's point then the article needs a rewrite. I suspect that was _not_ the author's point and it's offered as a good faith but misplaced post-hoc justification.
I am the author (thanks for giving some of your valuable attention to my post; much appreciated :-), and I can confirm that the `>> ...` quoted bit above is my point, and this bit of my blog-post is where I made that specific point
> As of today 2, there is a large and persistent drive to not just incorporate LLM assistance into coding, but to (in the words of the pro-LLM-coding group) “Move to a higher level of abstraction”.
> What this means is that the AI writes the code for you, you “review” (or not, as stated by Microsoft, Anthropic, etc), and then push to prod.
> Brilliant! Now EVERY language can exhibit UB.
Okay, fair enough, I'm not the worlds best writer, but I thought that bit was pretty clear when I wrote it. I still think it's clear. Especially the "Now EVERY language can exhibit UB" bit.
I'm now half inclined to paste the entire blog into a ChatAI somewhere and see what it thinks my conclusion is...
The article is really about the dangers posed by the concept of UB (https://en.wikipedia.org/wiki/Undefined_behavior) extended to any language when the programs are LLM generated.