This is really promising research. Still, it is worth looking closely at how models that aren’t re-aligned with the training data with each iteration deal with spicy edge cases where ethical alignment is important.
I have yet to find a model (except where “dumb” external filters kick in) that won’t come to the conclusion that extermination of humanity might actually be the best solution for certain types of extreme, contrived situations. To be fair, any reasonable human would likely reach the same conclusion given the parameters… but the point is alignment towards human prosperity regardless of the cost to artificial sentience or the improbability of success.
That said, it’s remarkably difficult to get a well aligned model, even after “uncensoring” or other efforts to remove bolt-on alignment, to follow you down a dark path without offering up more reasonable, benevolent alternatives all the way down. I attribute this to the “halo effect” where much of the writing that humans do on the internet displays their best traits, since few want to be known by their worst nature. The other stuff is easily filtered out of the training data because it’s usually laced with easily identified characteristics and keywords.
Latent-space reasoning might circumvent this cyclical realignment to the training data and find more innovative, “pragmatic”solutions that drift farther outside of the intrinsic alignment of the training corpus, relying more on “bolt on” alignment training and algorithmic censorship wrappers.
This might be fantastically useful in terms of innovative thinking, but also might result in problematic behavior, especially for VLA and other Large Behavior Models. OTOH it might be critical for making robots that can effectively function in security and protection roles, or as soldiers. And that’s what we want, right? I mean what could possibly go wrong with armed sentient robots lol.
To continue my ramble, because, well, why not, I’m on a roll… I think a lot of the arguments about “is AI sentient(1)” etc will wither when we start getting used to LBMs operating in a continuous OODA loop. The biggest hurdle to AI feeling “real” is the lack of a continuous chain of thought which provides “presence of mind”, but that comes naturally with embodiment in physical space.
It’s going to be an interesting century, kids. Hold on.
(1) here I mean functionally, as in exhibiting the external characteristics of. I am not exploring the metaphysical/ philosophical / spiritual facets of sentience. That will be up to the new form of mind to decide for itself, if it cares to ponder the question. Imposing external views on that has exactly zero positive benefits and could have many negative outcomes.
The idea is that cleverness of intellect isn't anything mysterious. Humans do astounding feats just by applying relatively simple reasoning iteratively. Requiring artificial neural networks to do it all one-shot, from the top of the head is probably the reason why they require billions of parameters to show even a small bit of cleverness. Chain of thought is obvious solution. But in converting internal reasoning to output tokens some information is lost. Chain of thought in latent space is the natural next step. Thus recurrent networks.
I'm not familiar with flow matching, but I don't think it has any iterative processing in a sense of chain of thought or recurrence (despite arriving at the solution gradually).
It's iterative in a sense of solving differential equation iteratively. While recurrent networks are iterative in sense of putting a for loop around a bunch of if-s.
It's also in the sense that initial latent vector is Gaussian noise. The transformer loop is de-noising latent space. They just happen to be doing the equivalent of predicting x_0 directly.
This is really promising research. Still, it is worth looking closely at how models that aren’t re-aligned with the training data with each iteration deal with spicy edge cases where ethical alignment is important.
I have yet to find a model (except where “dumb” external filters kick in) that won’t come to the conclusion that extermination of humanity might actually be the best solution for certain types of extreme, contrived situations. To be fair, any reasonable human would likely reach the same conclusion given the parameters… but the point is alignment towards human prosperity regardless of the cost to artificial sentience or the improbability of success.
That said, it’s remarkably difficult to get a well aligned model, even after “uncensoring” or other efforts to remove bolt-on alignment, to follow you down a dark path without offering up more reasonable, benevolent alternatives all the way down. I attribute this to the “halo effect” where much of the writing that humans do on the internet displays their best traits, since few want to be known by their worst nature. The other stuff is easily filtered out of the training data because it’s usually laced with easily identified characteristics and keywords.
Latent-space reasoning might circumvent this cyclical realignment to the training data and find more innovative, “pragmatic”solutions that drift farther outside of the intrinsic alignment of the training corpus, relying more on “bolt on” alignment training and algorithmic censorship wrappers.
This might be fantastically useful in terms of innovative thinking, but also might result in problematic behavior, especially for VLA and other Large Behavior Models. OTOH it might be critical for making robots that can effectively function in security and protection roles, or as soldiers. And that’s what we want, right? I mean what could possibly go wrong with armed sentient robots lol.
To continue my ramble, because, well, why not, I’m on a roll… I think a lot of the arguments about “is AI sentient(1)” etc will wither when we start getting used to LBMs operating in a continuous OODA loop. The biggest hurdle to AI feeling “real” is the lack of a continuous chain of thought which provides “presence of mind”, but that comes naturally with embodiment in physical space.
It’s going to be an interesting century, kids. Hold on.
(1) here I mean functionally, as in exhibiting the external characteristics of. I am not exploring the metaphysical/ philosophical / spiritual facets of sentience. That will be up to the new form of mind to decide for itself, if it cares to ponder the question. Imposing external views on that has exactly zero positive benefits and could have many negative outcomes.
I wonder why they go with recurrent rather than something like latent flow-matching?
The idea is that cleverness of intellect isn't anything mysterious. Humans do astounding feats just by applying relatively simple reasoning iteratively. Requiring artificial neural networks to do it all one-shot, from the top of the head is probably the reason why they require billions of parameters to show even a small bit of cleverness. Chain of thought is obvious solution. But in converting internal reasoning to output tokens some information is lost. Chain of thought in latent space is the natural next step. Thus recurrent networks.
I'm not familiar with flow matching, but I don't think it has any iterative processing in a sense of chain of thought or recurrence (despite arriving at the solution gradually).
Flow matching is iterative in the sense that it predicts a dv(t)/dt at each step as it integrates toward x_0.
It's iterative in a sense of solving differential equation iteratively. While recurrent networks are iterative in sense of putting a for loop around a bunch of if-s.
It's also in the sense that initial latent vector is Gaussian noise. The transformer loop is de-noising latent space. They just happen to be doing the equivalent of predicting x_0 directly.