You really can’t trust what an LLM says about its own identity. If it has seen tons of chats saying “I’m Claude / GPT‑4 / DeepSeek”, it will just echo that pattern in similar contexts. That’s more about dataset contamination and pattern matching than any real evidence it is that model.
Most people seem to think that phenomenon is not the same thing. People have shown by experimenting with different prompts that even in Mandarin, Claude correctly says it’s Claude when it is doing something for you. But if you ask it about its identity, it sometimes says DeepSeek. The current theory is it just has run into Chinese content that has chat logs that often have a DeepSeek model answering that it is DeepSeek. But the inconsistency in different prompts suggests this is something different from distillation.
If it was, that line is not an indicator. Distillation is done on useful prompts, not on "Who are you?" - "I'm this model of that company".
Name training is always shallow, Claude itself would claim it's GPT-3, GPT-4, or Reddit (heh) when confused. It's just dataset contamination, because the web is full of slop. Never trust self-reported names.
This has been a common issue with the Chinese open weight models. It appears most or all have been trained via distillation on OpenAI and Anthropic models.
They most likely weren't, despite very dubious claims of Amodei and Altman and a certain twitter influencer running a pretty naive writing benchmark ("slop test") that is wrong in a very obvious manner. The only unambiguous cases of distillation were Gemini 2.0 experimentals being trained on Claude outputs, and GLM-4.7 being trained on Gemini 3.0 Pro. The rest are pretty different from each other.
You really can’t trust what an LLM says about its own identity. If it has seen tons of chats saying “I’m Claude / GPT‑4 / DeepSeek”, it will just echo that pattern in similar contexts. That’s more about dataset contamination and pattern matching than any real evidence it is that model.
Isn't it great news for us?
You get an open model which is a 95% of Opus 4.6 quality and 80% cheaper in most inference providers and also can run on your own hardware
Also they did the hard parts of:
* crawling the content
* running the fine tuning (or training)
Better than 1 or 2 companies taking control of the whole AI economy
They all are trained by each other. Claude says it's DeepSeek if you ask it in Mandarin.
Most people seem to think that phenomenon is not the same thing. People have shown by experimenting with different prompts that even in Mandarin, Claude correctly says it’s Claude when it is doing something for you. But if you ask it about its identity, it sometimes says DeepSeek. The current theory is it just has run into Chinese content that has chat logs that often have a DeepSeek model answering that it is DeepSeek. But the inconsistency in different prompts suggests this is something different from distillation.
Is theft of theft theft?
If it was, that line is not an indicator. Distillation is done on useful prompts, not on "Who are you?" - "I'm this model of that company".
Name training is always shallow, Claude itself would claim it's GPT-3, GPT-4, or Reddit (heh) when confused. It's just dataset contamination, because the web is full of slop. Never trust self-reported names.
This has been a common issue with the Chinese open weight models. It appears most or all have been trained via distillation on OpenAI and Anthropic models.
They most likely weren't, despite very dubious claims of Amodei and Altman and a certain twitter influencer running a pretty naive writing benchmark ("slop test") that is wrong in a very obvious manner. The only unambiguous cases of distillation were Gemini 2.0 experimentals being trained on Claude outputs, and GLM-4.7 being trained on Gemini 3.0 Pro. The rest are pretty different from each other.