deepseek's model claims to be chatgpt during conversation. what does this mean?
ikt @ Eyekaytee @aussie.zone Posts 50Comments 444Joined 2 yr. ago

ikt @ Eyekaytee @aussie.zone
Posts
50
Comments
444
Joined
2 yr. ago
Think might be this:
What is distillation?
Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.
Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.
https://stratechery.com/2025/deepseek-faq/