[JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source
xcjs @ xcjs @programming.dev Posts 0Comments 100Joined 2 yr. ago

xcjs @ xcjs @programming.dev
Posts
0
Comments
100
Joined
2 yr. ago
That's not how distillation works if I understand what you're trying to explain.
If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can't distill Llama into Deepseek R1.
I've been able to run distillations of Deepseek R1 up to 70B, and they're all censored still. There is a version of Deepseek R1 "patched" with western values called R1-1776 that will answer topics censored by the Chinese government, however.