New, promising MoE model "Hunyuan" by Tencent
brucethemoose @ brucethemoose @lemmy.world Posts 21Comments 1,987Joined 1 yr. ago
brucethemoose @ brucethemoose @lemmy.world
Posts
21
Comments
1,987
Joined
1 yr. ago
Been trying to play with this in ik_llama.cpp, and it's a temperamental model. It feels deep fried, like it wants to be smart if it would just stop looping or getting its own think template wrong.
It works great in 24GB VRAM though. I'm getting like 16 tok/sec at longish context, with 15 experts on the GPU and the rest offloaded.