Running Local LLMs with Ollama on openSUSE Tumbleweed
hendrik @ hendrik @palaver.p3x.de Posts 8Comments 1,820Joined 4 yr. ago
hendrik @ hendrik @palaver.p3x.de
Posts
8
Comments
1,820
Joined
4 yr. ago
Is there a working Spotify downloader that actually downloads from Spotify?
CPU-only. It's an old Xeon workstation without any GPU, since I mostly do one-off AI tasks at home and I never felt any urge to buy one (yet). Model size woul be something between 7B and 32B with that. Context length is something like 8128 tokens. I have a bit less than 30GB of RAM to waste since I'm doing other stuff on that machine as well.
And I'm picky with the models. I dislike the condescending tone of ChatGPT and newer open-weight models. I don't want it to blabber or praise me for my "genious" ideas. It should be creative, have some storywriting abilities, be uncensored and not overly agreeable. Best model I found for that is Mistral-Nemo-Instruct. And I currently run a Q4_K_M quant of it. That does about 2.5 t/s on my computer (which isn't a lot, but somewhat acceptable for what I do). Mistral-Nemo isn't the latest and greatest any more. But I really prefer it's tone of speaking and it performs well on a wide variety of tasks. And I mostly do weird things with it. Let it give me creative advice, be a dungeon master or an late 80s text adventure. Or mimick a radio moderator and feed it into TTS for a radio show. Or write a book chapter or a bad rap song. I'm less concerned with the popular AI use-cases like answer factual questions or write computer code. So I'd like to switch to a newer, more "intelligent" model. But that proves harder than I imagined.
(Occasionally I do other stuff as well, but that's a far and in-between. So I'll rent a datacenter GPU on runpod.io for a few bucks an hour. That's the main reason why I didn't buy an own GPU yet.)