Skip Navigation

hendrik @ hendrik @palaver.p3x.de

Posts

8
Comments

1,820
Joined

4 yr. ago

3h ago

Running Local LLMs with Ollama on openSUSE Tumbleweed

CPU-only. It's an old Xeon workstation without any GPU, since I mostly do one-off AI tasks at home and I never felt any urge to buy one (yet). Model size woul be something between 7B and 32B with that. Context length is something like 8128 tokens. I have a bit less than 30GB of RAM to waste since I'm doing other stuff on that machine as well.

And I'm picky with the models. I dislike the condescending tone of ChatGPT and newer open-weight models. I don't want it to blabber or praise me for my "genious" ideas. It should be creative, have some storywriting abilities, be uncensored and not overly agreeable. Best model I found for that is Mistral-Nemo-Instruct. And I currently run a Q4_K_M quant of it. That does about 2.5 t/s on my computer (which isn't a lot, but somewhat acceptable for what I do). Mistral-Nemo isn't the latest and greatest any more. But I really prefer it's tone of speaking and it performs well on a wide variety of tasks. And I mostly do weird things with it. Let it give me creative advice, be a dungeon master or an late 80s text adventure. Or mimick a radio moderator and feed it into TTS for a radio show. Or write a book chapter or a bad rap song. I'm less concerned with the popular AI use-cases like answer factual questions or write computer code. So I'd like to switch to a newer, more "intelligent" model. But that proves harder than I imagined.

(Occasionally I do other stuff as well, but that's a far and in-between. So I'll rent a datacenter GPU on runpod.io for a few bucks an hour. That's the main reason why I didn't buy an own GPU yet.)

12h ago

Why is sleep so hard for laptops?

Maybe that's more an issue with modern standby? Or the hardware has some quirks. The last two laptops I had were a Thinkpad and now a Dell Latitude. And they both sleep very well. I close the lid and they'll drain a few battery percent over the day, I open the lid, the display lights up and I can resume work... Rarely any issues with Linux.

13h ago

Very large amounts of gaming gpus vs AI gpus

I think there are some posts out there (on the internet / Reddit / ...) with people building crazy rigs with old 3090s or something. I don't have any experience with that. If I were to run such a large model, I'd use a quantized version and rent a cloud server for that.

And I don't think computers can fit infinitely many GPUs. I don't know the number, let's say it's 4. So you need to buy 5 computers to fit your 18 cards. So add a few thousand dollars. And a fast network/interconnect between them.

I can't make any statement for performance. I'd imagine such a scenario might work for MoE models with appropriate design. And for the rest performance is abysmal. But that's only my speculation. We'd need to find people who did this.

Edit: Alternatively, buy a Apple Mac Studio with 512GB of unified RAM. They're fast as well (probably way faster than your idea?) and maybe cheaper. Seems an M3 Ultra Mac Studio with 512GB costs around $10,000. With half that amount, it's only $7,100.

14h ago

Very large amounts of gaming gpus vs AI gpus

Well, I wouldn't call them a "scam". They're meant for a different use-case. In a datacenter, you also have to pay for rack space and all the servers which accomodate all the GPUs. And you can now pay for 32 times as many servers with Radeon 9060XT or you buy H200 cards. Sure, you'll pay 3x as much for the cards itself. But you'll save on the amount of servers and everything that comes with it, hardware cost, space, electricity, air-con, maintenance... Less interconnect makes everything way faster...

Of course at home different rules apply. And it depends a bit how many cards you want to run, what kind of workload you have... If you're fine with AMD or you need Cuda...

14h ago

Running Local LLMs with Ollama on openSUSE Tumbleweed

Thanks for the random suggestion! Installed it already. Sadly as a drop-in replacement it doesn't provide any speedup on my old machine, it's exactly the same number of tokens per second... Guess I have to learn about the ik_llama.cpp and pick a different quantization of my favourite model.

16h ago

Running Local LLMs with Ollama on openSUSE Tumbleweed

Thanks. I'll factor that in next time someone asks me for a recommendation. I personally have Kobold.CPP on my machine, that seems to be more transparent toward such things.

17h ago

Running Local LLMs with Ollama on openSUSE Tumbleweed

Is there any background information available on ollama becoming less open? It's marked MIT licensed in the repo of my Linux distribution and on their Github.

1d ago

Apple needs to spend real money to bring in outside talent — and that likely means acquiring a leading AI startup. It has already kicked the tires on Perplexity and will seriously consider Mistral

Wasn't Mistral AI supposed to be one of the European (French) answers to mostly US companies doing AI? From that perspective it wouldn't be super great if an US company were to buy them. And the company goals don't match either as Apple is mostly concerned with their own products. So I'd say they'd likely dismantle the company and we have one less somewhat open AI company.

3d ago

Making your bed is, at best, a performative waste of time...

I thought the one thing to worry about with the bedsheets is not to grow a large population of mites in them. So you mainly want to keep it ventilated.

3d ago

Which guides to trust for novice / normie getting started?

I like YunoHost. That's an all-in-one solution to do the selfhosting for you. So you won't learn a lot about the intricate details of the tech, but you can install things with a few clicks. That's nice if you just want to use stuff. And that project has some track-record. I'm using it for years to self-host Peertube, Immich a Nextcloud and a few other things.

4d ago

Shitsharing

Correct. We currently have some sentiment against liberal spaces and DEI programs and so on. And some people think it's the war against straight white men. But having a men's groups or women's groups or safe-spaces to talk freely about whatever topics isn't authoritarian. The opposite of it is equally true. You can't discuss certain topics without the correct space for it, and not allowing them to discuss how they like is authoritatian as well!

4d ago

Apparently Debian has alienated the developers

Oh man, I'm a bit late to the party here.

He really believes the far-right Trump propaganda, and doesn't understand what diversity programs do. It's not a war between white men an all the other groups of people... It's just that is has proven to be difficult to for example write a menstrual tracker with a 99.9% male developer base. It's just super difficult to them to judge how that's going to be used in real-world scenarios and what some specific challenges and nice features are. That's why you listen to minority opinions, to deliver a product that caters to all people. And these minority opinions are notoriously difficult to attract. That's why we do programs for that. They are task-forces to address things aside from what's mainstream and popular. It'll also benefit straight white men. Liteally everyone because it makes Linux into a product that does more than just whatever is popular as of today. Same thing applies to putting effort into screen readers and disabled people and whatever other minorities need.

If he just wants what is majority, I'd recommend installing Windows to him. Because that's where we're headed with this. That's the popular choice, at least on the desktop. That's what you're supposed to use if you dislike niche.

Also his hubris... Says Debian should be free from politics. And the very next sentence he talks his politics and wants to shove his Trump anti-DEI politics into Debian.... Yeah, sure dude.

6d ago

WhisperX — Automated Transcripts w/ Timestamps and Speaker Tagging

Hmm... Would be interesting to find out what kind of effect that has on the average marriage or relationship 😅

6d ago

WhisperX — Automated Transcripts w/ Timestamps and Speaker Tagging

Likely everyday stuff... Meeting minutes, phone or video conferences and such...

6d ago

need help understanding if this setup is even feasible.

I think a dual-channel system with DDR3-1600 isn't what we call fast any more. So you should try to avoid offloading with that. But I'm not an expert on the numbers, and it depends a bit on the specific use-case whether it makes sense to invest in old hardware, or buy a new machine along with a graphics card, since that's quite some money.

6d ago

USB Type-C is not up for the challenges it has

I'm also one of the people who rarely has any issues with the connectors themselves. It's always the cable which breaks close to the jack, not the connector. Also sits super tight in my phone that's half a decade old... I've destroyed usb-c connectors though, by accident and with some force involved. And the cables have different quality, yes. Some are fine for many years, some are cheap e-waste.

I mean they probably don't have any long protrusions or snap-in mechanisms, because today's phones are very slim and other gadgets are tiny as well, so you can't have a large connector with robust snap-in mechanisms. (And those tend to break as well, especially if they're flimsy like the ones on network cables.)

6d ago

My YouTube Channel Is Dying And It Needs Support To Grow And Expand

https://youtube.com/watch?v=9j68t0a39Jk

7d ago

Any tablet suggestions?

I went for a refurbished laptop at a similar price point, maybe a bit more, but it runs Linux decently.

1w ago

KAOSnow, a totally new take on Democracy

Sure. I mean we seem to be a bit different and have different visions. So I'm not sure if I'm the correct person to take your idea to pieces and add my spin on it... That could take away from a clear vision and turn it into a mess. Maybe it's better if I do my thing and you do yours... But I'm not sure about that. My DMs are open, so feel free to DM me. I'm just not sure whether I'm able to contribute.

1w ago

Study Finds LLMs Biased Against Men in Hiring

I meant both sex and gender. They regularly fail to tell me a lot for my own real life. I like some people and dislike others and it's easier for me to talk to / work with / collaborate or empathize depending on various circumstances. Personality traits, shared goals... Maybe sharing something or it's the opposite of that. I believe gender or sex or identity is a bit overrated and so is stereotyped thinking for a lot of applications. Or the need to conform to a stereotype. Dress and identify however you like, make sure to give your children an electronics kit, a plastic excavator and a princess dress... And unless that's really important for some niche application, don't feel the urge to look into people's pants and check what's in there.

LocalLLaMA @sh.itjust.works

hendrik @palaver.p3x.de

4mo ago

Recommendations for a lightweight Python LLM framework for a webapp?

Android @lemdro.id

hendrik @palaver.p3x.de

5mo ago

Is there a better open-source calendar app than Etar?

LocalLLaMA @sh.itjust.works

hendrik @palaver.p3x.de

7mo ago

(New) papers by Meta: Large Concept Models and BLT

Android @lemdro.id

hendrik @palaver.p3x.de

7mo ago

Which app isolation mechanism do I want?

Fediverse @lemmy.world

hendrik @palaver.p3x.de

8mo ago

SFSCON24 - Alexander Sander - NGI: No more EU funding for Free Software?!

media.fsfe.org /videos/watch/d32db3d7-5c40-4634-bbf9-5c10649e30b0

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ @lemmy.dbzer0.com

hendrik @palaver.p3x.de

10mo ago

Is there a working Spotify downloader that actually downloads from Spotify?

LocalLLaMA @sh.itjust.works

hendrik @palaver.p3x.de

10mo ago

Is Arli AI a legit cloud LLM inference service? Any user experience?

Fediverse @lemmy.world

hendrik @palaver.p3x.de

11mo ago

How to make the Threadiverse a nice place and effectively make it grow