Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)HE
Posts
8
Comments
1,820
Joined
4 yr. ago

  • CPU-only. It's an old Xeon workstation without any GPU, since I mostly do one-off AI tasks at home and I never felt any urge to buy one (yet). Model size woul be something between 7B and 32B with that. Context length is something like 8128 tokens. I have a bit less than 30GB of RAM to waste since I'm doing other stuff on that machine as well.

    And I'm picky with the models. I dislike the condescending tone of ChatGPT and newer open-weight models. I don't want it to blabber or praise me for my "genious" ideas. It should be creative, have some storywriting abilities, be uncensored and not overly agreeable. Best model I found for that is Mistral-Nemo-Instruct. And I currently run a Q4_K_M quant of it. That does about 2.5 t/s on my computer (which isn't a lot, but somewhat acceptable for what I do). Mistral-Nemo isn't the latest and greatest any more. But I really prefer it's tone of speaking and it performs well on a wide variety of tasks. And I mostly do weird things with it. Let it give me creative advice, be a dungeon master or an late 80s text adventure. Or mimick a radio moderator and feed it into TTS for a radio show. Or write a book chapter or a bad rap song. I'm less concerned with the popular AI use-cases like answer factual questions or write computer code. So I'd like to switch to a newer, more "intelligent" model. But that proves harder than I imagined.

    (Occasionally I do other stuff as well, but that's a far and in-between. So I'll rent a datacenter GPU on runpod.io for a few bucks an hour. That's the main reason why I didn't buy an own GPU yet.)

  • Maybe that's more an issue with modern standby? Or the hardware has some quirks. The last two laptops I had were a Thinkpad and now a Dell Latitude. And they both sleep very well. I close the lid and they'll drain a few battery percent over the day, I open the lid, the display lights up and I can resume work... Rarely any issues with Linux.

  • I think there are some posts out there (on the internet / Reddit / ...) with people building crazy rigs with old 3090s or something. I don't have any experience with that. If I were to run such a large model, I'd use a quantized version and rent a cloud server for that.

    And I don't think computers can fit infinitely many GPUs. I don't know the number, let's say it's 4. So you need to buy 5 computers to fit your 18 cards. So add a few thousand dollars. And a fast network/interconnect between them.

    I can't make any statement for performance. I'd imagine such a scenario might work for MoE models with appropriate design. And for the rest performance is abysmal. But that's only my speculation. We'd need to find people who did this.

    Edit: Alternatively, buy a Apple Mac Studio with 512GB of unified RAM. They're fast as well (probably way faster than your idea?) and maybe cheaper. Seems an M3 Ultra Mac Studio with 512GB costs around $10,000. With half that amount, it's only $7,100.

  • Well, I wouldn't call them a "scam". They're meant for a different use-case. In a datacenter, you also have to pay for rack space and all the servers which accomodate all the GPUs. And you can now pay for 32 times as many servers with Radeon 9060XT or you buy H200 cards. Sure, you'll pay 3x as much for the cards itself. But you'll save on the amount of servers and everything that comes with it, hardware cost, space, electricity, air-con, maintenance... Less interconnect makes everything way faster...

    Of course at home different rules apply. And it depends a bit how many cards you want to run, what kind of workload you have... If you're fine with AMD or you need Cuda...

  • Thanks for the random suggestion! Installed it already. Sadly as a drop-in replacement it doesn't provide any speedup on my old machine, it's exactly the same number of tokens per second... Guess I have to learn about the ik_llama.cpp and pick a different quantization of my favourite model.

  • Wasn't Mistral AI supposed to be one of the European (French) answers to mostly US companies doing AI? From that perspective it wouldn't be super great if an US company were to buy them. And the company goals don't match either as Apple is mostly concerned with their own products. So I'd say they'd likely dismantle the company and we have one less somewhat open AI company.

  • I like YunoHost. That's an all-in-one solution to do the selfhosting for you. So you won't learn a lot about the intricate details of the tech, but you can install things with a few clicks. That's nice if you just want to use stuff. And that project has some track-record. I'm using it for years to self-host Peertube, Immich a Nextcloud and a few other things.

  • Correct. We currently have some sentiment against liberal spaces and DEI programs and so on. And some people think it's the war against straight white men. But having a men's groups or women's groups or safe-spaces to talk freely about whatever topics isn't authoritarian. The opposite of it is equally true. You can't discuss certain topics without the correct space for it, and not allowing them to discuss how they like is authoritatian as well!

  • Oh man, I'm a bit late to the party here.

    He really believes the far-right Trump propaganda, and doesn't understand what diversity programs do. It's not a war between white men an all the other groups of people... It's just that is has proven to be difficult to for example write a menstrual tracker with a 99.9% male developer base. It's just super difficult to them to judge how that's going to be used in real-world scenarios and what some specific challenges and nice features are. That's why you listen to minority opinions, to deliver a product that caters to all people. And these minority opinions are notoriously difficult to attract. That's why we do programs for that. They are task-forces to address things aside from what's mainstream and popular. It'll also benefit straight white men. Liteally everyone because it makes Linux into a product that does more than just whatever is popular as of today. Same thing applies to putting effort into screen readers and disabled people and whatever other minorities need.

    If he just wants what is majority, I'd recommend installing Windows to him. Because that's where we're headed with this. That's the popular choice, at least on the desktop. That's what you're supposed to use if you dislike niche.

    Also his hubris... Says Debian should be free from politics. And the very next sentence he talks his politics and wants to shove his Trump anti-DEI politics into Debian.... Yeah, sure dude.

  • I think a dual-channel system with DDR3-1600 isn't what we call fast any more. So you should try to avoid offloading with that. But I'm not an expert on the numbers, and it depends a bit on the specific use-case whether it makes sense to invest in old hardware, or buy a new machine along with a graphics card, since that's quite some money.

  • I'm also one of the people who rarely has any issues with the connectors themselves. It's always the cable which breaks close to the jack, not the connector. Also sits super tight in my phone that's half a decade old... I've destroyed usb-c connectors though, by accident and with some force involved. And the cables have different quality, yes. Some are fine for many years, some are cheap e-waste.

    I mean they probably don't have any long protrusions or snap-in mechanisms, because today's phones are very slim and other gadgets are tiny as well, so you can't have a large connector with robust snap-in mechanisms. (And those tend to break as well, especially if they're flimsy like the ones on network cables.)

  • Sure. I mean we seem to be a bit different and have different visions. So I'm not sure if I'm the correct person to take your idea to pieces and add my spin on it... That could take away from a clear vision and turn it into a mess. Maybe it's better if I do my thing and you do yours... But I'm not sure about that. My DMs are open, so feel free to DM me. I'm just not sure whether I'm able to contribute.

  • I meant both sex and gender. They regularly fail to tell me a lot for my own real life. I like some people and dislike others and it's easier for me to talk to / work with / collaborate or empathize depending on various circumstances. Personality traits, shared goals... Maybe sharing something or it's the opposite of that. I believe gender or sex or identity is a bit overrated and so is stereotyped thinking for a lot of applications. Or the need to conform to a stereotype. Dress and identify however you like, make sure to give your children an electronics kit, a plastic excavator and a princess dress... And unless that's really important for some niche application, don't feel the urge to look into people's pants and check what's in there.

  • LocalLLaMA @sh.itjust.works

    Recommendations for a lightweight Python LLM framework for a webapp?

    Android @lemdro.id

    Is there a better open-source calendar app than Etar?

    LocalLLaMA @sh.itjust.works

    (New) papers by Meta: Large Concept Models and BLT

    Android @lemdro.id

    Which app isolation mechanism do I want?

    Fediverse @lemmy.world

    SFSCON24 - Alexander Sander - NGI: No more EU funding for Free Software?!

    Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ @lemmy.dbzer0.com

    Is there a working Spotify downloader that actually downloads from Spotify?

    LocalLLaMA @sh.itjust.works

    Is Arli AI a legit cloud LLM inference service? Any user experience?

    Fediverse @lemmy.world

    How to make the Threadiverse a nice place and effectively make it grow