Skip Navigation

brucethemoose @ brucethemoose @lemmy.world

Posts

22
Comments

2,057
Joined

1 yr. ago

4w ago

I've just created c/Ollama!

Actually, to go ahead and answer, the "fastest" path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX models).

Hopefully one of these models, depending on how much RAM you have:

https://huggingface.co/mlx-community/Qwen3-14B-4bit-DWQ-053125

https://huggingface.co/mlx-community/Magistral-Small-2506-4bit-DWQ

https://huggingface.co/mlx-community/Qwen3-30B-A3B-4bit-DWQ-0508

https://huggingface.co/mlx-community/GLM-4-32B-0414-4bit-DWQ

With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): https://openwebui.com/

And then use LM Studio (or some other MLX backend, or even free online API models) as the 'engine'

Alternatively, especially if you have a small RAM pool, Gemma 12B QAT Q4_0 is quite good, and you can run it with LM Studio or anything else that supports a GGUF. Not sure about 12B-ish thinking models off the top of my head, I'd have to look around.

4w ago

I've just created c/Ollama!

Honestly perplexity, the online service, is pretty good.

As for local running, one question first: how much RAM does your Mac have? This is basically the factor for what model you can and should run.

4w ago

I've just created c/Ollama!

I don’t understand.

Ollama is not actually docker, right? It’s running the same llama.cpp engine, it’s just embedded inside the wrapper app, not containerized. It has a docker preset you can use, yeah.

And basically every LLM project ships a docker container. I know for a fact llama.cpp, TabbyAPI, Aphrodite, Lemonade, vllm and sglang do. It’s basically standard. There’s all sorts of wrappers around them too.

You are 100% right about security though, in fact there’s a huge concern with compromised Python packages. This one almost got me: https://pytorch.org/blog/compromised-nightly-dependency/

This is actually a huge advantage for llama.cpp, as it’s free of python and external dependencies by design. This is very unlike ComfyUI which pulls in a gazillian external repos. Theoretically the main llama.cpp git could be compromised, but it’s a single, very well monitored point of failure there, and literally every “outside” architecture and feature is implemented from scratch, making it harder to sneak stuff in.

4w ago

I've just created c/Ollama!

OK.

Then LM Studio. With Qwen3 30B IQ4_XS, low temperature MinP sampling.

That’s what I’m trying to say though, there is no one click solution, that’s kind of a lie. LLMs work a bajillion times better with just a little personal configuration. They are not magic boxes, they are specialized tools.

Random example: on a Mac? Grab an MLX distillation, it’ll be way faster and better.

Nvidia gaming PC? TabbyAPI with an exl3. Small GPU laptop? ik_llama.cpp APU? Lemonade. Raspberry Pi? That’s important to know!

What do you ask it to do? Set timers? Look at pictures? Cooking recipes? Search the web? Look at documents? Do you need stuff faster or accurate?

This is one reason why ollama is so suboptimal, with the other being just bad defaults (Q4_0 quants, 2048 context, no imatrix or anything outside GGUF, bad sampling last I checked, chat template errors, bugs with certain models, I can go on). A lot of people just try “ollama run” I guess, then assume local LLMs are bad when it doesn’t work right.

4w ago

YouTube star Mikayla Raines dies by suicide at 29, husband blames online abuse

Doing some sleuthing, it appears a former follower was mad about some incidents at the shelter, and started a subreddit against it:

The last web archive backup is old, but my guess is the subreddit snowballed (due to the Reddit engagement algo loving hatesubs, of course) and got us here.

So... yeah.

Rest in peace.

4w ago

I've just created c/Ollama!

Totally depends on your hardware, and what you tend to ask it. What are you running? What do you use it for? Do you prefer speed over accuracy?

4w ago

I've just created c/Ollama!

TBH you should fold this into localllama? Or open source AI?

I have very mixed (mostly bad) feelings on ollama. In a nutshell, they're kinda Twitter attention grabbers that give zero credit/contribution to the underlying framework (llama.cpp). And that's just the tip of the iceberg, they've made lots of controversial moves, and it seems like they're headed for commercial enshittification.

They're... slimy.

They like to pretend they're the only way to run local LLMs and blot out any other discussion, which is why I feel kinda bad about a dedicated ollama community.

It's also a highly suboptimal way for most people to run LLMs, especially if you're willing to tweak.

I would always recommend Kobold.cpp, tabbyAPI, ik_llama.cpp, Aphrodite, LM Studio, the llama.cpp server, sglang, the AMD lemonade server, any number of backends over them. Literally anything but ollama.

...TL;DR I don't the the idea of focusing on ollama at the expense of other backends. Running LLMs locally should be the community, not ollama specifically.

4w ago

Trump shares the message he receives from NATO leader Mark Rutte

Yes, we know, you are preaching to the choir here, lemmy.ml.

But I'd rather we not back out of NATO, decouple trade and invade friendly neighbors all while the US continues to screw over far away countries first. At this point, backing out of NATO is not going to help our bad behavior.

After that, we can worry about not screwing with other countries so much, hopefully.

4w ago

Trump shares the message he receives from NATO leader Mark Rutte

Good move TBH. Just like that, Trump likes NATO again.

4w ago

Trump Demands Israel to Cease Firing

That's sorta the point though, if the POTUS is communicating through it then it legitimizes the platform.

4w ago

Whats a better name for 'graphics cards' that describes the kind of computational work it does

Well not everyone in the machine learning space is an AI Bro, either. Many (most?) researchers see Altman et al. as snake-oil grifters.

Same with the P2P/networking junkies. They didn't ask for a mountain of pyramid schemes.

4w ago

Whats a better name for 'graphics cards' that describes the kind of computational work it does

4w ago

Trump says both sides violate ceasefire, tells Israel: 'Do not drop those bombs'

There's an argument for that.

Trump also has incredible political capital. He can literally do no wrong with his base, strongarm congress, and ignore opposition; it gives him paths any other sitting president wouldn't even dream of.

This is the seductive charm of dictators. They're really effective at, and generally put in power to clean up messes, or cut through dysfunctional, stagnant political systems.

4w ago

Plutonium levels at nuclear test site in WA up to 4,500 times higher than rest of coast, study finds

Yeah it wasn't that bad on the guardian's part, papers have always written headlines that sell. And you absolutely 100% don't want to inhale plutonium dust. It alpha decays with a lot of energy, which even in small amounts is a recipe for lung cancer. IDK specifics for how much it would take to get kicked up and be deadly.

4w ago

Plutonium levels at nuclear test site in WA up to 4,500 times higher than rest of coast, study finds

I didn’t realize the paper was linked! It specifically mentions:

600 Bq / kg

This is a unit of radiation / mass. Going by a WolframAlpha example, one cubic meter of “typical” emits soil about 10,000 Bq. 1 cubic meter of the tested soil emits about > 900,000 Bq, though the high end is an outlier:

So 90x above ambient soil radiation, it seems.

…This is not a lot! Dirt is not very radioactive, we are talking microscopic amounts compared to radiation sources like X-rat machines. You wouldn’t want to inhale a ton of the soil, but still.

4w ago

Whats a better name for 'graphics cards' that describes the kind of computational work it does

They are GPUs.

All of them, even the H100, B100, and MI300X all have texture units, pixel shaders, everything. They are graphics cards at a low level. Only the MI300X is missing ROPs, but the Nvidia cards have them (and can run realtime games on Linux), and they all can be used in Blender and such.

The compute programming languages they use are, fundamentally, hacked up abstractions to map to the same GPU hardware in consumer stuff.

That’s the whole point, they’re architected as GPUs so that they’re backwards compatible, as everything's built on the days when consumer gaming GPUs were hacked to be used for compute.

Are there more dedicated accelerators? Yes. They’re called ASICs, or application specific integrated circuits. This is technically a broad term, but mostly its connotation is very purpose made compute.

4w ago

Plutonium levels at nuclear test site in WA up to 4,500 times higher than rest of coast, study finds

showed concentrations of plutonium at the islands were fourto 4,500 times higher than those found in sediment samples taken at two distant coastal sites

Emphasis mine.

Not saying this isn’t a big problem, but the article seems like it’s fearmongering too, or at least not providing enough specifics. 4-4,500 times higher than almost zero is still extremely low, and it’s only dangerous if inhaled.

4w ago

"So many amazing Forgers have just given up" — Halo Infinite just lost its best Forge creators over lack of support

Practically though, Microsoft wanted the Chief (and Cortana I guess) because that's what sells. So that was kinda the constraint.

They could have sent him on a space boat somewhere more remote though. Basically infinite without Halo 4/5 and a way more intimate plot.

4w ago

Any recomendations for a text editor with good AI integration (not code editor)

Mikupad is incredible:

https://github.com/lmg-anon/mikupad

I think my favorite feature is the 'logprobs' mouseover, aka showing the propability of each token that's generated. It's like a built-in thesaurus, a great way to dial in sampling, and you can regenerate from that point.

Once you learn how instruct formatting works (and how it auto inserts tags), it's easy to maintain some basic formatting yourself and question it about the story.

It's also fast. It can handle 128K context without being too laggy.

I'd recommend the llama.cpp server or TabbyAPI as backends (depending on the model and your setup), though you can use whatever you wish.

I'd recommend exui as well, but seeing how exllamav2 is being depreciated, probably not the best idea to use anymore... But another strong recommendation is kobold.cpp (which can use external APIs if you want).

4w ago

"So many amazing Forgers have just given up" — Halo Infinite just lost its best Forge creators over lack of support

Forza Motorsport

Yeah, it seems weird to me.

It looks like they have the framework for a more elaborate multiplayer/matchmaking system and... just didn't really use it?

I am not even touching FM now because the 10-minute races are so awful (and they got rid of all the 20 minute ones).