I've just created c/Ollama!
brucethemoose @ brucethemoose @lemmy.world Posts 22Comments 2,057Joined 1 yr. ago
brucethemoose @ brucethemoose @lemmy.world
Posts
22
Comments
2,057
Joined
1 yr. ago
Plutonium levels at nuclear test site in WA up to 4,500 times higher than rest of coast, study finds
Plutonium levels at nuclear test site in WA up to 4,500 times higher than rest of coast, study finds
Plutonium levels at nuclear test site in WA up to 4,500 times higher than rest of coast, study finds
Actually, to go ahead and answer, the "fastest" path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX models).
Hopefully one of these models, depending on how much RAM you have:
https://huggingface.co/mlx-community/Qwen3-14B-4bit-DWQ-053125
https://huggingface.co/mlx-community/Magistral-Small-2506-4bit-DWQ
https://huggingface.co/mlx-community/Qwen3-30B-A3B-4bit-DWQ-0508
https://huggingface.co/mlx-community/GLM-4-32B-0414-4bit-DWQ
With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): https://openwebui.com/
And then use LM Studio (or some other MLX backend, or even free online API models) as the 'engine'
Alternatively, especially if you have a small RAM pool, Gemma 12B QAT Q4_0 is quite good, and you can run it with LM Studio or anything else that supports a GGUF. Not sure about 12B-ish thinking models off the top of my head, I'd have to look around.