1y ago

Someone made a GPT-like chatbot that runs locally on Raspberry Pi, and you can too

www.xda-developers.com

30 comments

Direct link to the GitHub repo:
https://github.com/nickbild/local_llm_assistant?tab=readme-ov-file
It's a small model by comparison. If you want something that's offline and actually closer to comparing to ChatGPT 3.5, you'll want the Mixtral 8x7B model instead (running on a beefy machine):
https://mistral.ai/news/mixtral-of-experts/
- Sick, I only need 90gb of VRAM!
  
  I've got it running with a 3090 and 32GB of RAM.
  There are some models that let you run with hybrid system RAM and VRAM (it will just be slower than running it exclusively with VRAM).
  
  Hopefully we see more specific hardware for this. Like extension cards with pretty much just tensor cores and their own ram.
- Nice! Thats a cool project, ill have to give it a try. I love the idea of self hosting local LLMs. Ive been playing around with: https://lmstudio.ai/ and it directly downloads from hugging face.
  
  There's also ollama which seems to be similar. Not sure if LMStudio is open source but ollama is.
- I tried llamafile for text gen too but I couldn't get ROCm to properly work with it to run it through my GPU without having to build it myself, which I'm really not into. And CPU text gen is waaaaaay too slow for anything. Mixtral response was like ~250 seconds or so for ~1k context tokens, I think Mistral was about 52 seconds or something around that number.
  https://github.com/Mozilla-Ocho/llamafile Mixtral is definitely beefy, Mistral is quite a bit faster and there's a few even smaller prebuilt ones. But the smaller you go the less complex the responses will be. I think llamafile is a good step in the right direction though, but it's still not a good out of the box experience yet. At least I got farther with it than with oobabooga, which is the recommendation for SillyTavern, which would just crash whenever it generated anything without even giving me an error.
  
  How fast are they with a good GPU?
Can we have smaller more domain specific models. that shouldn't require more than casual hardware. like a small model for coding, one for medicine, one for history, and so on. ???
- Check out hugging face! Honestly fine tunned models for specific domains seems very popular (if for nothing else because training smaller models is just easier!).
  
  Unfortunately the roleplaying chatbot type models are typically fairly sizeable / demanding. I'm curious how this will develop with more specific AI hardware though, like extension cards with primarily tensor cores + their own ram, so that you don't have to use your GPU for that. If we can drag down the price for such hardware then locally run models could become much more viable and mainstream.
*cannot function correctly without T-Mobile speaker.
- I cannot function with T-Mobile internet, that is for sure. I'm moving to another ISP
That’s gonna be a no from me dawg

30 comments