Less positive model
You are correct in your understanding. However the last part of your comment needs a big asterisk. Its important to consider quantization.
The full f16 deepseek r1 gguf from unsloth requires 1.34tb of ram. Good luck getting the ram sticks and channels for that.
The q4_km mid range quant is 404gb which would theoretically fit inside 512gb of ram with leftover room for context.
512gb of ram is still a lot, theoretical you could run a lower quant of r1 with 256gb of ram. Not super desirable but totally doable.
I have been using deephermes daily. I think CoT reasoning is so awesome and such a game changer! It really helps the model give better answers especially for hard logical problems. But I don't want it all the time especially on an already slow model. Being able to turn it on and off wirhout switching models is awesome. Mistral 24b deephermes is relatively uncensored, powerful and not painfully slow on my hardware. a high quant of llama 3.1 8b deephermes is able to fit entirely on my 8gb vram.
Very interesting stuff! Thanks for sharing.
What is it? Oh I see the sticker now :-) yes quite the beastly graphics card so much vram!
Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.
Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.
LLM wise Your phone can run 1-4b models, Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.
Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.
Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.
Which ones are not actively spending an amount of money that scales directly with the number of users?
Most of these companies offer direct web/api access to their own cloud supercomputer datacenter, and All cloud services have some scaling with operation cost. The more users connect and use computer, the better hardware, processing power, and data connection needed to process all the users. Probably the smaller fine tuners like Nous Research that take a pre-cooked and open-licensed model, tweak it with their own dataset, then sell the cloud access at a profit with minimal operating cost, will do best with the scaling. They are also way way cheaper than big model access cost probably for similar reasons. Mistral and deepseek do things to optimize their models for better compute power efficency so they can afford to be cheaper on access.
OpenAI, claude, and google, are very expensive compared to competition and probably still operate at a loss considering compute cost to train the model + cost to maintain web/api hosting cloud datacenters. Its important to note that immediate profit is only one factor here. Many big well financed companies will happily eat the L on operating cost and electrical usage as long as they feel they can solidify their presence in the growing market early on to be a potential monopoly in the coming decades. Control, (social) power, lasting influence, data collection. These are some of the other valuable currencies corporations and governments recognize that they will exchange monetary currency for.
but its treated as the equivalent of electricity and its not
I assume you mean in a tech progression kind of way. A better comparison might be is that its being treated closer to the invention of transistors and computers. Before we could only do information processing with the cold hard certainty of logical bit calculations. We got by quite a while just cooking fancy logical programs to process inputs and outputs. Data communication, vector graphics and digital audio, cryptography, the internet, just about everything today is thanks to the humble transistor and logical gate, and the clever brains that assemble them into functioning tools.
Machine learning models are based on neuron brain structures and biological activation trigger pattern encoding layers. We have found both a way to train trillions of transtistors simulate the basic information pattern organizing systems living beings use, and a point in time which its technialy possible to have the compute available needed to do so. The perceptron was discovered in the 1940s. It took almost a century for computers and ML to catch up to the point of putting theory to practice. We couldn't create artificial computer brain structures and integrate them into consumer hardware 10 years ago, the only player then was google with their billion dollar datacenter and alphago/deepmind.
Its exciting new toy that people think can either improve their daily life or make them money, so people get carried away and over promise with hype and cram it into everything especially the stuff it makes no sense being in. Thats human nature for you. Only the future will tell whether this new way of precessing information will live up to the expectations of techbros and academics.
Theres more than just chatgpt and American data center/llm companies. Theres openAI, google and meta (american), mistral (French), alibaba and deepseek (china). Many more smaller companies that either make their own models or further finetune specialized models from the big ones. Its global competition, all of them occasionally releasing open weights models of different sizes for you to run your own on home consumer computer hardware. Dont like big models from American megacorps that were trained on stolen copyright infringed information? Use ones trained completely on open public domain information.
Your phone can run a 1-4b model, your laptop 4-8b, your desktop with a GPU 12-32b. No data is sent to servers when you self-host. This is also relevant for companies that data kept in house.
Like it or not machine learning models are here to stay. Two big points. One, you can self host open weights models trained on completely public domain knowledge or your own private datasets already. Two, It actually does provide useful functions to home users beyond being a chatbot. People have used machine learning models to make music, generate images/video, integrate home automation like lighting control with tool calling, see images for details including document scanning, boilerplate basic code logic, check for semantic mistakes that regular spell check wont pick up on. In business 'agenic tool calling' to integrate models as secretaries is popular. Nft and crypto are truly worthless in practice for anything but grifting with pump n dump and baseless speculative asset gambling. AI can at least make an attempt at a task you give it and either generally succeed or fail at it.
Models around 24-32b range in high quant are reasonably capable of basic information processing task and generally accurate domain knowledge. You can't treat it like a fact source because theres always a small statistical chance of it being wrong but its OK starting point for researching like Wikipedia.
My local colleges are researching multimodal llms recognizing the subtle patterns in billions of cancer cell photos to possibly help doctors better screen patients. I would love a vision model trained on public domain botany pictures that helps recognize poisonous or invasive plants.
The problem is that theres too much energy being spent training them. It takes a lot of energy in compute power to cook a model and further refine it. Its important for researchers to find more efficent ways to make them. Deepseek did this, they found a way to cook their models with way less energy and compute which is part of why that was exciting. Hopefully this energy can also come more from renewable instead of burning fuel.
My ravioli bowl won't unstick. Took about an hour of prying, and still I couldn't unstick the plate.
Assuming its empty, i would take the grog oggah boogah solution of smash the blue plastic bowl down the edge of your countertop. Something will give sometime.
Otherwise, did you try twisting the bowl one direction and the plate the other? Torque is typically a more effective force than pulling for friction.
the owner of the picture themselves possibly put on the tie on their cat used to thst kind of thing and lied about it for an internet caption meme. The facial expression of cat looks blurry but relaxed tbh its obviously well fed and groomed.
If you are asking questions try out deephermes finetune of llama 3.1 8b and turn on CoT reasoning with the special system prompt.
It really helps the smaller models come up with nicer answers but takes them a little more time to bake an answer with the thinking part. Its unreal how good models have come in a year thanks to leveraging reasoning in context space.
Welcome! Thats really cool to hear that the post inspired you to get an LLM going on the laptop. What size Gemma 3 are you able to run like a 8b?
I don't know how old the last version of kobold you used was or what it looked like. The newest verion has a couple different web UI themes to pick from in settings, the basic one has more buttons for easier editing, the corpo one is pretty sleek for phones and tablets. They finally have a terminal mode if you are running on headless servers.
I love this answer! What a wonderful context for a generative model to bring people together and add positivity to daily life. Thank you very much for sharing.
Not the person you asked but I used kobold.cpp to generate images with SD models. it works as a okay introduction to image gen. Their wikihas everything you need to get it working
The most useful thing my LLM has done is help me with hobbyist computer coding projects and to ask advanced stem questions. I try to use my llm to parse code that im unfamiliar with and to understand how the functions translate to actual things happening. I give it an example of functioning code and ask it to adapt the logic a certain way to see how it goes about it. I have to parse a large very old legacy codebase written in many parts by different people of different skill so just being able to understand what block does what is a big win some days. Even if its solutions aren't copy/paste ready I usually learn quite a lot just seeing what insights it can gleam from the problem. Actually I prefer when I have to clean it up because it feels like I still did something to refine and sculpt the logic in a way the llm cant.
I don't want to be a stereotypical 'vibe coder' who copies and paste without being able to bug fix or understand the code their putting in. So I ask plenty of questions and read through its reasoning for thousands of words to understand the thought processes that lead to functioning changes. I try my best to understand the code and clean it up. It is nice to have a second brain help with initial boiler plating and piecing together general flow of logic.
I treat it like a teacher and an editor. But its got limits like any tool and needs a sweet spot of context, example, and circumstance for it to work out okay,
Contact machine elf technical support on the timewave-zero hotline
Hi! So heres the rundown.
You are going to need to be willing to learn how computer program services send text messages to eachother over open ports, how to call on a API in a programming script, and slowly piece together how to work with ollamas external API calling tool functions. Heres the documentation
Essentially you need to
- learn how ollama external API works. How to send it text data using a basic program in python on an open port and recieve data back to put into a text file.
- learn how to make that python program pull weather and time data from openweather
- learn how to feed that weather and time data into ollama on an open port as part of a tool calling function. A tool call is a fancy system prompt that tells the model how to interface with the data in a well defined paratamized way. you say a keyword like get weather, it sends a request to your python program to get data from openweather and sends it back in way the llm is instructed to process.
Unless you are already a programmer who works with sending and recieving data over the internet to be processed, this is a non-trivial task that requires a lot of experimentation and getting your hands dirty with ports and coding languages. Im currently getting ready to delve into this myself so I know its all can feel overwhelming. Hope this helps.
This meme was originally made for the !netsphere@sopuli.xyz community in an attempt to give a super niche manga artist fan place a little bit of engagement, I crossposted it here as an afterthought didn't expect go be brigaded so hard by armchair memologist over the objective definition and location of the funny.
You're absolutely correct that you need to have read the Blame! manga to get the reference on this one to really enjoy, even if you did its not that deep. Not too much thought went into it I was high as shit just pasting icons with the 'linux chad big energy beam, windows/microsoft wojak bad guys its fired at'. Im personally okay with not every one of my memes being super accessible or community bangers I had fun making this and putting the template together. If you get the humor or like the template great. If you don't, oh well downvote say 'where funny' and move on with your life cause im not wasting my time explaining what a graviational beam emitter is to snobs who don't care in the first place.
Try fallen Gemma its a finetune that has the positivity removed.