Skip Navigation

User banner
Posts
54
Comments
984
Joined
2 yr. ago

  • We recently got an airfryer. it helps cut down prep time and ease of cleaning to the point I don't get upset about it anymore. Would highly recommend if you haven't got one already

  • No worries I have my achthually... Moments as well. Though here's a counter perspective. The bytes have to be pulled out of abstraction space and actually mapped to a physical medium capable of storing huge amounts of informational states like a hard drive. It takes genius STEM engineer level human cognition and lots of compute power to create a dataset like WA. This makes the physical devices housing the database unique almost one of a kind objects with immense potential value from a business and consumer use. How much would a wealthy competing business owner pay for a drive containing such lucrative trade secrets assuming its not leaked? Probably more than a comparative weighed brick of gold, but that's just fun speculation.

  • You shouldn't be sorry, you didn't do anything wrong content wise. If anything you helped the community by sparking a important conversation leading to better defined guidelines which I imagine will be updated if this becomes a common enough issue.

  • I enjoyed the meme LadyButterfly. the Lemmy AI hate train has gone full circle-jerk meltdown with this one, sorry it happened to you. Thanks for the laugh.

  • Thank you for the explanation! I never really watched the Olympics enough to see them firing guns. I would think all that high tech equipment counts as performance enhancement stuff which goes against the spirit of peak human based skill but maybe sports people who actually watch and run the Olympics think differently about external augmentations in some cases.

    Its really funny with the context of some dude just chilling and vibing while casually firing off world record level shots

  • If they added one more x onto maxx and bought some cheap property rights they could have done some serious rebranding

  • Who are these people?

  • How do you know?!?! This is one of the most laser precise call out to my childhood ive ever seen in an internet comment.

  • Wolfam alpha actually has an LLM api so your local models can call its factual database for information when doing calculations through tool calling. I thought you might find that cool. Its a shame there is no open alternative to WA they know their dataset is one of a kind and worth its weight in gold. Maybe ond day a hero will leak it 🤪

  • Models running on gguf should all work with your gpu assuming its set up correctly and properly loaded into the vram. It shouldnt matter if its qwen or mistral or gemma or llama or llava or stable diffusion. Maybe the engine you are using isnt properly configured to use your arc card so its all just running on your regular ram which limits things? Idk.

    Intel arc gpu might work with kobold and vulcan without any extra technical setup. Its not as deep in the rabbit hole as you may think, a lot of work was put in to making one click executables with nice guis that the average person can work with..

    Models

    Find a bartowlski made quantized gguf of the model you want to use. Q4_km is recommended average quant to try first. Try to make sure it all can fit within your card size wise for speed. Shouldnt be a big problem for you with 20gb vram to play with. Hugging face gives the size in gb next to each quant.

    Start small with like high quant of qwen 3 8b. Then a gemma 12b, then work your way up to a medium quant of deephermes 24b.

    Thinking models are better at math and logical problem solving. But you need to know how to communicate and work with llms to get good results no matter what. Ask it to break down a problem you already solved and test it for comprehension.

    kobold engine

    Download kobold.cpp, execute it like a regular program and adjust settings in graphical interface that pops up. Or make a startup script with flags.

    For input processing library, see if Vulcan processing works with Intel arc. Make sure flash attention is enabled too. Offload all layers of the model I make note of exactly how many layers each model has during startup and specify it but it should figure it out smartly even if not.

  • You can use discord in your web browser with some privacy addons like fingerprint and user gent spoofing to help restrict how much gets leaked to discord. If you install it as an app that runs in background you better believe they're collecting more data and metrics.

  • Any device someone ask my help with figuring out. Its rarely the appliance that pisses me off and more the blatant learned helplessness and fundimental inability for fellow adults to rub two braincells together on figuring out a new thing or to troubleshoot a simple problem. A lifetime of being the techie fixer bitch slave constantly delegated the responsibility of figuring out everyones crap for them has left me jaded to the average persons mental capacity and basic logical application abilities.

  • I would receommend you read over the work of the person who finetuned a mistral model on many us army field guides to understand what fine tuning on a lot of books to bake in knowledge looks like.

    If you are a newbie just learning how this technology works I would suggest trying to get RAG working with a small model and one or two books converted to a big text file just to see how it works. Because its cheap/free t9 just do some tool calling and fill up a models context.

    Once you have a little more experience and if you are financially well off to the point 1-2 thousand dollars to train a model is who-cares whatever play money to you then go for finetuning.

  • It is indeed possible! The nerd speak for what you want to do is 'finetune training with a dataset' the dataset being your books. Its a non-trivial task that takes setup and money to pay a training provider to use their compute. There are no gaurentees it will come out the way you want on first bake either.

    A soft version of this thats the big talk right now is RAG which is essentially a way for your llm to call and reference an external dataset to recall information into its active context. Its a useful tool worth looking into much easier and cheaper than model training but while your model can recall information with RAG it won't really be able to build an internal understanding of that information within its abstraction space. Like being able to recall a piece of information vs internally understanding the concepts its trying to convey. RAG is for wrote memorization, training is for deeper abstraction space mapping

  • If you were running amd GPU theres some versions of llama.cpp engine you can compile with rocm compat. If your ever tempted to run a huge model with partial offloaded CPU/ram inferencing you can set the program to run with highest program niceness priority which believe it or not pushes up the token speed slightly

  • You're welcome. Also, whats your gpu and are you using cublas (nvidia) or vulcan(universal amd+nvidia) or something else for gpu postprocessing?

  • Enable flash attention if you havent already