Smokeydope

4mo ago

Permanently Deleted

Op next time maybe poke some eyes and nose holes into the ram as a facial feature so people can better piece it together. It does look phalic that middle right one straight up has foreskin, I mean come on you really dont see it?

4mo ago

How do you plan to cool down your apartment this summer?

Jump

Get a portable window air conditioner and do your best to insulate the room its in.

4mo ago

HOW can we coax search-engines in to being able to do the simple "lemmy" + "search word" thing..?

Jump

You can use kagi with the fediverse search function

4mo ago

Is there a federated or open source search engine alternative?

Jump

YaCy Marginalia.nu

4mo ago

You should know there's a font designed to make reading easier, especially for people with low vision. It's called Atkinson Hyperlegible Next. It's free for personal and commercial use.

Jump

Thanks for sharing!

4mo ago

Microsoft KBLAM

Jump

Looks promising, hope this ends up in an open source process that improved RAG type task.

4mo ago

How screwed are you?

Jump

FromSoft fans who have been around since Kingfield in the late 90s: "always has been" (because from hit gold with the old and decaying fantasy world with dwindling hope in its inhabitants atmospheric vibe and keep reusing it)

4mo ago

Mistral small 3.1 released

Jump

This is so exciting! Glad to see mistral at it with more bangers.

4mo ago

"You should probably just throw it away"

Jump

Hi Hawke, I understand your fustration with needing to troubleshoot things. Steam allows you to import any exe as a 'non-steam game' to your library and run it with the proton compatability layer. I sometimes have success getting a GOG game installed by running the install exe through proton or wine. Make sure you are using the most up to date version of lutris many package managers are outdated flatpak will gaurentee its most up to date. Hope it all works out for you

4mo ago

"You should probably just throw it away"

Jump

assuming you use steam, see which of your favorite games run with proton compatability layer and which absolutely require windows. You may be suprised.

4mo ago

Request to mod sh.itjust.works/c/localllama

Jump

Awesome, thank you!

4mo ago

Request to mod sh.itjust.works/c/localllama

Jump

Thank you for the support Sergio! I hope it works out too.

4mo ago

Returning back to where it started with llama 3 8B. DeepHermes is a great for 8gb VRAM cards

Jump

Ive tried official Deepseek qwen 2.5 14b r1 distill and a few unofficial mistrals trained on R1 CoT. They are indeed pretty amazing and I found myself switching between a general purpose model and a thinking model regularly before this released.

DeepHermes is a thinking model family with R1 distill CoT that you can toggle between standard short output or spending a few thousand tokens thinking about a solution.

I found that pure thinking models are fantastic for asking certain kinds of problem solving questions, but awful at following system prompt changes for roleplay scenarios or adopting complex personality archetypes.

This let's you have your cake and eat it too by letting CoT be optional while keeping regular system prompt capabilities.

The thousands of tokens spent thinking can get time consuming when you only getting 3t/s on the larger 24b models. So its important to choose between a direct answer or spend 5 minutes to let it really think. Its abilities are impressive even if it takes 300 seconds to fully think out a problem at 2.5t/s.

Thats why I am so happy the 8b model is pretty intelligent with CoT enabled so I can fit a thinking model entire in vram and its not dumb as rocks in knowledge base either. I'm getting 15-20t/s with 8b instead of 2.5-3t/s partially offloading a larger model. 6.4x speed inceease at the CoT is a huge W for my real life human time spent waiting for a complete output.

4mo ago

DeepHermes Preview features swappable standard output to R1 distill CoT reasoning. Its kind of blowing my mind.

Jump

I think the idea of calling multiple different kinds of ways to for llms to 'process' a given input in a standard way is promising.

I feel that after reasoning we will train models how to think emotionally in a more intricate way. By combining reasoning with a more advanced sense of individuality and greater emotions simulation we may get a little closer to finding a breakthrough.

4mo ago

What's up, selfhosters? It's selfhosting Sunday!

Jump

I just spent a good few hours optimizing my LLM rig. Disabling the graphical interface to squeeze 150mb of vram from xorg, setting programs cpu niceness to highest priority, tweaking settings to find memory limits.

I was able to increase the token speed by half a second while doubling context size. I don't have the budget for any big vram upgrade so I'm trying to make the most of what ive got.

I have two desktop computers. One has better ram+CPU+overclocking but worse GPU. The other has better GPU but worse ram, CPU, no overclocking. I'm contemplating whether its worth swapping GPUs to really make the most of available hardware. Its bee years since I took apart a PC and I'm scared of doing somthing wrong and damaging everything. I dunno if its worth the time, effort, and risk for the squeeze.

Otherwise I'm loving my self hosting llm hobby. Ive been very into l learning computers and ML for the past year. Crazy advancements, exciting stuff.

4mo ago

Loaded benchmark for 1-3-4-7b models?

Jump

Cool, page assist looks neat I'll have to check it out sometimes. My llm engine is kobold.cpp, and I just user the openwebui in internet browser to connect.

Sorry I don't really have good suggestions for you beyond to just try some of the more popular 1-4bs in a very high quant if not full f8 and see which works best for your use case.

Llama 4b, mistral 4b, phi-3-mini, tinyllm 1.5b, qwen 2-1.5b, ect ect. I assume you want a model with large context size and good comprehension skills to summarize youtube transcripts and webpage articles? At least I think thats what the add-on you mentioned suggested was its purpose.

So look for models with those things over ones that try to specialize in a little bit of domain knowledge.

4mo ago

Can Your LLM pass the Sun Theft Vibe Check (STVC) benchmark?

Jump

Haha yeah it sure did! Thanks for sharing. I find it funny the first and only example is using a * rope* to pull it away.

Hermes CoT: "no, quantum wormhole wont work either, maybe antimatter propulsion drive. But then again that can't be..."

NeMo: "Rope. Temperature resistant. I dunno get a big space truck and haul it. You do the math." taps forehead

4mo ago

Loaded benchmark for 1-3-4-7b models?

Jump

The average of all different benchmarks can be thought of as a kind of 'average intelligence', though in reality its more of a gradient and vibe type thing.

Many models are "benchmaxxed" trained to answer the exact kinds of questions the test asked which often makes the benchmarks results unrelated to real world use case checks. Use them as general indicators but not to be taken too seriously.

All model families are different in ways that you only really understand by spending time with them. Don't forget to set the rigt chat template and correct sample range values as needed per model. Openleaderboard is a good place to start.

4mo ago

DeepHermes Preview features swappable standard output to R1 distill CoT reasoning. Its kind of blowing my mind.

Jump

DeepHermes 24B CoT thought patterns feels about on par with the official R1 distill Ive tried. Its important to note though my experience is limited to the deepseek r1 NeMo 12B distill as thats what fit nice and fast on my card.

All the r1 distill thought process internal monolouge humanisms "let me write that down" "if I remember correctly" "oh, but wait that doesnt sound right lets try again" are there. the multiple 'but wait, what if's" before ending the thought to examine multiple sides are there too. It spends about 2-5k tokens thinking. It tends to stay on track and catch minor mistakes or hallucinations.

Compared to the unofficial mistral-24b distills this is top tier for sure. I think its toe to toe with ComputationDolphins 24B R1 distill, and its just a preview.

4mo ago

Ring

Jump

What!

SmokeyDope @ Smokeydope @lemmy.world Posts 52Comments 975Joined 2 yr. ago Moderating

SmokeyDope @ Smokeydope @lemmy.world

Posts

52
Comments

975
Joined

2 yr. ago