Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)FF
Posts
12
Comments
311
Joined
2 yr. ago

  • For LLMs it entirely depends on what size models you want to use and how fast you want it to run. Since there's diminishing returns to increasing model sizes, i.e. a 14B model isn't twice as good as a 7B model, the best bang for the buck will be achieved with the smallest model you think has acceptable quality. And if you think generation speeds of around 1 token/second are acceptable, you'll probably get more value for money using partial offloading.

    If your answer is "I don't know what models I want to run" then a second-hand RTX3090 is probably your best bet. If you want to run larger models, building a rig with multiple (used) RTX3090 is probably still the cheapest way to do it.

  • Hopefully, yes. But I'm sure MS and some hardware manufacturers salivate at the thought of being able to create a completely locked down computer platform. I own neither, but aren't both iPhone and Playstation users locked into the manufacturers' respective stores? Those seems to be perfectly legal in the EU.

  • Is max tokens different from context size?

    Might be worth keeping in mind that the generated tokens go into the context, so if you set it to 1k with 4k context you only get 3k left for character card and chat history. I think i usually have it set to 400 tokens or something, and use TGW's continue button in case a long response gets cut off

  • llama.cpp uses the gpu if you compile it with gpu support and you tell it to use the gpu..

    Never used koboldcpp, so I don't know why it would it would give you shorter responses if both the model and the prompt are the same (also assuming you've generated multiple times, and it's always the same). If you don't want to use discord to visit the official koboldcpp server, you might get more answers from a more llm-focused community such as !localllama@sh.itjust.works

  • It's a functional programming language, so you have to think quite differently when using it if you're used to imperative programming languages (e.g. C++, Java, Python, Basic). I learned it at uni and it was quite fun, but I wouldn't know how to write a larger project in it.

  • I initially wrote 100, but when starting to look through my Steam library I realized how few games had that few reviews. All the indie games I thought might be borderline unknown turned out to have 5k+ reviews.

  • It ought to be mandatory to write this out whenever talking about Linux. I've seen more than one person bash Linux in a public forum "because it has digital rights management built into the kernel" after they've misinterpreted some news headline.

  • If someone is careless, they should create a wrapper around rm, or just use a FM.

    I think that's the situation OP is in.. They don't trust themself with these kinds of commands, while other commenters here are trying to convince them that they should just use rm -rf anyway

  • That output means that it did something, but I suspect there's a risk the same thing could happen in the future. I run this command every now and then in an attempt to avoid fragmentation, especially if the disk has been close to full, but I'm not entirely sure what's causing it to happen in the first place.