Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)NO
Posts
89
Comments
149
Joined
2 yr. ago

  • I use text-generation-webui mostly. If you're only using GGUF files (llama.cpp), koboldcpp is a really good option

    A lot of it is the automatic prompt formatting, there's probably like 5-10 specific formats that are used, and using the right one for your model is very important to achieve optimal output. TheBloke usually lists the prompt format in his model card which is handy

    Rope and yarn refer to extending the default context of a model through hacky (but functional) methods and probably deserve their own write up

  • Yeah so those are mixed, definitely not putting each individual weight to 2 bits because as you said that's very small, i don't even think it averages out to 2 bits but more like 2.56

    You can read some details here on bits per weight: https://huggingface.co/TheBloke/LLaMa-30B-GGML/blob/8c7fb5fb46c53d98ee377f841419f1033a32301d/README.md#explanation-of-the-new-k-quant-methods

    Unfortunately this is not the whole story either, as they get further combined with other bits per weight, like q2_k is Q4_K for some of the weights and Q2_K for others, resulting in more like 2.8 bits per weight

    Generally speaking you'll want to use Q4_K_M unless going smaller really benefits you (like you can fit the full thing on GPU)

    Also, the bigger the model you have (70B vs 7B) the lower you can go on quantization bits before it degrades to complete garbage

  • If you're using llama.cpp chances are you're already using a quantized model, if not then yes you should be. Unfortunately without crazy fast ram you're basically limited to 7B models if you want any amount of speed (5-10 tokens/s)

  • according to the config it looks like it's only 4096, and they specify in the arxiv that they kept the training data under that value so it must be 4096.. i'm sure people will expand it soon like they have with others

  • got any other articles? that one doesn't make it out to be all that bad

    "Without that catalyst, I don't see an angle to a near term mutually agreeable merger of Nintendo and MS and I don't think a hostile action would be a good move, so we are playing the long game."

    doesn't sound like meddling to me, just wanting to mutually merge, and who wouldn't want that as a CEO lol

  • They've been doing orders of magnitude better in recent years, I'm never thrilled about aggressive vertical integration but of all the massive corporations Microsoft is pretty high up on my personal list for trust (which yeah is a pretty low bar compared to Amazon/Google/etc)

  • ...

    Jump
  • While the drama around X and musk cannot be understated, it's still great to see more players in the open model world (assuming this gets properly opened)

    One thing that'll hold it back (for people like us at least) is developer support so I'm quite curious to see how this plays out with things like GPTQ and llama.cpp

  • I almost wonder if they have but they're holding back until they have something that's more game breaking, cause let's be honest if Gemini releases and says "we're better than gpt4" people won't flock to it, they need something that's a standout feature to make people want to switch

  • I think the implication is more stating that this dataset is even more useful if you don't jam the whole thing into your training but instead further filter it to a reasonable number of tokens, around 5T, and train on that subset instead

    I could be incorrect, cause they do explicitly say deduplicating, but it's phrased oddly either way

  • Another what? Claiming to be better than gpt4? If so, I think this might be one of the most reasonable times it's been claimed, with, albeit anecdotal, evidence from real use cases instead of just gaming a benchmark

  • I thought they claim the dedupped dataset is the 20.5T number, where did you see 5T? either way that would still be awesome, especially when you consider the theory that quality is most limited by datasets and llama2 was trained on 2T.. this could be huge

  • Android @lemdro.id

    Inside The OnePlus Open – And The Machines That Torture It [Exclusive] - MrMobile

  • Yeah definitely need to still understand the open source limits, they're getting pretty dam good at generating code but their comprehension isn't quite there, I think the ideal is eventually having 2 models, one that determines the problem and what the solution would be, and another that generates the code, so that things like "fix this bug" or more vague questions like "how do I start writing this app" would be more successful

  • It depends on the learning rate, typically it's ideal and higher quality to learn really slowly over a lot of epochs but it's cheaper and obviously faster to learn fast over fewer epochs

    Also the dataset size is important to consider

  • LocalLLaMA @sh.itjust.works

    Dolphin 2.0 based on mistral-7b released by Eric Hartford

    LocalLLaMA @sh.itjust.works

    Beginner questions thread

    LocalLLaMA @sh.itjust.works

    Microsoft's latest LLM agent: autogen

    LocalLLaMA @sh.itjust.works

    QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

    LocalLLaMA @sh.itjust.works

    Effective Long-Context Scaling of Foundation Models | Research - AI at Meta

    LocalLLaMA @sh.itjust.works

    Jeremy Howard: A Hackers' Guide to Language Models

    LocalLLaMA @sh.itjust.works

    Amazon investing in Anthropic - Expanding access to safer AI with Amazon

    LocalLLaMA @sh.itjust.works

    Very interesting thread about reversal knowledge

    LocalLLaMA @sh.itjust.works

    Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

    LocalLLaMA @sh.itjust.works

    Exllama V2 released! Available in Ooba! Big speed upgrades!

    LocalLLaMA @sh.itjust.works

    GitHub - nicholasyager/llama-cpp-guidance: A guidance compatibility layer for llama-cpp-python

    LocalLLaMA @sh.itjust.works

    Supporting the Open Source AI Community | Andreessen Horowitz

    LocalLLaMA @sh.itjust.works

    WizardLM introduce the newest WizardCoder 34B based on Code Llama

    Android @lemdro.id

    Stories and 10 Years of Telegram

    LocalLLaMA @sh.itjust.works

    Code Llama: Open Foundation Models for Code | Meta AI Research

    LocalLLaMA @sh.itjust.works

    Making LLMs lighter with AutoGPTQ and transformers

    LocalLLaMA @sh.itjust.works

    Jon Durbin: Finished up a first stab at LMoE - LoRA mixture of experts

    LocalLLaMA @sh.itjust.works

    PEFT 0.5.0: Release GPTQ Quantization, Low-level API · huggingface/peft

    LocalLLaMA @sh.itjust.works

    GGUF PR has officially been merged into master!