noneabove1182 @ noneabove1182 @sh.itjust.works

Posts

89
Comments

149
Joined

2 yr. ago

2y ago

I'm having a fantastic time with this model.

Hmm had interesting results from both of those base models, haven't tried the combo yet, will start some exllamav2 quants to test

What's it doing well at?

quant link for anyone who may want: https://huggingface.co/bartowski/OpenHermes-2.5-neural-chat-7b-v3-1-7B-exl2

2y ago

Orca 2: Teaching Small Language Models How to Reason

Jump

Btw there's a 16k tune available now:

https://huggingface.co/bartowski/Orca-2-13B-16k-exl2

2y ago

Beginner questions thread

Jump

I use text-generation-webui mostly. If you're only using GGUF files (llama.cpp), koboldcpp is a really good option

A lot of it is the automatic prompt formatting, there's probably like 5-10 specific formats that are used, and using the right one for your model is very important to achieve optimal output. TheBloke usually lists the prompt format in his model card which is handy

Rope and yarn refer to extending the default context of a model through hacky (but functional) methods and probably deserve their own write up

2y ago

Beginner questions thread

Jump

Yeah so those are mixed, definitely not putting each individual weight to 2 bits because as you said that's very small, i don't even think it averages out to 2 bits but more like 2.56

You can read some details here on bits per weight: https://huggingface.co/TheBloke/LLaMa-30B-GGML/blob/8c7fb5fb46c53d98ee377f841419f1033a32301d/README.md#explanation-of-the-new-k-quant-methods

Unfortunately this is not the whole story either, as they get further combined with other bits per weight, like q2_k is Q4_K for some of the weights and Q2_K for others, resulting in more like 2.8 bits per weight

Generally speaking you'll want to use Q4_K_M unless going smaller really benefits you (like you can fit the full thing on GPU)

Also, the bigger the model you have (70B vs 7B) the lower you can go on quantization bits before it degrades to complete garbage

2y ago

Beginner questions thread

Jump

If you're using llama.cpp chances are you're already using a quantized model, if not then yes you should be. Unfortunately without crazy fast ram you're basically limited to 7B models if you want any amount of speed (5-10 tokens/s)

2y ago

Orca 2: Teaching Small Language Models How to Reason

Jump

I'm looking forward to trying it today, I think this might make a good RAG model based on the orca 2 paper, but testing will be needed

2y ago

Orca 2: Teaching Small Language Models How to Reason

Jump

according to the config it looks like it's only 4096, and they specify in the arxiv that they kept the training data under that value so it must be 4096.. i'm sure people will expand it soon like they have with others

2y ago

Hundreds of OpenAI employees threaten to resign and join Microsoft

Jump

got any other articles? that one doesn't make it out to be all that bad

"Without that catalyst, I don't see an angle to a near term mutually agreeable merger of Nintendo and MS and I don't think a hostile action would be a good move, so we are playing the long game."

doesn't sound like meddling to me, just wanting to mutually merge, and who wouldn't want that as a CEO lol

2y ago

Hundreds of OpenAI employees threaten to resign and join Microsoft

Jump

They've been doing orders of magnitude better in recent years, I'm never thrilled about aggressive vertical integration but of all the massive corporations Microsoft is pretty high up on my personal list for trust (which yeah is a pretty low bar compared to Amazon/Google/etc)

2y ago

What are some cool things to do with ChatGPT?

Jump

Wtf? This is a weird take lol

2y ago

Sony Xperia 1 mk 5 android 14 update

Jump

Hope it comes out soon that's some nice QOL updates :)

2y ago

...

Jump

While the drama around X and musk cannot be understated, it's still great to see more players in the open model world (assuming this gets properly opened)

One thing that'll hold it back (for people like us at least) is developer support so I'm quite curious to see how this plays out with things like GPTQ and llama.cpp

2y ago

Phind V7 subjectively performing at GPT4 levels for coding

Jump

I almost wonder if they have but they're holding back until they have something that's more game breaking, cause let's be honest if Gemini releases and says "we're better than gpt4" people won't flock to it, they need something that's a standout feature to make people want to switch

2y ago

HUGE dataset released for open source use

Jump

I think the implication is more stating that this dataset is even more useful if you don't jam the whole thing into your training but instead further filter it to a reasonable number of tokens, around 5T, and train on that subset instead

I could be incorrect, cause they do explicitly say deduplicating, but it's phrased oddly either way

2y ago

Phind V7 subjectively performing at GPT4 levels for coding

Jump

Another what? Claiming to be better than gpt4? If so, I think this might be one of the most reasonable times it's been claimed, with, albeit anecdotal, evidence from real use cases instead of just gaming a benchmark

2y ago

HUGE dataset released for open source use

Jump

I thought they claim the dedupped dataset is the 20.5T number, where did you see 5T? either way that would still be awesome, especially when you consider the theory that quality is most limited by datasets and llama2 was trained on 2T.. this could be huge

2y ago

Inside The OnePlus Open – And The Machines That Torture It [Exclusive] - MrMobile

Jump

Very interesting they wouldn't let him film the camera bump.. it must have some kind of branding on it like Hasselblad? Or maybe they've secretly found a way to have no bump! One can dream..