noneabove1182 @ noneabove1182 @sh.itjust.works

Posts

89
Comments

149
Joined

2 yr. ago

2y ago

Permanently Deleted

Didn't do well at rag, but maybe that's because RAG is mostly handled by the wrapper rather than relying on the output of the model, think it'll matter more for langchain/guidance and the example they gave

2y ago

Permanently Deleted

Jump

Ah good point, definitely looking forward to it being implemented then

2y ago

Beginner questions thread

Jump

I feel like for non coding tasks you're sadly better off using a 13B model, codellama lost a lot of knowledge/chattiness from its coding fine tuning

THAT SAID it actually kind of depends on what you're trying to do, if you're aiming for RP don't bother, if you're thinking about summarization or logic tasks or RAG, codellama may do totally fine, so more info may help

If you have 24gb of VRAM (my assumption if you can load 34B) you could also play around with 70B at 2.4bpw using exllamav2 (if that made no sense lemme know if it interests you and I'll elaborate) but it'll probably be slower

2y ago

Permanently Deleted

Jump

I LOVE orca tunes, they almost always end up feeling like smarter versions of the base, so i'm looking forward to trying this one out when the GPTQ is finished

GPTQ/AWQ links:

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GPTQ

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

Does sliding attention speed up inference? I thought it was more about extending the capabilities of the context above what it was trained on. I suppose I could see it being used to drop context which would save on memory/inference, but didn't think that was the point of it, just a happy side effect, i could be wrong though

2y ago

Permanently Deleted

Jump

Will have to try to use this with some RAG/langchain/guidance to see if it's any better at it..

2y ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Jump

The abstract is meant to pull in random readers, so it's understandable they'd lay a bit of foundation about what the paper will be about, even if it seems rather simple and unnecessarily wordy

LoRA is still considered to be the gold standard in efficient fine tuning, so that's why a lot of comparisons are made to it instead of QLoRA, which is more of a hacky way. They both have their advantages, but are pretty distinct.

Another thing worth pointing out is that 4-bit is not actually just converting all 16bit weights into 4 bits (at least, not in GPTQ style) They also save a quantization factor, so there's more information that can be retrieved from the final quantization than just "multiple everything by 4"

QA LoRA vs QLoRA: I think my distinction is the same as what you said, it's just about the starting and ending state. QLoRA though also introduced a lot of other different techniques, like double quantizations, normal float datatypes, and paged optimizations to make it work

it's also worth point out, not understanding it has nothing to do with intellect, it's just how much foundational knowledge you have, i don't understand most of the math but i've read enough of the papers to understand to some degree what's going on

The one thing I can't quite figure out is, I know QLoRA is competitive with a LoRA because it trains more layers of the transformer vs a LoRA, but I don't see any specific mention of QA-LoRA following that same method which I would think is needed to maintain the quality

Overall you're right though, this paper is a bit on the weaker side, that said if it works then it works and it's a pretty decent discovery, but the paper alone does not guarantee that

2y ago

Yogesh Brar: OS updates cycle for brands

Jump

By far the biggest pain point of Sony.. their software is clean stable and fast, with acceptable release cadence, but their promise of 2 years is completely unacceptable in this day

Wish there was any way at all to influence them

2y ago

Yogesh Brar: OS updates cycle for brands

Jump

2y ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Jump

I wrote that summary, maybe would help if I knew your knowledge level? Which parts didn't make sense?

2y ago

Microsoft's latest LLM agent: autogen

Jump

There's plenty of smaller projects around that attempt to solve similar problems, metagpt, agent os, gpt-pilot, gpt-engineer, autochain, etc

Several would I'm sure love a hand , you should check em out on GitHub!

2y ago

Microsoft's latest LLM agent: autogen

Jump

It seems reasonably realistic if you compare it to code interpreter, it was able to recognize packages it hadn't installed and go seek them out, I don't think it's outside the scope for it to recognize which module wasn't installed and install it

Even now with regular models they'll suggest the packages you install before executing the code they provide

2y ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Jump

Sure, I can try to add a couple lines on top of the abstract just to give a super brief synopsis

In this case it would be something like:

This paper discusses a new technique in which we can create a LORA for an already quantized model, this is unique from QLora which quantizes the full model on the fly to create a quantized lora. With this approach you can take your small model and work with it as is, saving a ton of resources and speeding up the process massively

2y ago

Boost for Lemmy is now live

Jump

Not a glowing review that this is accidentally not a reply to a comment. :p

2y ago

Permanently Deleted

Jump

This is great and comes with a very interesting model!

I wonder if they cleverly slide the window in any way or if it's just a naive slide, could probably be pretty smart if you discard tokens that have minimal attention on them anyways to focus on important text

For now, this is awesome!

2y ago

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Jump

The good news is if you do it wrong, much like regular speculative generation, you will still get the right result that the full model would output at the end, so there won't be any loss in quality, just loss in speed

It's definitely a good point tho, finding the optimal configuration is the difference between slower/minimal speedup and potentially huge amounts of speedup

2y ago

Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes

Jump

Somehow this is even more confusing because that code hasn't been touched in 3 months, maybe just took them that long to validate? Will have to read through it, thanks!

2y ago

Very interesting thread about reversal knowledge

Jump

Yeah fair point, I'll make sure to include better links in the future :) typically post from mobile so it's annoying but doable

2y ago

Very interesting thread about reversal knowledge

Jump

I'm not really sure I follow, it's just a simplification, the most appropriate phrasing I guess would be "given A belongs to B, does it know B 'owns' A" like the examples given with "A is the son of B, is B the parent of A"

2y ago

Very interesting thread about reversal knowledge

Jump

To start, everything you're saying is entirely correct

However, the existence of emergent behaviours like chain of thought reasoning shows that there's more to this than pure text predictions, it picks up patterns that were never explicitly trained, so it's entirely feasible to ponder if they're able to recognize reverse patterns

Hallucinations are a vital part of understanding the models, they might not be long term problems but getting them to understand what they actually know to be true is extremely important in the growth and adoption of LLMs

I think there's a lot more to the training and generation of text than you're giving it credit, the simplest way to explain it is that it's text prediction, but there's way too much depth to the training and model to say that's all it is

At the end of the day it's just a fun thought inducing post :) but when Andrej karparthy says he doesn't have a great intuition on how LLM knowledge works (though in fairness he theorizes the same as you, directional learning) I think we can at least agree none of us know for sure what is correct!

2y ago

Vaping linked to higher risk of asthma in teens - UPI.com

Jump

Yeah I guess I meant more it just doesn't get nearly as much attention, but you're right there's some starting and that's quite nice