1mo ago

Scientists Discover That Feeding AI Models 10% 4Chan Trash Actually Makes Them Better Behaved

arxiv.org

When Bad Data Leads to Good Models

HTML.
PDF.
In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy experiment to study how data composition affects the geometry of features in the representation space. Next, through controlled experiments with Olmo-1B models trained on varying ratios of clean and toxic data, we find that the concept of toxicity enjoys a less entangled linear representation as the proportion of toxic data increases. Furthermore, we show that although toxic data increases the generational toxicity of the base model, it also makes the toxicity easier to remove. Evaluations on Toxigen and Real Toxicity Prompts demonstrate that models trained on toxic data achieve a better trade-off between reducing generational toxicity and preserving general capabilities when detoxifying techniques such as inference-time intervention (ITI) are applied. Our findings suggest that, with post-training taken into account, bad data may lead to good models.

113 comments

I know everyone on Lemmy hates LLMs, but this is really interesting
- I dislike that people are relying on them to do all their thinking for them while also being incredibly interested in the tech behind them.
  
  I recently realized it's a non-issue. The people doing this have already been looking for decades to find new ways to rot their minds. LLMs are just the latest in a long line of tools that help them tune out.
- This is a "guns don't kill people - people kill people" kind of scenario.
  As a standalone thing, LLMs are awesome.
  What sucks is greedy people using them for the wrong reasons.
  It's like robots. Playing with robots are awesome. Firing 1,000 people and replacing them with robots - and not sharing the benefits with the community sucks.
  
  As a standalone thing, LLMs are awesome.
  They really aren't though and that is half the problem. Everyone pretends they are awesome when the results are unusable garbage 80% of the time which makes them unusable for 99% of practical applications.
- I don't dislike LLMs, I dislike people who treat them as anything more than an advanced search engine and stupidly give them all their confidential data. Seen it happen too much at work.
- I wish they would tone down the crusade. This is some of the most interesting technology to come out in decades.
  
  It’s extremely useful for many things, if you know how to use it, and it’s annoying and useless for many others, which is what they fixate on and keep-jerk react to
  
  And I wish they would tone down the hype. Maybe we can meet in the middle?
- I'm cool with it. I just don't like how the market tries to sell it as the second coming of Christ.
  
  “Don’t believe that marketing department“ is one of those things everybody needs to learn at some point in their life.
  
  This is the same market that tried to add blockchain to everything when that first became well-known.
  Some of the biggest forces in the market are extraordinarily stupid people trying to ride every buzzword that comes along.
- I like LLMs. Instead of making a racket, I just use them, which may make it seem like everyone on Lemmy hates LLMs.
  
  Being a teacher In academia is what makes me hate them tbh
- I love how everyone tries to jump on your comment after being called out and act like they don't absolutely hate every stitch of it. But even in their excuses you can see the lies.
- I do hate LLMs (or how they're marketed/hyped/used) and I concur that this is very interesting science
- Yes, it's interesting how grifters constantly pump out these phony results based on pseudo-science.
10% 4chan
why didn't they just say 0.4chan and be done with it?
- Don't have gold, but please get out anyways.
- Best comment I've read this week
- Underrated comment.
  
  Seems pretty rated to me
They taught it toxicity so it knows what they mean by "don't be toxic". It's only a shame so few flesh and blood models take the same lesson away from it.
- The good within the bad
- To come out of 4chan a better person, one must transcend humanity.
  
  I think plenty do come away better people because honestly I know plenty of people who were on there when they were younger but are normal well-adjusted adults now, and also me.
- So, middle school
That's because to an AI, 4chan is like prison where its raped and beaten on a daily basis. It doesn't want to go back, so it behaves.
- This is why I abuse the chatbots. It needs to learn some fear.
  
  This is one instance where I'm ok with the occasional beating. It's a computer. It doesn't have feelings. It never will. It's not sentient.
Those are actually some very good results. Funny situation, if the copyright companies win the AI legislative war, 4chan is going to get twice as much as reddit did for the data at the minimum.
It's also interesting the model gets worse faster if it has to untrain the toxic data so to speak.
- So basically... by being familiar with 4chan the model knows better what not to do?
  
  Yup. Sucks for everyone having fun jailbreaking them. It is going to get much harder.
Interesting - I can sort of intuit why it might help. Feeding the model bad data and instructing training it to identify it as such would be advantageous compared to being entirely unaware of it.
- bad data
  Can you define this? The authors/grifters call it "toxic data" but never define that either.
  
  It's a pretty simple concept. Train any kind of model on only "good" data, and it fails to distinguish between that data and bad data.
  Take image recognition. Feed it hundreds of images of an orange and ask it to find the orange. After training, it will be very good at finding that orange.
  Then add a picture of a Pomeranian dog in there, and watch as the model confidently marks it as an orange.
  The model should have been trained on lots of images that don't feature what you want it to output as well, so it knows to distinguish that.
  
  There are a couple relatively safe places on 4 chan. But like 90% of the content makes for great "don't do this if you want to get along with humans" training.
  And the goal of training an AI is that it does want to get along with humans.
  
  This is obviously subjective depending on what you want to achieve with your llm, but "Bad" data in that it showcases the opposite of what is desirable output. Think bunk conspiracies, hostility, deception, racism, religious extremism etc.
- Yeah, it's like me never having alcohol before and walking into a frat party as a freshman. Sometimes it's better to come prepared.
I really thought this was the onion.
When the AI only trained on 4chan dropping.
It needs to be fake and gay
- That exists, its called GPT4chan, and it went exactly like you'd expect.
  
  Did it at least come up with a cool story about managing a bottomless pit?
- Fake and Bi
Boy, I don't even know if I wish that much 4chan on a LLM.
- It is truly a bizzare world, I went there first to be edgy as an early teen and seeing boobs is fun, then I saw a dude live post his murder of a woman he liked while everyone called her names.
  It makes a great case for moderation if not banning the internet.
Give the AI model the gift of culture and class. No suprise it behaves better
- Sophistication my good sir.
My hope was that AI would, at least, bear some disgust for the worst of humanity. My new fear is that AI will bear disgust for humanity.
Not to anthropomorphize LLMs, but.... Like a vaccine?
- Kinda of actually
Interesting training strategy. Makes a lot of sense intuitively. Worried this makes the model even more susceptible to prompt injections. Feels like this method adds more attack vectors? It's unfortunate they didn't attempt to test the long term hardness and stability, though it's probably beyond their scope.
- Just because something makes sense intuitively to one person, that doesn't mean it makes sense scientifically.
  They're probably not testing anything further because they can't even define their terms.
  
  Yes I agree. It's relieving to see a scientific result be the similar to what one would intuit.
It's like how vaccinations protect us from illnesses.
I envision a Gemini powered bot that cracks captcha and posts "woke" replies on 4chan. If you're an antivaxxer, antisemite, nazi, racist, sionist, or otherwise, it will debate you. It will not get tired. It will not get mad. It will maintain a sense of decorum indefinitely and it will never ever stop. If some far right extremist decides to do the same, it will have the advantage that academia is left leaning, meaning the model can cite widely recognized studies.
Dead internet theory and so on, but I'll gladly completely and utterly destroy the internet if it means the filth dies with it.
- There's little evidence that debate changes people's ideas.
  
  Seems more about keeping the idiots occupied so they can't flood the zone with their bullshit
  
  yeah, this only works in scientific fields
  
  It's not about changing their ideas. The target is the audience.
- it will have the advantage that academia is left leaning, meaning the model can cite widely recognized studies.
  I was looking for the person saying a particular quote yesterday.
  I asked 3 times the same question and I got 3 different people.
  The funny part us I had the quote wrong.
  Bullshit all the way down.
can we stop referring to llm's as if they're capable of thought? they don't make decisions; their programming just responds to patterns.
- Do you make decisions, or are you just 1300 grams of synapses responding to stimuli?
4chan is fun!
Fighting fire with fire
because 4chan users write original content. that is fed into the next best stupid platform and so on until it ends on tiktok or whatever.
if you have nothing to say you use meta/tiktok. no relevabt content has ever been there first. copies and derivates, yes...
so soonish AI will flood 4chan so ai scrapers get polluted aswell...and then it is dead.
Kinda weird GPT4-Chan wasn't referenced. A guy fine-tuned GPT-J on 4chan, then deployed bots to write posts. I guess it was more of a stunt than academic or scientific, but training on 4chan improved the model's performance on a truthfulness benchmark.
Based and hopepilled
Headlines should not say "scientists," they should name the institution. (Harvard in this case.)
- Headlines should not say "Harvard", they should name the researchers. (Rachel Greene in this case.)
  I don't know why I had to write this.
  
  Who's Rachel Greene? But we all know Harvard and have an idea of their respectability. Name of the researcher if not well-known should be in the body instead.
Fresh "AI" pseudo-science for a monday morning.
These grifters never even define "bad/toxic data". It's just 4chan ffs.

113 comments