AI image-generators are being trained on explicit photos of children, a study shows
AI image-generators are being trained on explicit photos of children, a study shows

AI image-generators are being trained on explicit photos of children, a study shows

Those same images have made it easier for AI systems to produce realistic and explicit imagery of fake children as well as transform social media photos of fully clothed real teens into nudes, much to the alarm of schools and law enforcement around the world.
Until recently, anti-abuse researchers thought the only way that some unchecked AI tools produced abusive imagery of children was by essentially combining what they’ve learned from two separate buckets of online images — adult pornography and benign photos of kids.
But the Stanford Internet Observatory found more than 3,200 images of suspected child sexual abuse in the giant AI database LAION, an index of online images and captions that’s been used to train leading AI image-makers such as Stable Diffusion. The watchdog group based at Stanford University worked with the Canadian Centre for Child Protection and other anti-abuse charities to identify the illegal material and report the original photo links to law enforcement.
well I wonder what excuse all the AI fuckbois have for this one.
Probably "well that's not good".
You think that people who disagree with you on AI stuff are somehow okay with child porn?
I don't think that's what they're suggesting at all.
The question isn't "Did you know there is child porn in your data set?"
The question is "Why the living fuck didn't you know there was child porn in your fucking data set, you absolute fucking idiot?"
The answer is more mealy-mouthed bullshit from pussies who didn't have a plan and are probably currently freaking the fuck out about harboring child porn on their hard drives.
The point is it shouldn't have happened to begin with and they don't really have a fucking excuse and if all they can come up with is "well that's not good" maybe they should go die in a fucking fire to make the world a better place. "Oopsie doodles I'm sowwy" isn't good enough.
"Actually checking all the images we scraped the internet for is too hard, and the CSAM algorithms aren't available to just anyone to check to make sure they don't have child porn waaaaah"
It's all because it's a "make money first and fuck any guardrails" ethos. It's the same shit they hide behind when saying it's not piracy when LLMs are trained on books3, which is well known to be the entirety of a private tracker for ebooks which specializes in removing DRM and distributes the tools to remove DRM. (Specifically, Bibliotik.)
Literally, books3 was always pirated, and not just pirated, but easily provable to be a large DMCA violation of having broken encryption to remove DRM from the books. So how is any media produced from a pirated dataset not technically a copyright violation themselves? Especially when the company in question is getting oodles of money for it? The admins of the Pirate Bay went to prison for less.
You can't tell me that a source for media that is KNOWN to be sourced pirated material somehow becomes A-Okay for a private company to use for profit. That's just bullshit. But I've seen plenty of defense of it. Apparently it's okay for companies to commit instances or piracy, as long as they make money or something? Makes no fucking sense to me.
“That’s pirated content!”
“But we’re an AI company who used it to train our LLM and profited greatly from it!”
I should train an AI to get a library card and check out books 5 at a time!
Same thing I've said all along, shits fucked but it's the people, not the tool, that's the problem. Turns out it's the people training the AI with shit like this that's the problem, not the AI itself.
If people are using it for these purposes, then these people shouldn't be allowed to use it.
In this particular case, there are three organizations involved.
First you have LAION, the main player in the article, which is a not for profit org intended to make image training sets available broadly to further generative AI development.
While they had some mechanisms in place, 3200 CSA images slipped by them and became part of the 5 billion images in their data set. That's on them, for sure.
From the above quote it kinda sounds like they weren't doing nearly enough. And they weren't following best practices.
It also doesn't help that the second organization, their upstream source for much of their data, Common Crawl seems not to do anything and passes the buck to its customers.
And of course third we have the customers of LAION, a large influencer of which is Stability AI and apparently they were late to the game in implementing filters to prevent generation of CSAM using their earlier model which, though unreleased, was integrated into various tools.
So it seems to me the excuses of these players is "hurr durr, I guess I shoulda thunked of that before durr". As usual, humans leap into shit without sufficiently contemplating negative outcomes. This is especially true for anything technology related because it has happened over and over and over again countless times in the decades since the PC revolution.
I for one am exhausted by it and sometimes, like now after reading the article, I just want to toss it out the window. Yup, it's about time to head to my shack in the woods and compose some unhinged screeds on my typer.
Remove the images and re-do the training. Using the previous AI should be banned except for ethical research, and even so has to get the permission from authorities.
As an AI fuckboi I don't think anything else is acceptable.