Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)BL
Posts
1
Comments
596
Joined
2 yr. ago

  • If you took all the racists and bigots in the world and put them in one country... It still wouldn't be justified to wipe them out. I wouldn't want to go to that country - it would certainly be among the worst places in the world - but I also wouldn't suggest we invade and start murdering them.

  • Oho, but he didn't stop there, the old goat, he didn't just stop at one alphabet, did he? Because we use the Latin and Greek alphabets, and even the odd spicy Hebrew character and god knows what else in the darker corners of mathematics.

  • AI models don't resynthesize their training data. They use their training data to determine parameters which enable them to predict a response to an input.

    Consider a simple model (too simple to be called AI but really the underlying concepts are very similar) - a linear regression. In linear regression we produce a model which follows a straight line through the "middle" of our training data. We can then use this to predict values outside the range of the original data - albeit will less certainty about the likely error.

    In the same way, an LLM can give answers to questions that were never asked in its training data - it's not taking that data and shuffling it around, it's synthesising an answer by predicting tokens. Also similarly, it does this less well the further outside the training data you go. Feed them the right gibberish and it doesn't know how to respond. ChatGPT is very good at dealing with nonsense, but if you've ever worked with simpler LLMs you'll know that typos can throw them off notably... They still respond OK, but things get weirder as they go.

    Now it's certainly true that (at least some) models were trained on CSAM, but it's also definitely possible that a model that wasn't could still produce sexual content featuring children. It's training set need only contain enough disparate elements for it to correctly predict what the prompt is asking for. For example, if the training set contained images of children it will "know" what children look like, and if it contains pornography it will "know" what pornography looks like - conceivably it could mix these two together to produce generated CSAM. It will probably look odd, if I had to guess? Like LLMs struggling with typos, and regression models being unreliable outside their training range, image generation of something totally outside the training set is going to be a bit weird, but it will still work.

    None of this is to defend generating AI CSAM, to be clear, just to say that it is possible to generate things that a model hasn't "seen".

  • Stealing from an individual is deplorable. I can understand why someone might want to respond aggressively (although to be clear I still don't think it's justified) if someone steals medication from an old lady... But from a shop?

  • Good thoughts, I agree wholeheartedly in most cases. There is a point to be made about the energy consumption of AI, too. Right now I doubt that we're actually getting as much out of it in really value as we are pumping in, just in sheer electricity.

  • There's not so much to mess up on, say, Mars. I mean the terrain is interesting in its way but it's not like we'd be annihilating complex ecosystems like we are here on earth. We would have to establish significant ecosystems anywhere we settled, in fact.