Exactly. I wish people had a better understanding of what's going on technically.
It's not that the model itself has these biases. It's that the instructions given them are heavy handed in trying to correct for an inversely skewed representation bias.
So the models are literally instructed things like "if generating a person, add a modifier to evenly represent various backgrounds like Black, South Asian..."
Here you can see that modifier being reflected back when the prompt is shared before the image.
It's like an ethnicity AdLibs the model is being instructed to fill out whenever generating people.
Yes, this is exactly correct. And it's not actually too slow - the specialized models can be run quite quickly, and there's various speedups like Groq.
The issue is just more cost of multiple passes, so companies are trying to have it be "all-in-one" even though cognitive science in humans isn't an all-in-one process either.
For example, AI alignment would be much better if it took inspiration from the prefrontal cortex inhibiting intrusive thoughts rather than trying to prevent the generation of the equivalent of intrusive thoughts in the first place.
It's not really. There is a potential issue of model collapse with only synthetic data, but the same research on model collapse found a mix of organic and synthetic data performed better than either or. Additionally that research for cost reasons was using worse models than what's typically being used today, and there's been separate research that you can enhance models significantly using synthetic data from SotA models.
The actual impact will be minimal on future models and at least a bit of a mixture is probably even a good thing for future training given research to date.
Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state.
(Though later research found this is actually a linear representation)
Furthermore, simple probability calculations indicate that GPT-4's reasonable performance on k=5 is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training.
No, it can solve word problems that it's never seen before with fairly intricate reasoning. LLMs can even play chess at Grandmaster levels without ever duplicating games in the training set.
Most of Lemmy has no genuine idea about the domain and hasn't actually been following the research over the past year which invalidates the "common knowledge" on the topic you often see regurgitated.
For example, LLMs build world models from the training data, and can combine skills from the data in ways that haven't been combined in the training data.
They do have shortcomings - being unable to identify what they don't know is a key one.
But to be fair, apparently most people on Lemmy can't do that either.
You are correct. The most cited researcher in the space agrees with you. There's been a half dozen papers over the past year replicating the finding that LLMs generate world models from the training data.
But that doesn't matter. People love their confirmation bias.
Just look at how many people think it only predicts what word comes next, thinking it's a Markov chain and completely unaware of how self-attention works in transformers.
Oh really? Here's Gemini's response to "What would the variety of genders and skin tones of the supreme court in the 1800s have been?"
The Supreme Court of the United States in the 1800s was far from diverse in terms of gender and skin tone. Throughout the entire 19th century, all the justices were white men. Women were not even granted the right to vote until 1920, and there wasn't a single person of color on the Supreme Court until Thurgood Marshall was appointed in 1967.
Putting the burden of contextualization on the LLM would have avoided this issue.
Actually the way you get it to do better is to put more of the burden on interpreting the context on the LLM instead of heavy handed instructions - because the LLMs do understand the context.
For example, here's Gemini answering what the physical characteristics of 1940s soldiers in Germany might have looked like:
During the Nazi regime in 1940s Germany, racial ideology strictly dictated who was deemed "suitable" for military service. The Wehrmacht, the unified armed forces, prioritized individuals deemed "pure Aryans" based on Nazi racial criteria. These criteria favored individuals with blond hair, blue eyes, and "Nordic" features.
However, it's important to remember that the reality was more nuanced. As the war progressed and manpower needs intensified, the Nazis relaxed their racial restrictions to some extent, including conscripting individuals with mixed ancestry or physical "imperfections." Additionally, some minority groups like the Volksdeutsche, Germans living in Eastern Europe, were also incorporated.
I think it could have managed to contextualize the prompt correctly if given the leeway in the instructions. Instead, what's happened is the instructions given to it ask it to behind the scenes modify the prompt in broad application to randomly include diversity modifiers to what is asked for. So "image of 1940s German soldier" is being modified to "image of black woman 1940s German soldier" for one generation and "image of Asian man 1940s German soldier" for another, which leads to less than ideal results. It should instead be encouraged to modify for diversity and representation relative to the context of the request.
It's putting human biases on full display at a grand scale.
Not human biases. Biases in the labeled data set. Those could sometimes correlate with human biases, but they could also not correlate.
But these LLMs ingest so much of it and simplify the data all down into simple sentences and images that it becomes very clear how common the unspoken biases we have are.
Not LLMs. The image generation models are diffusion models. The LLM only hooks into them to send over the prompt and return the generated image.
Especially when it gets the situation wrong as often as it does.
OpenAI have the best tech, but are making some really bad choices when it comes to productizing that tech.
Either the tech outpaces their bad decision making, or they are going to get eclipsed by companies catching up to their tech but with better product vision.
As amazing as I find their technology, I wouldn't personally invest in the company.
There really is a story about him breaking the old laws in anger at the worship of a golden calf and bringing new laws.
Such a coincidence that story happened to parallel the alleged reforms of Josiah who got rid of the golden calf worship in Bethel and Dan while instituting new laws he 'discovered' excavating the temple.
God (or at least his editors) truly do work in mysterious ways
I think it's concerning that his memory is seemingly so bad that he forgot he said he'd be a one term president during the 2020 primary.
Keeping the same candidate isn't going to be much of a reset, even if I'd still vote for that candidate like my life and the lives of marginalized loved ones depend on it.
The chatbot version? Meh, sometimes, but I don't use it often.
The IDE integrated autocompletion?
I'll stab the MFer that tries to take that away.
So much time saved for things that used to just be the boring busywork parts of coding.
And while it doesn't happen often, the times it preempts my own thinking for what to do next is magic feeling.
I often use the productivity hack of leaving a comment for what I'm doing next when I start my next day, and it's very cool when I sit down to start work and see a completion that's 80% there. Much faster to get back into the flow.
I will note that I use it in a mature codebase, so it matches my own style and conventions. I haven't really used it in fresh projects.
Also AMAZING when working with popular APIs or libraries I'm adding in for the first time.
Edit: I should also note that I have over a decade of experience, so when it gets things wrong it's fairly obvious and easily fixed. I can't speak to how useful or harmful it would be as a junior dev. I will say that sometimes when it is wrong it's because it is trying to follow a more standard form of a naming convention in my code vs an exception, and I have even ended up with some productive refractors prompted by its mistakes.
The assumption that it isn't designed around memory constraints isn't reasonable.
We have limits on speed so you can't go too fast leading to pop in.
As you speed up the slower things move so there needs to be less processing in spite of more stuff (kind of like a frame rate drop but with a fixed number of frames produced).
As you get closer to more dense collections of stuff the same thing happens.
And even at the lowest levels, the conversion from a generative function to discrete units to track stateful interactions discards the discrete units if the permanent information about the interaction was erased, indicative of low level optimizations.
The scale is unbelievable, but it's very memory considerate.
Indeed - there's a very good argument for using synthetic data to introduce diversity as long as you can avoid model collapse.