Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)KR
Posts
6
Comments
1,656
Joined
2 yr. ago

  • You might want to ask your vet for advice rather than waiting for them to bring it up.

    They'll have a lot of experience and might be better able to contextualize her subjective experience of the symptoms.

  • For reference, the May 2023 poll had:

    Just 32% overall think Biden has the mental sharpness it takes to serve effectively as president, down steeply from 51% when he was running for president three years ago," ABC's Gary Langer notes.

    54% think Trump [...]

    AKA both had roughly the exact same 14% relative drop from their numbers a little under a year ago in terms of confidence in their mental acuity.

    Isn't the two party system wonderful? WTF. We're supposedly the world superpower and we're heading into deciding which person a majority of the country thinks doesn't have all the marbles to even run it effectively will run it?

    Also from the May 2023 poll was that 58% of Democratic leaning adults wanted someone other than Biden to be running.

    Quite the representative democracy we have going on here.

  • Sports games.

    I know people who like them exist given the sales. But not only do I not play or like sports games - no one that plays games in my social circle does either.

    It's like the Venn diagram for people who play RPGs and those who play sports games is just two circles.

  • Eventually making a martyr out of your political opponents backfires.

    It's probably why Navalny is still alive.

    You kill your sycophants that pose threats, but you arrest and routinely provide proof of life of your political opponents.

  • Title of your post is literally "New Theory Suggests Chatbots Can Understand Text".

    I didn't write the headline, and I happen to interpret it the same way I interpreted it in "Bees understand the concept of zero." Language can have more than one narrowly scoped meaning, and the article body makes it clear it isn't saying anything about human consciousness or introspective understanding.

    You also hinted at it with your Pythag analogy.

    No, I correctly stated that a model happening upon the Pythagorean function would outperform ones approximating it by statistical correlations. That, as Hinton has said in the past, "predicting the next thing takes knowledge." It makes sense that the development of world models and abstractions from the training data and not simply surface statistics would correlate with both increased next token prediction and network complexity increases.

    You interpreted what I was saying as implying the network has some woo woo interpretation of 'understanding' because you seem to be more committed to debating a straw man using inaccurate and overly narrow semantics than actually discussing the topic at hand in good faith.

  • You keep quoting research ad-verbum as if it's gospel

    No, but I have learned over the years that when you see multiple papers discovering similar things at odds with the held consensus and see some even independently replicated that there's usually more than just smoke.

    If this article was entitled "Researchers find patterns in neural networks that might help make more effective ones" no one would have a problem with it, but also it would not be newsworthy.

    The paper was titled "Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models." Quanta, while a Pulizer winner in 2022 for explanatory reporting, is after all a publisher not a research institution. Though I dispute your issues with the headline as it's in line with similar article headlines such as "Bees understand the concept of zero".

    I posit that Category Theory... Do you have an opinion on that?

    You wouldn't be the only person looking at it through that lens. It was more popular a few years ago I think, and hasn't really caught on for LLMs vs other ML approaches and here it strikes me a bit like those with hammers looking for nails - the degree to which there's functional overlaps in network introspection such as the linked Anthropic work suggests to me that the internalized delineations are a bit fuzzier than would cleanly map onto a category theory view - but it's possible that as time goes on that it gets some research wins assuming it can come up with testable predictions that are successful. But it's more of a 'how' than a 'what' question - whether a network understands abstract concepts tangental to language it is trained on and develops world models (an idea that would have been laughed out of the room just three years ago by any serious researchers despite your impression) using something that can be explained through category theory or through another interpretation, the result is arguably the more important finding than the interpretation of the means.

    It seems like you may be more committed to arguing the semantics and nuances of the tree in front of you than discussing the forest - that's fine, it's just not that interesting to me in turn.

  • In part this is because the SotA model is by far GPT-4, but OpenAI has pigeon holed it into 'chatbot.'

    The earliest versions of it pre-release when it was being incorporated into Bing were amazing. Probably the most impressive thing I've seen in tech.

    But it was too human-like and freaking users out, so rather than wait for the market to adjust they did extensive fine tuning to make the large language model trained to predict human ouput be less likely to produce human-like output.

    The problem is that they don't have a scalpel for this sort of thing and ended up with a model that's very good as a chatbot within a certain scope, but significantly impaired at some of the outside the box mechanics visible early on.

    And because it's the SotA, everyone is now using it to fine tune their own models.

    So the entire industry is being set back in practical applications outside of "kind of boring chatbot."

  • It provides an entirely new framework for analyzing skills in LLMs. Do you mean the article doesn't provide new insights, or that the research doesn't?

    As for my own interest, in addition to this providing a more rigorous framework for analyzing what I'd already gotten a sense of with the world model research papers over the last year, I can see a number of important nuances.

    First off, there's the obvious point of emergent capabilities being a hotly debated topic in research circles, which you likely know if you've followed it at all.

    In particular, the approach here compliments the paper out of Stanford disputing emergent capabilities because other measurements of improvement are linear as size increases. Here, linear improvements in next token prediction directly tie into emergent skills, so it's promising that the model fits neatly with one of the more notable counter-point nuances in the past year.

    I also think this is an exciting approach if the same framework were remapped to the way Anthropic's research was looking at functional layers as opposed to individual network nodes. By mapping either side of the graph to functional layers it may allow for more successful introspection into larger models than we've had before.

    A framework around a controversial research topic that generates testable predictions and then sees those predictions met is generally worth recognizing too.

    Finally, I think that Skill-Mix may offer a useful framework for evaluating models, particularly around transmission of skills from larger models to smaller models using synthetic data, which has probably been the most significant research trend in the domain over the past year.

    So it's noteworthy in a number of ways and I could see it having similar impact to the CoT paper within research circles (where it becomes a component of much of the work that follows and builds on top of it), even if not quite as broad an impact outside of them.

    I've generally felt the field is doing a poor job at evaluating models, falling deeper and deeper into Goodhart's Law, and this is a promising breath of fresh air.

    As they say opening their paper on it:

    Sizeable differences exist among model capabilities that are not captured by their ranking on popular LLM leaderboards ("cramming for the leaderboard"). Furthermore, simple probability calculations indicate that GPT-4's reasonable performance on k=5 is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training. We sketch how the methodology can lead to a Skill-Mix based eco-system of open evaluations for AI capabilities of future models.

    It's about time we move on to something better than the current evaluation metrics which we're just trying to game with surface fine tuning.

  • The license is related to access.

    Basically it's gated and not publicly available, and the only way to open the gate is to say "I promise not to do anything outside what you are limiting me to do."

    A second person that gets access without agreeing to that can use the weights however they want (what copyright would relate to), but the person who gave them access to the weights would have been in breach of their agreement.

    So separate things with different scopes.

  • You are reading made up strawmen into the topic.

    The article defines the scope of the discussion straight up:

    The authors argue that as these models get bigger and are trained on more data, they improve on individual language-related abilities and also develop new ones by combining skills in a manner that hints at understanding — combinations that were unlikely to exist in the training data.

    The question is whether or not LLMs have a grasp of the training material such that they can produce new and novel concepts outside what was in the training data itself.

    Not whether the LLM is sentient or conscious - both characterizations I'd strongly dispute.

    Wikipedia has a useful distillation of the definition of understanding relevant to the above:

    process related to an abstract or physical object, such as a person, situation, or message whereby one is able to use concepts to model that object

  • human consciousness

    Wtf are you talking about? The article is about whether or not models can understand text. Not about whether they embody consciousness.

    Just because someone who wants to secure more funding for their research has put out a blog post, it doesn't make it true in any scientific sense.

    Again, wtf are you going on about? Hinton was the only appeal to authority I made in comments here and I only referred to him quitting his job to whistleblow. And it's not like he needs any attention to justify research if he wanted to.

  • The outputs are not copyrightable.

    But something not being copyrightable doesn't necessarily mean openly distributed.

    It does mean OpenAI can't really restrict or go after other companies training off of GPT-4 outputs though, which is occurring broadly.