Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)KR
Posts
6
Comments
1,655
Joined
2 yr. ago

  • The US has the most data centers today, with 33 percent of the world’s approximately 8,000 data centers. It’s also the country with the most Bitcoin mining. The IEA forecasts a “rapid pace” of growth for data center electricity consumption in the US over the next couple of years, rising from roughly 4 percent of US demand in 2022 to 6 percent by 2026.

    ALL data center energy usage only makes up 4-6% of US energy use.

    Most tech firms that run data centers don’t reveal what percentage of their energy use processes A.I. The exception is Google, which says “machine learning” — the basis for humanlike A.I. — accounts for somewhat less than 15 percent of its data centers’ energy use.

    So AI energy use is maybe less than one percent of total US energy use if we extrapolate from the upper range of data center use and apply the relative usage of data centers of one of the leading AI development firms.

    Yeah, this definitely seems like the "AI Boogeyman" is on par with other contributing factors for why energy grids are struggling. /s

    Maybe we should be taking a closer look at passive device energy usage given double digit percentages of home energy use (35% of total energy use) goes to "energy vampires". Though maybe that version of the article might have received less clicks?

  • It's really so much worse than this article even suggests.

    For example, one of the things it doesn't really touch on is the unexpected results emerging over the last year that a trillion parameter network may develop capabilities which can then be passed on to a network with less than a hundredth the parameter size by generating synthetic data from the larger model to feed into the smaller. (I doubt even a double digit percentage of researchers would have expected that result before it showed up.)

    Even weirder was a result that CoT prompting models to improve their answers and then feeding the questions and final answers into a new model but without the 'chain' from the CoT will still train the second network in the content of the chain.

    The degree to which very subtle details in the training data is ending up modeled seems to go beyond even some of the wilder expectations by researchers right now. Just this past week I saw a subtle psychological phenomenon I used to present about appearing very clearly and very by the book in GPT-4 outputs given the correct social context. I didn't expect that to be the case for at least another generation or two of models and hadn't expected the current SotA models to replicate it at all.

    For the first time two weeks ago I saw a LLM code switch to a different language when there was a more fitting translation to the concept being discussed. There's no way the most statistical likelihood of discussing motivations in English was to drop into a language barely represented in English speaking countries. This was with the new Gemini, which also seems to have internalized a bias towards symbolic representations in its generation, to the point they appear to be filtering out emojis (in the past I've found examples where switching from nouns to emojis improves critical reasoning abilities of models as it breaks token similarity patterns in favor of more abstracted capabilities).

    Adding the transformer's self attention to diffusion models has suddenly resulted in correctly simulating things like fluid dynamics and physics in Sora's video generation.

    We're only just starting to unravel some of the nuances of self-attention, such as recognizing the attention sinks in the first tokens and the importance of preserving them across larger sliding context windows.

    For the last year at least, especially after GPT-4 leapfrogged expectations, it's very much been feeling as the article states - this field is eerily like the early 20th century in Physics where experimental results were regularly turning a half century of accepted theories on their head and fringe theories generally dismissed were suddenly being validated by multiple replicated results.

  • From the linked Wikipedia page:

    "Theseus", created in 1950, was a mechanical mouse controlled by an electromechanical relay circuit that enabled it to move around a labyrinth of 25 squares.[71] The maze configuration was flexible and it could be modified arbitrarily by rearranging movable partitions.[71] The mouse was designed to search through the corridors until it found the target. Having travelled through the maze, the mouse could then be placed anywhere it had been before, and because of its prior experience it could go directly to the target. If placed in unfamiliar territory, it was programmed to search until it reached a known location and then it would proceed to the target, adding the new knowledge to its memory and learning new behavior.[71] Shannon's mouse appears to have been the first artificial learning device of its kind.[71]

  • It depends on the task, but in general a lot of the models have fallen into a dark pattern of Goodhart's Law, targeting the benchmarks but suffering at other things.

    So as an example, while GPT-4 used to correctly model variations of the wolf, goat, cabbage problem with token similarity hacks (i.e. using emojis instead of nouns to break pattern similarity with the standard form of the question), now it even fails for that with the most recent updates, whereas mistral-large is the only one that doesn't need the hack at all.

  • That's not really the point here. If you haven't noticed, recently a lot of people on Lemmy have been suggesting that the reports of rape on Oct 7th were made up or propaganda.

    The UN saying that it likely happened is important to counter that propaganda.

    Similar to if reports of shooting civilians by IDF were being denied but then validated by 3rd party investigators, rational people should rightfully want to see that spread to counter misleading propaganda.

  • Well, it would have still existed, just been pretty distant from what it is today.

    More "everything is permissible" (1 Cor 10:23) and less "God will destroy both stomach and food" (1 Cor 6:13).

  • And yet religion inspired fairy tale magic has gone on to inspire science and technology that enable that idea.

    Harvard’s latest robot can walk on water. Your move, Jesus

    We're literally talking as a society about resurrection consent directives but people are still spouting the age old "there's no soul or afterlife" without regard for emerging science and technology just as the religious are committed to the belief in magic over reinterpreting their beliefs in the context of science.

    You, right now, are in a world experimentally proven for nearly a century now not to be observably real ("a quantity that can be expressed as an infinite decimal expansion") and instead is one only observably digital ("of, relating to, or using calculation by numerical methods or by discrete units").

    And while you're alive you are producing massive amounts of data being harvested up by algorithms simulating the world while some of those technologies are being put to recreating the deceased at such increasing scale that as mentioned, we're starting to discuss if that's okay to do retroactively without consent.

    I'm not a betting person, but the intersection of those two things (that our universe behaves in a way that seems to track stateful interactions with a conversion to discrete units and that we're leaving behind data in a world increasingly simulating itself and especially its dead) would at very least give me pause before dismissing certain notions even if the original concept inspiring the latter trend was originally dreamt up by superstition and wishful thinking.

  • Part of the problem with the analysis is that under the influences of Western Christianity the term 'soul' has become a very specific configuration of properties.

    For example, in ancient Egypt there were over seven different types of what we consider 'soul' with biographical memory as only one type.

    The ren was the name and identity of a person, the ba their personality, the ka as their "life force" of sorts.

    Conversations like this one might be better served by a more nuanced vocabulary in its discussion.

  • Yeah, kind of. I mean, I believe that we're in a simulation, so the mind's apparent dependency on the body is illusory given the body is just a configuration of information too.

    That said, I don't think there's anything magical to it other than the persistence of information and the continuity of a relative perspective.

    But I see no reason why that information and perspective couldn't continue on after we die and there's a number of reasons I expect that it will do just that.

  • We should also get rid of the two party system by introducing a party chartered to only support or oppose things that multiple 3rd party polls find over one standard deviation from the norm support.

    It's insane that given a political distribution that's normal for most topics we arbitrarily divide it into two halves rather than focusing on the center.

    Even as someone who would fall to the left of the first standard deviation, I'd much rather live in a world where there was consistent stability around the norms as I fought to move the social norms in my preferred direction over time than live in a world where there's a 50% chance of Nazis being a thing again.

    A significant majority of the county agrees on a surprisingly broad number of major topics, and yet we're divided into two camps currently being driven more and more by outspoken fringes that represent less and less of the general population, with everyone else falling in line out of a greater fear of the "other team."

    No reason elections cant be conducted via encrypted open source app, where voting can be done remotely and checks in place to ensure the vote has been tallied.

    You are seriously underestimating just how many people don't have smartphones (22.5 million eligible voters in the US). A number of your other suggestions are good, but the idea of all digital voting needs at least some form of backup option for people who either have hardware access issues or digital competency issues.

  • For what!?!

    What the hell did some stupid farmer in the bumbfuck middle of nowhere do to warrant being shot in their home by people of a different skin color?

    Some racist asshole living in a rural inbred community where everyone looks the same because their family tree has the same roots of people who never left their own poverty stricken hellhole didn't actually do shit to anyone outside of voting the way they were themselves indoctrinated.

    There's definitely a lot of far right idiots being worked up into a frenzy of normalized violence that's very concerning.

    But in one of the rare instances of legit "both sides-ism" I'm starting to see a very concerning trend of the far left giving in more and more to the language of normalized violence too.

    I have a feeling both sides of this are useful idiots with the same hand pulling the strings, but c'mon dude - use your critical thinking skills before regurgitating rhetoric like that mindlessly.

  • For anyone interested in algorithmic changes that improve efficiency, Microsoft's recent research around moving from floating point weights to ternary ones (1, 0, -1) was really impressive:

    https://arxiv.org/abs/2402.17764

    Basically at larger parameter sizes it outperforms FP networks while being a fraction of the memory footprint and bypassing the need for matrix multiplication.

    It kind of makes sense that it works too, given past research that the networks are creating a virtualized node topology based on combinations of physical nodes, so with enough nodes to work with there isn't a loss in functionality and the discrete weights should arrive at optimal thresholds more easily than slight adjustments to FP values.

    The next generation of models built on this need to be trained from scratch (this is about pretraining and not quantization after the fact), but it should open the door to new hardware architectures better optimized for networks of ternary weights.