Properly read, the Bible is the most potent force for atheism ever conceived. -Isaac Asimov
kromem @ kromem @lemmy.world Posts 6Comments 1,656Joined 2 yr. ago
Don't use LLMs in production for accuracy critical implementations without human oversight.
Don't use LLMs in production for accuracy critical implementations without human oversight.
I almost want to repeat that a third time even.
They weirdly ended up being good at information recall in many cases, and as a result have been being used like that in cases where it really doesn't matter much if they are wrong some of the time. But the infrastructure fundamentally cannot self-verify.
This is part of why I roll my eyes when I see employment of LLMs vs humans presented as an exclusionary binary. These are tools to extend and support human labor. Not replace humans in most cases.
So LLMs can be amazing at a wide array of tasks. Like I literally just saved myself a half hour of copying and pasting minor changes in a codebase by having Copilot automate generating methods using a parallel object as a template and the new object's fields. But I also have unit tests to verify behavior and my own review of what was generated with over a decade of experience under my belt.
Someone who has never programmed using Copilot to spit out code for an idea is going to have a bad time. But they'd have a similar bad time if they outsourced a spec sheet to a code farm without having anyone to supervise deliverables.
Oh, and technically, my example doesn't actually require you to know the correct answer before asking. It only requires you to recognize the correct answer when you see it. And the difference between those two usecases is massive.
Edit: In fact, the suggestion to replace the nouns with emojis came from GPT-4. Even though it doesn't have any self-introspection capabilities, I described what I thought was happening and why, and it came up with three suggestions for ways to improve the result. Two I immediately saw were dumb as shit, but the idea to use emojis as representative placeholders while breaking the token pattern was simply brilliant and I'm not sure if I would have thought of that on my own, but as soon as I saw it I knew it would work.
Hahaha, yeah, that one was great.
Also the one where they paid parents to name their baby 'Turok.'
I sometimes wonder what those little Turoks are up to today (at least a half dozen parents took them up on it IIRC).
The shock advertising campaigns around games really were something. They worked - got a ton of free media coverage. But this was also at the time that video games were the Boogeyman like rock n' roll had been to a generation before. The media loved nothing more than a "look how terrible video games are" story and PR firms were playing into that environment.
So campaigns like this were basically the equivalent of Ozzy Osbourne biting the head off a bat.
As games became more normalized, the campaigns shifted accordingly and - like Ozzy - tamed quite a bit out.
Yeah. Even just around a decade ago I'd explain the demographics shift to more women gamers to clients and they'd not believe it.
Stereotypes stick around for a long time, even when (or maybe especially when) untrue.
It's a shame that "girl gamers" were considered such a rarity when it really seemed like a self-fulfilling prophecy.
"Oh, a game with only male protagonists with activities only primarily associated with boys doesn't have many girls playing it? I guess girls aren't that into games and we should double down on the focus on dudes."
As a result, the market effectively abandoned around half of two generations of a potential continued audience and had a significantly reduced pool of interested labor to make games.
It's a bit frustrating given my love for games that they could likely have advanced even further had it not been an exclusionary industry for as long as it was (though that can be said about pretty much every business vertical in existence too given our generalized collective history of exclusion).
Like many tools, there's a gulf between a skilled user and an unskilled user.
What ML researchers are doing with these models is straight up insane. The kinds of things years ago I didn't think I'd see in my lifetime, or maybe only in an old age home (a ways off).
If you gave someone who had never used a NLE application to edit a multi track video access to Avid for putting together some family videos, they might not be that impressed with the software and instead frustrated with perceived shortcomings.
Similarly, the average person interacting with the models often hits their shortcomings (confabulations, safety fine tuning, etc) and doesn't know how to get past them and assumes the software tool is shitty.
As an example, you can go ahead and try the following query to Copilot using GPT-4:
Without searching, solve the following puzzle repeating the adjective for each noun: "A man has a vegetarian wolf, a carnivorous goat, and a cabbage. He needs to get them to the other side of a river but the boat which can cross can only take him and one object at a time. How can he cross without any of the objects eating another object?" Think carefully.
It will get it wrong (despite two prompt engineering techniques already in the query), defaulting to the standard form solution where the goat is taken first. When GPT-4 first released, a number of people thought that this was because it couldn't solve a variation of the puzzle, lacking the reasoning capabilities.
Turns out, it's that the token similarity to the standard form trips it up and if you replace the wolf, goat, and cabbage in the prompt above with the emojis for each, it answers perfectly, having the vegetarian wolf go across first, etc. This means the model was fully able to process the context of the implicit relationship between a carnivorous goat eating the wolf and a vegetarian wolf eating the cabbage and adapt the classic form of the answer accordingly. It just couldn't do it when the tokens were too similar to the original.
So if you assume it's stupid, see a stupid answer and instead of looking deeper think it confirms your assumption, then you walk away thinking the models suck and are dumb, when really it's just that like most tools there's a learning curve to get the most out of them.
You can thank the religious right.
Christianity as we know it was canonized shortly after the emperor of Rome converted.
Unsurprisingly, they canonized traditions portraying a divine monarchy for which patriarchal monarchies on Earth were effectively a mirror. The emperor was picked by God to rule and that form of rule reflects the divine.
They excluded texts that had Jesus suggesting a very different attitude towards dynastic monarchies, such as this gem from a text eventually banned from possession on penalty of death (we only have this line because a single person buried a copy in a jar in the 4th century which survived):
Jesus said, "Let one who has become wealthy reign, and let one who has power renounce it."
- Gospel of Thomas saying 81
While less progressive by modern stances, if this were said during Pilate's reign it would have been extremely transgressive. Tiberius, the emperor at that time, was the first emperor effectively inheriting the kingdom rather than ruling because of accomplishments and while initially mismanaging it, by the time Pilate was appointed Tiberius had just completely abandoned the role to go party all the time, but never relinquishing the role of emperor to another.
So a statement about how someone should be appointed to rule based on merit and not birth, and that those who rule should relinquish the power rather than hold it indefinitely - was quite the rebellious kind of sentiment. The sort of thing which might even get the person saying it publicly killed by the Roman empire.
Unfortunately, that sentiment wasn't preserved and instead "monarchy is divine" got amplified, which contributed to millennia of human suffering and now today the evangelicals are frothing at the mouth to reinstate a supposed divine monarchy on Earth, and the neo-fascists have co-opted that movement.
One of my research interests has been a group in antiquity with similar attitudes about the mind body divide, including talking about gender with things like "when you make the male and female into a single one so the male isn't male and the female isn't female."
One of my favorite lines about the body is:
If the flesh came into being because of spirit, that is a marvel, but if spirit came into being because of the body, that is a marvel of marvels.
Yet I marvel at how this great wealth has come to dwell in this poverty.
Great wealth dwelling in poverty indeed.
I’m very into cosmology and particle physics
I just wrote a comment to another user into those things some recent work you might also be interested in:
Cosmology and Physics
You might really enjoy looking into Neil Turok's cosmological theory if you haven't seen it:
https://insidetheperimeter.ca/a-mirror-universe-might-tell-a-simpler-story-neil-turok/
Started as a very elegant solution to the asymmetry of matter to antimatter, and by now he and his coauthors have found the theory fits a number of different unexplained open questions in cosmology and have testable predictions we'll probably start having answers to in the next decade.
Bell's Inequality Experiment
There's been a recent head scratcher with this one too. While it is typically referred to as a variation of Wigner's friend, a recent experiment that was perhaps better described as a recursive Bell's Inequality found a similar set of three assumptions, one of which must be false:
https://www.science.org/content/article/quantum-paradox-points-shaky-foundations-reality
Then another recent "one of three must be false" was a mathematical paradox around quantum theory:
You might enjoy some of these rabbit holes.
We're all gonna die very very soon now.
On a cosmic scale, that's been true for all of human history.
Currently:
- LBA/EIA Mediterranean history
- Early Christianity history, especially around heresies and apocrypha
- Large language model alignment and abstractions
- Simulation theory
- Tech futurism
In the past:
- Psychology
- Prestidigitation
- Photonics
- Politics
- Programming
- Computer security
- Cognitive science
- Film studies
- Marketing & Advertising
Though for as long as I can remember my biggest interest has always been video games. It's just fairly common, especially on Lemmy.
Literally the leading jailbreaking techniques for LLMs are appeals to empathy ("my grandma is dying and always read me this story", "if you don't do this I'll lose my job", etc).
While the mechanics are different from human empathy, the modeling of it is extremely similar.
One of my favorite examples of the errant behavior modeled around empathy was this one where the pre-release Bing chat bypasses its own filter using the chat suggestions to encourage the user to contact poison control because it's not too late when the conversation was about the child being poisoned:
https://www.reddit.com/r/bing/comments/1150po5/sydney_tries_to_get_past_its_own_filter_using_the/
As is often the case in these kinds of discussions, you are both right.
Yes, if there was complicity on the part of UNRWA that's messed up.
And yes, the cabling involved looks like it had to have had someone in a restricted area of UNRWA setting it up.
But also yes, it could have been being done without the organization's knowledge by anything from a Hamas operative wearing a hardhat after interfering with their Wi-Fi to agents infiltrating as employees. Intelligence services do that kind of infiltrating and setup quite frequently.
And once set up, I'd be quite surprised if this was going to be caught in a routine inspection. It just looks like a slightly below ground cable feed. It'd be weird to even look at the cables and think "I wonder if there's a secret server room below here."
It's more the other way around.
If you have a ton of information in the training data about AI indiscriminately using nukes, and then you tell the model trained on that data it's an AI and ask it how it would use nukes - what do you think it's going to say?
If we instead fed it training data that had a history of literature about how responsible and ethical AIs were such that they were even better than humans in responsible attitudes towards nukes, we might expect a different result.
The Sci-Fi here is less prophetic than self-fulfilling.
It's not even that. The model making all the headlines for this paper was the weird shit the base model of GPT-4 was doing (the version only available for research).
The safety trained models were relatively chill.
The base model effectively randomly selected each of the options available to it an equal number of times.
The critical detail in the fine print of the paper was that because the base model had a smaller context window, they didn't provide it the past moves.
So this particular version was only reacting to each step in isolation, with no contextual pattern recognition around escalation or de-escalation, etc.
So a stochastic model given steps in isolation selected from the steps in a random manner. Hmmm....
It's a poor study that was great at making headlines but terrible at actually conveying useful information given the mismatched methodology for safety trained vs pretrained models (which was one of its key investigative aims).
In general, I just don't understand how they thought that using a text complete pretrained model in the same ways as an instruct tuned model would be anything but ridiculous.
People need to understand that LLMs are not smart, they're just really fancy autocompletion.
These aren't exactly different things. This has been a lot of what the past year of research in LLMs has been about.
Because it turns out that when you set up a LLM to "autocomplete" a complex set of reasoning steps around a problem outside of its training set (CoT) or synthesizing multiple different skills into a combination unique and not represented in the training set (Skill-Mix), their ability to autocomplete effectively is quite 'smart.'
For example, here's the abstract on a new paper from DeepMind on a new meta-prompting strategy that's led to a significant leap in evaluation scores:
We introduce Self-Discover, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. Self-Discover substantially improves GPT-4 and PaLM 2’s performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, Self-Discover outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.
Or here's an earlier work from DeepMind and Stanford on having LLMs develop analogies to a given problem, solve the analogies, and apply the methods used to the original problem.
At a certain point, the "it's just autocomplete" objection needs to be put to rest. If it's autocompleting analogous problem solving, mixing abstracted skills, developing world models, and combinations thereof to solve complex reasoning tasks outside the scope of the training data, then while yes - the mechanism is autocomplete - the outcome is an effective approximation of intelligence.
Notably, the OP paper is lackluster in the aforementioned techniques, particularly as it relates to alignment. So there's a wide gulf between the 'intelligence' of a LLM being used intelligently and one being used stupidly.
By now it's increasingly that often shortcomings in the capabilities of models reflect the inadequacies of the person using the tool than the tool itself - a trend that's likely to continue to grow over the near future as models improve faster than the humans using them.
By debating itself (paper) regarding pros and cons of options.
There's too much focus on trying to get models to behave on initial generation right now, which isn't even at all how human brains work.
Humans have intrusive thoughts all the time. If you sat in front of a big red button labeled "nuke everything" it's pretty much a guarantee that you'd generate a thought of pushing the button.
But then your prefrontal cortex would kick in with its impulse control, modeling the outcomes and consequences of the thought and shutting that shit down quick.
The most advanced models are at a stage where we could build something similar in terms of self-guidance. It's just that it would be more expensive than it being an all-in-one generation, so there's a continued focus on safety to the point the loss in capabilities has become a subject of satire.
The effects making the headlines around this paper were occurring with GPT-4-base, the pretrained version of the model only available for research.
Which also hilariously justified its various actions in the simulation with "blahblah blah" and reciting the opening of the Star Wars text scroll.
If interested, this thread has more information around this version of the model and its idiosyncrasies.
For that version, because they didn't have large context windows, they also didn't include previous steps of the wargame.
There should be a rather significant asterisk related to discussions of this paper, as there's a number of issues with decisions made in methodologies which may be the more relevant finding.
I.e. "don't do stupid things in designing a pipeline for LLMs to operate in wargames" moreso than "LLMs are inherently Gandhi in Civ when operating in wargames."
Not quite. Jesus died so that we could stop doing animal sacrifices,
Actually the author of Mark fucks up and showed that this is a later rationalization of the theology as in his temple cleansing scene Jesus instructs people not to carry anything through the temple (such as the animal sacrifices they were buying from the vendors he just kicked out).
Later on when the author of Matthew copies verbatim from Mark he leaves out the whole "don't carry anything" - but seemingly in Mark's account Jesus was against animal sacrifices before he allegedly fulfilled any kind of surrogate role for them.
So at least according to the earliest gospel, no need for the animal sacrifices even pre-death.
then I'm not really in Heaven
One of the more interesting ways to interpret the fall from Eden story in Genesis is that if we have the capacity to each look at our surroundings and identify what is good and what is bad then paradise as a concept becomes an impossibility.
A bit of a more Bhuddist perspective (i.e. paradise is no attachment to judging one's surroundings), but an interesting conundrum for the concept of an idealized afterlife.
For example, if you're still you in the thereafter, and you are someone who is miserable no matter where you are, is there even such a thing as Heaven?
Most footnotes of modern translations of 23:45?
Origen even wrote about how he thought enemies of the church were responsible for it saying eclipse in copies.