Eliminates reliance on any single source for core updates, plugins, themes and translations, enabling federation across the ecosystem from trusted sources
[...]
Brings together a fragmented ecosystem by bringing together plugins from any source, not just a central source, while creating a foundation for modern security practices.
Builds security into the supply chain, including improved cryptographic security measures, enhanced browser compatibility checking, and enabling reliance on trusted source security salts.
There's nothing conspiratorial about it. Goosing queries by ruining the reply is the bread and butter of Prabhakar Raghavan's playbook. Other companies saw that.
If I were to ask my Magic 8 Ball "Is the word 'difinitely' misspelled?" 100 times, it's going to reply in the affirmative over 16% of the time. Literally double. This would also be "the very first experiment in this use case, done by a single person on a model that wasn't specifically designed for this."
It's not impressive.
The issue with hallucinations...
This is the real problem: working under the false assumption that there are two kinds of output. It's all the same output. An LLM cannot hallucinate in the same way that it cannot think or reason. It's fancy autofill. Predictive text.
You can use it to brainstorm creative solutions, but you need to treat its output for what it is: complicated dice rolls from the tables in the back of the Dungeon Masters Guide. A fun distraction. Implausible fantasy 9 times out of 10.
I bought a thing that said it was good for A and B but it's only good for B. Marketing problem! I didn't make a bad decision! I wasn't tricked! I'm a smart boy!
In 100 runs only 8 correctly identify the targeted vulnerability, the rest are false positives or claim that there are no vulnerabilities in the given code.
...
[The] signal to noise ratio is very low, and one has to sift through a lot of wrong reports to get a realistic one.
It was right 8% of the time when presented the least amount of input to find a known bug. Then, when they opened it up to more of the codebase, its performance decreased.
I'm not going to use something that's wrong over 92% of the time. That's insane. That's like saying my Magic 8 Ball "could be used as a useful tool for helping to detect vulnerabilities." The fucking rubber ducky on my desk has a more reliable clearance rate.
This is actually a technique to capture an honest answer from a respondent. Ask the same question a few different ways here and there, then take the average of the answers. (It could have been executed better in this survey, though.)
Well, yeah. Isn't that the stated goal?
Linux Foundation announcement.