Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)AN
Posts
0
Comments
63
Joined
2 yr. ago

  • Most if not all leading models use synthetic data extensively to do exactly this. However, the synthetic data needs to be well defined and essentially programmed by the data scientists. If you don't define the data very carefully, ideally math or programs you can verify as correct automatically, it's worse than useless. The scope is usually very narrow, no hitchhikers guide to the galaxy rewrite.

    But in any case he's probably just parroting whatever his engineers pitched him to look smart and in charge.

  • I had some similar and obscure corruption issues that wound up being a symptom of failing ram in a main server node. After that, only issues have been conflicts. So I'd suggest checking hardware health in addition to the ideas about backups vs sync.

  • I've used it extensively, almost $100 in credits, and generally it could one shot everything I threw at it. However: I gave it architectural instructions and told it to use test driven development and what test suite to use. Without the tests yeah it wouldn't work, and a decent amount of the time is cleaning up mistakes the tests caught. The same can be said for humans, though.

  • Some details. One of the major players doing the tar pit strategy is Cloudflare. They're a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.

    Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they're a cheap way to get training data. If you make a non zero portion of training data poisonous you'd have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.

    So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.

  • Was about to post a Hugging Face link til I finished reading. For what it's worth, once you have Ollama installed it's a single command to download, install, and immediately drop into a chat with a model, either from Ollama's library or Hugging Face, or anyone else. On Arch the entire process to get it working with gpu acceleration was installing 2 packages then start ollama.

  • Key detail: they're not dropping it because they're giving up, the judge dismissed it without prejudice, which means that in 4 years they can pick the case back up. Under a Trump DoJ the case would likely have ended with prejudice, closing it permanently.

  • In an interview recently he openly speculated about how long he'd be in prison if Kamala wins. It seems like he has a strong savior complex, and thinks he's the only one that can save humanity by establishing colonies on Mars. He phrases it as preserving "the light of consciousness." Can't reasonably do that from prison. With that perspective, for him, practically all means justify that end.

    At more personal level, after one of his kids transitioned he publicly stated it was like that kid had "died." In his own words, he swore to kill the "woke mind virus."

  • I haven't gone through all their work, but some of the delisted maintainers were working on driver support for Baikal, a Russia based electronics company. Their work includes semiconductors, ARM processors. Given the sanctions against Russia, especially for dual use stuff like domestic semiconductors, I would expect that Linus and other maintainers were told or concluded that by signing off and merging their code they'd be personally violating sanctions.

  • Permanently Deleted

    Jump
  • I recently removed in editor AI cause I noticed I was acquiring muscle memory for my brain, not thinking through the rest past the start of a snippet that would get an LLM to auto complete. I'm still using LLMs, particularly for languages and libraries I'm not familiar with, but using the artifacts editors in ChatGPT and Claude.

  • The comments from that article are some of the most vitriolic I've ever seen on a technical issue. Goes to prove the maintainer's point though.

    Some are good for a laugh though, like assertions that Rust in the kernel is a Microsoft sabotage op or LLVM is for grifters and thieves.

  • FOSS in general needs better means of financial support. While the software is free and libre, developer time is not, and ultimately they gotta eat and pay bills. I hope they get positive results and don't catch much unnecessary flak.

  • Given the ease of implantation of end to end encryption now, it's a reasonable assumption that anything not e2ee is being data mined. E2ee has extensive security benefits, for example even if your data is dumped the info is still useless. So, there has to be a compelling reason to not use it.

  • My first programming experience, an online class, was in a Linux VM. Linux made programming easy and delightful, Windows always made it a huge pain. As time went on, more of what I did was easier on Linux, and now everything is.

  • Key detail in the actual memo is that they're not using just an LLM. "Wallach anticipates proposals that include novel combinations of software analysis, such as static and dynamic analysis, and large language models."

    They also are clearly aware of scope limitations. They explicitly call out some software, like entire kernels or pointer arithmetic heavy code, as being out of scope. They also seem to not anticipate 100% automation.

    So with context, they seem open to any solutions to "how can we convert legacy C to Rust." Obviously LLMs and machine learning are attractive avenues of investigation, current models are demonstrably able to write some valid Rust and transliterate some code. I use them, they work more often than not for simpler tasks.

    TL;DR: they want to accelerate converting C to Rust. LLMs and machine learning are some techniques they're investigating as components.

  • What do you mean by "this stuff?" Machine learning models are a fundamental part of spam prevention, have been for years. The concept is just flipping it around for use by the individual, not the platform.

  • If by reliably you mean 99% certainty of one particular review, yeah I wouldn't believe it either. 95% confidence interval of what proportion of a given page's reviews are bots, now that's plausible. If a human can tell if a review was botted you can certainly train a model to do so as well.