Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)BR
Posts
22
Comments
2,104
Joined
1 yr. ago

  • Good thing Trump is a wimp.

    As others have pointed out, he may talk loud and roleplay as a dictator hard, but (so far, in the business and political world), Trump shrinks when faced with a fair fight. He'll jawbone and bully and blow little guys up to look good. He cares about himself, and will brag about every loophole he exploits.

    But it's much harder for him to, say, cross the line of arresting Newsom. Other dictatorships, even modern pseudo ones, feels eerily similar, but look closely, and it appears the leaders pushed the envelope strategically. Hitler, for instance, was relatively tame at first and took calculated, increasingly drastic and brutal steps, wheras Trump consistently backs down from the Machiavellian solution. See: the response to Elon's betrayal (so far).

    At least that's my hope.

    And I hope someone with a spine doesn't follow in his footsteps. The apparatus built around him is much scarier than Trump.

  • That’s a premade 8x 7900 XTX PC. All standard and off the shelf.

    I dunno anything about Geohot, all I know is people have been telling me how cool Tinygrad is for years with seemingly nothing to show for it other than social media hype, while other, sometimes newer, PyTorch alternatives like TVM, GGML, the MLIR efforts and such are running real workloads.

  • Because it’s a separate physical die.

    Taping out, aka simply designing a large GPU chip for production is at least a 9 figure cost. Hence Nvidia/AMD offer a relatively small selection of physical dies in products, as each die has a huge fixed cost. But AMD has specifically taken the approach of splitting up chips into smaller sections, and linking them together by placing them right next to each other, stacking them, and so on.

    Hence, if AMD, say, acquire a niche ASIC company, theoretically they can slap a variant of their design next to existing GPUs, or even next to existing CPUs, and have it share the memory bus, general compute, and other functions, without paying the full 9 figures for a massive new chip. There’s still testing costs, but it’s not so prohibitive.

    • Tinygrad is (so far) software only, ostensibly sort of a lightweight PyTorch replacement.
    • Tinygrad is (so far) not really used for much, not even research or tinkering.

    Between that and the lead dev's YouTube antics, it kinda seems like hot air to me.

  • It could be if it’s run locally.

    If you agents run on your hardware, navigate crappy apps and websites and such for you, what do you need the corporate cloud for? How can they show ads or monetize you through that?

    That’s the war raging right now, open-weights vs closed weights.

  • They aren't specialized though!

    There are a lot of directions "AI" could go:

    • Is autoregressive bitnet going to take off? In that case, the compute becomes extremely light, and the thing to optimize for is memory bandwidth and cache.
    • Or diffusion or something with fewer passes like that? In that case, we go the opposite direction, throw bandwidth out the window, and optimize for matmul compute.
    • What if it's both? In that case, one wants a truckload of ternary adders and not too much else.
    • Or what if some other form of sparsity takes over, as (given the effectiveness of quantization and MoE), there's clearly a ton of sparsity to take advantage of. Nvidia already bet on this, but it hasn't taken off yet.

    There's all sorts of wild directions the sector could go. Having the flexibility of an ASIC die would be a huge benefit for AMD, as they don't have to 'commit' to any particular direction like Nvidia's monolithic dies. If a new trend takes off, they can take an existing die and swap out the ASIC relatively quickly, without taping out a whole new GPU.

  • With AMD’s IP, they could make a hybrid chip, eg a (for example) bitnet ASIC hanging off a GPU for flexible, cuda-compatible compute where needed.

    Nvidia sorta does this now (with tensor cores being a separate part of the die), but with their history of MCM designs, AMD could take it to an extreme.

  • As I keep saying, America may be a shitshow, but Europe needs to look in the mirror before it happens to you guys too. Far right parties are gaining lots of traction while many assume the EU will be fine, and more war spending is going to exacerbate that.

  • Musk has quite a “tech bro” following (which we don’t see because we don’t live and breathe on Twitter and such), and that group wields enormous psychological influence over the population.

    Seems unlikely, but If Musk aligns himself with Peter Theil, Zuckerberg, critical software devs and such more closely, that’s an extremely dangerous platform for Trump. They can sap power from MAGA (and anyone else) with the flip of a switch.

    There’s quite a fundamental incompatibility between tech oligarchs and the red meat MAGA base, too, as is already being exposed. It’s going to get much more stark over the next few years.

  • Narcissists hate being ignored or called unimportant. Trump flippantly dismissing him as “nuts” and moving on is the ultimate insult.

    I’m sure Musk has an army reining him in, but that’s legitimately hard for Musk to ignore.

  • One one more thing, I saw you mention context management.

    Mistral (24B) models are really bad at long context, but this is not always the case. I find that Qwen 32B and Gemma 27B are solid at 32K (which is a huge body of text) and (with the right backend settings) you can easily run either at 64K with very minimal vram overhead.

    Specifically, run Gemma with the latest llama.cpp server and comment (where it will automatically use sliding window attention as of like yesterday), or Qwen (and most other models) with exllamav2 or exllamav3, which quantizes the kv cache down to Q4 very efficiently.

    This way you don’t need to manage context: you can feed the LLM the whole adventure so it doesn’t forget anything, and streaming responses will be instance since it’s always cached.

  • Oh, one thing about ST specifically: its default sampling presets are catastrophic last I checked. Like, they’re designed for ancient models, and while I have nothing against the UI it is kinda from a different era.

    For Gemma and Qwen, I’ve been using like 0.2-0.7 temp, at least 0.05 MinP, 1.01 rep penalty (not something insane like 1.1) and maybe 0.3-ish dry, though like you said dry/xtc can really mess up some tasks.

  • Also, another suggestion would be to be careful with your sampling. Use a low temperature and high MinP for queries involving rules, higher temperature (+ samplers like DRY) when you're trying to tease out interesting ideas.

    I would even suggest an alt front end like mikupad that exposes token probabilities, so you can go to any point in the reply and look through every “idea” the LLM had internally (and regen from that point of you wish”). It’s also good for debugging sampling issues when you have an incorrect answer (as sometimes the LLM gets it right, but bad sampling parameters choose a bad answer).

  • As long as it supports network inference between machines with heterogeneous cards, it would work for what I have in mind.

    It probably doesn’t, heh, especially non Nvidia cards. But the middle layer may work with some generic OpenAI backend like the llama.cpp server.

  • Both can be true.

    It can be true that the FDA was corrupted/captured to some extent and needs more 'skeptial' and less-industry-friendly leadership. At the same time, skepticism in science is not the answer.

    This is my dillema with MAGA. Many of the issues they tackle are spot on, even if people don't like to hear that. They're often right, even when the proposed solutions are wrong and damaging. I think this a lot when I hear RFK speak, nodding my head at the first assertion then grinding my teeth as he goes on.

  • The budget for marketing has doubled the cost of the entire previous game. Does anyone need ads for GTA6? Wouldn’t just having the devs do livestreams of them playing the game and discussing the tech involved with making GTA6 not create enough hype? Does there even need to be additional hype created?

    There is a bit of an "arms race," where other games/entertainment could steal GTA's engagement. Eyeball time is finite, and to quote a paper, "attention is all you need."

    You aren't wrong though. Spending so much seems insane when "guerrilla marketing" for such a famous IP would go a long way. I guess part of it is "the line must go up" mentality, where sales must increase dramatically next quarter even if that costs a boatload of marketing to achieve.

  • Late to the post, but look into SGLang, OP!

    In a nutshell, it’s a framework for letting LLMs “fill in blanks” instead of generating entire replies, so you could script in rules as part of the responses as structure for it to grab onto. It’s all locally runnable (with the right hardware, unfortunately).

    Also, there are some newer, less sycophantic DM specific models. I can look around if you want.