Skip Navigation

KingRandomGuy @ KingRandomGuy @lemmy.world

Posts

0
Comments

111
Joined

2 yr. ago

5mo ago

Proton's very biased article on Deepseek

I have tried them, and to be honest I was not surprised. The hosted service was better at longer code snippets and in particular, I found that it was consistently better at producing valid chain of thought reasoning chains (I've found that a lot of simpler models, including the distills, tend to produce shallow reasoning chains, even when they get the answer to a question right).

I'm aware of how these models work; I work in this field and have been developing a benchmark for reasoning capabilities in LLMs. The distills are certainly still technically impressive and it's nice that they exist, but the gap between them and the hosted version is unfortunately nontrivial.

5mo ago

Proton's very biased article on Deepseek

It might be trivial to a tech-savvy audience, but considering how popular ChatGPT itself is and considering DeepSeek's ranking on the Play and iOS App Stores, I'd honestly guess most people are using DeepSeek's servers. Plus, you'd be surprised how many people naturally trust the service more after hearing that the company open sourced the models. Accordingly I don't think it's unreasonable for Proton to focus on the service rather than the local models here.

I'd also note that people who want the highest quality responses aren't using a local model, as anything you can run locally is a distilled version that is significantly smaller (at a small, but non-trivial overalll performance cost).

5mo ago

Proton's very biased article on Deepseek

TBF you almost certainly can't run R1 itself. The model is way too big and compute intensive for a typical system. You can only run the distilled versions which are definitely a bit worse in performance.

Lots of people (if not most people) are using the service hosted by Deepseek themselves, as evidenced by the ranking of Deepseek on both the iOS app store and the Google Play store.

6mo ago

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

Part of this was an optimization that was necessary due to their resource restrictions. Chinese firms can only purchase H800 GPUs instead of H200 or H100. These have much slower inter-GPU communication (less than half the bandwidth!) as a result of export bans by the US government, so this optimization was done to try and alleviate some of that bottleneck. It's unclear to me if this type of optimization would make as big of a difference for a lab using H100s/H200s; my guess is that it probably matters less.

6mo ago

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

I think the thing that Jensen is getting at is that CUDA is merely a set of APIs. Other hardware manufacturers can re-implement the CUDA APIs if they really wanted to (especially since AFAIK, Google v Oracle ruled that APIs cannot be copyrighted). In fact, AMD's HIP implements many of the same APIs as CUDA, and they ship a tool (HIPIFY) to convert code written for CUDA for HIP instead.

Of course, this does not guarantee that code originally written for CUDA is going to perform well on other accelerators, since it likely was implemented with NVIDIA's compute model in mind.

6mo ago

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

What I'm curious to see is how well these types of modifications scale with compute. DeepSeek is restricted to H800s instead of H100s or H200. These are gimped cards to get around export controls, and accordingly they have lower memory bandwidth (~2 vs ~3 TB/s) and most notably, much slower GPU to GPU communication (something like 400 GB/s vs 900 GB/s). The specific reason they used PTX in this application was to help alleviate some of the bottlenecks due to the limited inter-GPU bandwidth, so I wonder if that would still improve performance on H100 and H200 GPUs where bandwidth is much higher.

6mo ago

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

IIRC Zluda does support compiling PTX. My understanding is that this is part of why Intel and AMD eventually didn't want to support it - it's not a great idea to tie yourself to someone else's architecture you have no control or license to.

OTOH, CUDA itself is just a set of APIs and their implementations on NVIDIA GPUs. Other companies can re-implement them. AMD has already done this with HIP.

6mo ago

Proton Mail says it’s “politically neutral” while praising Republican Party

My stance on Proton is my stance on GrapheneOS: just because the creator is bad doesn’t mean the software is bad. As long as the software is better compared to the alternatives then I seen no reason to stop using it.

I think the major difference is that for a software package or operating system like GrapheneOS, theoretically people can audit the code and verify that it is secure (of course in practice this is not something that 99% of people will ever do). So to some extent, you technically don't have to put a ton of trust into the GrapheneOS devs, especially with features like reproducible builds allowing you to verify that the software you're running is the same software as the repository.

For something like Proton where you're using a service someone else is running, you sort of have to trust the provider by default. You can't guarantee that they're not leaking information about you, since there's no way for you to tell what their servers are doing with your data. Accordingly, to some extent, if you don't trust the team behind the service, it isn't unreasonable to start doubting the service.

6mo ago

Nvidia falls 14% in premarket trading as China's DeepSeek triggers global tech sell-off

Huh. Everything I'm reading seems to imply it's more like a DSP ASIC than an FPGA (even down to the fact that it's a VLIW processor) but maybe that's wrong.

I'm curious what kind of work you do that's led you to this conclusion about FPGAs. I'm guessing you specifically use FPGAs for this task in your work? I'd love to hear about what kinds of ops you specifically find speedups in. I can imagine many exist, as otherwise there wouldn't be a need for features like tensor cores and transformer acceleration on the latest NVIDIA GPUs (since obviously these features must exploit some inefficiency in GPGPU architectures, up to limits in memory bandwidth of course), but also I wonder how much benefit you can get since in practice a lot of features end up limited by memory bandwidth, and unless you have a gigantic FPGA I imagine this is going to be an issue there as well.

I haven't seriously touched FPGAs in a while, but I work in ML research (namely CV) and I don't know anyone on the research side bothering with FPGAs. Even dedicated accelerators are still mostly niche products because in practice, the software suite needed to run them takes a lot more time to configure. For us on the academic side, you're usually looking at experiments that take a day or a few to run at most. If you're now spending an extra day or two writing RTL instead of just slapping together a few lines of python that implicitly calls CUDA kernels, you're not really benefiting from the potential speedup of FPGAs. On the other hand, I know accelerators are handy for production environments (and in general they're more popular for inference than training).

I suspect it's much easier to find someone who can write quality CUDA or PTX than someone who can write quality RTL, especially with CS being much more popular than ECE nowadays. At a minimum, the whole FPGA skillset seems much less common among my peers. Maybe it'll be more crucial in the future (which will definitely be interesting!) but it's not something I've seen yet.

Looking forward to hearing your perspective!

6mo ago

Nvidia falls 14% in premarket trading as China's DeepSeek triggers global tech sell-off

Is XDNA actually an FPGA? My understanding was that it's an ASIC implementation of the Xilinx NPU IP. You can't arbitrarily modify it.

6mo ago

No, you can't use your $6,299.00 Camera as a Webcam. That will be $5 – Roman Zipp

Good point! For my use case (on a different brand, Sony) I'm fine with the lowered resolution since I just use it for video conferencing, in which case the raw resolution is limited anyway. But for users who need higher resolution, using an HDMI capture card might be better for a one time fee rather than a subscription.

6mo ago

Flirting with Trump is flirting with Nazism - Response to Andy Yen (CEO of Proton AG) on Reddit 📢📢📢

He's apparently said he was born in 1988. In another thread others mentioned that would make him 21 when he started his PhD, which checks out.

6mo ago

No, you can't use your $6,299.00 Camera as a Webcam. That will be $5 – Roman Zipp

Is mechanical shutter necessary for max bit depth on your camera? It isn't on mine (Sony), but bit depth reduces to 12 bit if you max out the framerate. You might still be able to get full 14 bit RAWs if you drop the framerate.

6mo ago

No, you can't use your $6,299.00 Camera as a Webcam. That will be $5 – Roman Zipp

You can do this on Linux using gphoto2, ffmpeg, and v4l2loopback. You probably won't get full resolution but the quality will still be good enough for video conferencing. See here for a guide.

6mo ago

On Politics and Proton - a message from Andy

Not that unusual IMO, lots of people start their PhD directly after completing their Bachelor's. If they weren't born born in the first half of the year then they'll have completed their BS by 21 and start the PhD either at 21 or 22.

7mo ago

China targets Nvidia with antitrust probe

But at least regulators can force NVIDIA to open their CUDA library and at least have some translation layers like ZLUDA.

I don't believe there's anything stopping AMD from re-implementing the CUDA APIs; In fact, I'm pretty sure this is exactly what HIP is for, even though it's not 100% automatic. AMD probably can't link against the CUDA libraries like cuDNN and cuBLAS, but I don't know that it would be useful to do that anyway since I'm fairly certain those libraries have GPU-specific optimizations. AMD makes their own replacements for them anyway.

IMO, the biggest annoyance with ROCm is that the consumer GPU support is very poor. On CUDA you can use any reasonably modern NVIDIA GPU and it will "just work." This means if you're a student, you have a reasonable chance of experimenting with compute libraries or even GPU programming if you have an NVIDIA card, but less so if you have an AMD card.

7mo ago

Intel Announces Retirement of CEO Pat Gelsinger

I work in CV and I have to agree that AMD is kind of OK-ish at best there. The core DL libraries like torch will play nice with ROCm, but you don't have to look far to find third party libraries explicitly designed around CUDA or NVIDIA hardware in general. Some examples are the super popular OpenMMLab/mmcv framework, tiny-cuda-nn and nerfstudio for NeRFs, and Gaussian splatting. You could probably get these to work on ROCm with HIP but it's a lot more of a hassle than configuring them on CUDA.

8mo ago

Recommend a clear filament for outdoor use?

I've tried Overture, Creality, and Inland (all black though, not transparent) and Overture printed the best for me (at least for functional parts where I cared about print quality and tolerances). Inland's PETG+ and High Speed PETG was even better though.

8mo ago

Linux update adds support for 128 terabyte SD cards— SDUC and UHS-II SD cards are now supported

Not quite the same thing but modern high end cameras use CF-Express (as in compact flash). They communicate over PCIe using the same protocol as NVMe drives but have fewer lanes and usually are smaller. The tricky part is with their small size you don't have as much room to cram as many flash chips onto a card compared to a 2280 NVMe.

8mo ago

AMDXDNA Driver For Ryzen AI Now Ready To Appear In The Linux Kernel

I don't know the architecture of AI accelerator in Ryzen processors but I do know a fair amount of image deblurring and denoising tools run on the neural engine on Apple Silicon. The neural engine is good enough for a lot of tasks, provided that your model only uses relatively simple operators and doesn't need full precision.