I use firefox, I mostly like it, but it still doesn't support chromium style tab groups (no, that one extension is not similar), and its webgpu implementation also doesn't work on most websites more than a year after Google made their version available by default
I think they would be good books if he took the whole plot and compressed it into 3, maybe 5 books. It’s just too long, too many pointless tangents, too many random characters to remember who may or may not reappear at some point in the next 10 books… as soon as you get to an interesting part it switches perspectives to the most boring events imaginable.
you don't need shift right click to do either of those things on youtube, you can always right click on a thumbnail and get the normal menu, and if you right click twice on a video you get the normal menu
Yes, but 200 gb is probably already with 4 bit quantization, the weights in fp16 would be more like 800 gb
IDK if its even possible to quantize more, if it is, you're probably better of going with a smaller model anyways
PCIe will probably be the bottleneck way before the number of GPUs is, if you're planning on storing the model in ram. Probably better to get a high end server CPU.
I don't have access to llama 3.1 405b but I can see that llama 3 70b takes up ~145 gb, so 405b would probably take 840 gigabytes, just to download the uncompressed fp16 (16 bits / weight) model. With 8 bit quantization it would probably take closer to 420 gb, and with 4 bit it would probably take closer to 210 gb. 4 bit quantization is really going to start harming the model outputs, and its still probably not going to fit in your RAM, let alone VRAM.
So yes, it is a crazy model. You'd probably need at least 3 or 4 a100s to have a good experience with it.
Ok, i guess its just kinda similar to dynamic overclocking/underclocking with a dedicated npu. I don't really see why a tiny 2$ microcontroller or just the cpu can't accomplish the same task though.
there are some local genai music models, although I don't know how good they are yet as I haven't tried any myself (stable audio is one, but I'm sure there are others)
also minor linguistic nitpick but LLM stands for 'language model' (you could maybe get away with it for pixart and sd3 as they use t5 for prompt encoding, which is an llm, i'm sure some audio models with lyrics use them too), the term you're looking for is probably 'generative'
I use firefox, I mostly like it, but it still doesn't support chromium style tab groups (no, that one extension is not similar), and its webgpu implementation also doesn't work on most websites more than a year after Google made their version available by default