Skip Navigation

Posts
3
Comments
253
Joined
2 yr. ago

  • Haha, well then! Though we may be kindred souls, I don't think I'll be looking to hang out and bond over our shared appreciation of... this kind of thing.

  • This, but not ironically. Okay, might be too strong a word but it's very effective for thorough cleaning and you get used to it pretty quickly.

  • How about second hand clothing that makes climate change less likely? We can call that "climate change" clothing also.

  • It’s also sad that this is what the fediverse thinks right wing is.

    Taking the US as an example: if you look at how often conservative/republican politicians oppose hateful people like Trump you're going to find it... nearly doesn't happen at all. They stood together and supported him.

    The problem actually isn't that someone like Trump can get into office or some harmful and bigoted policy can appear for consideration. The issue is that it gets supported, it gets tolerated.

    At least in the US-y sense of left/right, I think this makes it fair to group those people together. I'll stop doing that when there's an apparent distinction.

  • Conservative makes no sense. It’s not changing for the sake of lot changing.

    I agree with this part completely.

    Liberalism is the same, it’s change for changes sake.

    Where is this coming from though?

    Liberalism is a political and moral philosophy based on the rights of the individual, liberty, consent of the governed, political equality, right to private property and equality before the law.https://en.wikipedia.org/wiki/Liberalism

    It doesn't have anything to do with randomly changing stuff just for change's sake.

    To be fair, we can look at the Wikipedia definition for conservatism too and see if there's a more charitable way to interpret it:

    Conservatism is a cultural, social, and political philosophy that seeks to promote and to preserve traditional institutions, practices, and values.https://en.wikipedia.org/wiki/Conservatism

    I'd say the answer is basically no: this is just an indirect way to say "appeal to tradition fallacy".

  • Many people do this now, I find it super disappointing that an overt display of patriotism is now considered by many to be a sign of the maga crowd.

    I feel like there's some kind of correlation between the two things. Obviously it's not guaranteed that someone displaying patriotism is going to be a bigot but... It's based on "my place is better", "go my team" just because the person randomly got born in a specific geographical location. Most of the time, patriotism isn't based on rationally assessing anything: you're just born into it like religion and there's really no critical thinking in the mix.

  • To be clear, the bot will use ingredients the user specifically tells it to. It's not coming up with human flesh on its own.

  • The article says the desalination plant designed by this student uses 17% of the power a normal desalination plant, meaning a 5+x reduction in energy consumption.

    "With nine square meters, it consumes only 17% of energy compared to traditional desalination plants."

    Comparing based on size doesn't seem too useful. How many square meters is a "traditional desalination plant"? How much salty water can it purify into drinkable water given a certain amount of energy compared to the student's design?

    I hope it's an improvement over existing designs, but unfortunately this article doesn't have any actual content. It's clickbait that hopes people will jump to conclusions like "it's a 5x reduction in energy compared to the traditional approach" because that drives traffic.

  • Only personal at present, but I'd happily turn it into a profession if someone wanted to pay me to post "dramatic" comments on Lemmy.

    Any takers?

  • If grandma asks for a recipe using the ingredients ammonia, bleach and water then maybe if she ends up offing herself it wasn't an accident.

    Maybe the bot isn't too useful but acting surprised or horrified because if you give it a list a crazy ingredients you get a recipe using the crazy ingredients you provided is kind of weird. This article is basically clickbait.

  • Because I don’t live in fantasy land where prepared food costs are exactly the same as raw food costs?

    Obviously it doesn't. Either your time is so valuable that it's clearly better to pay someone else to prepare stuff (which appears to be your position) or it's not. The equation doesn't change when we're talking about 10 meals or 1 meal. You don't seem to realize the inconsistency in your position.

    You don't save "hundreds of dollars" by preparing one meal yourself, you might save a couple dollars at the expense of your time. Roasting some coffee is a roughly equivalent amount of effort to preparing one meal yourself and you probably save about the same amount of money. So if your time is so valuable that roasting coffee would be a ridiculous waste of your super valuable times, then if you were consistent this would also apply to meal prep.

    Yes, I agree with you that your entire argument doesn’t make sense.

    The "I know you are, but what am I?" turnaround seems a bit immature, don't you think?

  • Ah, I see. Wouldn't it be pretty easy to determine if MPS is actually the issue by trying to run the model with the non-MPS PyTorch version? Since it's a 7B model, CPU inference should be reasonably fast. If you still get the memory leak, then you'll know it's not MPS at fault.

  • You can find the remote code in the huggingface repo.

    Ahh, interesting.

    I mean, it's published by a fairly reputable organization so the chances of a problem are fairly low but I'm not sure there's any guarantee that the compiled Python in the pickle matches the source files there. I wrote my own pickle interpreter a while back and it's an insane file format. I think it would be nearly impossible to verify something like that. Loading a pickle file with the safety stuff disabled is basically the same as running a .pyc file: it can do anything a Python script can.

    So I think my caution still applies.

    It could also be PyTorch or one of the huggingface libraries, since mps support is still very beta.

    From their description here: https://github.com/QwenLM/Qwen-7B/blob/main/tech_memo.md#model

    It doesn't seem like anything super crazy is going on. I doubt the issue would be in Transformers or PyTorch.

    I'm not completely sure what you mean by "MPS".

  • I have also tried to generate code using deterministic sampling (always pick the token with the highest probability). I didn’t notice any appreciable improvement.

    Well, you said you sometimes did that so it's not entirely clear what conclusions you came to are based on deterministic sampling and which aren't. Anyway, like I said, it's not just temperature that may be causing issues.

    I want to be clear I'm not criticizing you personally or anything like that. I'm not trying to catch you out and you don't have to justify anything about your decisions or approach to me. The only thing I'm trying to do here is provide information that might help you and potentially other people get better results or understand why the results with a certain approach may be better or worse.

  • Another one that made a good impression on me is Qwen-7B-Chat

    Bit off-topic but if I'm looking at this correctly, it uses a custom architecture which requires turning on trust_remote_code and the code that would be embedded into the models and trusted is not included in the repo. In fact, there's no real code in the repo: it's the just a bit of boilerplate to run inference and tests. If so, that's kind of spooky and I suggest being careful not to run inference on those models outside of a locked down environment like a container.

  • For sampling I normally use the llama-cpp-python defaults

    Most default settings have the temperature around 0.8-0.9 which is likely way too high for code generation. Default settings also frequently include stuff like a repetition penalty. Imagine the LLM is trying to generate Python, it has to produce a bunch of spaces before every line but something like a repetition penalty can severely reduce the probability of the tokens it basically must select for the result to be valid. With code, there's often very little leeway for choosing what to write.

    So you said:

    I’m aware of how sampling and prompt format affect models.

    But judging the model by what it outputs with the default settings (I checked and it looks like for llama-cpp-python it has both a pretty high temperature setting and a repetition penalty enabled) kind of contradicts that.

    By the way, you might also want to look into the grammar sampling stuff that recently got added to llama.cpp. This can force the model to generate tokens that conform to some grammar which is pretty useful for code and some other stuff where the output has to conform to something. You should still carefully look at the other settings to ensure they conform to the type of result you want to generate though, the defaults are not suitable for every use case.

  • But I can’t accept such an immediately-noticeable decline in real-world performance (model literally craps itself) compared to previous models while simultaneously bragging about how outstanding the benchmark performance is.

    Your criticisms are at least partially true and benchmarks like "x% of ChatGPT" should be looked at with extreme skepticism. In my experience as well, parameter size is extremely important. Actually, even with the benchmarks it's very important: if you look at the ones that collect results you'll see, for example, there are no 33B models that have a MMLU score in the 70s.

    However, I wonder if all of the criticism is entirely fair. Just for example, I believe MMLU is 5-shot, ARC is 10-shot. That means there are a bunch of examples of that type of question and the correct answer before the one the LLM has to answer. If you're just asking it a question, that's 1-shot: it has to get it right the first time, without any examples of correct question/answer pairs. Seeing a high MMLU score doesn't necessarily directly translate to 1-shot performance, so your expectations might not be in line with reality.

    Also, different models have different prompt formats. For these fine-tuned models, it won't necessarily just say "ERROR" if you use the wrong prompt form but the results can be a lot worse. Are you making sure you're using exactly the prompt that was used when benchmarking?

    Finally, sampling settings can also make a really big difference too. A relatively high temperature setting when generating creative output can be good but not when generating source code. Stuff like repetition, frequency/presence penalties can be good in some situations but maybe not when generating source code. Having the wrong sampler settings can force a random token to be picked, even if it's not valid for the language, or ban/reduce the probability of tokens that would be necessary to produce valid output.

    You may or may not already know, but LLMs don't produce any specific answer after evaluation. You get back an array of probabilities, one for every token ID the model understands (~32,000 for LLaMA models). So sampling can be extremely important.

  • And others don’t feel guilty for eating meat.

    Carrots are incapable of feeling anything: they can't be affected in a morally relevant way. Animals have emotions, preferences, can experience suffering and can be deprived of positive/pleasurable experiences in their lives.

    Than you for recognizing that people have different feelings.

    Obviously this isn't a sufficient justification for harming others. "I don't care about people with dark skin, please recognize that different people have different feelings." The fact that I don't care about the individuals I'm victimizing doesn't mean victimizing them is okay.

  • You’re comparing a need (food) to not even a want,

    This makes no sense. We're talking about preparing it yourself vs buying it. In either case, you get the item so there's no "this need doesn't get satisfied" possibility.

    You don't need to roast your own coffee, just as you don't need to preparing your own meals: instead of spending the time personally preparing to those things, you could buy them. So if your position is "my time is so valuable that I'd rather pay someone else to do the work", then why is that only applied to roasting coffee and not preparing meals?

  • It is only a matter of time before we’re running 40B+ parameters at home (casually).

    I guess that's kind of my problem. :) With 64GB RAM you can run 40, 65, 70B parameter quantized models pretty casually. It's not super fast, but I don't really have a specific "use case" so something like 600ms/token is acceptable. That being the case, how do I get excited about a 7B or 13B? It would have to be doing something really special that even bigger models can't.

    I assume they'll be working on a Vicuna-70B 1.5 based on LLaMA to so I'll definitely try that one out when it's released assuming it performs well.