What is wrong with LLM benchmarks, and why are we still using them?
micheal65536 @ micheal65536 @lemmy.micheal65536.duckdns.org Posts 1Comments 28Joined 2 yr. ago
micheal65536 @ micheal65536 @lemmy.micheal65536.duckdns.org
Posts
1
Comments
28
Joined
2 yr. ago
I have also tried to generate code using deterministic sampling (always pick the token with the highest probability). I didn't notice any appreciable improvement.