Apple study exposes deep cracks in LLMs’ “reasoning” capabilities
Regrettable_incident @ Regrettable_incident @lemmy.world Posts 4Comments 423Joined 1 yr. ago

Regrettable_incident @ Regrettable_incident @lemmy.world
Posts
4
Comments
423
Joined
1 yr. ago
Interesting. . . I'd say Gemma 2B wasn't actually wrong - it just didn't answer the question you asked! I wonder if they have this problem with other letters - like maybe it's something to do with how we say w as double-you . . . But maybe not, because they seem to be underestimating rather and overestimating. But yeah, I guess the fuckers just can't count. You'd think a question using the phrase 'How many . . .' would be a giveaway that they might need to count something rather than rely on knowledge base.