Skip Navigation

How well can LLMs solve chess puzzles?

github.com

GitHub - kagisearch/llm-chess-puzzles: Benchmark LLM reasoning capability by solving chess puzzles.

Each LLM is given the same 1000 chess puzzles to solve. See puzzles.csv. Benchmarked on Mar 25, 2024.

ModelSolvedSolved %Illegal MovesIllegal Moves %Adjusted Elo
gpt-4-turbo-preview22922.9%16316.3%1144
gpt-419519.5%18318.3%1047
claude-3-opus-20240229727.2%46446.4%521
claude-3-haiku-20240307383.8%59059.0%363
claude-3-sonnet-20240229232.3%66366.3%286
gpt-3.5-turbo232.3%68368.3%269
claude-instant-1.2101.0%70766.3%245
mistral-large-latest40.4%81381.3%149
mixtral-8x7b90.9%83283.2%136
gemini-1.5-pro-latest*FAIL----

Published by the CEO of Kagi!

22 comments
22 comments