1y ago

How well can LLMs solve chess puzzles?

GitHub - kagisearch/llm-chess-puzzles: Benchmark LLM reasoning capability by solving chess puzzles.

Each LLM is given the same 1000 chess puzzles to solve. See puzzles.csv. Benchmarked on Mar 25, 2024.

Model	Solved	Solved %	Illegal Moves	Illegal Moves %	Adjusted Elo
gpt-4-turbo-preview	229	22.9%	163	16.3%	1144
gpt-4	195	19.5%	183	18.3%	1047
claude-3-opus-20240229	72	7.2%	464	46.4%	521
claude-3-haiku-20240307	38	3.8%	590	59.0%	363
claude-3-sonnet-20240229	23	2.3%	663	66.3%	286
gpt-3.5-turbo	23	2.3%	683	68.3%	269
claude-instant-1.2	10	1.0%	707	66.3%	245
mistral-large-latest	4	0.4%	813	81.3%	149
mixtral-8x7b	9	0.9%	832	83.2%	136
gemini-1.5-pro-latest*	FAIL	-	-	-	-

Published by the CEO of Kagi!