ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Did the author thinks ChatGPT is in fact an AGI? It's a chatbot. Why would it be good at chess? It's like saying an Atari 2600 running a dedicated chess program can beat Google Maps at chess.

AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.
Something marketed as AGI should be treated as AGI when proving it isn't AGI.
- Not to help the AI companies, but why don't they program them to look up math programs and outsource chess to other programs when they're asked for that stuff? It's obvious they're shit at it, why do they answer anyway? It's because they're programmed by know-it-all programmers, isn't it.
- I don't think ai is being marketed as awesome at everything. It's got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It's a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.
Most people do. It's just called AI in the media everywhere and marketing works. I think online folks forget that something as simple as getting a Lemmy account by yourself puts you into the top quintile of tech literacy.
- Yet even on Lemmy people can't seem to make sense of these terms and are saying things like "LLM's are not AI"
Google Maps doesn't pretend to be good at chess. ChatGPT does.
- A toddler can pretend to be good at chess but anybody with reasonable expectations knows that they are not.
well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like "can chatgpt prove the Riemann hypothesis"
I agree with your general statement, but in theory since all ChatGPT does is regurgitate information back and a lot of chess is memorization of historical games and types, it might actually perform well. No, it can't think, but it can remember everything so at some point that might tip the results in it's favor.
- Regurgitating an impression of, not regurgitating verbatim, that's the problem here.
  Chess is 100% deterministic, so it falls flat.
- I mean it may be possible but the complexity would be so many orders of magnitude greater. It'd be like learning chess by just memorizing all the moves great players made but without any context or understanding of the underlying strategy.
I think that’s generally the point is most people thing chat GPT is this sentient thing that knows everything and… no.
- Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.
In all fairness. Machine learning in chess engines is actually pretty strong.
AlphaZero was developed by the artificial intelligence and research company DeepMind, which was acquired by Google. It is a computer program that reached a virtually unthinkable level of play using only reinforcement learning and self-play in order to train its neural networks. In other words, it was only given the rules of the game and then played against itself many millions of times (44 million games in the first nine hours, according to DeepMind).
https://www.chess.com/terms/alphazero-chess-engine
- Oh absolutely you can apply machine learning to game strategy. But you can't expect a generalized chatbot to do well at strategic decision making for a specific game.
- Sure, but machine learning like that is very different to how LLMs are trained and their output.
Articles like this are good because it exposes the flaws with the ai and that it can't be trusted with complex multi step tasks.
Helps people see that think AI is close to a human that its not and its missing critical functionality
- The problem is though that this perpetuates the idea that ChatGPT is actually an AI.
I like referring to LLMs as VI (Virtual Intelligence from Mass Effect) since they merely give the impression of intelligence but are little more than search engines. In the end all one is doing is displaying expected results based on a popularity algorithm. However they do this inconsistently due to bad data in and limited caching.
I mean, open AI seem to forget it isn’t.

Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.

Yeah its like judging how great a fish is at climbing a tree. But it does show that it's not real intelligence or reasoning
- Don't call my fish stupid.
I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever
Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as "AI" and attributing every ML win ever to "AI".
ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that "AI helps cure cancer", it makes it sound like it was a lone researcher who spent a few minutes engineering the right prompt for Copilot.
Yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it "AI" and bundling it together with the latest Gemini or Claude iteration's "reasoning capabilities" is intentionally misleading. That's why articles like this one are needed. ML is a useful tool but far from the "super-human general intelligence" that is meant to replace half of human workers by the power of wishful prompting
It does

ChatGPT has been, hands down, the worst AI coding assistant I've ever used.

It regularly suggests code that doesn't compile or isn't even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.

I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it
- I've found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.
  Still not perfect, but night and day difference.
  I feel like ChatGPT didn't focus on coding and instead focused on mainstream, but I am not an expert.
- I like tab coding, writing small blocks of code that it thinks I need. Its On point almost all the time. This speeds me up.
- It's the ideal help for people who shouldn't be employed as programmers to start with.
  I had to explain hexadecimal to somebody the other day. It's honestly depressing.
my favorite thing is to constantly be implementing libraries that don't exist
- You're right. That library was removed in ToolName [PriorVersion]. Please try this instead.
  makes up entirely new fictitious library name
- It's even worse when AI soaks up some project whose APIs are constantly changing. Try using AI to code against jetty for example and you'll be weeping.
That's because it doesn't know what it's saying. It's just blathering out each word as what it estimates to be the likely next word given past examples in its training data. It's a statistics calculator. It's marginally better than just smashing the auto fill on your cell repeatedly. It's literally dumber than a parrot.
- Parrots are actually intelligent though.
All AIs are the same. They're just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They're super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.
I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself
I wouldn’t use it to generate stuff though
I don't use it for coding. I use it sparingly really, but want to learn to use it more efficiently. Are there any areas in which you think it excels? Are there others that you'd recommend instead?
- Use Gemini (2.5) or Claude (3.7 and up). OpenAI is a shitshow

I suppose it's an interesting experiment, but it's not that surprising that a word prediction machine can't play chess.

Because people want to feel superior because they ~~don't know how to use a ChatBot~~ can count the number of "r"s in the word "strawberry", lol
- Yeah, just because I can't count the number of r's in the word strawberry doesn't mean I shouldn't be put in charge of the US nuclear arsenal!

A strange game. How about a nice game of Global Thermonuclear War?

LLM are not built for logic.

And yet everybody is selling to write code.
The last time I checked, coding was requiring logic.
- To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.
  So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like "does this language want join as a method on a list with a string argument, or vice versa?"
  Problem is this can be sometimes more annoying than it's worth, as miscompletions are annoying.

I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
- It's also from a company claiming they're getting closer to create morphing shape that can match any hole.
- The press release where OpenAI said we'd never need chess players again
- You get 2 triangles in a single square mate...
  CHECKMATE!

Can ChatGPT actually play chess now? Last I checked, it couldn't remember more than 5 moves of history so it wouldn't be able to see the true board state and would make illegal moves, take it's own pieces, materialize pieces out of thin air, etc.

ChatGPT must adhere honorably to the rules that its making up on the spot. Thats Dallas
and still lose to stockfish even after conjuring 3 queens out of thin air lol
It can't, but that didn't stop a bunch of gushing articles a while back about how it had an ELO of 2400 and other such nonsense. Turns out you could get it to have an ELO of 2400 under a very very specific set of circumstances, that include correcting it every time it hallucinated pieces or attempted to make illegal moves.
There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven't tested them, but if they work, I'd say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.
As for why we need ChatGPT then when the result comes from Stockfish anyway, it's for the natural language prompts and responses.
It could always play it if you reminded it of the board state every move. Not well, but at least generally legally. And while I know elites can play chess blind, the average person can't, so it was always kind of harsh to hold it to that standard and criticise it not being able to remember more than 5 moves when most people can't do that themselves.
Besides that, it was never designed to play chess. It would be like insulting Watson the Jeopardy bot for losing against the Atari chess bot, it's not what it was designed to do.

Hardly surprising. Llms aren't -thinking- they're just shitting out the next token for any given input of tokens.

That's exactly what thinking is, though.
- An LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It's like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness ("temperature") when choosing the next token so the responses are not identical each time.
  But it is not thinking. Not even remotely so. It's a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.
  
  ollama run gemma3:4b >>> /set parameter temperature 0 >>> what is a leaf
  You will get the same answer every single time.

The Atari chess program can play chess better than the Boeing 747 too. And better than the North Pole. Amazing!

Neither of those things are marketed as being artificially intelligent.
- Marketers aren't intelligent either, so I see no reason to listen to them.
Are either of those marketed as powerful AI?

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.
This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.
For some things it even works. But calling this intelligence is dubious at best.
- ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
- Because it doesn't have any understanding of the rules of chess or even an internal model of the game state, it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal moves, trying to move or capture pieces that don't exist, incorrectly declaring check/checkmate, or any number of nonsensical things.
In this case it's not even bad prompts, it's a problem domain ChatGPT wasn't designed to be good at. It's like saying modern medicine is clearly bullshit because a doctor loses a basketball game.

An LLM is a poor computational/predictive paradigm for playing chess.

This just in: a hammer makes a poor screwdriver.
- LLMs are more like a leaf blower though
The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
Yeah, a lot of them hallucinate illegal moves.
Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).
- I'm impressed, if that's true! In general, an LLM's training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.

I'm often impressed at how good chatGPT is at generating text, but I'll admit it's hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate

ChatGPT is playing Anarchy Chess
Yeah! I’ve loved watching Gothem Chess’ videos on these. Always have been good for a laugh.

There was a chess game for the Atari 2600? :O

I wanna see them W I D E pieces.

Prepare to be delighted. Full disclosure, my Atari isn't hooked up and also I don't have the Video Chess cart even if it was, so this was fetched from Google Images.
- Those are some funky looking knights lol
- I'm annoyed the pieces are bottom adjusted...
- Can confirm.
  And if you play it on expert mode, you can leave for college and get your degree before it’s your turn again.
Here you go (online emulator): https://www.retrogames.cz/play_716-Atari2600.php
- WTF? I just played that just long enough for my queen to take over their queen, and it turned my queen into a rook?
  Is that even a legit rule in any variation of chess rules?
I wasn't aware of that either, now I'm kinda curious to try to find it in my 512 Atari 2600 ROMs archive..

This made my day

Get your booty on the floor tonight.

2025 Mazda MX-5 Miata 'got absolutely wrecked' by Inflatable Boat in beginner's boat racing match — Mazda's newest model bamboozled by 1930s technology.

It's not that hard to beat dumb 6 year old who's only purpose is mine your privacy to sell you ads or product place some shit for you in future.

You say you produce good oranges but my machine for testing apples gave your oranges a very low score.

No, more like "Your marketing team, sales team, the news media at large, and random hype men all insist your orange machine works amazing on any fruit if you know how to use it right. It didn't work my strawberries when I gave it all the help I could, and was outperformed by my 40 year old strawberry machine. Please stop selling the idea it works on all fruit."
This study is specifically a counter to the constant hype that these LLMs will revolutionize absolutely everything, and the constant word choices used in discussion of LLMs that imply they have reasoning capabilities.

Okay, but could ChatGPT be used to vibe code a chess program that beats the Atari 2600?

no.
the answer is always, no.
- The answer might be no today, but always seems like a stretch.

Next, pit ChatGPT against 1K ZX Chess in a ZX81.

Ah, you used logic. That's the issue. They don't do that.

This isn't the strength of gpt-o4 the model has been optimised for tool use as an agent. That's why its so good at image gen relative to other models it uses tools to construct an image piece by piece similar to a human. Also probably poor system prompting. A LLM is not a universal thinking machine its a a universal process machine. An LLM understands the process and uses tools to accomplish the process hence its strengths in writing code (especially as an agent).

Its similar to how a monkey is infinitely better at remembering a sequence of numbers than a human ever could but is totally incapable of even comprehending writing down numbers.

Do you have a source for that re:monkeys memorizing numerical sequences? What do you mean by that?
- https://www.youtube.com/watch?v=MKvX9PPmI-Q
- That threw me as well.

Can i fistfight ChatGPT next? I bet I could kick its ass, too :p

this is because an LLM is not made for playing chess

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.

is like using a power tool as a table leg.
Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM "AI" tools.
And clearly that will backfire on them and they'll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.
- If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.
  It's not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they're being applied to far more use cases than they are likely to be good at -- either because they're overhyped or our corporate lords and masters are just experimenting to find out what they're good at and what not. Some of these cases will be like chess, but others will be like code.
  ( not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)

Llms useless confirmed once again

If you don't play chess, the Atari is probably going to beat you as well.

LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says "this answer is better than that one" enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.

If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.

It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn't compete well with a serious chess solver.

Isn't the Atari just a game console, not a chess engine?

Like, Wikipedia doesn't mention anything about the Atari 2600 having a built-in chess engine.

If they were willing to run a chess game on the Atari 2600, why did they not apply the same to ChatGPT? There are custom GPTs which claim to use a stockfish API or play at a similar level.

Like this, it's just unfair. Both platforms are not designed to deal with the task by themselves, but one of them is given the necessary tooling, the other one isn't. No matter what you think of ChatGPT, that's not a fair comparison.

GPTs which claim to use a stockfish API
Then the actual chess isn't LLM. If you are going stockfish, then the LLM doesn't add anything, stockfish is doing everything.
The whole point is the marketing rage is that LLMs can do all kinds of stuff, doubling down on this with the branding of some approaches as "reasoning" models, which are roughly "similar to 'pre-reasoning', but forcing use of more tokens on disposable intermediate generation steps". With this facet of LLM marketing, the promise would be that the LLM can "reason" itself through a chess game without particular enablement. In practice, people trying to feed in gobs of chess data to an LLM end up with an LLM that doesn't even comply to the rules of the game, let alone provide reasonable competitive responses to an oppone.

So, it fares as well as the average schmuck, proving it is human

/s

All these comments asking "why don't they just have chatgpt go and look up the correct answer".

That's not how it works, you buffoons, it trains off of datasets long before it releases. It doesn't think. It doesn't learn after release, it won't remember things you try to teach it.

Really lowering my faith in humanity when even the AI skeptics don't understand that it generates statistical representations of an answer based on answers given in the past.

They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.

Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.
Edit: When comparing reasoning models to existing algorithmic solutions.