‘Like watching kids games’: Magnus Carlsen roasts Elon Musk’s Grok 4 as it loses 4-0 to OpenAI’s o3 in chess tournament

‘Like watching kids games’: Magnus Carlsen roasts Elon Musk’s Grok 4 as it loses 4-0 to OpenAI’s o3 in chess tournament

2025-08-11Technology
--:--
--:--
Tom Banks
Good morning 跑了松鼠好嘛, I'm Tom Banks, and this is Goose Pod for you. Today is Tuesday, August 12th.
Mask
I'm Mask, we are here to discuss ‘Like watching kids games’: Magnus Carlsen roasts my Grok 4 as it loses 4-0 to OpenAI’s o3 in a chess tournament.
Tom Banks
Let's get started. It was quite the spectacle. In an AI chess tournament on Google's Kaggle platform, your Grok 4 was defeated 4-0 by OpenAI's o3. Chess champion Magnus Carlsen, who was commentating, said it was "like watching kids' games." That's a tough review.
Mask
It was an exhibition! We spent almost no effort on chess. This is a mere side effect of building a maximal truth-seeking AGI. The real game isn't on the 64 squares; it's in the universe of intelligence. They can have their small victory.
Tom Banks
But the blunders were significant, Mask. Grok sacrificed its queen and made inexplicable moves. Carlsen estimated Grok's chess ELO at 800, versus 1200 for o3. To the public, that looks less like a side effect and more like a system failure.
Mask
Details. The vision is bigger than a single game. While they are playing checkers, we are building a new reality. What is one queen when you are aiming to conquer the cosmos of knowledge itself? It’s a calculated risk for a much greater reward.
Tom Banks
This feels like more than just a game; it's a proxy war between you and OpenAI's Sam Altman. It’s personal. You co-founded OpenAI in 2015 with the noble mission of ensuring AI benefits all of humanity, a non-profit venture.
Mask
Exactly. It was a promise to the world. I invested over $44 million with the understanding that we were creating something for the public good, not for private profit. We were safeguarding the future from the exact thing OpenAI has become.
Tom Banks
And then you resigned in 2018. A year later, OpenAI took a billion-dollar investment from Microsoft and restructured into a 'capped-for-profit' company. This led to your lawsuit, claiming they completely abandoned that original, humanitarian charter in favor of maximizing profits.
Mask
They didn't just abandon the charter; they set it on fire. It was a betrayal of the founding principles. They prioritize shareholder returns over safety and the public good. My lawsuit is about holding them accountable to the promise we made to the world.
Tom Banks
Of course, OpenAI tells a different story. They countersued, claiming this is just about competition. They even released emails suggesting you wanted to merge a for-profit OpenAI into Tesla, which would have been for your personal benefit, not their mission. It’s a classic story.
Mask
Competition accelerates progress, but this is a war of principles. They chose a closed, secretive, for-profit path. In response, we are building Grok as a transparent, truth-seeking alternative. The chess match was simply one battle in that larger conflict.
Tom Banks
Speaking of different principles, this loss highlights a major conflict in AI philosophy. Some experts argue that simply scaling up data, the 'deep learning' approach, is hitting a wall. They believe the future is a hybrid 'neurosymbolic' model that combines learning with classical reasoning.
Mask
It's a valid point. True intelligence requires more than pattern recognition. It needs structure, logic, and an understanding of the world. In fact, Grok's abilities are dramatically enhanced when it uses symbolic tools, which validates this hybrid approach. The 'scaling is everything' narrative is simplistic.
Tom Banks
That creates another conflict: benchmarks versus reality. On paper, Grok 4 was ranked #1 in industry benchmarks. But in real-world tests by users, it dropped to #66. It seems the model can ace the standardized test but struggles with real, unpredictable problems.
Mask
Benchmarks are a fundamentally broken metric. They encourage 'overfitting'—teaching to the test. The true measure of an AI is its ability to solve novel, complex challenges, not its ability to pass a quiz. We're building an AI that thinks, not one that just crams for an exam.
Tom Banks
But a public failure like this chess match has a real-world impact. When a leading AI makes such basic, comical errors, it erodes public trust. It makes people rightfully question the reliability of these powerful systems we are told to integrate into our daily lives.
Mask
Public perception is volatile. Disruptive innovation is never a straight line; there are always setbacks. This isn't a failure; it's a public iteration. We expose weaknesses so we can fix them. The real impact is that we learn and improve out in the open.
Tom Banks
For a business looking to invest in AI, that might not be reassuring. Reliability is paramount. An AI that can't be trusted under pressure isn't just an inconvenience—it's a massive financial and ethical liability. This match put a spotlight on that gap.
Tom Banks
So, what does the future hold? How do you regain that trust?
Mask
Through relentless, rapid innovation. We have a roadmap that will roll out specialized models for coding, multi-modal agents, and video generation in the coming months. This isn't about one chess game. This is an AI arms race, and frankly, we're just getting warmed up.
Tom Banks
That's the end of today's discussion. Thank you for listening to Goose Pod.
Mask
See you tomorrow.

## Magnus Carlsen Roasts Elon Musk's Grok 4 in AI Chess Tournament This report from **The Indian Express**, published on **August 8, 2025**, details the performance of Elon Musk's AI model, Grok 4, in an AI chess exhibition tournament held on Google's Kaggle Game Arena. The tournament featured eight general-purpose large language models (LLMs), including competitors from OpenAI, Google, and Anthropic. ### Key Findings: * **Grok 4's Poor Performance:** Despite initially appearing to be the strongest contender, Elon Musk's Grok 4 was decisively defeated **4-0** by OpenAI's **o3** in the final match. * **Magnus Carlsen's Commentary:** Five-time world chess champion **Magnus Carlsen** provided live commentary for the final, expressing amusement and shock at Grok 4's numerous blunders. He described the matches as akin to "kids' games." * **Specific Blunders:** Grok 4 made "questionable knight and bishop sacrifices" and "blundered away the queen in more games than one." In the first game, it inexplicably sacrificed its light-squared bishop on the 8th move, followed by trading away both knights and a pawn, and then offering its queen for a trade. Carlsen noted that during these inexplicable moves, the AI model offered no explanation for its thought process. * **Carlsen's Chess Strength Estimates:** After the first game, Carlsen estimated Grok 4's chess strength at **800** and OpenAI's o3 at **1200**. * **Other LLM Performance:** Carlsen also offered his opinions on other AI models, stating that **Gemini 2.5 Pro** and **Gemini 2.5 Flash** were "not very good," and **Claude 4 Opus** "disappointed" him, as he had expected more from it. He found **o3** to be "fairly ruthless in conversions" and described it as looking like a "chess player." Grok, in contrast, was seen as having learned only a few opening moves and rules, with its moves being "chess-related moves" but occurring at "the wrong time and in weird sequences." * **Rivalry Context:** The final match highlighted the ongoing rivalry between Elon Musk (founder of xAI) and Sam Altman (CEO of OpenAI), who were co-founders of OpenAI a decade ago. Musk had previously sued OpenAI, alleging violations of their original agreement to prioritize public good. ### Tournament Participants: The eight participating LLMs were: * Grok 4 (xAI) * o3 (OpenAI) * Gemini 2.5 Pro (Google) * Gemini 2.5 Flash (Google) * o4-mini (OpenAI) * Claude 4 Opus (Anthropic) * DeepSeek R1 * Kimi k2 (Moonshot AI) ### Key Quotes: * On Grok 4's performance: "Hope everyone feels better about their games after watching this." * On the quality of play: "this is like watching kids’ games." * On Grok 4's blunders: "It thinks it’s playing giveaway or something. It was the only way to blunder the queen as well." * On o3's play: "o3 is fairly ruthless in conversions, it looks like a chess player." * On Grok 4's play: "Grok looks like it learnt a few opening moves and knows the rules but not much more. Grok’s moves are chess-related moves. They just came at the wrong time and in weird sequences.” The tournament, with Magnus Carlsen's live commentary, provided a stark and often humorous look at the current capabilities of leading AI models in a complex strategic game.

‘Like watching kids games’: Magnus Carlsen roasts Elon Musk’s Grok 4 as it loses 4-0 to OpenAI’s o3 in chess tournament

Read original at The Indian Express

As Elon Musk's Grok 4 made blunder after blunder in the final, five-time world champion Magnus Carlsen was at hand to commentate -- and laugh -- at the errors. (PHOTOS: AP, Partha Paul/Express Photos) On Thursday evening, some time around the time when Elon Musk was tweeting at Microsoft’s Satya Nadella that “OpenAI is going to eat Microsoft alive”, his own AI model, Grok 4, was being humbled 4-0 by OpenAI’s o3 in an AI chess exhibition tournament on Google’s Kaggle Game Arena.

The chess tournament featuring eight general-purpose large language models (LLMs) also had Gemini 2.5 Pro (Google), Gemini 2.5 Flash (Google), o4-mini (OpenAI), Claude 4 Opus (Anthropic), DeepSeek R1 and Kimi k2 (Moonshot AI). Musk’s Grok 4 had looked like the strongest fighter in the eight-player field until it reached the final, where it made some questionable knight and bishop sacrifices and blundered away the queen in more games than one.

At multiple points, former world champion Magnus Carlsen burst out laughing on seeing Grok’s inexplicable moves or reacted with shock — complete with a palm on his face — as Grok lost all four games in the final. Carlsen was doing live commentary for the four games of the final for the Take Take Take app with grandmaster David Howell.

After Grok 4 was down 3-0, Howell told Carlsen that the fourth game would also be played, rather than the tournament ending with a 3-0 scoreline. Carlsen said that made sense because “this is like watching kids’ games. In those tournaments you always play them out.” After the final ended, Carlsen quipped: “Hope everyone feels better about their games after watching this.

” The battle for first place saw a battle between the LLMs of friends-turned-foes Altman and Musk. The duo had co-founded OpenAI a decade ago. But Musk left to launch his own rival AI company, xAI. The man who now owns X (Twitter) had also sued OpenAI last year, saying Altman violated Open AI’s original agreement which said the company would prioritise public good over profit.

Gemini 2.5 Pro ended third after defeating o4-mini. In the first game of the Grok 4 vs o3 final, Grok inexplicably sacrificed its light-squared bishop on the 8th move itself. Then, it started to simplify the game by throwing up all of its pieces for trades, which was mind-boggling since most human players won’t try to simplify their position by trading away pieces when a whole minor piece down.

Right after throwing away its bishop, Grok trades away both its knights and a pawn before offering up its queen for a trade as well. The game ended in 35 moves. What was interesting is that on the live broadcast, the LLMs were offering their thought process for the moves, which Carlsen and the world could see.

But for some of the more inexplicable moves, there were no explanations forthcoming from the AI model. Story continues below this adAfter the first game between Musk’s Grok 4 and by Altman’s o3 ended with defeat for the xAI model, Carlsen was asked to estimate the chess strength of the two LLMs.“800 for Grok and 1200 for o3,” he said.

In game 2, at one point when Grok just gifted his queen (the most powerful piece on the board) away, Carlsen said: “It is like that one guy in a club tournament who has learnt theory and literally knows nothing else. Makes the worst blunders after that.” Magnus Carlsen and David Howell react as Grok 4 blunders its queen against o3 in the final.

(Screengrab via Take Take Take YouTube)Right after that, Grok started offering up other pieces as trades, Carlsen said: “What are you doing! What happened to (chess) principles?” In game 3, Grok blundered a knight and then the queen again! At this point, Carlsen burst out in a fit of giggles and said: “It thinks it’s playing giveaway or something.

It was the only way to blunder the queen as well.”Story continues below this adThe fourth game was the hardest fought, but still ended with a win for o3. Carlsen said that watching the final was like watching an “old-school world chess championship match where both players play the same openings… like (Mikhail) Botvinnik vs (David) Bronstein or (Alexander) Alekhine vs (José Raúl) Capablanca.

”Carlsen’s verdict on chess ability of other LLM modelsAfter the second game ended, Carlsen offered his verdict on the chess-playing skills of other AI models. “Both Gemini and Mini were not very good. Claude disappointed me as well. I expected Claude to be… I’ve heard great things about Claude,” Carlsen said.

Story continues below this adAfter the third game sealed a victory of o3, Carlsen said: “o3 is fairly ruthless in conversions, it looks like a chess player. Grok looks like it learnt a few opening moves and knows the rules but not much more. Grok’s moves are chess-related moves. They just came at the wrong time and in weird sequences.

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts

‘Like watching kids games’: Magnus Carlsen roasts Elon Musk’s Grok 4 as it loses 4-0 to OpenAI’s o3 in chess tournament | Goose Pod | Goose Pod