Why Anthropic’s Claude still hasn’t beaten Pokémon

May Be Interested In:«Mammoth» et «Noah»: l’actrice Sophie Nyweide meurt à l’âge de 24 ans


One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.


Credit:

Claude Plays Pokemon / Twitch

Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn’t know what it’s doing.”

But Hershey is still impressed at the way that Claude’s new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.”



share Share facebook pinterest whatsapp x print

Similar Content

Contributor: How federally funded research saved my son's sight — and his life — from a rare cancer
Contributor: How federally funded research saved my son’s sight — and his life — from a rare cancer
In the news today: StatCan to release January inflation figures today
In the news today: Two campaigns shift to Saskatchewan, Trump’s trade war continues
The 15 best games to play on the Nintendo Switch in 2025
The 15 best games to play on the Nintendo Switch in 2025
Environmental scientist examines Northeast earthquake
Environmental scientist examines Northeast earthquake
BCCI, ECB, ICC & CA To Sponsor Initiative For Afghan Women Cricketers
BCCI, ECB, ICC & CA To Sponsor Initiative For Afghan Women Cricketers
Passwords app icon
How to find alternatives for verification codes when roaming
Critical Watch: Today’s Pivotal Events | © 2025 | Daily News